Message ID | 20210423173018.23133-10-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: Hole punch vs page cache filling races | expand |
Hi Jan,
I love your patch! Perhaps something to improve:
[auto build test WARNING on ext4/dev]
[also build test WARNING on fuse/for-next linus/master v5.12-rc8]
[cannot apply to hnaz-linux-mm/master next-20210423]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jan-Kara/fs-Hole-punch-vs-page-cache-filling-races/20210424-013114
base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: s390-randconfig-s031-20210424 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/800cf89f11d437415eead2da969c3b07908fd406
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jan-Kara/fs-Hole-punch-vs-page-cache-filling-races/20210424-013114
git checkout 800cf89f11d437415eead2da969c3b07908fd406
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=s390
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
mm/shmem.c: In function 'shmem_writepage':
>> mm/shmem.c:1326:10: warning: variable 'index' set but not used [-Wunused-but-set-variable]
1326 | pgoff_t index;
| ^~~~~
vim +/index +1326 mm/shmem.c
^1da177e4c3f41 Linus Torvalds 2005-04-16 1316
^1da177e4c3f41 Linus Torvalds 2005-04-16 1317 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1318 * Move the page from the page cache to the swap cache.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1319 */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1320 static int shmem_writepage(struct page *page, struct writeback_control *wbc)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1321 {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1322 struct shmem_inode_info *info;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1323 struct address_space *mapping;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1324 struct inode *inode;
6922c0c7abd387 Hugh Dickins 2011-08-03 1325 swp_entry_t swap;
6922c0c7abd387 Hugh Dickins 2011-08-03 @1326 pgoff_t index;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1327
800d8c63b2e989 Kirill A. Shutemov 2016-07-26 1328 VM_BUG_ON_PAGE(PageCompound(page), page);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1329 BUG_ON(!PageLocked(page));
^1da177e4c3f41 Linus Torvalds 2005-04-16 1330 mapping = page->mapping;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1331 index = page->index;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1332 inode = mapping->host;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1333 info = SHMEM_I(inode);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1334 if (info->flags & VM_LOCKED)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1335 goto redirty;
d9fe526a83b84e Hugh Dickins 2008-02-04 1336 if (!total_swap_pages)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1337 goto redirty;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1338
d9fe526a83b84e Hugh Dickins 2008-02-04 1339 /*
97b713ba3ebaa6 Christoph Hellwig 2015-01-14 1340 * Our capabilities prevent regular writeback or sync from ever calling
97b713ba3ebaa6 Christoph Hellwig 2015-01-14 1341 * shmem_writepage; but a stacking filesystem might use ->writepage of
97b713ba3ebaa6 Christoph Hellwig 2015-01-14 1342 * its underlying filesystem, in which case tmpfs should write out to
97b713ba3ebaa6 Christoph Hellwig 2015-01-14 1343 * swap only in response to memory pressure, and not for the writeback
97b713ba3ebaa6 Christoph Hellwig 2015-01-14 1344 * threads or sync.
d9fe526a83b84e Hugh Dickins 2008-02-04 1345 */
48f170fb7d7db8 Hugh Dickins 2011-07-25 1346 if (!wbc->for_reclaim) {
48f170fb7d7db8 Hugh Dickins 2011-07-25 1347 WARN_ON_ONCE(1); /* Still happens? Tell us about it! */
48f170fb7d7db8 Hugh Dickins 2011-07-25 1348 goto redirty;
48f170fb7d7db8 Hugh Dickins 2011-07-25 1349 }
1635f6a74152f1 Hugh Dickins 2012-05-29 1350
1635f6a74152f1 Hugh Dickins 2012-05-29 1351 /*
1635f6a74152f1 Hugh Dickins 2012-05-29 1352 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
1635f6a74152f1 Hugh Dickins 2012-05-29 1353 * value into swapfile.c, the only way we can correctly account for a
1635f6a74152f1 Hugh Dickins 2012-05-29 1354 * fallocated page arriving here is now to initialize it and write it.
800cf89f11d437 Jan Kara 2021-04-23 1355 * Since a page added by currently running fallocate call cannot be
800cf89f11d437 Jan Kara 2021-04-23 1356 * dirtied and thus arrive here we know the fallocate has already
800cf89f11d437 Jan Kara 2021-04-23 1357 * completed and we are fine writing it out.
1635f6a74152f1 Hugh Dickins 2012-05-29 1358 */
1635f6a74152f1 Hugh Dickins 2012-05-29 1359 if (!PageUptodate(page)) {
1635f6a74152f1 Hugh Dickins 2012-05-29 1360 clear_highpage(page);
1635f6a74152f1 Hugh Dickins 2012-05-29 1361 flush_dcache_page(page);
1635f6a74152f1 Hugh Dickins 2012-05-29 1362 SetPageUptodate(page);
1635f6a74152f1 Hugh Dickins 2012-05-29 1363 }
1635f6a74152f1 Hugh Dickins 2012-05-29 1364
38d8b4e6bdc872 Huang Ying 2017-07-06 1365 swap = get_swap_page(page);
48f170fb7d7db8 Hugh Dickins 2011-07-25 1366 if (!swap.val)
48f170fb7d7db8 Hugh Dickins 2011-07-25 1367 goto redirty;
d9fe526a83b84e Hugh Dickins 2008-02-04 1368
b1dea800ac3959 Hugh Dickins 2011-05-11 1369 /*
b1dea800ac3959 Hugh Dickins 2011-05-11 1370 * Add inode to shmem_unuse()'s list of swapped-out inodes,
6922c0c7abd387 Hugh Dickins 2011-08-03 1371 * if it's not already there. Do it now before the page is
6922c0c7abd387 Hugh Dickins 2011-08-03 1372 * moved to swap cache, when its pagelock no longer protects
b1dea800ac3959 Hugh Dickins 2011-05-11 1373 * the inode from eviction. But don't unlock the mutex until
6922c0c7abd387 Hugh Dickins 2011-08-03 1374 * we've incremented swapped, because shmem_unuse_inode() will
6922c0c7abd387 Hugh Dickins 2011-08-03 1375 * prune a !swapped inode from the swaplist under this mutex.
b1dea800ac3959 Hugh Dickins 2011-05-11 1376 */
b1dea800ac3959 Hugh Dickins 2011-05-11 1377 mutex_lock(&shmem_swaplist_mutex);
05bf86b4ccfd0f Hugh Dickins 2011-05-14 1378 if (list_empty(&info->swaplist))
b56a2d8af9147a Vineeth Remanan Pillai 2019-03-05 1379 list_add(&info->swaplist, &shmem_swaplist);
b1dea800ac3959 Hugh Dickins 2011-05-11 1380
4afab1cd256e42 Yang Shi 2019-11-30 1381 if (add_to_swap_cache(page, swap,
3852f6768ede54 Joonsoo Kim 2020-08-11 1382 __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
3852f6768ede54 Joonsoo Kim 2020-08-11 1383 NULL) == 0) {
4595ef88d13613 Kirill A. Shutemov 2016-07-26 1384 spin_lock_irq(&info->lock);
6922c0c7abd387 Hugh Dickins 2011-08-03 1385 shmem_recalc_inode(inode);
267a4c76bbdb95 Hugh Dickins 2015-12-11 1386 info->swapped++;
4595ef88d13613 Kirill A. Shutemov 2016-07-26 1387 spin_unlock_irq(&info->lock);
6922c0c7abd387 Hugh Dickins 2011-08-03 1388
267a4c76bbdb95 Hugh Dickins 2015-12-11 1389 swap_shmem_alloc(swap);
267a4c76bbdb95 Hugh Dickins 2015-12-11 1390 shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
267a4c76bbdb95 Hugh Dickins 2015-12-11 1391
6922c0c7abd387 Hugh Dickins 2011-08-03 1392 mutex_unlock(&shmem_swaplist_mutex);
d9fe526a83b84e Hugh Dickins 2008-02-04 1393 BUG_ON(page_mapped(page));
9fab5619bdd7f8 Hugh Dickins 2009-03-31 1394 swap_writepage(page, wbc);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1395 return 0;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1396 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1397
6922c0c7abd387 Hugh Dickins 2011-08-03 1398 mutex_unlock(&shmem_swaplist_mutex);
75f6d6d29a40b5 Minchan Kim 2017-07-06 1399 put_swap_page(page, swap);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1400 redirty:
^1da177e4c3f41 Linus Torvalds 2005-04-16 1401 set_page_dirty(page);
d9fe526a83b84e Hugh Dickins 2008-02-04 1402 if (wbc->for_reclaim)
d9fe526a83b84e Hugh Dickins 2008-02-04 1403 return AOP_WRITEPAGE_ACTIVATE; /* Return with page locked */
d9fe526a83b84e Hugh Dickins 2008-02-04 1404 unlock_page(page);
d9fe526a83b84e Hugh Dickins 2008-02-04 1405 return 0;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1406 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1407
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Fri, 23 Apr 2021, Jan Kara wrote: > We have to handle pages added by currently running shmem_fallocate() > specially in shmem_writepage(). For this we use serialization mechanism > using structure attached to inode->i_private field. If we protect > allocation of pages in shmem_fallocate() with invalidate_lock instead, > we are sure added pages cannot be dirtied until shmem_fallocate() is done > (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we > cannot see those pages in shmem_writepage() and there's no need for the > serialization mechanism. Appealing diffstat, but NAK, this patch is based on a false premise. All those pages that are added by shmem_fallocate() are marked dirty: see the set_page_dirty(page) and comment above it in shmem_fallocate(). It's intentional, that they should percolate through to shmem_writepage() when memory is tight, and give feedback to fallocate that it should stop (instead of getting into that horrid OOM-kill flurry that never even frees any tmpfs anyway). Hugh > > CC: Hugh Dickins <hughd@google.com> > CC: <linux-mm@kvack.org> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/shmem.c | 61 ++++++------------------------------------------------ > 1 file changed, 6 insertions(+), 55 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index f34162ac46de..7a2b0744031e 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt; > /* Symlink up to this size is kmalloc'ed instead of using a swappable page */ > #define SHORT_SYMLINK_LEN 128 > > -/* > - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with > - * i_rwsem making sure that it has only one user at a time): we would prefer > - * not to enlarge the shmem inode just for that. > - */ > -struct shmem_falloc { > - pgoff_t start; /* start of range currently being fallocated */ > - pgoff_t next; /* the next page offset to be fallocated */ > - pgoff_t nr_falloced; /* how many new pages have been fallocated */ > - pgoff_t nr_unswapped; /* how often writepage refused to swap out */ > -}; > - > struct shmem_options { > unsigned long long blocks; > unsigned long long inodes; > @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC > * value into swapfile.c, the only way we can correctly account for a > * fallocated page arriving here is now to initialize it and write it. > - * > - * That's okay for a page already fallocated earlier, but if we have > - * not yet completed the fallocation, then (a) we want to keep track > - * of this page in case we have to undo it, and (b) it may not be a > - * good idea to continue anyway, once we're pushing into swap. So > - * reactivate the page, and let shmem_fallocate() quit when too many. (b) there commenting on communicating back to fallocate by nr_unswapped. > + * Since a page added by currently running fallocate call cannot be > + * dirtied and thus arrive here we know the fallocate has already > + * completed and we are fine writing it out. > */ > if (!PageUptodate(page)) { > - if (inode->i_private) { > - struct shmem_falloc *shmem_falloc; > - spin_lock(&inode->i_lock); > - shmem_falloc = inode->i_private; > - if (shmem_falloc && > - index >= shmem_falloc->start && > - index < shmem_falloc->next) > - shmem_falloc->nr_unswapped++; > - else > - shmem_falloc = NULL; > - spin_unlock(&inode->i_lock); > - if (shmem_falloc) > - goto redirty; > - } > clear_highpage(page); > flush_dcache_page(page); > SetPageUptodate(page); > @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > loff_t len) > { > struct inode *inode = file_inode(file); > + struct address_space *mapping = file->f_mapping; > struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); > struct shmem_inode_info *info = SHMEM_I(inode); > - struct shmem_falloc shmem_falloc; > pgoff_t start, index, end; > int error; > > @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > inode_lock(inode); > > if (mode & FALLOC_FL_PUNCH_HOLE) { > - struct address_space *mapping = file->f_mapping; > loff_t unmap_start = round_up(offset, PAGE_SIZE); > loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; > > @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > goto out; > } > > - shmem_falloc.start = start; > - shmem_falloc.next = start; > - shmem_falloc.nr_falloced = 0; > - shmem_falloc.nr_unswapped = 0; > - spin_lock(&inode->i_lock); > - inode->i_private = &shmem_falloc; > - spin_unlock(&inode->i_lock); > - > + down_write(&mapping->invalidate_lock); > for (index = start; index < end; index++) { > struct page *page; > > @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > */ > if (signal_pending(current)) > error = -EINTR; > - else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) > - error = -ENOMEM; > else > error = shmem_getpage(inode, index, &page, SGP_FALLOC); > if (error) { > @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > goto undone; > } > > - /* > - * Inform shmem_writepage() how far we have reached. > - * No need for lock or barrier: we have the page lock. > - */ > - shmem_falloc.next++; > - if (!PageUptodate(page)) > - shmem_falloc.nr_falloced++; > - > /* > * If !PageUptodate, leave it that way so that freeable pages > * can be recognized if we need to rollback on error later. Which goes on to say: * But set_page_dirty so that memory pressure will swap rather * than free the pages we are allocating (and SGP_CACHE pages * might still be clean: we now need to mark those dirty too). */ set_page_dirty(page); unlock_page(page); put_page(page); cond_resched(); > @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > i_size_write(inode, offset + len); > inode->i_ctime = current_time(inode); > undone: > - spin_lock(&inode->i_lock); > - inode->i_private = NULL; > - spin_unlock(&inode->i_lock); > + up_write(&mapping->invalidate_lock); > out: > inode_unlock(inode); > return error; > -- > 2.26.2
On Wed 28-04-21 20:24:59, Hugh Dickins wrote: > On Fri, 23 Apr 2021, Jan Kara wrote: > > > We have to handle pages added by currently running shmem_fallocate() > > specially in shmem_writepage(). For this we use serialization mechanism > > using structure attached to inode->i_private field. If we protect > > allocation of pages in shmem_fallocate() with invalidate_lock instead, > > we are sure added pages cannot be dirtied until shmem_fallocate() is done > > (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we > > cannot see those pages in shmem_writepage() and there's no need for the > > serialization mechanism. > > Appealing diffstat, but NAK, this patch is based on a false premise. > > All those pages that are added by shmem_fallocate() are marked dirty: > see the set_page_dirty(page) and comment above it in shmem_fallocate(). Aha, I missed that set_page_dirty(). Thanks for correcting me. > It's intentional, that they should percolate through to shmem_writepage() > when memory is tight, and give feedback to fallocate that it should stop > (instead of getting into that horrid OOM-kill flurry that never even > frees any tmpfs anyway). I understand the reason, I just feel there should be a better way of communicating memory pressure than ->writepage talking to ->fallocate through structure attached to inode->i_private. But now I see it isn't as simple as I thought ;) Anyway for now I'll just remove this patch. Thanks for review! Honza > > CC: Hugh Dickins <hughd@google.com> > > CC: <linux-mm@kvack.org> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/shmem.c | 61 ++++++------------------------------------------------ > > 1 file changed, 6 insertions(+), 55 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index f34162ac46de..7a2b0744031e 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt; > > /* Symlink up to this size is kmalloc'ed instead of using a swappable page */ > > #define SHORT_SYMLINK_LEN 128 > > > > -/* > > - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with > > - * i_rwsem making sure that it has only one user at a time): we would prefer > > - * not to enlarge the shmem inode just for that. > > - */ > > -struct shmem_falloc { > > - pgoff_t start; /* start of range currently being fallocated */ > > - pgoff_t next; /* the next page offset to be fallocated */ > > - pgoff_t nr_falloced; /* how many new pages have been fallocated */ > > - pgoff_t nr_unswapped; /* how often writepage refused to swap out */ > > -}; > > - > > struct shmem_options { > > unsigned long long blocks; > > unsigned long long inodes; > > @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > > * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC > > * value into swapfile.c, the only way we can correctly account for a > > * fallocated page arriving here is now to initialize it and write it. > > - * > > - * That's okay for a page already fallocated earlier, but if we have > > - * not yet completed the fallocation, then (a) we want to keep track > > - * of this page in case we have to undo it, and (b) it may not be a > > - * good idea to continue anyway, once we're pushing into swap. So > > - * reactivate the page, and let shmem_fallocate() quit when too many. > > (b) there commenting on communicating back to fallocate by nr_unswapped. > > > + * Since a page added by currently running fallocate call cannot be > > + * dirtied and thus arrive here we know the fallocate has already > > + * completed and we are fine writing it out. > > */ > > if (!PageUptodate(page)) { > > - if (inode->i_private) { > > - struct shmem_falloc *shmem_falloc; > > - spin_lock(&inode->i_lock); > > - shmem_falloc = inode->i_private; > > - if (shmem_falloc && > > - index >= shmem_falloc->start && > > - index < shmem_falloc->next) > > - shmem_falloc->nr_unswapped++; > > - else > > - shmem_falloc = NULL; > > - spin_unlock(&inode->i_lock); > > - if (shmem_falloc) > > - goto redirty; > > - } > > clear_highpage(page); > > flush_dcache_page(page); > > SetPageUptodate(page); > > @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > loff_t len) > > { > > struct inode *inode = file_inode(file); > > + struct address_space *mapping = file->f_mapping; > > struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); > > struct shmem_inode_info *info = SHMEM_I(inode); > > - struct shmem_falloc shmem_falloc; > > pgoff_t start, index, end; > > int error; > > > > @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > inode_lock(inode); > > > > if (mode & FALLOC_FL_PUNCH_HOLE) { > > - struct address_space *mapping = file->f_mapping; > > loff_t unmap_start = round_up(offset, PAGE_SIZE); > > loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; > > > > @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > goto out; > > } > > > > - shmem_falloc.start = start; > > - shmem_falloc.next = start; > > - shmem_falloc.nr_falloced = 0; > > - shmem_falloc.nr_unswapped = 0; > > - spin_lock(&inode->i_lock); > > - inode->i_private = &shmem_falloc; > > - spin_unlock(&inode->i_lock); > > - > > + down_write(&mapping->invalidate_lock); > > for (index = start; index < end; index++) { > > struct page *page; > > > > @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > */ > > if (signal_pending(current)) > > error = -EINTR; > > - else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) > > - error = -ENOMEM; > > else > > error = shmem_getpage(inode, index, &page, SGP_FALLOC); > > if (error) { > > @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > goto undone; > > } > > > > - /* > > - * Inform shmem_writepage() how far we have reached. > > - * No need for lock or barrier: we have the page lock. > > - */ > > - shmem_falloc.next++; > > - if (!PageUptodate(page)) > > - shmem_falloc.nr_falloced++; > > - > > /* > > * If !PageUptodate, leave it that way so that freeable pages > > * can be recognized if we need to rollback on error later. > > Which goes on to say: > > * But set_page_dirty so that memory pressure will swap rather > * than free the pages we are allocating (and SGP_CACHE pages > * might still be clean: we now need to mark those dirty too). > */ > set_page_dirty(page); > unlock_page(page); > put_page(page); > cond_resched(); > > > @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > i_size_write(inode, offset + len); > > inode->i_ctime = current_time(inode); > > undone: > > - spin_lock(&inode->i_lock); > > - inode->i_private = NULL; > > - spin_unlock(&inode->i_lock); > > + up_write(&mapping->invalidate_lock); > > out: > > inode_unlock(inode); > > return error; > > -- > > 2.26.2
diff --git a/mm/shmem.c b/mm/shmem.c index f34162ac46de..7a2b0744031e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt; /* Symlink up to this size is kmalloc'ed instead of using a swappable page */ #define SHORT_SYMLINK_LEN 128 -/* - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with - * i_rwsem making sure that it has only one user at a time): we would prefer - * not to enlarge the shmem inode just for that. - */ -struct shmem_falloc { - pgoff_t start; /* start of range currently being fallocated */ - pgoff_t next; /* the next page offset to be fallocated */ - pgoff_t nr_falloced; /* how many new pages have been fallocated */ - pgoff_t nr_unswapped; /* how often writepage refused to swap out */ -}; - struct shmem_options { unsigned long long blocks; unsigned long long inodes; @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC * value into swapfile.c, the only way we can correctly account for a * fallocated page arriving here is now to initialize it and write it. - * - * That's okay for a page already fallocated earlier, but if we have - * not yet completed the fallocation, then (a) we want to keep track - * of this page in case we have to undo it, and (b) it may not be a - * good idea to continue anyway, once we're pushing into swap. So - * reactivate the page, and let shmem_fallocate() quit when too many. + * Since a page added by currently running fallocate call cannot be + * dirtied and thus arrive here we know the fallocate has already + * completed and we are fine writing it out. */ if (!PageUptodate(page)) { - if (inode->i_private) { - struct shmem_falloc *shmem_falloc; - spin_lock(&inode->i_lock); - shmem_falloc = inode->i_private; - if (shmem_falloc && - index >= shmem_falloc->start && - index < shmem_falloc->next) - shmem_falloc->nr_unswapped++; - else - shmem_falloc = NULL; - spin_unlock(&inode->i_lock); - if (shmem_falloc) - goto redirty; - } clear_highpage(page); flush_dcache_page(page); SetPageUptodate(page); @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); + struct address_space *mapping = file->f_mapping; struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); struct shmem_inode_info *info = SHMEM_I(inode); - struct shmem_falloc shmem_falloc; pgoff_t start, index, end; int error; @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode_lock(inode); if (mode & FALLOC_FL_PUNCH_HOLE) { - struct address_space *mapping = file->f_mapping; loff_t unmap_start = round_up(offset, PAGE_SIZE); loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto out; } - shmem_falloc.start = start; - shmem_falloc.next = start; - shmem_falloc.nr_falloced = 0; - shmem_falloc.nr_unswapped = 0; - spin_lock(&inode->i_lock); - inode->i_private = &shmem_falloc; - spin_unlock(&inode->i_lock); - + down_write(&mapping->invalidate_lock); for (index = start; index < end; index++) { struct page *page; @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, */ if (signal_pending(current)) error = -EINTR; - else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) - error = -ENOMEM; else error = shmem_getpage(inode, index, &page, SGP_FALLOC); if (error) { @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto undone; } - /* - * Inform shmem_writepage() how far we have reached. - * No need for lock or barrier: we have the page lock. - */ - shmem_falloc.next++; - if (!PageUptodate(page)) - shmem_falloc.nr_falloced++; - /* * If !PageUptodate, leave it that way so that freeable pages * can be recognized if we need to rollback on error later. @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, i_size_write(inode, offset + len); inode->i_ctime = current_time(inode); undone: - spin_lock(&inode->i_lock); - inode->i_private = NULL; - spin_unlock(&inode->i_lock); + up_write(&mapping->invalidate_lock); out: inode_unlock(inode); return error;
We have to handle pages added by currently running shmem_fallocate() specially in shmem_writepage(). For this we use serialization mechanism using structure attached to inode->i_private field. If we protect allocation of pages in shmem_fallocate() with invalidate_lock instead, we are sure added pages cannot be dirtied until shmem_fallocate() is done (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we cannot see those pages in shmem_writepage() and there's no need for the serialization mechanism. CC: Hugh Dickins <hughd@google.com> CC: <linux-mm@kvack.org> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/shmem.c | 61 ++++++------------------------------------------------ 1 file changed, 6 insertions(+), 55 deletions(-)