[10/12] shmem: Use invalidate_lock to protect fallocate

Message ID	20210423173018.23133-10-jack@suse.cz (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=74i9=JU=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 196956140F From: Jan Kara <jack@suse.cz> To: <linux-fsdevel@vger.kernel.org> Cc: Christoph Hellwig <hch@infradead.org>, Amir Goldstein <amir73il@gmail.com>, Dave Chinner <david@fromorbit.com>, Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>, linux-mm@kvack.org Subject: [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate Date: Fri, 23 Apr 2021 19:29:39 +0200 Message-Id: <20210423173018.23133-10-jack@suse.cz> In-Reply-To: <20210423171010.12-1-jack@suse.cz> References: <20210423171010.12-1-jack@suse.cz> MIME-Version: 1.0 Received-SPF: none (suse.cz>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from="<jack@suse.cz>"; helo=mx2.suse.de; client-ip=195.135.220.15 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	fs: Hole punch vs page cache filling races \| expand [0/12,v4] fs: Hole punch vs page cache filling races [02/12] mm: Protect operations adding pages to page cache with invalidate_lock [10/12] shmem: Use invalidate_lock to protect fallocate

Message ID

20210423173018.23133-10-jack@suse.cz (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 196956140F
From: Jan Kara <jack@suse.cz>
To: <linux-fsdevel@vger.kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	Ted Tso <tytso@mit.edu>,
	Jan Kara <jack@suse.cz>,
	Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org
Subject: [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate
Date: Fri, 23 Apr 2021 19:29:39 +0200
Message-Id: <20210423173018.23133-10-jack@suse.cz>
In-Reply-To: <20210423171010.12-1-jack@suse.cz>
References: <20210423171010.12-1-jack@suse.cz>
MIME-Version: 1.0
Received-SPF: none (suse.cz>: No applicable sender policy available)
 receiver=imf13; identity=mailfrom; envelope-from="<jack@suse.cz>";
 helo=mx2.suse.de; client-ip=195.135.220.15
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

fs: Hole punch vs page cache filling races | expand

Commit Message

Jan Kara April 23, 2021, 5:29 p.m. UTC

We have to handle pages added by currently running shmem_fallocate()
specially in shmem_writepage(). For this we use serialization mechanism
using structure attached to inode->i_private field. If we protect
allocation of pages in shmem_fallocate() with invalidate_lock instead,
we are sure added pages cannot be dirtied until shmem_fallocate() is done
(invalidate_lock blocks faults, i_rwsem blocks writes) and thus we
cannot see those pages in shmem_writepage() and there's no need for the
serialization mechanism.

CC: Hugh Dickins <hughd@google.com>
CC: <linux-mm@kvack.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/shmem.c | 61 ++++++------------------------------------------------
 1 file changed, 6 insertions(+), 55 deletions(-)

Comments

kernel test robot April 23, 2021, 7:27 p.m. UTC | #1

Hi Jan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ext4/dev]
[also build test WARNING on fuse/for-next linus/master v5.12-rc8]
[cannot apply to hnaz-linux-mm/master next-20210423]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jan-Kara/fs-Hole-punch-vs-page-cache-filling-races/20210424-013114
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: s390-randconfig-s031-20210424 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/800cf89f11d437415eead2da969c3b07908fd406
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jan-Kara/fs-Hole-punch-vs-page-cache-filling-races/20210424-013114
        git checkout 800cf89f11d437415eead2da969c3b07908fd406
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   mm/shmem.c: In function 'shmem_writepage':
>> mm/shmem.c:1326:10: warning: variable 'index' set but not used [-Wunused-but-set-variable]
    1326 |  pgoff_t index;
         |          ^~~~~


vim +/index +1326 mm/shmem.c

^1da177e4c3f41 Linus Torvalds         2005-04-16  1316  
^1da177e4c3f41 Linus Torvalds         2005-04-16  1317  /*
^1da177e4c3f41 Linus Torvalds         2005-04-16  1318   * Move the page from the page cache to the swap cache.
^1da177e4c3f41 Linus Torvalds         2005-04-16  1319   */
^1da177e4c3f41 Linus Torvalds         2005-04-16  1320  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
^1da177e4c3f41 Linus Torvalds         2005-04-16  1321  {
^1da177e4c3f41 Linus Torvalds         2005-04-16  1322  	struct shmem_inode_info *info;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1323  	struct address_space *mapping;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1324  	struct inode *inode;
6922c0c7abd387 Hugh Dickins           2011-08-03  1325  	swp_entry_t swap;
6922c0c7abd387 Hugh Dickins           2011-08-03 @1326  	pgoff_t index;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1327  
800d8c63b2e989 Kirill A. Shutemov     2016-07-26  1328  	VM_BUG_ON_PAGE(PageCompound(page), page);
^1da177e4c3f41 Linus Torvalds         2005-04-16  1329  	BUG_ON(!PageLocked(page));
^1da177e4c3f41 Linus Torvalds         2005-04-16  1330  	mapping = page->mapping;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1331  	index = page->index;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1332  	inode = mapping->host;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1333  	info = SHMEM_I(inode);
^1da177e4c3f41 Linus Torvalds         2005-04-16  1334  	if (info->flags & VM_LOCKED)
^1da177e4c3f41 Linus Torvalds         2005-04-16  1335  		goto redirty;
d9fe526a83b84e Hugh Dickins           2008-02-04  1336  	if (!total_swap_pages)
^1da177e4c3f41 Linus Torvalds         2005-04-16  1337  		goto redirty;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1338  
d9fe526a83b84e Hugh Dickins           2008-02-04  1339  	/*
97b713ba3ebaa6 Christoph Hellwig      2015-01-14  1340  	 * Our capabilities prevent regular writeback or sync from ever calling
97b713ba3ebaa6 Christoph Hellwig      2015-01-14  1341  	 * shmem_writepage; but a stacking filesystem might use ->writepage of
97b713ba3ebaa6 Christoph Hellwig      2015-01-14  1342  	 * its underlying filesystem, in which case tmpfs should write out to
97b713ba3ebaa6 Christoph Hellwig      2015-01-14  1343  	 * swap only in response to memory pressure, and not for the writeback
97b713ba3ebaa6 Christoph Hellwig      2015-01-14  1344  	 * threads or sync.
d9fe526a83b84e Hugh Dickins           2008-02-04  1345  	 */
48f170fb7d7db8 Hugh Dickins           2011-07-25  1346  	if (!wbc->for_reclaim) {
48f170fb7d7db8 Hugh Dickins           2011-07-25  1347  		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */
48f170fb7d7db8 Hugh Dickins           2011-07-25  1348  		goto redirty;
48f170fb7d7db8 Hugh Dickins           2011-07-25  1349  	}
1635f6a74152f1 Hugh Dickins           2012-05-29  1350  
1635f6a74152f1 Hugh Dickins           2012-05-29  1351  	/*
1635f6a74152f1 Hugh Dickins           2012-05-29  1352  	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
1635f6a74152f1 Hugh Dickins           2012-05-29  1353  	 * value into swapfile.c, the only way we can correctly account for a
1635f6a74152f1 Hugh Dickins           2012-05-29  1354  	 * fallocated page arriving here is now to initialize it and write it.
800cf89f11d437 Jan Kara               2021-04-23  1355  	 * Since a page added by currently running fallocate call cannot be
800cf89f11d437 Jan Kara               2021-04-23  1356  	 * dirtied and thus arrive here we know the fallocate has already
800cf89f11d437 Jan Kara               2021-04-23  1357  	 * completed and we are fine writing it out.
1635f6a74152f1 Hugh Dickins           2012-05-29  1358  	 */
1635f6a74152f1 Hugh Dickins           2012-05-29  1359  	if (!PageUptodate(page)) {
1635f6a74152f1 Hugh Dickins           2012-05-29  1360  		clear_highpage(page);
1635f6a74152f1 Hugh Dickins           2012-05-29  1361  		flush_dcache_page(page);
1635f6a74152f1 Hugh Dickins           2012-05-29  1362  		SetPageUptodate(page);
1635f6a74152f1 Hugh Dickins           2012-05-29  1363  	}
1635f6a74152f1 Hugh Dickins           2012-05-29  1364  
38d8b4e6bdc872 Huang Ying             2017-07-06  1365  	swap = get_swap_page(page);
48f170fb7d7db8 Hugh Dickins           2011-07-25  1366  	if (!swap.val)
48f170fb7d7db8 Hugh Dickins           2011-07-25  1367  		goto redirty;
d9fe526a83b84e Hugh Dickins           2008-02-04  1368  
b1dea800ac3959 Hugh Dickins           2011-05-11  1369  	/*
b1dea800ac3959 Hugh Dickins           2011-05-11  1370  	 * Add inode to shmem_unuse()'s list of swapped-out inodes,
6922c0c7abd387 Hugh Dickins           2011-08-03  1371  	 * if it's not already there.  Do it now before the page is
6922c0c7abd387 Hugh Dickins           2011-08-03  1372  	 * moved to swap cache, when its pagelock no longer protects
b1dea800ac3959 Hugh Dickins           2011-05-11  1373  	 * the inode from eviction.  But don't unlock the mutex until
6922c0c7abd387 Hugh Dickins           2011-08-03  1374  	 * we've incremented swapped, because shmem_unuse_inode() will
6922c0c7abd387 Hugh Dickins           2011-08-03  1375  	 * prune a !swapped inode from the swaplist under this mutex.
b1dea800ac3959 Hugh Dickins           2011-05-11  1376  	 */
b1dea800ac3959 Hugh Dickins           2011-05-11  1377  	mutex_lock(&shmem_swaplist_mutex);
05bf86b4ccfd0f Hugh Dickins           2011-05-14  1378  	if (list_empty(&info->swaplist))
b56a2d8af9147a Vineeth Remanan Pillai 2019-03-05  1379  		list_add(&info->swaplist, &shmem_swaplist);
b1dea800ac3959 Hugh Dickins           2011-05-11  1380  
4afab1cd256e42 Yang Shi               2019-11-30  1381  	if (add_to_swap_cache(page, swap,
3852f6768ede54 Joonsoo Kim            2020-08-11  1382  			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
3852f6768ede54 Joonsoo Kim            2020-08-11  1383  			NULL) == 0) {
4595ef88d13613 Kirill A. Shutemov     2016-07-26  1384  		spin_lock_irq(&info->lock);
6922c0c7abd387 Hugh Dickins           2011-08-03  1385  		shmem_recalc_inode(inode);
267a4c76bbdb95 Hugh Dickins           2015-12-11  1386  		info->swapped++;
4595ef88d13613 Kirill A. Shutemov     2016-07-26  1387  		spin_unlock_irq(&info->lock);
6922c0c7abd387 Hugh Dickins           2011-08-03  1388  
267a4c76bbdb95 Hugh Dickins           2015-12-11  1389  		swap_shmem_alloc(swap);
267a4c76bbdb95 Hugh Dickins           2015-12-11  1390  		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
267a4c76bbdb95 Hugh Dickins           2015-12-11  1391  
6922c0c7abd387 Hugh Dickins           2011-08-03  1392  		mutex_unlock(&shmem_swaplist_mutex);
d9fe526a83b84e Hugh Dickins           2008-02-04  1393  		BUG_ON(page_mapped(page));
9fab5619bdd7f8 Hugh Dickins           2009-03-31  1394  		swap_writepage(page, wbc);
^1da177e4c3f41 Linus Torvalds         2005-04-16  1395  		return 0;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1396  	}
^1da177e4c3f41 Linus Torvalds         2005-04-16  1397  
6922c0c7abd387 Hugh Dickins           2011-08-03  1398  	mutex_unlock(&shmem_swaplist_mutex);
75f6d6d29a40b5 Minchan Kim            2017-07-06  1399  	put_swap_page(page, swap);
^1da177e4c3f41 Linus Torvalds         2005-04-16  1400  redirty:
^1da177e4c3f41 Linus Torvalds         2005-04-16  1401  	set_page_dirty(page);
d9fe526a83b84e Hugh Dickins           2008-02-04  1402  	if (wbc->for_reclaim)
d9fe526a83b84e Hugh Dickins           2008-02-04  1403  		return AOP_WRITEPAGE_ACTIVATE;	/* Return with page locked */
d9fe526a83b84e Hugh Dickins           2008-02-04  1404  	unlock_page(page);
d9fe526a83b84e Hugh Dickins           2008-02-04  1405  	return 0;
^1da177e4c3f41 Linus Torvalds         2005-04-16  1406  }
^1da177e4c3f41 Linus Torvalds         2005-04-16  1407  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Hugh Dickins April 29, 2021, 3:24 a.m. UTC | #2

On Fri, 23 Apr 2021, Jan Kara wrote:

> We have to handle pages added by currently running shmem_fallocate()
> specially in shmem_writepage(). For this we use serialization mechanism
> using structure attached to inode->i_private field. If we protect
> allocation of pages in shmem_fallocate() with invalidate_lock instead,
> we are sure added pages cannot be dirtied until shmem_fallocate() is done
> (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we
> cannot see those pages in shmem_writepage() and there's no need for the
> serialization mechanism.

Appealing diffstat, but NAK, this patch is based on a false premise.

All those pages that are added by shmem_fallocate() are marked dirty:
see the set_page_dirty(page) and comment above it in shmem_fallocate().

It's intentional, that they should percolate through to shmem_writepage()
when memory is tight, and give feedback to fallocate that it should stop
(instead of getting into that horrid OOM-kill flurry that never even
frees any tmpfs anyway).

Hugh

> 
> CC: Hugh Dickins <hughd@google.com>
> CC: <linux-mm@kvack.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/shmem.c | 61 ++++++------------------------------------------------
>  1 file changed, 6 insertions(+), 55 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index f34162ac46de..7a2b0744031e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt;
>  /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
>  #define SHORT_SYMLINK_LEN 128
>  
> -/*
> - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with
> - * i_rwsem making sure that it has only one user at a time): we would prefer
> - * not to enlarge the shmem inode just for that.
> - */
> -struct shmem_falloc {
> -	pgoff_t start;		/* start of range currently being fallocated */
> -	pgoff_t next;		/* the next page offset to be fallocated */
> -	pgoff_t nr_falloced;	/* how many new pages have been fallocated */
> -	pgoff_t nr_unswapped;	/* how often writepage refused to swap out */
> -};
> -
>  struct shmem_options {
>  	unsigned long long blocks;
>  	unsigned long long inodes;
> @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
>  	 * value into swapfile.c, the only way we can correctly account for a
>  	 * fallocated page arriving here is now to initialize it and write it.
> -	 *
> -	 * That's okay for a page already fallocated earlier, but if we have
> -	 * not yet completed the fallocation, then (a) we want to keep track
> -	 * of this page in case we have to undo it, and (b) it may not be a
> -	 * good idea to continue anyway, once we're pushing into swap.  So
> -	 * reactivate the page, and let shmem_fallocate() quit when too many.

(b) there commenting on communicating back to fallocate by nr_unswapped.

> +	 * Since a page added by currently running fallocate call cannot be
> +	 * dirtied and thus arrive here we know the fallocate has already
> +	 * completed and we are fine writing it out.
>  	 */
>  	if (!PageUptodate(page)) {
> -		if (inode->i_private) {
> -			struct shmem_falloc *shmem_falloc;
> -			spin_lock(&inode->i_lock);
> -			shmem_falloc = inode->i_private;
> -			if (shmem_falloc &&
> -			    index >= shmem_falloc->start &&
> -			    index < shmem_falloc->next)
> -				shmem_falloc->nr_unswapped++;
> -			else
> -				shmem_falloc = NULL;
> -			spin_unlock(&inode->i_lock);
> -			if (shmem_falloc)
> -				goto redirty;
> -		}
>  		clear_highpage(page);
>  		flush_dcache_page(page);
>  		SetPageUptodate(page);
> @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  							 loff_t len)
>  {
>  	struct inode *inode = file_inode(file);
> +	struct address_space *mapping = file->f_mapping;
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> -	struct shmem_falloc shmem_falloc;
>  	pgoff_t start, index, end;
>  	int error;
>  
> @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  	inode_lock(inode);
>  
>  	if (mode & FALLOC_FL_PUNCH_HOLE) {
> -		struct address_space *mapping = file->f_mapping;
>  		loff_t unmap_start = round_up(offset, PAGE_SIZE);
>  		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
>  
> @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  		goto out;
>  	}
>  
> -	shmem_falloc.start = start;
> -	shmem_falloc.next  = start;
> -	shmem_falloc.nr_falloced = 0;
> -	shmem_falloc.nr_unswapped = 0;
> -	spin_lock(&inode->i_lock);
> -	inode->i_private = &shmem_falloc;
> -	spin_unlock(&inode->i_lock);
> -
> +	down_write(&mapping->invalidate_lock);
>  	for (index = start; index < end; index++) {
>  		struct page *page;
>  
> @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  		 */
>  		if (signal_pending(current))
>  			error = -EINTR;
> -		else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced)
> -			error = -ENOMEM;
>  		else
>  			error = shmem_getpage(inode, index, &page, SGP_FALLOC);
>  		if (error) {
> @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  			goto undone;
>  		}
>  
> -		/*
> -		 * Inform shmem_writepage() how far we have reached.
> -		 * No need for lock or barrier: we have the page lock.
> -		 */
> -		shmem_falloc.next++;
> -		if (!PageUptodate(page))
> -			shmem_falloc.nr_falloced++;
> -
>  		/*
>  		 * If !PageUptodate, leave it that way so that freeable pages
>  		 * can be recognized if we need to rollback on error later.

Which goes on to say:

		 * But set_page_dirty so that memory pressure will swap rather
		 * than free the pages we are allocating (and SGP_CACHE pages
		 * might still be clean: we now need to mark those dirty too).
		 */
		set_page_dirty(page);
		unlock_page(page);
		put_page(page);
		cond_resched();

> @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  		i_size_write(inode, offset + len);
>  	inode->i_ctime = current_time(inode);
>  undone:
> -	spin_lock(&inode->i_lock);
> -	inode->i_private = NULL;
> -	spin_unlock(&inode->i_lock);
> +	up_write(&mapping->invalidate_lock);
>  out:
>  	inode_unlock(inode);
>  	return error;
> -- 
> 2.26.2

Jan Kara April 29, 2021, 9:20 a.m. UTC | #3

On Wed 28-04-21 20:24:59, Hugh Dickins wrote:
> On Fri, 23 Apr 2021, Jan Kara wrote:
> 
> > We have to handle pages added by currently running shmem_fallocate()
> > specially in shmem_writepage(). For this we use serialization mechanism
> > using structure attached to inode->i_private field. If we protect
> > allocation of pages in shmem_fallocate() with invalidate_lock instead,
> > we are sure added pages cannot be dirtied until shmem_fallocate() is done
> > (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we
> > cannot see those pages in shmem_writepage() and there's no need for the
> > serialization mechanism.
> 
> Appealing diffstat, but NAK, this patch is based on a false premise.
> 
> All those pages that are added by shmem_fallocate() are marked dirty:
> see the set_page_dirty(page) and comment above it in shmem_fallocate().

Aha, I missed that set_page_dirty(). Thanks for correcting me.

> It's intentional, that they should percolate through to shmem_writepage()
> when memory is tight, and give feedback to fallocate that it should stop
> (instead of getting into that horrid OOM-kill flurry that never even
> frees any tmpfs anyway).

I understand the reason, I just feel there should be a better way of
communicating memory pressure than ->writepage talking to ->fallocate
through structure attached to inode->i_private. But now I see it isn't as
simple as I thought ;) Anyway for now I'll just remove this patch. Thanks
for review!

								Honza

> > CC: Hugh Dickins <hughd@google.com>
> > CC: <linux-mm@kvack.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/shmem.c | 61 ++++++------------------------------------------------
> >  1 file changed, 6 insertions(+), 55 deletions(-)
> > 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index f34162ac46de..7a2b0744031e 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt;
> >  /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
> >  #define SHORT_SYMLINK_LEN 128
> >  
> > -/*
> > - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with
> > - * i_rwsem making sure that it has only one user at a time): we would prefer
> > - * not to enlarge the shmem inode just for that.
> > - */
> > -struct shmem_falloc {
> > -	pgoff_t start;		/* start of range currently being fallocated */
> > -	pgoff_t next;		/* the next page offset to be fallocated */
> > -	pgoff_t nr_falloced;	/* how many new pages have been fallocated */
> > -	pgoff_t nr_unswapped;	/* how often writepage refused to swap out */
> > -};
> > -
> >  struct shmem_options {
> >  	unsigned long long blocks;
> >  	unsigned long long inodes;
> > @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >  	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
> >  	 * value into swapfile.c, the only way we can correctly account for a
> >  	 * fallocated page arriving here is now to initialize it and write it.
> > -	 *
> > -	 * That's okay for a page already fallocated earlier, but if we have
> > -	 * not yet completed the fallocation, then (a) we want to keep track
> > -	 * of this page in case we have to undo it, and (b) it may not be a
> > -	 * good idea to continue anyway, once we're pushing into swap.  So
> > -	 * reactivate the page, and let shmem_fallocate() quit when too many.
> 
> (b) there commenting on communicating back to fallocate by nr_unswapped.
> 
> > +	 * Since a page added by currently running fallocate call cannot be
> > +	 * dirtied and thus arrive here we know the fallocate has already
> > +	 * completed and we are fine writing it out.
> >  	 */
> >  	if (!PageUptodate(page)) {
> > -		if (inode->i_private) {
> > -			struct shmem_falloc *shmem_falloc;
> > -			spin_lock(&inode->i_lock);
> > -			shmem_falloc = inode->i_private;
> > -			if (shmem_falloc &&
> > -			    index >= shmem_falloc->start &&
> > -			    index < shmem_falloc->next)
> > -				shmem_falloc->nr_unswapped++;
> > -			else
> > -				shmem_falloc = NULL;
> > -			spin_unlock(&inode->i_lock);
> > -			if (shmem_falloc)
> > -				goto redirty;
> > -		}
> >  		clear_highpage(page);
> >  		flush_dcache_page(page);
> >  		SetPageUptodate(page);
> > @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  							 loff_t len)
> >  {
> >  	struct inode *inode = file_inode(file);
> > +	struct address_space *mapping = file->f_mapping;
> >  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> >  	struct shmem_inode_info *info = SHMEM_I(inode);
> > -	struct shmem_falloc shmem_falloc;
> >  	pgoff_t start, index, end;
> >  	int error;
> >  
> > @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  	inode_lock(inode);
> >  
> >  	if (mode & FALLOC_FL_PUNCH_HOLE) {
> > -		struct address_space *mapping = file->f_mapping;
> >  		loff_t unmap_start = round_up(offset, PAGE_SIZE);
> >  		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
> >  
> > @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		goto out;
> >  	}
> >  
> > -	shmem_falloc.start = start;
> > -	shmem_falloc.next  = start;
> > -	shmem_falloc.nr_falloced = 0;
> > -	shmem_falloc.nr_unswapped = 0;
> > -	spin_lock(&inode->i_lock);
> > -	inode->i_private = &shmem_falloc;
> > -	spin_unlock(&inode->i_lock);
> > -
> > +	down_write(&mapping->invalidate_lock);
> >  	for (index = start; index < end; index++) {
> >  		struct page *page;
> >  
> > @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		 */
> >  		if (signal_pending(current))
> >  			error = -EINTR;
> > -		else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced)
> > -			error = -ENOMEM;
> >  		else
> >  			error = shmem_getpage(inode, index, &page, SGP_FALLOC);
> >  		if (error) {
> > @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  			goto undone;
> >  		}
> >  
> > -		/*
> > -		 * Inform shmem_writepage() how far we have reached.
> > -		 * No need for lock or barrier: we have the page lock.
> > -		 */
> > -		shmem_falloc.next++;
> > -		if (!PageUptodate(page))
> > -			shmem_falloc.nr_falloced++;
> > -
> >  		/*
> >  		 * If !PageUptodate, leave it that way so that freeable pages
> >  		 * can be recognized if we need to rollback on error later.
> 
> Which goes on to say:
> 
> 		 * But set_page_dirty so that memory pressure will swap rather
> 		 * than free the pages we are allocating (and SGP_CACHE pages
> 		 * might still be clean: we now need to mark those dirty too).
> 		 */
> 		set_page_dirty(page);
> 		unlock_page(page);
> 		put_page(page);
> 		cond_resched();
> 
> > @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		i_size_write(inode, offset + len);
> >  	inode->i_ctime = current_time(inode);
> >  undone:
> > -	spin_lock(&inode->i_lock);
> > -	inode->i_private = NULL;
> > -	spin_unlock(&inode->i_lock);
> > +	up_write(&mapping->invalidate_lock);
> >  out:
> >  	inode_unlock(inode);
> >  	return error;
> > -- 
> > 2.26.2

diff --git a/mm/shmem.c b/mm/shmem.c
index f34162ac46de..7a2b0744031e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -94,18 +94,6 @@  static struct vfsmount *shm_mnt;
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-/*
- * shmem_fallocate communicates with shmem_writepage via inode->i_private (with
- * i_rwsem making sure that it has only one user at a time): we would prefer
- * not to enlarge the shmem inode just for that.
- */
-struct shmem_falloc {
-	pgoff_t start;		/* start of range currently being fallocated */
-	pgoff_t next;		/* the next page offset to be fallocated */
-	pgoff_t nr_falloced;	/* how many new pages have been fallocated */
-	pgoff_t nr_unswapped;	/* how often writepage refused to swap out */
-};
-
 struct shmem_options {
 	unsigned long long blocks;
 	unsigned long long inodes;
@@ -1364,28 +1352,11 @@  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
 	 * value into swapfile.c, the only way we can correctly account for a
 	 * fallocated page arriving here is now to initialize it and write it.
-	 *
-	 * That's okay for a page already fallocated earlier, but if we have
-	 * not yet completed the fallocation, then (a) we want to keep track
-	 * of this page in case we have to undo it, and (b) it may not be a
-	 * good idea to continue anyway, once we're pushing into swap.  So
-	 * reactivate the page, and let shmem_fallocate() quit when too many.
+	 * Since a page added by currently running fallocate call cannot be
+	 * dirtied and thus arrive here we know the fallocate has already
+	 * completed and we are fine writing it out.
 	 */
 	if (!PageUptodate(page)) {
-		if (inode->i_private) {
-			struct shmem_falloc *shmem_falloc;
-			spin_lock(&inode->i_lock);
-			shmem_falloc = inode->i_private;
-			if (shmem_falloc &&
-			    index >= shmem_falloc->start &&
-			    index < shmem_falloc->next)
-				shmem_falloc->nr_unswapped++;
-			else
-				shmem_falloc = NULL;
-			spin_unlock(&inode->i_lock);
-			if (shmem_falloc)
-				goto redirty;
-		}
 		clear_highpage(page);
 		flush_dcache_page(page);
 		SetPageUptodate(page);
@@ -2629,9 +2600,9 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 							 loff_t len)
 {
 	struct inode *inode = file_inode(file);
+	struct address_space *mapping = file->f_mapping;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_falloc shmem_falloc;
 	pgoff_t start, index, end;
 	int error;
 
@@ -2641,7 +2612,6 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 	inode_lock(inode);
 
 	if (mode & FALLOC_FL_PUNCH_HOLE) {
-		struct address_space *mapping = file->f_mapping;
 		loff_t unmap_start = round_up(offset, PAGE_SIZE);
 		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
 
@@ -2680,14 +2650,7 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		goto out;
 	}
 
-	shmem_falloc.start = start;
-	shmem_falloc.next  = start;
-	shmem_falloc.nr_falloced = 0;
-	shmem_falloc.nr_unswapped = 0;
-	spin_lock(&inode->i_lock);
-	inode->i_private = &shmem_falloc;
-	spin_unlock(&inode->i_lock);
-
+	down_write(&mapping->invalidate_lock);
 	for (index = start; index < end; index++) {
 		struct page *page;
 
@@ -2697,8 +2660,6 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		if (signal_pending(current))
 			error = -EINTR;
-		else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced)
-			error = -ENOMEM;
 		else
 			error = shmem_getpage(inode, index, &page, SGP_FALLOC);
 		if (error) {
@@ -2711,14 +2672,6 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 			goto undone;
 		}
 
-		/*
-		 * Inform shmem_writepage() how far we have reached.
-		 * No need for lock or barrier: we have the page lock.
-		 */
-		shmem_falloc.next++;
-		if (!PageUptodate(page))
-			shmem_falloc.nr_falloced++;
-
 		/*
 		 * If !PageUptodate, leave it that way so that freeable pages
 		 * can be recognized if we need to rollback on error later.
@@ -2736,9 +2689,7 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		i_size_write(inode, offset + len);
 	inode->i_ctime = current_time(inode);
 undone:
-	spin_lock(&inode->i_lock);
-	inode->i_private = NULL;
-	spin_unlock(&inode->i_lock);
+	up_write(&mapping->invalidate_lock);
 out:
 	inode_unlock(inode);
 	return error;

[10/12] shmem: Use invalidate_lock to protect fallocate

Commit Message

Comments

Patch