[v3,07/25] fsdax: Hold dax lock over mapping insertion

Message ID	166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Subject: [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion From: Dan Williams <dan.j.williams@intel.com> To: linux-mm@kvack.org Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>, "Darrick J. Wong" <djwong@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>, John Hubbard <jhubbard@nvidia.com>, david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:37 -0700 Message-ID: <166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	Fix the DAX-gup mistake \| expand [v3,00/25] Fix the DAX-gup mistake [v3,01/25] fsdax: Wait on @page not @page->_refcount [v3,02/25] fsdax: Use dax_page_idle() to document DAX busy page checking [v3,03/25] fsdax: Include unmapped inodes for page-idle detection [v3,04/25] fsdax: Introduce dax_zap_mappings() [v3,05/25] fsdax: Wait for pinned pages during truncate_inode_pages_final() [v3,06/25] fsdax: Validate DAX layouts broken before truncate [v3,07/25] fsdax: Hold dax lock over mapping insertion [v3,08/25] fsdax: Update dax_insert_entry() calling convention to return an error [v3,09/25] fsdax: Rework for_each_mapped_pfn() to dax_for_each_folio() [v3,10/25] fsdax: Introduce pgmap_request_folios() [v3,11/25] fsdax: Rework dax_insert_entry() calling convention [v3,12/25] fsdax: Cleanup dax_associate_entry() [v3,13/25] devdax: Minor warning fixups [v3,14/25] devdax: Fix sparse lock imbalance warning [v3,15/25] libnvdimm/pmem: Support pmem block devices without dax [v3,16/25] devdax: Move address_space helpers to the DAX core [v3,17/25] devdax: Sparse fixes for xarray locking [v3,18/25] devdax: Sparse fixes for vmfault_t / dax-entry conversions [v3,19/25] devdax: Sparse fixes for vm_fault_t in tracepoints [v3,20/25] devdax: add PUD support to the DAX mapping infrastructure [v3,21/25] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() [v3,22/25] mm/memremap_pages: Replace zone_device_page_init() with pgmap_request_folios() [v3,23/25] mm/memremap_pages: Initialize all ZONE_DEVICE pages to start at refcount 0 [v3,24/25] mm/meremap_pages: Delete put_devmap_managed_page_refs() [v3,25/25] mm/gup: Drop DAX pgmap accounting

Message ID

166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com (mailing list archive)

State

New, archived

Headers

Subject: [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion
From: Dan Williams <dan.j.williams@intel.com>
To: linux-mm@kvack.org
Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
        "Darrick J. Wong" <djwong@kernel.org>,
        Jason Gunthorpe <jgg@nvidia.com>,
        Christoph Hellwig <hch@lst.de>,
        John Hubbard <jhubbard@nvidia.com>, david@fromorbit.com,
        nvdimm@lists.linux.dev, akpm@linux-foundation.org,
        linux-fsdevel@vger.kernel.org
Date: Fri, 14 Oct 2022 16:57:37 -0700
Message-ID: 
 <166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com>
In-Reply-To: 
 <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com>
References: 
 <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com>
User-Agent: StGit/0.18-3-g996c
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Precedence: bulk

Series

Fix the DAX-gup mistake | expand

Commit Message

Dan Williams Oct. 14, 2022, 11:57 p.m. UTC

In preparation for dax_insert_entry() to start taking page and pgmap
references ensure that page->pgmap is valid by holding the
dax_read_lock() over both dax_direct_access() and dax_insert_entry().

I.e. the code that wants to elevate the reference count of a pgmap page
from 0 -> 1 must ensure that the pgmap is not exiting and will not start
exiting until the proper references have been taken.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe Oct. 17, 2022, 7:31 p.m. UTC | #1

On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote:
> In preparation for dax_insert_entry() to start taking page and pgmap
> references ensure that page->pgmap is valid by holding the
> dax_read_lock() over both dax_direct_access() and dax_insert_entry().
> 
> I.e. the code that wants to elevate the reference count of a pgmap page
> from 0 -> 1 must ensure that the pgmap is not exiting and will not start
> exiting until the proper references have been taken.

I'm surprised we can have a vmfault while the pgmap is exiting?

Shouldn't the FS have torn down all the inodes before it starts
killing the pgmap?

And since tearing down all the inodes now ensures all the pages have 0
reference, why do we need to do anything with the pgmap?

Jason

Dan Williams Oct. 17, 2022, 8:17 p.m. UTC | #2

Jason Gunthorpe wrote:
> On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote:
> > In preparation for dax_insert_entry() to start taking page and pgmap
> > references ensure that page->pgmap is valid by holding the
> > dax_read_lock() over both dax_direct_access() and dax_insert_entry().
> > 
> > I.e. the code that wants to elevate the reference count of a pgmap page
> > from 0 -> 1 must ensure that the pgmap is not exiting and will not start
> > exiting until the proper references have been taken.
> 
> I'm surprised we can have a vmfault while the pgmap is exiting?
> 
> Shouldn't the FS have torn down all the inodes before it starts
> killing the pgmap?

Historically, no. The block-device is allowed to disappear while inodes
are still live. For example, the filesystem's calls to blk_queue_enter()
will start failing, but otherwise the filesystem tries to hobble along
after the device-driver has finished ->remove(). In the typical
page-cache case this makes sense since there is still some residual
usability of cached data even after the backing device is gone.

Recently Ruan plumbed support for failure-notification callbacks into
the filesystem, or at least XFS. With that in place the driver can
theoretically notify failures like "device gone" and the FS can take
actions like tearing down inodes. However, that is FS specific enabling
/ behaviour, not something the pgmap code can rely upon. At least, not
without some layering violations.

Christoph Hellwig Oct. 18, 2022, 5:26 a.m. UTC | #3

On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote:
> Historically, no. The block-device is allowed to disappear while inodes
> are still live.

Btw, while I agree with what you wrote below this sentence is at least
a bit confusing.  Struct block_device/gendisk/request_queue will always
be valid as long as a file system is mounted and inodes are live due
to refcounting.  It's just as you correctly pointed out del_gendisk
might have aready been called and they are dead.

Dan Williams Oct. 18, 2022, 5:30 p.m. UTC | #4

Christoph Hellwig wrote:
> On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote:
> > Historically, no. The block-device is allowed to disappear while inodes
> > are still live.
> 
> Btw, while I agree with what you wrote below this sentence is at least
> a bit confusing.  Struct block_device/gendisk/request_queue will always
> be valid as long as a file system is mounted and inodes are live due
> to refcounting.  It's just as you correctly pointed out del_gendisk
> might have aready been called and they are dead.

Yes, when I said "allowed to disappear" I should have said "allowed to
die".

diff --git a/fs/dax.c b/fs/dax.c
index 1d4f0072e58d..6990a6e7df9f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1107,10 +1107,9 @@  static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
 		size_t size, void **kaddr, pfn_t *pfnp)
 {
 	pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
-	int id, rc = 0;
 	long length;
+	int rc = 0;
 
-	id = dax_read_lock();
 	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
 				   DAX_ACCESS, kaddr, pfnp);
 	if (length < 0) {
@@ -1135,7 +1134,6 @@  static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
 	if (!*kaddr)
 		rc = -EFAULT;
 out:
-	dax_read_unlock(id);
 	return rc;
 }
 
@@ -1588,7 +1586,7 @@  static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
 	loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
 	bool write = iter->flags & IOMAP_WRITE;
 	unsigned long entry_flags = pmd ? DAX_PMD : 0;
-	int err = 0;
+	int err = 0, id;
 	pfn_t pfn;
 	void *kaddr;
 
@@ -1608,11 +1606,15 @@  static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
 		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
 	}
 
+	id = dax_read_lock();
 	err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
-	if (err)
+	if (err) {
+		dax_read_unlock(id);
 		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
+	}
 
 	*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags);
+	dax_read_unlock(id);
 
 	if (write &&
 	    srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {

[v3,07/25] fsdax: Hold dax lock over mapping insertion

Commit Message

Comments

Patch