diff mbox series

[v3,07/25] fsdax: Hold dax lock over mapping insertion

Message ID 166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com (mailing list archive)
State New
Headers show
Series Fix the DAX-gup mistake | expand

Commit Message

Dan Williams Oct. 14, 2022, 11:57 p.m. UTC
In preparation for dax_insert_entry() to start taking page and pgmap
references ensure that page->pgmap is valid by holding the
dax_read_lock() over both dax_direct_access() and dax_insert_entry().

I.e. the code that wants to elevate the reference count of a pgmap page
from 0 -> 1 must ensure that the pgmap is not exiting and will not start
exiting until the proper references have been taken.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe Oct. 17, 2022, 7:31 p.m. UTC | #1
On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote:
> In preparation for dax_insert_entry() to start taking page and pgmap
> references ensure that page->pgmap is valid by holding the
> dax_read_lock() over both dax_direct_access() and dax_insert_entry().
> 
> I.e. the code that wants to elevate the reference count of a pgmap page
> from 0 -> 1 must ensure that the pgmap is not exiting and will not start
> exiting until the proper references have been taken.

I'm surprised we can have a vmfault while the pgmap is exiting?

Shouldn't the FS have torn down all the inodes before it starts
killing the pgmap?

And since tearing down all the inodes now ensures all the pages have 0
reference, why do we need to do anything with the pgmap?

Jason
Dan Williams Oct. 17, 2022, 8:17 p.m. UTC | #2
Jason Gunthorpe wrote:
> On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote:
> > In preparation for dax_insert_entry() to start taking page and pgmap
> > references ensure that page->pgmap is valid by holding the
> > dax_read_lock() over both dax_direct_access() and dax_insert_entry().
> > 
> > I.e. the code that wants to elevate the reference count of a pgmap page
> > from 0 -> 1 must ensure that the pgmap is not exiting and will not start
> > exiting until the proper references have been taken.
> 
> I'm surprised we can have a vmfault while the pgmap is exiting?
> 
> Shouldn't the FS have torn down all the inodes before it starts
> killing the pgmap?

Historically, no. The block-device is allowed to disappear while inodes
are still live. For example, the filesystem's calls to blk_queue_enter()
will start failing, but otherwise the filesystem tries to hobble along
after the device-driver has finished ->remove(). In the typical
page-cache case this makes sense since there is still some residual
usability of cached data even after the backing device is gone.

Recently Ruan plumbed support for failure-notification callbacks into
the filesystem, or at least XFS. With that in place the driver can
theoretically notify failures like "device gone" and the FS can take
actions like tearing down inodes. However, that is FS specific enabling
/ behaviour, not something the pgmap code can rely upon. At least, not
without some layering violations.
Christoph Hellwig Oct. 18, 2022, 5:26 a.m. UTC | #3
On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote:
> Historically, no. The block-device is allowed to disappear while inodes
> are still live.

Btw, while I agree with what you wrote below this sentence is at least
a bit confusing.  Struct block_device/gendisk/request_queue will always
be valid as long as a file system is mounted and inodes are live due
to refcounting.  It's just as you correctly pointed out del_gendisk
might have aready been called and they are dead.
Dan Williams Oct. 18, 2022, 5:30 p.m. UTC | #4
Christoph Hellwig wrote:
> On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote:
> > Historically, no. The block-device is allowed to disappear while inodes
> > are still live.
> 
> Btw, while I agree with what you wrote below this sentence is at least
> a bit confusing.  Struct block_device/gendisk/request_queue will always
> be valid as long as a file system is mounted and inodes are live due
> to refcounting.  It's just as you correctly pointed out del_gendisk
> might have aready been called and they are dead.

Yes, when I said "allowed to disappear" I should have said "allowed to
die".
diff mbox series

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 1d4f0072e58d..6990a6e7df9f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1107,10 +1107,9 @@  static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
 		size_t size, void **kaddr, pfn_t *pfnp)
 {
 	pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
-	int id, rc = 0;
 	long length;
+	int rc = 0;
 
-	id = dax_read_lock();
 	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
 				   DAX_ACCESS, kaddr, pfnp);
 	if (length < 0) {
@@ -1135,7 +1134,6 @@  static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
 	if (!*kaddr)
 		rc = -EFAULT;
 out:
-	dax_read_unlock(id);
 	return rc;
 }
 
@@ -1588,7 +1586,7 @@  static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
 	loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
 	bool write = iter->flags & IOMAP_WRITE;
 	unsigned long entry_flags = pmd ? DAX_PMD : 0;
-	int err = 0;
+	int err = 0, id;
 	pfn_t pfn;
 	void *kaddr;
 
@@ -1608,11 +1606,15 @@  static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
 		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
 	}
 
+	id = dax_read_lock();
 	err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
-	if (err)
+	if (err) {
+		dax_read_unlock(id);
 		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
+	}
 
 	*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags);
+	dax_read_unlock(id);
 
 	if (write &&
 	    srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {