Message ID | 166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix the DAX-gup mistake | expand |
On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote: > In preparation for dax_insert_entry() to start taking page and pgmap > references ensure that page->pgmap is valid by holding the > dax_read_lock() over both dax_direct_access() and dax_insert_entry(). > > I.e. the code that wants to elevate the reference count of a pgmap page > from 0 -> 1 must ensure that the pgmap is not exiting and will not start > exiting until the proper references have been taken. I'm surprised we can have a vmfault while the pgmap is exiting? Shouldn't the FS have torn down all the inodes before it starts killing the pgmap? And since tearing down all the inodes now ensures all the pages have 0 reference, why do we need to do anything with the pgmap? Jason
Jason Gunthorpe wrote: > On Fri, Oct 14, 2022 at 04:57:37PM -0700, Dan Williams wrote: > > In preparation for dax_insert_entry() to start taking page and pgmap > > references ensure that page->pgmap is valid by holding the > > dax_read_lock() over both dax_direct_access() and dax_insert_entry(). > > > > I.e. the code that wants to elevate the reference count of a pgmap page > > from 0 -> 1 must ensure that the pgmap is not exiting and will not start > > exiting until the proper references have been taken. > > I'm surprised we can have a vmfault while the pgmap is exiting? > > Shouldn't the FS have torn down all the inodes before it starts > killing the pgmap? Historically, no. The block-device is allowed to disappear while inodes are still live. For example, the filesystem's calls to blk_queue_enter() will start failing, but otherwise the filesystem tries to hobble along after the device-driver has finished ->remove(). In the typical page-cache case this makes sense since there is still some residual usability of cached data even after the backing device is gone. Recently Ruan plumbed support for failure-notification callbacks into the filesystem, or at least XFS. With that in place the driver can theoretically notify failures like "device gone" and the FS can take actions like tearing down inodes. However, that is FS specific enabling / behaviour, not something the pgmap code can rely upon. At least, not without some layering violations.
On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote: > Historically, no. The block-device is allowed to disappear while inodes > are still live. Btw, while I agree with what you wrote below this sentence is at least a bit confusing. Struct block_device/gendisk/request_queue will always be valid as long as a file system is mounted and inodes are live due to refcounting. It's just as you correctly pointed out del_gendisk might have aready been called and they are dead.
Christoph Hellwig wrote: > On Mon, Oct 17, 2022 at 01:17:23PM -0700, Dan Williams wrote: > > Historically, no. The block-device is allowed to disappear while inodes > > are still live. > > Btw, while I agree with what you wrote below this sentence is at least > a bit confusing. Struct block_device/gendisk/request_queue will always > be valid as long as a file system is mounted and inodes are live due > to refcounting. It's just as you correctly pointed out del_gendisk > might have aready been called and they are dead. Yes, when I said "allowed to disappear" I should have said "allowed to die".
diff --git a/fs/dax.c b/fs/dax.c index 1d4f0072e58d..6990a6e7df9f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1107,10 +1107,9 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos, size_t size, void **kaddr, pfn_t *pfnp) { pgoff_t pgoff = dax_iomap_pgoff(iomap, pos); - int id, rc = 0; long length; + int rc = 0; - id = dax_read_lock(); length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), DAX_ACCESS, kaddr, pfnp); if (length < 0) { @@ -1135,7 +1134,6 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos, if (!*kaddr) rc = -EFAULT; out: - dax_read_unlock(id); return rc; } @@ -1588,7 +1586,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; bool write = iter->flags & IOMAP_WRITE; unsigned long entry_flags = pmd ? DAX_PMD : 0; - int err = 0; + int err = 0, id; pfn_t pfn; void *kaddr; @@ -1608,11 +1606,15 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS; } + id = dax_read_lock(); err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn); - if (err) + if (err) { + dax_read_unlock(id); return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); + } *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); + dax_read_unlock(id); if (write && srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
In preparation for dax_insert_entry() to start taking page and pgmap references ensure that page->pgmap is valid by holding the dax_read_lock() over both dax_direct_access() and dax_insert_entry(). I.e. the code that wants to elevate the reference count of a pgmap page from 0 -> 1 must ensure that the pgmap is not exiting and will not start exiting until the proper references have been taken. Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- fs/dax.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)