Message ID | 158861253957.340223.7465334678444521655.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fscache, cachefiles: Rewrite the I/O interface in terms of kiocb/iov_iter | expand |
On Mon, May 04, 2020 at 06:15:39PM +0100, David Howells wrote: > PG_fscache is going to be used to indicate that a page is being written to > the cache, and that the page should not be modified or released until it's > finished. > > Make afs_invalidatepage() and afs_releasepage() wait for it. Well, why? Keeping a refcount on the page will prevent it from going away while it's being written to storage. And the fact that it's being written to this cache is no reason to delay the truncate of a file (is it?) Similarly, I don't see why we need to wait for the page to make it to the cache before we start to modify it. Certainly we'll need to re-write it to the cache since the cache is now stale, but why should we wait for the now-stale write to complete?
Matthew Wilcox <willy@infradead.org> wrote: > > PG_fscache is going to be used to indicate that a page is being written to > > the cache, and that the page should not be modified or released until it's > > finished. > > > > Make afs_invalidatepage() and afs_releasepage() wait for it. > > Well, why? Keeping a refcount on the page will prevent it from going > away while it's being written to storage. And the fact that it's > being written to this cache is no reason to delay the truncate of a file > (is it?) Won't that screw up ITER_MAPPING? Does that mean that ITER_MAPPING isn't viable? David
On Wed, May 06, 2020 at 08:57:58AM +0100, David Howells wrote: > Matthew Wilcox <willy@infradead.org> wrote: > > > > PG_fscache is going to be used to indicate that a page is being written to > > > the cache, and that the page should not be modified or released until it's > > > finished. > > > > > > Make afs_invalidatepage() and afs_releasepage() wait for it. > > > > Well, why? Keeping a refcount on the page will prevent it from going > > away while it's being written to storage. And the fact that it's > > being written to this cache is no reason to delay the truncate of a file > > (is it?) > > Won't that screw up ITER_MAPPING? Does that mean that ITER_MAPPING isn't > viable? Can you remind me why ITER_MAPPING needs: "The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed." An elevated refcount prevents migration, and it also prevents the pages from being freed. It doesn't prevent them from being truncated out of the file, but it does ensure the pages aren't reallocated.
Matthew Wilcox <willy@infradead.org> wrote: > > Won't that screw up ITER_MAPPING? Does that mean that ITER_MAPPING isn't > > viable? > > Can you remind me why ITER_MAPPING needs: > > "The caller must guarantee that the pages are all present and they must be > locked using PG_locked, PG_writeback or PG_fscache to prevent them from > going away or being migrated whilst they're being accessed." > > An elevated refcount prevents migration, and it also prevents the pages > from being freed. It doesn't prevent them from being truncated out of > the file, but it does ensure the pages aren't reallocated. ITER_MAPPING relies on the mapping to maintain the pointers to the pages so that it can find them rather than being like ITER_BVEC where there's a separate list. Truncate removes the pages from the mapping - at which point ITER_MAPPING can no longer find them. David
David Howells <dhowells@redhat.com> wrote: > ITER_MAPPING relies on the mapping to maintain the pointers to the pages so > that it can find them rather than being like ITER_BVEC where there's a > separate list. > > Truncate removes the pages from the mapping - at which point ITER_MAPPING can > no longer find them. It looks like ITER_MAPPING is fine with truncate, provided the invalidation waits for the iterator to complete first: int truncate_inode_page(struct address_space *mapping, struct page *page) { VM_BUG_ON_PAGE(PageTail(page), page); if (page->mapping != mapping) return -EIO; truncate_cleanup_page(mapping, page); delete_from_page_cache(page); return 0; } In which case, ->invalidatepage() needs to wait for PG_fscache. Similarly, it looks like ->releasepage() is fine, provided it waits for PG_fscache also. If I have to use ITER_BVEC, what's the advisability of using vmalloc() to allocate the bio_vec array for a transient op? Such an array can reference up to 1MiB on a 64-bit machine with 4KiB non-compound pages if it only allocates up to a single page. I'm wondering what the teardown cost is, though, if all the corresponding PTEs have to be erased from all CPUs. David
diff --git a/fs/afs/file.c b/fs/afs/file.c index ea9f6d45d9ff..b25c5ab1f4e1 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -548,6 +548,11 @@ static void afs_invalidatepage(struct page *page, unsigned int offset, /* we clean up only if the entire page is being invalidated */ if (offset == 0 && length == PAGE_SIZE) { +#ifdef CONFIG_AFS_FSCACHE + if (PageFsCache(page)) + wait_on_page_fscache(page); +#endif + if (PagePrivate(page)) { priv = page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("inval"), @@ -575,6 +580,14 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags) /* deny if page is being written to the cache and the caller hasn't * elected to wait */ +#ifdef CONFIG_AFS_FSCACHE + if (PageFsCache(page)) { + if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) + return false; + wait_on_page_fscache(page); + } +#endif + if (PagePrivate(page)) { priv = page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("rel"), diff --git a/fs/afs/write.c b/fs/afs/write.c index 390fee44446c..3632909fcd91 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -111,6 +111,10 @@ int afs_write_begin(struct file *file, struct address_space *mapping, SetPageUptodate(page); } +#ifdef CONFIG_AFS_FSCACHE + wait_on_page_fscache(page); +#endif + /* page won't leak in error case: it eventually gets cleaned off LRU */ *pagep = page; @@ -786,6 +790,11 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) /* Wait for the page to be written to the cache before we allow it to * be modified. We then assume the entire page will need writing back. */ +#ifdef CONFIG_AFS_FSCACHE + if (PageFsCache(vmf->page) && + wait_on_page_bit_killable(vmf->page, PG_fscache) < 0) + return VM_FAULT_RETRY; +#endif if (PageWriteback(vmf->page) && wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
PG_fscache is going to be used to indicate that a page is being written to the cache, and that the page should not be modified or released until it's finished. Make afs_invalidatepage() and afs_releasepage() wait for it. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/afs/file.c | 13 +++++++++++++ fs/afs/write.c | 9 +++++++++ 2 files changed, 22 insertions(+)