Message ID | 20170315215107.5628-4-rgoldwyn@suse.de (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range > mentioned. I thought you were going to replace this patch with one that starts writeback for these pages but does not wait for them? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > Find out if the write will trigger a wait due to writeback. If yes, > return -EAGAIN. > > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range > mentioned. > > Return -EINVAL for buffered AIO: there are multiple causes of > delay such as page locks, dirty throttling logic, page loading > from disk etc. which cannot be taken care of. Also, this patch only touches the write path; we have a similar call to write_and_wait_range() in generic_file_read_iter(). Actually, why do we even have that? Why can't we satisfy an O_DIRECT read from the cache? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/16/2017 08:08 AM, Matthew Wilcox wrote: > On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: >> This introduces a new function filemap_range_has_page() which >> returns true if the file's mapping has a page within the range >> mentioned. > > I thought you were going to replace this patch with one that starts > writeback for these pages but does not wait for them? > As mentioned by Jan, Flags to filemap_write_and_wait_range are unnecessarily complicated. The AIO-DIO API users who eye for performance usually are careful with page writes/evictions. As a fallback, they can (and should) go the wait route (without IOCB_RW_FLAG_NOWAIT). Finally, my take on this is that we don't want to perform tasks for a following system call, which may or may not immediately follow the current one. May not, because an application (DB) will offload the task from the CPU thread to the I/O thread in case of -EAGAIN. A system call should be complete in itself (and do the minimum, what is asked).
On 03/16/2017 08:20 AM, Matthew Wilcox wrote: > On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >> >> Find out if the write will trigger a wait due to writeback. If yes, >> return -EAGAIN. >> >> This introduces a new function filemap_range_has_page() which >> returns true if the file's mapping has a page within the range >> mentioned. >> >> Return -EINVAL for buffered AIO: there are multiple causes of >> delay such as page locks, dirty throttling logic, page loading >> from disk etc. which cannot be taken care of. > > Also, this patch only touches the write path; we have a similar call to > write_and_wait_range() in generic_file_read_iter(). > This patch series is concerned with direct-write AIO paths only.
diff --git a/include/linux/fs.h b/include/linux/fs.h index e8d9346..4a30e8f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2514,6 +2514,8 @@ extern int filemap_fdatawait(struct address_space *); extern void filemap_fdatawait_keep_errors(struct address_space *); extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); +extern int filemap_range_has_page(struct address_space *, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); diff --git a/mm/filemap.c b/mm/filemap.c index e08f3b9..c020e23 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping) } EXPORT_SYMBOL(filemap_flush); +/** + * filemap_range_has_page - check if a page exists in range. + * @mapping: address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Find at least one page in the range supplied, usually used to check if + * direct writing in this range will trigger a writeback. + */ +int filemap_range_has_page(struct address_space *mapping, + loff_t start_byte, loff_t end_byte) +{ + pgoff_t index = start_byte >> PAGE_SHIFT; + pgoff_t end = end_byte >> PAGE_SHIFT; + struct pagevec pvec; + int ret; + + if (end_byte < start_byte) + return 0; + + if (mapping->nrpages == 0) + return 0; + + pagevec_init(&pvec, 0); + ret = pagevec_lookup(&pvec, mapping, index, 1); + if (!ret) + return 0; + ret = (pvec.pages[0]->index <= end); + pagevec_release(&pvec); + return ret; +} +EXPORT_SYMBOL(filemap_range_has_page); + static int __filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { @@ -2640,6 +2673,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) pos = iocb->ki_pos; + if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) + return -EINVAL; + if (limit != RLIM_INFINITY) { if (iocb->ki_pos >= limit) { send_sig(SIGXFSZ, current, 0); @@ -2709,9 +2745,17 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from) write_len = iov_iter_count(from); end = (pos + write_len - 1) >> PAGE_SHIFT; - written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); - if (written) - goto out; + if (iocb->ki_flags & IOCB_NOWAIT) { + /* If there are pages to writeback, return */ + if (filemap_range_has_page(inode->i_mapping, pos, + pos + iov_iter_count(from))) + return -EAGAIN; + } else { + written = filemap_write_and_wait_range(mapping, pos, + pos + write_len - 1); + if (written) + goto out; + } /* * After a write we want buffered reads to be sure to go to disk to get