Message ID | 20170228233610.25456-4-rgoldwyn@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 28, 2017 at 05:36:05PM -0600, Goldwyn Rodrigues wrote: > Find out if the write will trigger a wait due to writeback. If yes, > return -EAGAIN. > > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range > mentioned. Ugh, this is pretty inefficient. If that's all you want to know, then using the radix tree directly will be far more efficient than spinning up all the pagevec machinery only to discard the pages found. But what's going to kick these pages out of cache? Shouldn't we rather find the pages, kick them out if clean, start writeback if not, and *then* return -EAGAIN? So maybe we want to spin up the pagevec machinery after all so we can do that extra work?
On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote: > Ugh, this is pretty inefficient. If that's all you want to know, then > using the radix tree directly will be far more efficient than spinning > up all the pagevec machinery only to discard the pages found. > > But what's going to kick these pages out of cache? Shouldn't we rather > find the pages, kick them out if clean, start writeback if not, and *then* > return -EAGAIN? > > So maybe we want to spin up the pagevec machinery after all so we can > do that extra work? As pointed out in the last round of these patches I think we really need to pass a flags argument to filemap_write_and_wait_range to communicate the non-blocking nature and only return -EAGAIN if we'd block. As a bonus that can indeed start to kick the pages out.
On Wed 01-03-17 07:38:57, Christoph Hellwig wrote: > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote: > > Ugh, this is pretty inefficient. If that's all you want to know, then > > using the radix tree directly will be far more efficient than spinning > > up all the pagevec machinery only to discard the pages found. > > > > But what's going to kick these pages out of cache? Shouldn't we rather > > find the pages, kick them out if clean, start writeback if not, and *then* > > return -EAGAIN? > > > > So maybe we want to spin up the pagevec machinery after all so we can > > do that extra work? > > As pointed out in the last round of these patches I think we really > need to pass a flags argument to filemap_write_and_wait_range to > communicate the non-blocking nature and only return -EAGAIN if we'd > block. As a bonus that can indeed start to kick the pages out. Aren't flags to filemap_write_and_wait_range() unnecessary complication? Realistically, most users wanting performance from AIO DIO so badly that they bother with this API won't have any pages to write / evict. If they do by some bad accident, they can fall back to standard "blocking" AIO DIO. So I don't see much value in teaching filemap_write_and_wait_range() about a non-blocking mode... Honza
On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote: > On Wed 01-03-17 07:38:57, Christoph Hellwig wrote: > > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote: > > > But what's going to kick these pages out of cache? Shouldn't we rather > > > find the pages, kick them out if clean, start writeback if not, and *then* > > > return -EAGAIN? > > > > As pointed out in the last round of these patches I think we really > > need to pass a flags argument to filemap_write_and_wait_range to > > communicate the non-blocking nature and only return -EAGAIN if we'd > > block. As a bonus that can indeed start to kick the pages out. > > Aren't flags to filemap_write_and_wait_range() unnecessary complication? > Realistically, most users wanting performance from AIO DIO so badly that > they bother with this API won't have any pages to write / evict. If they do > by some bad accident, they can fall back to standard "blocking" AIO DIO. > So I don't see much value in teaching filemap_write_and_wait_range() about > a non-blocking mode... That lets me execute a DoS against a user using this API. All I have to do is open the file they're using read-only and read a byte from it. Page goes into page-cache, and they'll only get -EAGAIN from calling this syscall until the page ages out. Also, I don't understand why this is a flag. Isn't the point of AIO to be non-blocking? Why isn't this just a change to how we do AIO?
On Thu 02-03-17 06:12:45, Matthew Wilcox wrote: > On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote: > > On Wed 01-03-17 07:38:57, Christoph Hellwig wrote: > > > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote: > > > > But what's going to kick these pages out of cache? Shouldn't we rather > > > > find the pages, kick them out if clean, start writeback if not, and *then* > > > > return -EAGAIN? > > > > > > As pointed out in the last round of these patches I think we really > > > need to pass a flags argument to filemap_write_and_wait_range to > > > communicate the non-blocking nature and only return -EAGAIN if we'd > > > block. As a bonus that can indeed start to kick the pages out. > > > > Aren't flags to filemap_write_and_wait_range() unnecessary complication? > > Realistically, most users wanting performance from AIO DIO so badly that > > they bother with this API won't have any pages to write / evict. If they do > > by some bad accident, they can fall back to standard "blocking" AIO DIO. > > So I don't see much value in teaching filemap_write_and_wait_range() about > > a non-blocking mode... > > That lets me execute a DoS against a user using this API. All I have > to do is open the file they're using read-only and read a byte from it. > Page goes into page-cache, and they'll only get -EAGAIN from calling > this syscall until the page ages out. It will not be a DoS. This non-blocking AIO can always return EAGAIN when it feels like it and the caller is required to fall back to a blocking version in that case if he wants to guarantee forward progress. It is just a performance optimization which allows user (database) to submit IO from a computation thread instead of having to offload it to an IO thread... > Also, I don't understand why this is a flag. Isn't the point of AIO to > be non-blocking? Why isn't this just a change to how we do AIO? Because this is an API change and the caller has to implement some handling to guarantee a forward progress of non-blocking IO... Honza
diff --git a/include/linux/fs.h b/include/linux/fs.h index ab2f556..527ef53 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2494,6 +2494,8 @@ extern int filemap_fdatawait(struct address_space *); extern void filemap_fdatawait_keep_errors(struct address_space *); extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); +extern int filemap_range_has_page(struct address_space *, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); diff --git a/mm/filemap.c b/mm/filemap.c index 78dd50e..82335f4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -375,6 +375,39 @@ int filemap_flush(struct address_space *mapping) } EXPORT_SYMBOL(filemap_flush); +/** + * filemap_range_has_page - check if a page exists in range. + * @mapping: address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Find at least one page in the range supplied, usually used to check if + * direct writing in this range will trigger a writeback. + */ +int filemap_range_has_page(struct address_space *mapping, + loff_t start_byte, loff_t end_byte) +{ + pgoff_t index = start_byte >> PAGE_SHIFT; + pgoff_t end = end_byte >> PAGE_SHIFT; + struct pagevec pvec; + int ret; + + if (end_byte < start_byte) + return 0; + + if (mapping->nrpages == 0) + return 0; + + pagevec_init(&pvec, 0); + ret = pagevec_lookup(&pvec, mapping, index, 1); + if (!ret) + return 0; + ret = (pvec.pages[0]->index <= end); + pagevec_release(&pvec); + return ret; +} +EXPORT_SYMBOL(filemap_range_has_page); + static int __filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { @@ -2631,6 +2664,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) pos = iocb->ki_pos; + if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) + return -EINVAL; + if (limit != RLIM_INFINITY) { if (iocb->ki_pos >= limit) { send_sig(SIGXFSZ, current, 0); @@ -2700,9 +2736,17 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from) write_len = iov_iter_count(from); end = (pos + write_len - 1) >> PAGE_SHIFT; - written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); - if (written) - goto out; + if (iocb->ki_flags & IOCB_NOWAIT) { + /* If there are pages to writeback, return */ + if (filemap_range_has_page(inode->i_mapping, pos, + pos + iov_iter_count(from))) + return -EAGAIN; + } else { + written = filemap_write_and_wait_range(mapping, pos, + pos + write_len - 1); + if (written) + goto out; + } /* * After a write we want buffered reads to be sure to go to disk to get