Message ID | 20220608171741.3875418-7-shr@fb.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io-uring/xfs: support async buffered writes | expand |
On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote: > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index b06a5c24a4db..f701dcb7c26a 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > length -= status; > } while (iov_iter_count(i) && length); > > - return written ? written : status; > + if (status == -EAGAIN) { > + iov_iter_revert(i, written); > + return -EAGAIN; > + } > + if (written) > + return written; > + return status; > } I still don't understand how this can possibly work. Walk me through it. Let's imagine we have a file laid out such that extent 1 is bytes 0-4095 of the file and extent 2 is extent 4096-16385 of the file. We do a write of 5000 bytes starting at offset 4000 of the file. iomap_iter() tells us about the first extent and we write the first 96 bytes of our data to the first extent, returning 96. iomap_iter() tells us about the second extent, and we write the next 4000 bytes to the second extent. Then we get a page fault and get to the -EAGAIN case. We rewind the iter 4000 bytes. How do we not end up writing garbage when the kworker does the retry? I'd understand if we rewound the iter all the way to the start. Or if we didn't rewind the iter at all and were able to pick up partway through the write. But rewinding to the start of the extent feels like it can't possibly work.
On 6/8/22 12:02 PM, Matthew Wilcox wrote: > On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote: >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index b06a5c24a4db..f701dcb7c26a 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) >> length -= status; >> } while (iov_iter_count(i) && length); >> >> - return written ? written : status; >> + if (status == -EAGAIN) { >> + iov_iter_revert(i, written); >> + return -EAGAIN; >> + } >> + if (written) >> + return written; >> + return status; >> } > > I still don't understand how this can possibly work. Walk me through it. > > Let's imagine we have a file laid out such that extent 1 is bytes > 0-4095 of the file and extent 2 is extent 4096-16385 of the file. > We do a write of 5000 bytes starting at offset 4000 of the file. > > iomap_iter() tells us about the first extent and we write the first > 96 bytes of our data to the first extent, returning 96. iomap_iter() > tells us about the second extent, and we write the next 4000 bytes to > the second extent. Then we get a page fault and get to the -EAGAIN case. > We rewind the iter 4000 bytes. > We have two data structures, the iomap_iter and iov_iter. After the first 96 bytes, the iov_iter offset get updated in iomap_write_iter() and then the iomap_iter pos gets updated in iomap_iter()->iomap_iter_advance(). We then get the second extend from iomap_iter(). In iomap_write_iter() the first page is obtained and written successfully, then the second page is faulted. At this point the iov offset of the iov_iter has advanced. To reset it to the state when the function iomap_write_iter() was entered, the iov_iter is reset to iov_offset - written bytes. iomap_write_iter() is exited and returns -EAGAIN. As iomap_write_iter() returns an error, the iomap_iter pos is not updated in iomap_iter(). Only the number of bytes written in the write of the first extent from iomap_file_buffered_write() is returned from iomap_file_buffered_write(). In xfs_file_buffered_write we updated the iocb->ki_pos with the number of bytes written. In io-uring, the io_write() call receives the short write result. It copies the iov_iter struct into the work context for the io worker. The io_worker uses that information to complete the rest of the write. The above reset is required to keep the pos in iomap_iter and the offset in iov_iter in sync. Side Note: I had an earlier version of the patch that was changing the signature of the function iomap_write_iter(). It was returning a return code and changing the processed value of the iomap_iter (which then also changes the pos value of the iomap_iter). This version (version 7 of the patch) does not require to reset the offset of the iov_iter. It can update the pos in iomap_iter even when -EAGAIN is returned. > How do we not end up writing garbage when the kworker does the retry? > I'd understand if we rewound the iter all the way to the start. Or if > we didn't rewind the iter at all and were able to pick up partway through > the write. But rewinding to the start of the extent feels like it can't > possibly work.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index b06a5c24a4db..f701dcb7c26a 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) length -= status; } while (iov_iter_count(i) && length); - return written ? written : status; + if (status == -EAGAIN) { + iov_iter_revert(i, written); + return -EAGAIN; + } + if (written) + return written; + return status; } ssize_t
If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller. Signed-off-by: Stefan Roesch <shr@fb.com> --- fs/iomap/buffered-io.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)