Message ID | 20220601210141.3773402-16-shr@fb.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io-uring/xfs: support async buffered writes | expand |
On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: > This adds the async buffered write support to XFS. For async buffered > write requests, the request will return -EAGAIN if the ilock cannot be > obtained immediately. breaks generic/471...
On 6/30/22 10:39 PM, Al Viro wrote: > On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: >> This adds the async buffered write support to XFS. For async buffered >> write requests, the request will return -EAGAIN if the ilock cannot be >> obtained immediately. > > breaks generic/471... That test case is odd, because it makes some weird assumptions about what RWF_NOWAIT means. Most notably that it makes it mean if we should instantiate blocks or not. Where did those assumed semantics come from? On the read side, we have clearly documented that it should "not wait for data which is not immediately available". Now it is possible that we're returning a spurious -EAGAIN here when we should not be. And that would be a bug imho. I'll dig in and see what's going on.
On 7/1/22 8:19 AM, Jens Axboe wrote: > On 6/30/22 10:39 PM, Al Viro wrote: >> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: >>> This adds the async buffered write support to XFS. For async buffered >>> write requests, the request will return -EAGAIN if the ilock cannot be >>> obtained immediately. >> >> breaks generic/471... > > That test case is odd, because it makes some weird assumptions about > what RWF_NOWAIT means. Most notably that it makes it mean if we should > instantiate blocks or not. Where did those assumed semantics come from? > On the read side, we have clearly documented that it should "not wait > for data which is not immediately available". > > Now it is possible that we're returning a spurious -EAGAIN here when we > should not be. And that would be a bug imho. I'll dig in and see what's > going on. This is the timestamp update that needs doing which will now return -EAGAIN if IOCB_NOWAIT is set as it may block. I do wonder if we should just allow inode time updates with IOCB_NOWAIT, even on the io_uring side. Either that, or passed in RWF_NOWAIT semantics don't map completely to internal IOCB_NOWAIT semantics. At least in terms of what generic/471 is doing, but I'm not sure who came up with that and if it's established semantics or just some made up ones from whomever wrote that test. I don't think they make any sense, to be honest.
On 7/1/22 8:30 AM, Jens Axboe wrote: > On 7/1/22 8:19 AM, Jens Axboe wrote: >> On 6/30/22 10:39 PM, Al Viro wrote: >>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: >>>> This adds the async buffered write support to XFS. For async buffered >>>> write requests, the request will return -EAGAIN if the ilock cannot be >>>> obtained immediately. >>> >>> breaks generic/471... >> >> That test case is odd, because it makes some weird assumptions about >> what RWF_NOWAIT means. Most notably that it makes it mean if we should >> instantiate blocks or not. Where did those assumed semantics come from? >> On the read side, we have clearly documented that it should "not wait >> for data which is not immediately available". >> >> Now it is possible that we're returning a spurious -EAGAIN here when we >> should not be. And that would be a bug imho. I'll dig in and see what's >> going on. > > This is the timestamp update that needs doing which will now return > -EAGAIN if IOCB_NOWAIT is set as it may block. > > I do wonder if we should just allow inode time updates with IOCB_NOWAIT, > even on the io_uring side. Either that, or passed in RWF_NOWAIT > semantics don't map completely to internal IOCB_NOWAIT semantics. At > least in terms of what generic/471 is doing, but I'm not sure who came > up with that and if it's established semantics or just some made up ones > from whomever wrote that test. I don't think they make any sense, to be > honest. Further support that generic/471 is just randomly made up semantics, it needs to special case btrfs with nocow or you'd get -EAGAIN anyway for that test. And it's relying on some random timing to see if this works. I really think that test case is just hot garbage, and doesn't test anything meaningful.
On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote: > On 7/1/22 8:30 AM, Jens Axboe wrote: > > On 7/1/22 8:19 AM, Jens Axboe wrote: > >> On 6/30/22 10:39 PM, Al Viro wrote: > >>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: > >>>> This adds the async buffered write support to XFS. For async buffered > >>>> write requests, the request will return -EAGAIN if the ilock cannot be > >>>> obtained immediately. > >>> > >>> breaks generic/471... > >> > >> That test case is odd, because it makes some weird assumptions about > >> what RWF_NOWAIT means. Most notably that it makes it mean if we should > >> instantiate blocks or not. Where did those assumed semantics come from? > >> On the read side, we have clearly documented that it should "not wait > >> for data which is not immediately available". > >> > >> Now it is possible that we're returning a spurious -EAGAIN here when we > >> should not be. And that would be a bug imho. I'll dig in and see what's > >> going on. > > > > This is the timestamp update that needs doing which will now return > > -EAGAIN if IOCB_NOWAIT is set as it may block. > > > > I do wonder if we should just allow inode time updates with IOCB_NOWAIT, > > even on the io_uring side. Either that, or passed in RWF_NOWAIT > > semantics don't map completely to internal IOCB_NOWAIT semantics. At > > least in terms of what generic/471 is doing, but I'm not sure who came > > up with that and if it's established semantics or just some made up ones > > from whomever wrote that test. I don't think they make any sense, to be > > honest. > > Further support that generic/471 is just randomly made up semantics, > it needs to special case btrfs with nocow or you'd get -EAGAIN anyway > for that test. > > And it's relying on some random timing to see if this works. I really > think that test case is just hot garbage, and doesn't test anything > meaningful. <shrug> I had thought that NOWAIT means "don't wait for *any*thing", which would include timestamp updates... but then I've never been all that clear on what specifically NOWAIT will and won't wait for. :/ --D > -- > Jens Axboe >
On 7/1/22 12:05 PM, Darrick J. Wong wrote: > On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote: >> On 7/1/22 8:30 AM, Jens Axboe wrote: >>> On 7/1/22 8:19 AM, Jens Axboe wrote: >>>> On 6/30/22 10:39 PM, Al Viro wrote: >>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: >>>>>> This adds the async buffered write support to XFS. For async buffered >>>>>> write requests, the request will return -EAGAIN if the ilock cannot be >>>>>> obtained immediately. >>>>> >>>>> breaks generic/471... >>>> >>>> That test case is odd, because it makes some weird assumptions about >>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should >>>> instantiate blocks or not. Where did those assumed semantics come from? >>>> On the read side, we have clearly documented that it should "not wait >>>> for data which is not immediately available". >>>> >>>> Now it is possible that we're returning a spurious -EAGAIN here when we >>>> should not be. And that would be a bug imho. I'll dig in and see what's >>>> going on. >>> >>> This is the timestamp update that needs doing which will now return >>> -EAGAIN if IOCB_NOWAIT is set as it may block. >>> >>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT, >>> even on the io_uring side. Either that, or passed in RWF_NOWAIT >>> semantics don't map completely to internal IOCB_NOWAIT semantics. At >>> least in terms of what generic/471 is doing, but I'm not sure who came >>> up with that and if it's established semantics or just some made up ones >>> from whomever wrote that test. I don't think they make any sense, to be >>> honest. >> >> Further support that generic/471 is just randomly made up semantics, >> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway >> for that test. >> >> And it's relying on some random timing to see if this works. I really >> think that test case is just hot garbage, and doesn't test anything >> meaningful. > > <shrug> I had thought that NOWAIT means "don't wait for *any*thing", > which would include timestamp updates... but then I've never been all > that clear on what specifically NOWAIT will and won't wait for. :/ Agree, at least the read semantics (kind of) make sense, but the ones seemingly made up by generic/471 don't seem to make any sense at all.
On Fri, Jul 01, 2022 at 12:14:41PM -0600, Jens Axboe wrote: > On 7/1/22 12:05 PM, Darrick J. Wong wrote: > > On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote: > >> On 7/1/22 8:30 AM, Jens Axboe wrote: > >>> On 7/1/22 8:19 AM, Jens Axboe wrote: > >>>> On 6/30/22 10:39 PM, Al Viro wrote: > >>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: > >>>>>> This adds the async buffered write support to XFS. For async buffered > >>>>>> write requests, the request will return -EAGAIN if the ilock cannot be > >>>>>> obtained immediately. > >>>>> > >>>>> breaks generic/471... > >>>> > >>>> That test case is odd, because it makes some weird assumptions about > >>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should > >>>> instantiate blocks or not. Where did those assumed semantics come from? > >>>> On the read side, we have clearly documented that it should "not wait > >>>> for data which is not immediately available". > >>>> > >>>> Now it is possible that we're returning a spurious -EAGAIN here when we > >>>> should not be. And that would be a bug imho. I'll dig in and see what's > >>>> going on. > >>> > >>> This is the timestamp update that needs doing which will now return > >>> -EAGAIN if IOCB_NOWAIT is set as it may block. > >>> > >>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT, > >>> even on the io_uring side. Either that, or passed in RWF_NOWAIT > >>> semantics don't map completely to internal IOCB_NOWAIT semantics. At > >>> least in terms of what generic/471 is doing, but I'm not sure who came > >>> up with that and if it's established semantics or just some made up ones > >>> from whomever wrote that test. I don't think they make any sense, to be > >>> honest. > >> > >> Further support that generic/471 is just randomly made up semantics, > >> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway > >> for that test. > >> > >> And it's relying on some random timing to see if this works. I really > >> think that test case is just hot garbage, and doesn't test anything > >> meaningful. > > > > <shrug> I had thought that NOWAIT means "don't wait for *any*thing", > > which would include timestamp updates... but then I've never been all > > that clear on what specifically NOWAIT will and won't wait for. :/ > > Agree, at least the read semantics (kind of) make sense, but the ones > seemingly made up by generic/471 don't seem to make any sense at all. > Added Goldwyn to the CC list for this. This appears to be just a confusion about what we think NOWAIT should mean. Looking at the btrfs code it seems like Goldwyn took it as literally as possible so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we literally wouldn't do anything other than wrap the bio up and fire it off. The general consensus seems to be that NOWAIT isn't that strict, and that BTRFS's definition was too strict. I wrote initial patches to give to Stefan to clean up the Btrfs side to allow us to use NOWAIT under a lot more circumstances. Goldwyn, this test seems to be a little specific to our case, and can be flakey if the timing isn't just right. I think we should just remove it? Especially since how we define NOWAIT isn't quite right. Does that sound reasonable to you? I think a decent followup would be to add a NOWAIT specific fio test to fsperf so we still can catch any NOWAIT related regressions, without trying to test for specific behavior for something that can fail under a whole lot of conditions unrelated to our implementation. Thanks, Josef
On 9:47 05/07, Josef Bacik wrote: > On Fri, Jul 01, 2022 at 12:14:41PM -0600, Jens Axboe wrote: > > On 7/1/22 12:05 PM, Darrick J. Wong wrote: > > > On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote: > > >> On 7/1/22 8:30 AM, Jens Axboe wrote: > > >>> On 7/1/22 8:19 AM, Jens Axboe wrote: > > >>>> On 6/30/22 10:39 PM, Al Viro wrote: > > >>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote: > > >>>>>> This adds the async buffered write support to XFS. For async buffered > > >>>>>> write requests, the request will return -EAGAIN if the ilock cannot be > > >>>>>> obtained immediately. > > >>>>> > > >>>>> breaks generic/471... > > >>>> > > >>>> That test case is odd, because it makes some weird assumptions about > > >>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should > > >>>> instantiate blocks or not. Where did those assumed semantics come from? > > >>>> On the read side, we have clearly documented that it should "not wait > > >>>> for data which is not immediately available". > > >>>> > > >>>> Now it is possible that we're returning a spurious -EAGAIN here when we > > >>>> should not be. And that would be a bug imho. I'll dig in and see what's > > >>>> going on. > > >>> > > >>> This is the timestamp update that needs doing which will now return > > >>> -EAGAIN if IOCB_NOWAIT is set as it may block. > > >>> > > >>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT, > > >>> even on the io_uring side. Either that, or passed in RWF_NOWAIT > > >>> semantics don't map completely to internal IOCB_NOWAIT semantics. At > > >>> least in terms of what generic/471 is doing, but I'm not sure who came > > >>> up with that and if it's established semantics or just some made up ones > > >>> from whomever wrote that test. I don't think they make any sense, to be > > >>> honest. > > >> > > >> Further support that generic/471 is just randomly made up semantics, > > >> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway > > >> for that test. > > >> > > >> And it's relying on some random timing to see if this works. I really > > >> think that test case is just hot garbage, and doesn't test anything > > >> meaningful. > > > > > > <shrug> I had thought that NOWAIT means "don't wait for *any*thing", > > > which would include timestamp updates... but then I've never been all > > > that clear on what specifically NOWAIT will and won't wait for. :/ > > > > Agree, at least the read semantics (kind of) make sense, but the ones > > seemingly made up by generic/471 don't seem to make any sense at all. > > > > Added Goldwyn to the CC list for this. > > This appears to be just a confusion about what we think NOWAIT should mean. > Looking at the btrfs code it seems like Goldwyn took it as literally as possible > so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we > literally wouldn't do anything other than wrap the bio up and fire it off. When I introduced NOWAIT, it was only meant for writes and for those which would block on block allocations or locking. Over time it was included for buffered reads as well. > > The general consensus seems to be that NOWAIT isn't that strict, and that > BTRFS's definition was too strict. I wrote initial patches to give to Stefan to > clean up the Btrfs side to allow us to use NOWAIT under a lot more > circumstances. BTRFS COW path would allocate blocks and the reason it returns -EAGAIN. > > Goldwyn, this test seems to be a little specific to our case, and can be flakey > if the timing isn't just right. I think we should just remove it? Especially > since how we define NOWAIT isn't quite right. Does that sound reasonable to > you? Yes, I agree. It was based on the initial definition and now can be removed.
On Tue, Jul 05, 2022 at 09:47:25AM -0400, Josef Bacik wrote: > This appears to be just a confusion about what we think NOWAIT should mean. > Looking at the btrfs code it seems like Goldwyn took it as literally as possible > so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we > literally wouldn't do anything other than wrap the bio up and fire it off. It means don't do anything that would block. And I suspect in btrfs the above is about as much as you can do without blocking.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a60632ecc3f0..4d65ff007c7d 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -410,7 +410,7 @@ xfs_file_write_checks( spin_unlock(&ip->i_flags_lock); out: - return file_modified(file); + return kiocb_modified(iocb); } static int @@ -700,12 +700,11 @@ xfs_file_buffered_write( bool cleared_space = false; unsigned int iolock; - if (iocb->ki_flags & IOCB_NOWAIT) - return -EOPNOTSUPP; - write_retry: iolock = XFS_IOLOCK_EXCL; - xfs_ilock(ip, iolock); + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; ret = xfs_file_write_checks(iocb, from, &iolock); if (ret) @@ -1165,7 +1164,7 @@ xfs_file_open( { if (xfs_is_shutdown(XFS_M(inode->i_sb))) return -EIO; - file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC; + file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC; return generic_file_open(inode, file); } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index bcf7c3694290..5d50fed291b4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -886,6 +886,7 @@ xfs_buffered_write_iomap_begin( bool eof = false, cow_eof = false, shared = false; int allocfork = XFS_DATA_FORK; int error = 0; + unsigned int lockmode = XFS_ILOCK_EXCL; if (xfs_is_shutdown(mp)) return -EIO; @@ -897,7 +898,9 @@ xfs_buffered_write_iomap_begin( ASSERT(!XFS_IS_REALTIME_INODE(ip)); - xfs_ilock(ip, XFS_ILOCK_EXCL); + error = xfs_ilock_for_iomap(ip, flags, &lockmode); + if (error) + return error; if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) {