Message ID | 39f8b446-dce3-373f-eb86-e3333b31122c@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: allow inode time modification with IOCB_NOWAIT | expand |
On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote: > generic/471 complains because it expects any write done with RWF_NOWAIT > to succeed as long as the blocks for the write are already instantiated. > This isn't necessarily a correct assumption, as there are other conditions > that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range > is already there. > > Since the risk of blocking off this path is minor, just allow inode > time updates with IOCB_NOWAIT set. Then we can later decide if we should > catch this further down the stack. I think this is broken. Please just drop the test, the non-blocking behavior here makes a lot of sense. At least for XFS, the update will end up allocating and commit a transaction which involves memory allocation, a blocking lock and possibly waiting for lock space.
On 7/2/22 12:05 AM, Christoph Hellwig wrote: > On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote: >> generic/471 complains because it expects any write done with RWF_NOWAIT >> to succeed as long as the blocks for the write are already instantiated. >> This isn't necessarily a correct assumption, as there are other conditions >> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range >> is already there. >> >> Since the risk of blocking off this path is minor, just allow inode >> time updates with IOCB_NOWAIT set. Then we can later decide if we should >> catch this further down the stack. > > I think this is broken. Please just drop the test, the non-blocking > behavior here makes a lot of sense. At least for XFS, the update > will end up allocating and commit a transaction which involves memory > allocation, a blocking lock and possibly waiting for lock space. I'm fine with that, I've made my opinions on that test case clear in previous emails. I'll drop the patch for now. I will say though that even in low memory testing, I never saw XFS block off the inode time update. So at least we have room for future improvements here, it's wasteful to return -EAGAIN here when the vast majority of time updates don't end up blocking. One issue there too is that, by default, XFS uses a high granularity threshold for when the time should be updated, making the problem worse.
On Sat, Jul 02, 2022 at 09:45:23AM -0600, Jens Axboe wrote: > On 7/2/22 12:05 AM, Christoph Hellwig wrote: > > On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote: > >> generic/471 complains because it expects any write done with RWF_NOWAIT > >> to succeed as long as the blocks for the write are already instantiated. > >> This isn't necessarily a correct assumption, as there are other conditions > >> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range > >> is already there. > >> > >> Since the risk of blocking off this path is minor, just allow inode > >> time updates with IOCB_NOWAIT set. Then we can later decide if we should > >> catch this further down the stack. > > > > I think this is broken. Please just drop the test, the non-blocking > > behavior here makes a lot of sense. At least for XFS, the update > > will end up allocating and commit a transaction which involves memory > > allocation, a blocking lock and possibly waiting for lock space. > > I'm fine with that, I've made my opinions on that test case clear in > previous emails. I'll drop the patch for now. > > I will say though that even in low memory testing, I never saw XFS block > off the inode time update. So at least we have room for future > improvements here, it's wasteful to return -EAGAIN here when the vast > majority of time updates don't end up blocking. It's not low memory testing that you should be concerned about - it's when the journal runs out of space that you'll get long, unbound latencies waiting for timestamp updates. Waiting for journal space to become available could, in the worst case, entail waiting for tens of thousands of small random metadata IOs to be submitted and completed.... > One issue there too is that, by default, XFS uses a high granularity > threshold for when the time should be updated, making the problem worse. That's not an XFS issue - we're just following the VFS rules for when mtime needs to be changed. If you want to avoid frequent transactional (on-disk) mtime updates, then use the lazytime mount option to limit on-disk mtime updates to once per day. Cheers, Dave.
On 7/2/22 6:06 PM, Dave Chinner wrote: > On Sat, Jul 02, 2022 at 09:45:23AM -0600, Jens Axboe wrote: >> On 7/2/22 12:05 AM, Christoph Hellwig wrote: >>> On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote: >>>> generic/471 complains because it expects any write done with RWF_NOWAIT >>>> to succeed as long as the blocks for the write are already instantiated. >>>> This isn't necessarily a correct assumption, as there are other conditions >>>> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range >>>> is already there. >>>> >>>> Since the risk of blocking off this path is minor, just allow inode >>>> time updates with IOCB_NOWAIT set. Then we can later decide if we should >>>> catch this further down the stack. >>> >>> I think this is broken. Please just drop the test, the non-blocking >>> behavior here makes a lot of sense. At least for XFS, the update >>> will end up allocating and commit a transaction which involves memory >>> allocation, a blocking lock and possibly waiting for lock space. >> >> I'm fine with that, I've made my opinions on that test case clear in >> previous emails. I'll drop the patch for now. >> >> I will say though that even in low memory testing, I never saw XFS block >> off the inode time update. So at least we have room for future >> improvements here, it's wasteful to return -EAGAIN here when the vast >> majority of time updates don't end up blocking. > > It's not low memory testing that you should be concerned about - > it's when the journal runs out of space that you'll get long, > unbound latencies waiting for timestamp updates. Waiting for journal > space to become available could, in the worst case, entail waiting > for tens of thousands of small random metadata IOs to be submitted > and completed.... Right, I know it's not a specifically targeted test. But testing blocking on a new journal space would certainly be interesting in itself, if someone where to make xfs_vn_update_time() honor IOCB_NOWAIT for that. More realistic is probably just checking SB_LAZYTIME and allowing IOCB_NOWAIT for that. >> One issue there too is that, by default, XFS uses a high granularity >> threshold for when the time should be updated, making the problem worse. > > That's not an XFS issue - we're just following the VFS rules for > when mtime needs to be changed. If you want to avoid frequent > transactional (on-disk) mtime updates, then use the lazytime mount > option to limit on-disk mtime updates to once per day. Seems like that would be a better default (for anyone, not just XFS), but that's more likely a bigger can of worms I don't have any interest in opening :-)
diff --git a/fs/inode.c b/fs/inode.c index 259ebf438893..98a48fbfa0ad 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2150,8 +2150,6 @@ static int file_modified_flags(struct file *file, int flags) ret = inode_needs_update_time(inode, &now); if (ret <= 0) return ret; - if (flags & IOCB_NOWAIT) - return -EAGAIN; return __file_update_time(file, &now, ret); }
generic/471 complains because it expects any write done with RWF_NOWAIT to succeed as long as the blocks for the write are already instantiated. This isn't necessarily a correct assumption, as there are other conditions that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range is already there. Since the risk of blocking off this path is minor, just allow inode time updates with IOCB_NOWAIT set. Then we can later decide if we should catch this further down the stack. Fixes: 4faa13bd5d3b ("fs: Add async write file modification handling.") Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/inode.c | 2 -- 1 file changed, 2 deletions(-)