Message ID | 20220415013735.1610091-1-chengzhihao1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs-writeback: Flush plug before next iteration in wb_writeback() | expand |
On Fri, Apr 15, 2022 at 09:37:35AM +0800, Zhihao Cheng wrote: > + if (progress) { > + /* > + * The progress may be false postive in page redirty > + * case (which is caused by failing to get buffer head > + * lock), which will requeue dirty inodes and start > + * next writeback iteration, and other tasks maybe > + * stuck for getting tags for new requests. So, flush > + * plug to schedule requests holding tags. > + * > + * The code can be removed after buffer head > + * disappering from linux. > + */ > + blk_flush_plug(current->plug, false); This basically removes plugging entirely, so we might as well stop adding the plug if we can't solve it any other way. But it seems like that fake progress needs to be fixed instead.
在 2022/4/15 14:39, Christoph Hellwig 写道: Hi, Christoph > This basically removes plugging entirely, so we might as well stop > adding the plug if we can't solve it any other way. But it seems > like that fake progress needs to be fixed instead. > Maybe there is a more ideal solution: diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e524c0a1749c..9723f77841f8 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1855,7 +1855,7 @@ static long writeback_sb_inodes(struct super_block *sb, wbc_detach_inode(&wbc); work->nr_pages -= write_chunk - wbc.nr_to_write; - wrote += write_chunk - wbc.nr_to_write; + wrote += write_chunk - wbc.nr_to_write - wbc.pages_skipped; if (need_resched()) { /* , or following is better(It looks awkward.): diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e524c0a1749c..5f310e53bf1e 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1780,6 +1780,7 @@ static long writeback_sb_inodes(struct super_block *sb, while (!list_empty(&wb->b_io)) { struct inode *inode = wb_inode(wb->b_io.prev); struct bdi_writeback *tmp_wb; + long tmp_wrote; if (inode->i_sb != sb) { if (work->sb) { @@ -1854,8 +1855,11 @@ static long writeback_sb_inodes(struct super_block *sb, __writeback_single_inode(inode, &wbc); wbc_detach_inode(&wbc); - work->nr_pages -= write_chunk - wbc.nr_to_write; - wrote += write_chunk - wbc.nr_to_write; + tmp_wrote = write_chunk - wbc.nr_to_write >= wbc.pages_skipped ? + write_chunk - wbc.nr_to_write - wbc.pages_skipped : + write_chunk - wbc.nr_to_write;workaround + work->nr_pages -= tmp_wrote; + wrote += tmp_wrote; if (need_resched()) { /* It depends on how specific filesystem behaves after invoking redirty_page_for_writepage(). Most filesystems return AOP_WRITEPAGE_ACTIVATE or 0 after invoking redirty_page_for_writepage() in writepage, but there still exist accidential examples: 1. metapage_writepage() could returns -EIO after redirty_page_for_writepage() 2. ext4_writepage() could returns -ENOMEM after redirty_page_for_writepage() write_cache_pages error = (*writepage)(page, wbc, data); if (unlikely(error)) { ... break; } --wbc->nr_to_write // Skip if 'error < 0'. And if writepage invokes redirty_page_for_writepage(), wrote could be negative. I think the root cause is fsync gets buffer head's lock without locking corresponding page, fixing 'progess' and flushing plug are both workarounds.
> I think the root cause is fsync gets buffer head's lock without locking > corresponding page, fixing 'progess' and flushing plug are both > workarounds. So let's fix that.
在 2022/4/16 13:42, Christoph Hellwig 写道: >> I think the root cause is fsync gets buffer head's lock without locking >> corresponding page, fixing 'progess' and flushing plug are both >> workarounds. > > So let's fix that. > I think adding page lock before locking buffer head is a little difficult and risky: 1. There are too many places getting buffer head before submitting bio, and not all filesystems behave same in readpage/writepage/write_inode. For example, ntfs_read_block() has locked page before locking buffer head and then submitting bh, ext4(no journal) and fat may lock buffer head without locking page while writing inode. It's a huge work to check all places. 2. Import page lock before locking buffer head may bring new unknown problem(other deadlocks about page ?). Taking page lock before locking buffer head(in all processes which can be concurrent with wb_writeback) is a dangerous thing. So, how about applying the safe and simple method(flush plug) for the time being? PS: Maybe someday buffer head is removed from all filesystems, then we can remove this superfluous blk_flush_plug.
On Sat, Apr 16, 2022 at 04:41:35PM +0800, Zhihao Cheng wrote: > So, how about applying the safe and simple method(flush plug) for the time > being? As said before - if you flush everytime here the plug is basically useless and we could remove it. But it was added for a reason. So let's at least improve the accounting for the skipped writeback as suggested in your last mail.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 591fe9cf1659..e524c0a1749c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2036,8 +2036,21 @@ static long wb_writeback(struct bdi_writeback *wb, * mean the overall work is done. So we keep looping as long * as made some progress on cleaning pages or inodes. */ - if (progress) + if (progress) { + /* + * The progress may be false postive in page redirty + * case (which is caused by failing to get buffer head + * lock), which will requeue dirty inodes and start + * next writeback iteration, and other tasks maybe + * stuck for getting tags for new requests. So, flush + * plug to schedule requests holding tags. + * + * The code can be removed after buffer head + * disappering from linux. + */ + blk_flush_plug(current->plug, false); continue; + } /* * No more inodes for IO, bail */