Message ID | 1458669320-6819-3-git-send-email-axboe@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote: > If you call sync, the initial call to wakeup_flusher_threads() ends up > calling wb_start_writeback() with reason=WB_REASON_SYNC, but > wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. > Ensure that we use WB_SYNC_ALL for a sync operation. This seems wrong to me. We want background write to happen as quickly as possible and /not block/ when we first kick sync. The latter blocking passes of sync use WB_SYNC_ALL to ensure that we block waiting for all remaining IO to be issued and waited on, but the background writeback doesn't need to do this. Cheers, Dave.
On 03/22/2016 03:34 PM, Dave Chinner wrote: > On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote: >> If you call sync, the initial call to wakeup_flusher_threads() ends up >> calling wb_start_writeback() with reason=WB_REASON_SYNC, but >> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. >> Ensure that we use WB_SYNC_ALL for a sync operation. > > This seems wrong to me. We want background write to happen as > quickly as possible and /not block/ when we first kick sync. It's not going to block. wakeup_flusher_threads() async queues writeback work through wb_start_writeback(). > The latter blocking passes of sync use WB_SYNC_ALL to ensure that we > block waiting for all remaining IO to be issued and waited on, but > the background writeback doesn't need to do this. That's fine, they can get to wait on the previously issued IO, which was properly submitted with WB_SYNC_ALL. Maybe I'm missing your point?
On 03/22/2016 03:40 PM, Jens Axboe wrote: > On 03/22/2016 03:34 PM, Dave Chinner wrote: >> On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote: >>> If you call sync, the initial call to wakeup_flusher_threads() ends up >>> calling wb_start_writeback() with reason=WB_REASON_SYNC, but >>> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. >>> Ensure that we use WB_SYNC_ALL for a sync operation. >> >> This seems wrong to me. We want background write to happen as >> quickly as possible and /not block/ when we first kick sync. > > It's not going to block. wakeup_flusher_threads() async queues writeback > work through wb_start_writeback(). For block here, you mean the async work ending up doing wait_on_page_writeback() because we're doing WB_SYNC_ALL instead of WB_SYNC_NONE? And if so: >> The latter blocking passes of sync use WB_SYNC_ALL to ensure that we >> block waiting for all remaining IO to be issued and waited on, but >> the background writeback doesn't need to do this. why not have it do that?
On Tue, Mar 22, 2016 at 03:40:28PM -0600, Jens Axboe wrote: > On 03/22/2016 03:34 PM, Dave Chinner wrote: > >On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote: > >>If you call sync, the initial call to wakeup_flusher_threads() ends up > >>calling wb_start_writeback() with reason=WB_REASON_SYNC, but > >>wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. > >>Ensure that we use WB_SYNC_ALL for a sync operation. > > > >This seems wrong to me. We want background write to happen as > >quickly as possible and /not block/ when we first kick sync. > > It's not going to block. wakeup_flusher_threads() async queues > writeback work through wb_start_writeback(). The flusher threads block, not the initial wakeup. e.g. they will now block waiting for data writeback to complete before writing the inode. i.e. this code in __writeback_single_inode() is now triggered by the background flusher: /* * Make sure to wait on the data before writing out the metadata. * This is important for filesystems that modify metadata on data * I/O completion. We don't do it for sync(2) writeback because it has a * separate, external IO completion path and ->sync_fs for guaranteeing * inode metadata is written back correctly. */ if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) { int err = filemap_fdatawait(mapping); if (ret == 0) ret = err; } It also changes the writeback chunk size in write_cache_pages(), so instead of doing a bit of writeback from all dirty inodes, it tries to write everything from each inode in turn (see writeback_chunk_size()) which will further exacerbate the wait above. > >The latter blocking passes of sync use WB_SYNC_ALL to ensure that we > >block waiting for all remaining IO to be issued and waited on, but > >the background writeback doesn't need to do this. > > That's fine, they can get to wait on the previously issued IO, which > was properly submitted with WB_SYNC_ALL. > > Maybe I'm missing your point? Making the background flusher block and wait for data makes it completely ineffective in speeding up sync() processing... Cheers, Dave.
On 03/22/2016 04:04 PM, Dave Chinner wrote: > On Tue, Mar 22, 2016 at 03:40:28PM -0600, Jens Axboe wrote: >> On 03/22/2016 03:34 PM, Dave Chinner wrote: >>> On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote: >>>> If you call sync, the initial call to wakeup_flusher_threads() ends up >>>> calling wb_start_writeback() with reason=WB_REASON_SYNC, but >>>> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. >>>> Ensure that we use WB_SYNC_ALL for a sync operation. >>> >>> This seems wrong to me. We want background write to happen as >>> quickly as possible and /not block/ when we first kick sync. >> >> It's not going to block. wakeup_flusher_threads() async queues >> writeback work through wb_start_writeback(). > > The flusher threads block, not the initial wakeup. e.g. they will > now block waiting for data writeback to complete before writing the > inode. i.e. this code in __writeback_single_inode() is now triggered > by the background flusher: > > /* > * Make sure to wait on the data before writing out the metadata. > * This is important for filesystems that modify metadata on data > * I/O completion. We don't do it for sync(2) writeback because it has a > * separate, external IO completion path and ->sync_fs for guaranteeing > * inode metadata is written back correctly. > */ > if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) { > int err = filemap_fdatawait(mapping); > if (ret == 0) > ret = err; > } Yeah, that's not ideal, for this case we'd really like something that WB_SYNC_ALL_NOWAIT... > It also changes the writeback chunk size in write_cache_pages(), so > instead of doing a bit of writeback from all dirty inodes, it tries > to write everything from each inode in turn (see > writeback_chunk_size()) which will further exacerbate the wait > above. But that part is fine, if it wasn't for the waiting. >>> The latter blocking passes of sync use WB_SYNC_ALL to ensure that we >>> block waiting for all remaining IO to be issued and waited on, but >>> the background writeback doesn't need to do this. >> >> That's fine, they can get to wait on the previously issued IO, which >> was properly submitted with WB_SYNC_ALL. >> >> Maybe I'm missing your point? > > Making the background flusher block and wait for data makes it > completely ineffective in speeding up sync() processing... Agree, we should not wait on the pages individually, we want them submitted and then waited on. And I suppose it's no differently than handling the normal buffered write from an application, which then gets waited on with fsync() or similar. So I can drop this patch, it'll work fine without it.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 5c46ed9f3e14..97a9e9987134 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -936,7 +936,10 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, return; } - work->sync_mode = WB_SYNC_NONE; + if (reason == WB_REASON_SYNC) + work->sync_mode = WB_SYNC_ALL; + else + work->sync_mode = WB_SYNC_NONE; work->nr_pages = nr_pages; work->range_cyclic = range_cyclic; work->reason = reason;
If you call sync, the initial call to wakeup_flusher_threads() ends up calling wb_start_writeback() with reason=WB_REASON_SYNC, but wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode. Ensure that we use WB_SYNC_ALL for a sync operation. Signed-off-by: Jens Axboe <axboe@fb.com> --- fs/fs-writeback.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)