Message ID | 1445714897-26342-1-git-send-email-jack@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Jan, Thanks for looking into this and promptly sending a patch. On 2015-10-24 21:28:17 +0200, Jan Kara wrote: > These days do_sync_mapping_range() went away and we can switch > sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should > help PostgreSQL avoid large latency spikes when flushing data in the > background. > > diff --git a/fs/sync.c b/fs/sync.c > index fbc98ee62044..ef60e812d771 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -343,7 +343,8 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, > } > > if (flags & SYNC_FILE_RANGE_WRITE) { > - ret = filemap_fdatawrite_range(mapping, offset, endbyte); > + ret = __filemap_fdatawrite_range(mapping, offset, endbyte, > + WB_SYNC_NONE); > if (ret < 0) > goto out_put; > } Thanks. Would scheduling a comparative benchmark of this be helpful pushing htis forward ? Would probably only be early next week, I'm at the european postgresql conference right now. Regards, Andres -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Andres, On Wed 28-10-15 10:30:40, Andres Freund wrote: > On 2015-10-24 21:28:17 +0200, Jan Kara wrote: > > These days do_sync_mapping_range() went away and we can switch > > sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should > > help PostgreSQL avoid large latency spikes when flushing data in the > > background. > > > > diff --git a/fs/sync.c b/fs/sync.c > > index fbc98ee62044..ef60e812d771 100644 > > --- a/fs/sync.c > > +++ b/fs/sync.c > > @@ -343,7 +343,8 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, > > } > > > > if (flags & SYNC_FILE_RANGE_WRITE) { > > - ret = filemap_fdatawrite_range(mapping, offset, endbyte); > > + ret = __filemap_fdatawrite_range(mapping, offset, endbyte, > > + WB_SYNC_NONE); > > if (ret < 0) > > goto out_put; > > } > > Thanks. Would scheduling a comparative benchmark of this be helpful > pushing htis forward ? Would probably only be early next week, I'm at > the european postgresql conference right now. If you could run it, it would be nice. Thanks! Honza
Hi, On 2015-10-28 15:18:52 +0100, Jan Kara wrote: > On Wed 28-10-15 10:30:40, Andres Freund wrote: > > On 2015-10-24 21:28:17 +0200, Jan Kara wrote: > > Thanks. Would scheduling a comparative benchmark of this be helpful > > pushing htis forward ? Would probably only be early next week, I'm at > > the european postgresql conference right now. > > If you could run it, it would be nice. Thanks! Sorry that it took longer. Had some $work deadline making a surprise attack (sneaky bastard) and then difficulity getting time on a bigger box. Hence I only have numbers from a single SSD drive (840 EVO 1TB) laptop (16GB RAM, i7-4800MQ), with nothing else running. I've compared 4.3.0-andres-07965-gd1e41ff, with/without the patch applied. Best of three results: Before: transaction type: TPC-B (sort of) scaling factor: 300 query mode: prepared number of clients: 48 number of threads: 48 duration: 1000 s number of transactions actually processed: 7873859 latency average: 6.094 ms latency stddev: 35.376 ms tps = 7871.887045 (including connections establishing) tps = 7872.135871 (excluding connections establishing) After: transaction type: TPC-B (sort of) scaling factor: 300 query mode: prepared number of clients: 48 number of threads: 48 duration: 1000 s number of transactions actually processed: 9370637 latency average: 5.119 ms latency stddev: 24.423 ms tps = 9369.212486 (including connections establishing) tps = 9369.725595 (excluding connections establishing) Pretty tidy improvement I'd say. Feel free to add a Tested-By: Andres Freund <andres@anarazel.de> There's still quite some further room of improvement. E.g. some quite massive peaks in latency, which don't coincide with background work in postgres: progress: 869.0 s, 9971.2 tps, lat 4.828 ms stddev 6.201 progress: 870.0 s, 8196.4 tps, lat 4.632 ms stddev 6.774 progress: 871.0 s, 497.0 tps, lat 89.598 ms stddev 308.398 progress: 872.0 s, 10360.6 tps, lat 5.917 ms stddev 40.041 progress: 873.0 s, 7005.5 tps, lat 6.743 ms stddev 21.985 Without controlling writeout via sync_file_range(), we frequently can see stalls in the tens of seconds on busy machines though... So this is still much better ;) Greetings, Andres Freund -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri 06-11-15 14:06:12, Andres Freund wrote: > On 2015-10-28 15:18:52 +0100, Jan Kara wrote: > > On Wed 28-10-15 10:30:40, Andres Freund wrote: > > > On 2015-10-24 21:28:17 +0200, Jan Kara wrote: > > > Thanks. Would scheduling a comparative benchmark of this be helpful > > > pushing htis forward ? Would probably only be early next week, I'm at > > > the european postgresql conference right now. > > > > If you could run it, it would be nice. Thanks! > > Sorry that it took longer. Had some $work deadline making a surprise > attack (sneaky bastard) and then difficulity getting time on a bigger > box. Hence I only have numbers from a single SSD drive (840 EVO 1TB) > laptop (16GB RAM, i7-4800MQ), with nothing else running. > > I've compared 4.3.0-andres-07965-gd1e41ff, with/without the patch > applied. Best of three results: > > Before: > transaction type: TPC-B (sort of) > scaling factor: 300 > query mode: prepared > number of clients: 48 > number of threads: 48 > duration: 1000 s > number of transactions actually processed: 7873859 > latency average: 6.094 ms > latency stddev: 35.376 ms > tps = 7871.887045 (including connections establishing) > tps = 7872.135871 (excluding connections establishing) > > > After: > transaction type: TPC-B (sort of) > scaling factor: 300 > query mode: prepared > number of clients: 48 > number of threads: 48 > duration: 1000 s > number of transactions actually processed: 9370637 > latency average: 5.119 ms > latency stddev: 24.423 ms > tps = 9369.212486 (including connections establishing) > tps = 9369.725595 (excluding connections establishing) > > Pretty tidy improvement I'd say. > > Feel free to add a > Tested-By: Andres Freund <andres@anarazel.de> > > There's still quite some further room of improvement. E.g. some > quite massive peaks in latency, which don't coincide with background > work in postgres: > progress: 869.0 s, 9971.2 tps, lat 4.828 ms stddev 6.201 > progress: 870.0 s, 8196.4 tps, lat 4.632 ms stddev 6.774 > progress: 871.0 s, 497.0 tps, lat 89.598 ms stddev 308.398 > progress: 872.0 s, 10360.6 tps, lat 5.917 ms stddev 40.041 > progress: 873.0 s, 7005.5 tps, lat 6.743 ms stddev 21.985 > > Without controlling writeout via sync_file_range(), we frequently can > see stalls in the tens of seconds on busy machines though... So this is > still much better ;) Thanks for doing the test and I'm happy it improved your results. Andrew has already picked up the patch into his tree so it will eventually hit the upstream kernel (not sure if for 4.4 already but for 4.5 certainly). Honza
diff --git a/fs/sync.c b/fs/sync.c index fbc98ee62044..ef60e812d771 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -343,7 +343,8 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, } if (flags & SYNC_FILE_RANGE_WRITE) { - ret = filemap_fdatawrite_range(mapping, offset, endbyte); + ret = __filemap_fdatawrite_range(mapping, offset, endbyte, + WB_SYNC_NONE); if (ret < 0) goto out_put; }
sync_file_range(2) is documented to issue writeback only for pages that are not currently being written. After all the system call has been created for userspace to be able to issue background writeout and so waiting for in-flight IO is undesirable there. However commit ee53a891f474 (mm: do_sync_mapping_range integrity fix) switched do_sync_mapping_range() and thus sync_file_range() to issue writeback in WB_SYNC_ALL mode since do_sync_mapping_range() was used by other code relying on WB_SYNC_ALL semantics. These days do_sync_mapping_range() went away and we can switch sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should help PostgreSQL avoid large latency spikes when flushing data in the background. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jan Kara <jack@suse.com> --- fs/sync.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)