Message ID | 20220621102201.26337-1-jack@suse.cz (mailing list archive) |
---|---|
Headers | show |
Series | block: Fix IO priority mess | expand |
On 6/21/22 19:24, Jan Kara wrote: > Hello, > > This is the fourth revision of my patches fixing IO priority handling in the > block layer. Thanks for this. I reviewed and this all looks good to me. Nevertheless, I will give this a spin tomorrow (with ATA drives that have NCQ priority). > > Changes since v3: > * Added Reviewed-by tags from Damien > * Fixed build failure without CONFIG_BLOCK > * Separated refactoring of get_current_ioprio() into a separate patch > > Changes since v2: > * added some comments to better explain things > * changed handling of ioprio_get(2) > * a few small tweaks based on Damien's feedback > > Original cover letter: > Recently, I've been looking into 10% regression reported by our performance > measurement infrastructure in reaim benchmark that was bisected down to > 5a9d041ba2f6 ("block: move io_context creation into where it's needed"). This > didn't really make much sense and it took me a while to understand this but the > culprit is actually in even older commit e70344c05995 ("block: fix default IO > priority handling") and 5a9d041ba2f6 just made the breakage visible. > Essentially the problem was that after these commits some IO was queued with IO > priority class IOPRIO_CLASS_BE while other IO was queued with IOPRIO_CLASS_NONE > and as a result they could not be merged together resulting in performance > regression. I think what commit e70344c05995 ("block: fix default IO priority > handling") did is actually broken not only because of this performance > regression but because of other reasons as well (see changelog of patch 3/8 for > details). Besides this problem, there are multiple other inconsistencies in the > IO priority handling throughout the block stack we have identified when > discussing this with Damien Le Moal. So this patch set aims at fixing all these > various issues. > > Note that there are a few choices I've made I'm not 100% sold on. In particular > the conversion of blk-ioprio from rqos is somewhat disputable since it now > incurs a cost similar to blk-throttle in the bio submission fast path (need to > load bio->bi_blkg->pd[ioprio_policy.plid]). If people think the removed > boilerplate code is not worth the cost, I can certainly go via the "additional > rqos hook" path. > > Honza > Previous versions: > Link: http://lore.kernel.org/r/20220601132347.13543-1-jack@suse.cz # v1 > Link: http://lore.kernel.org/r/20220615160437.5478-1-jack@suse.cz # v2 > Link: http://lore.kernel.org/r/20220620160726.19798-1-jack@suse.cz # v3
On Tue 21-06-22 22:03:54, Damien Le Moal wrote: > On 6/21/22 19:24, Jan Kara wrote: > > Hello, > > > > This is the fourth revision of my patches fixing IO priority handling in the > > block layer. > > Thanks for this. I reviewed and this all looks good to me. > Nevertheless, I will give this a spin tomorrow (with ATA drives that have > NCQ priority). Great, thanks for the review and the testing! I will wait for the results and then post the final version of the patchset. Honza
On 6/21/22 19:24, Jan Kara wrote: > Hello, > > This is the fourth revision of my patches fixing IO priority handling in the > block layer. Hey, I ran some tests with RT IO class on an ATA HDD with NCQ priority. Results look good: randread: (groupid=0, jobs=1): err= 0: pid=1862: Thu Jun 23 09:22:42 2022 read: IOPS=152, BW=19.1MiB/s (20.0MB/s)(11.1GiB/592929msec) slat (usec): min=37, max=3280, avg=174.96, stdev=120.90 clat (msec): min=7, max=918, avg=156.95, stdev=149.43 lat (msec): min=7, max=918, avg=157.13, stdev=149.43 clat prio 0/0 (msec): min=7, max=918, avg=171.68, stdev=150.48 clat prio 1/0 (msec): min=8, max=166, avg=24.64, stdev= 5.93 clat percentiles (msec): | 1.00th=[ 15], 5.00th=[ 20], 10.00th=[ 23], 20.00th=[ 32], | 30.00th=[ 51], 40.00th=[ 77], 50.00th=[ 108], 60.00th=[ 146], | 70.00th=[ 194], 80.00th=[ 262], 90.00th=[ 372], 95.00th=[ 477], | 99.00th=[ 659], 99.50th=[ 701], 99.90th=[ 793], 99.95th=[ 818], | 99.99th=[ 885] clat prio 0/0 (89.99% of IOs) percentiles (msec): | 1.00th=[ 15], 5.00th=[ 21], 10.00th=[ 27], 20.00th=[ 46], | 30.00th=[ 68], 40.00th=[ 94], 50.00th=[ 126], 60.00th=[ 163], | 70.00th=[ 213], 80.00th=[ 279], 90.00th=[ 388], 95.00th=[ 489], | 99.00th=[ 659], 99.50th=[ 709], 99.90th=[ 793], 99.95th=[ 827], | 99.99th=[ 885] clat prio 1/0 (10.01% of IOs) percentiles (msec): | 1.00th=[ 14], 5.00th=[ 17], 10.00th=[ 18], 20.00th=[ 20], | 30.00th=[ 22], 40.00th=[ 23], 50.00th=[ 25], 60.00th=[ 26], | 70.00th=[ 28], 80.00th=[ 30], 90.00th=[ 33], 95.00th=[ 35], | 99.00th=[ 40], 99.50th=[ 42], 99.90th=[ 47], 99.95th=[ 50], | 99.99th=[ 167] Clearly significantly lower tail latencies for the 10% of IOs with class RT (1/0), as expected. This is with "none" scheduler at QD=24 (128K random read). Feel free to add: Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > > Changes since v3: > * Added Reviewed-by tags from Damien > * Fixed build failure without CONFIG_BLOCK > * Separated refactoring of get_current_ioprio() into a separate patch > > Changes since v2: > * added some comments to better explain things > * changed handling of ioprio_get(2) > * a few small tweaks based on Damien's feedback > > Original cover letter: > Recently, I've been looking into 10% regression reported by our performance > measurement infrastructure in reaim benchmark that was bisected down to > 5a9d041ba2f6 ("block: move io_context creation into where it's needed"). This > didn't really make much sense and it took me a while to understand this but the > culprit is actually in even older commit e70344c05995 ("block: fix default IO > priority handling") and 5a9d041ba2f6 just made the breakage visible. > Essentially the problem was that after these commits some IO was queued with IO > priority class IOPRIO_CLASS_BE while other IO was queued with IOPRIO_CLASS_NONE > and as a result they could not be merged together resulting in performance > regression. I think what commit e70344c05995 ("block: fix default IO priority > handling") did is actually broken not only because of this performance > regression but because of other reasons as well (see changelog of patch 3/8 for > details). Besides this problem, there are multiple other inconsistencies in the > IO priority handling throughout the block stack we have identified when > discussing this with Damien Le Moal. So this patch set aims at fixing all these > various issues. > > Note that there are a few choices I've made I'm not 100% sold on. In particular > the conversion of blk-ioprio from rqos is somewhat disputable since it now > incurs a cost similar to blk-throttle in the bio submission fast path (need to > load bio->bi_blkg->pd[ioprio_policy.plid]). If people think the removed > boilerplate code is not worth the cost, I can certainly go via the "additional > rqos hook" path. > > Honza > Previous versions: > Link: http://lore.kernel.org/r/20220601132347.13543-1-jack@suse.cz # v1 > Link: http://lore.kernel.org/r/20220615160437.5478-1-jack@suse.cz # v2 > Link: http://lore.kernel.org/r/20220620160726.19798-1-jack@suse.cz # v3
On Thu 23-06-22 09:24:56, Damien Le Moal wrote: > On 6/21/22 19:24, Jan Kara wrote: > > Hello, > > > > This is the fourth revision of my patches fixing IO priority handling in the > > block layer. > > Hey, > > I ran some tests with RT IO class on an ATA HDD with NCQ priority. > Results look good: > > randread: (groupid=0, jobs=1): err= 0: pid=1862: Thu Jun 23 09:22:42 2022 > read: IOPS=152, BW=19.1MiB/s (20.0MB/s)(11.1GiB/592929msec) > slat (usec): min=37, max=3280, avg=174.96, stdev=120.90 > clat (msec): min=7, max=918, avg=156.95, stdev=149.43 > lat (msec): min=7, max=918, avg=157.13, stdev=149.43 > clat prio 0/0 (msec): min=7, max=918, avg=171.68, stdev=150.48 > clat prio 1/0 (msec): min=8, max=166, avg=24.64, stdev= 5.93 > clat percentiles (msec): > | 1.00th=[ 15], 5.00th=[ 20], 10.00th=[ 23], 20.00th=[ 32], > | 30.00th=[ 51], 40.00th=[ 77], 50.00th=[ 108], 60.00th=[ 146], > | 70.00th=[ 194], 80.00th=[ 262], 90.00th=[ 372], 95.00th=[ 477], > | 99.00th=[ 659], 99.50th=[ 701], 99.90th=[ 793], 99.95th=[ 818], > | 99.99th=[ 885] > clat prio 0/0 (89.99% of IOs) percentiles (msec): > | 1.00th=[ 15], 5.00th=[ 21], 10.00th=[ 27], 20.00th=[ 46], > | 30.00th=[ 68], 40.00th=[ 94], 50.00th=[ 126], 60.00th=[ 163], > | 70.00th=[ 213], 80.00th=[ 279], 90.00th=[ 388], 95.00th=[ 489], > | 99.00th=[ 659], 99.50th=[ 709], 99.90th=[ 793], 99.95th=[ 827], > | 99.99th=[ 885] > clat prio 1/0 (10.01% of IOs) percentiles (msec): > | 1.00th=[ 14], 5.00th=[ 17], 10.00th=[ 18], 20.00th=[ 20], > | 30.00th=[ 22], 40.00th=[ 23], 50.00th=[ 25], 60.00th=[ 26], > | 70.00th=[ 28], 80.00th=[ 30], 90.00th=[ 33], 95.00th=[ 35], > | 99.00th=[ 40], 99.50th=[ 42], 99.90th=[ 47], 99.95th=[ 50], > | 99.99th=[ 167] > > Clearly significantly lower tail latencies for the 10% of IOs with class > RT (1/0), as expected. This is with "none" scheduler at QD=24 (128K random > read). Nice! :) > Feel free to add: > > Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Thanks for testing! I'll post the final patch version shortly. Honza