Message ID | 20240907024901.405881-1-zhanghui31@xiaomi.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] block: move non sync requests complete flow to softirq | expand |
On 9/6/24 8:49 PM, ZhangHui wrote: > From: zhanghui <zhanghui31@xiaomi.com> > > Currently, for a controller that supports multiple queues, like UFS4.0, > the mq_ops->complete is executed in the interrupt top-half. Therefore, > the file system's end io is executed during the request completion process, > such as f2fs_write_end_io on smartphone. > > However, we found that the execution time of the file system end io > is strongly related to the size of the bio and the processing speed > of the CPU. Because the file system's end io will traverse every page > in bio, this is a very time-consuming operation. > > We measured that the 80M bio write operation on the little CPU will > cause the execution time of the top-half to be greater than 100ms, > which will undoubtedly affect interrupt response latency. > > Let's fix this issue by moving non sync requests completion to softirq > context, and keeping sync requests completion in the IRQ top-half context. You keep ignoring the feedback, and hence I too shall be ignoring this patch going forward then. The key issue here is that the completion takes so long, and adding a heuristic that equates not-sync with latency-not-important is pretty bogus and not a good way to attempt to work around it.
On 2024/9/7 21:46, Jens Axboe wrote: > On 9/6/24 8:49 PM, ZhangHui wrote: >> From: zhanghui <zhanghui31@xiaomi.com> >> >> Currently, for a controller that supports multiple queues, like UFS4.0, >> the mq_ops->complete is executed in the interrupt top-half. Therefore, >> the file system's end io is executed during the request completion process, >> such as f2fs_write_end_io on smartphone. >> >> However, we found that the execution time of the file system end io >> is strongly related to the size of the bio and the processing speed >> of the CPU. Because the file system's end io will traverse every page >> in bio, this is a very time-consuming operation. >> >> We measured that the 80M bio write operation on the little CPU will >> cause the execution time of the top-half to be greater than 100ms, >> which will undoubtedly affect interrupt response latency. >> >> Let's fix this issue by moving non sync requests completion to softirq >> context, and keeping sync requests completion in the IRQ top-half context. > You keep ignoring the feedback, and hence I too shall be ignoring this > patch going forward then. > > The key issue here is that the completion takes so long, and adding a > heuristic that equates not-sync with latency-not-important is pretty > bogus and not a good way to attempt to work around it. > > -- > Jens Axboe > hi Jens, Sorry for not replying in time. We have basically determined the plan for the f2fs side. The short-term plan is to limit the size of a single bio, and the long-term plan is to change f2fs from page to folio to reduce the pagecache traversal time. However, I think it also makes sense to move less urgent work out of the IRQ top-half. Thanks Zhang
On 9/8/24 8:17 PM, ?? wrote: > On 2024/9/7 21:46, Jens Axboe wrote: >> On 9/6/24 8:49 PM, ZhangHui wrote: >>> From: zhanghui <zhanghui31@xiaomi.com> >>> >>> Currently, for a controller that supports multiple queues, like UFS4.0, >>> the mq_ops->complete is executed in the interrupt top-half. Therefore, >>> the file system's end io is executed during the request completion process, >>> such as f2fs_write_end_io on smartphone. >>> >>> However, we found that the execution time of the file system end io >>> is strongly related to the size of the bio and the processing speed >>> of the CPU. Because the file system's end io will traverse every page >>> in bio, this is a very time-consuming operation. >>> >>> We measured that the 80M bio write operation on the little CPU will >>> cause the execution time of the top-half to be greater than 100ms, >>> which will undoubtedly affect interrupt response latency. >>> >>> Let's fix this issue by moving non sync requests completion to softirq >>> context, and keeping sync requests completion in the IRQ top-half context. >> You keep ignoring the feedback, and hence I too shall be ignoring this >> patch going forward then. >> >> The key issue here is that the completion takes so long, and adding a >> heuristic that equates not-sync with latency-not-important is pretty >> bogus and not a good way to attempt to work around it. >> >> -- >> Jens Axboe >> > hi Jens, > > Sorry for not replying in time. > > We have basically determined the plan for the f2fs side. The short-term > plan is to limit the size of a single bio, and the long-term plan is to > change f2fs from page to folio to reduce the pagecache traversal time. > > However, I think it also makes sense to move less urgent work out of the > IRQ top-half. What you are missing is that what you think is less urgent, may be just as urgent as other requests to others. !rq_is_sync() doesn't mean that it's necessarily a background or low priority request. So no, I'm not interested in merging an odd work-around for what is really a different issue. Fixing f2fs is indeed the right way, and I'd suggest in the mean time you just limit the per-request size to something a lot more reasonable. If you see high latencies with 80MB requests, then perhaps don't be doing 80MB requests. That should be well beyond the diminishing returns point for bandwidth anyway, there's no reason why anyone should be doing requests that huge and not expect longer processing times.
diff --git a/block/blk-mq.c b/block/blk-mq.c index e3c3c0c21b55..45e4d255ea3b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1210,7 +1210,11 @@ bool blk_mq_complete_request_remote(struct request *rq) return true; } - if (rq->q->nr_hw_queues == 1) { + /* + * To reduce the execution time in the IRQ top-half, + * move non-sync request completions to softirq context. + */ + if (rq->q->nr_hw_queues == 1 || !rq_is_sync(rq)) { blk_mq_raise_softirq(rq); return true; }