Message ID | 20210520223353.11561-2-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Fix deadlock when merging requests with BFQ | expand |
On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > Most of the merging happens at bio level. There should not be much > merging happening at request level anymore. Furthermore if we backmerged > a request to the previous one, the chances to be able to merge the > result to even previous request are slim - that could succeed only if > requests were inserted in 2 1 3 order. Merging more requests in Right, but some workload has this kind of pattern. For example of qemu IO emulation, it often can be thought as single job, native aio, direct io with high queue depth. IOs is originated from one VM, but may be from multiple jobs in the VM, so bio merge may not hit much because of IO emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple jobs via the SQ transport), but request merge can really make a difference, see recent patch in the following link: https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > elv_attempt_insert_merge() will be difficult to handle when we want to > pass requests to free back to the caller of > blk_mq_sched_try_insert_merge(). So just remove the possibility of > merging multiple requests in elv_attempt_insert_merge(). This way will cause regression. Thanks, Ming
On Fri 21-05-21 08:42:16, Ming Lei wrote: > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > Most of the merging happens at bio level. There should not be much > > merging happening at request level anymore. Furthermore if we backmerged > > a request to the previous one, the chances to be able to merge the > > result to even previous request are slim - that could succeed only if > > requests were inserted in 2 1 3 order. Merging more requests in > > Right, but some workload has this kind of pattern. > > For example of qemu IO emulation, it often can be thought as single job, > native aio, direct io with high queue depth. IOs is originated from one VM, but > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > jobs via the SQ transport), but request merge can really make a difference, see > recent patch in the following link: > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t Oh, request merging definitely does make a difference. But the elevator hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() from their \.bio_merge handler which gets called from blk_mq_submit_bio(). So all the merging that can happen in the code I remove should have already happened. Or am I missing something? Honza
On Fri, May 21, 2021 at 01:53:54PM +0200, Jan Kara wrote: > On Fri 21-05-21 08:42:16, Ming Lei wrote: > > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > > Most of the merging happens at bio level. There should not be much > > > merging happening at request level anymore. Furthermore if we backmerged > > > a request to the previous one, the chances to be able to merge the > > > result to even previous request are slim - that could succeed only if > > > requests were inserted in 2 1 3 order. Merging more requests in > > > > Right, but some workload has this kind of pattern. > > > > For example of qemu IO emulation, it often can be thought as single job, > > native aio, direct io with high queue depth. IOs is originated from one VM, but > > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > > jobs via the SQ transport), but request merge can really make a difference, see > > recent patch in the following link: > > > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > > Oh, request merging definitely does make a difference. But the elevator > hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE > AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() > from their \.bio_merge handler which gets called from blk_mq_submit_bio(). > So all the merging that can happen in the code I remove should have already > happened. Or am I missing something? There might be at least two reasons: 1) when .bio_merge() is called, some requests are kept in plug list, so the bio may not be merged to requests in scheduler queue; when flushing plug list and inserts these requests to scheduler queue, we have to try to merge them further 2) only blk_mq_sched_try_insert_merge() is capable of doing aggressive request merge, such as, when req A is merged to req B, the function will continue to try to merge req B with other in-queue requests, until no any further merge can't be done; neither blk_mq_sched_try_merge() nor blk_attempt_plug_merge can do such aggressive request merge. Thanks, Ming
On Fri 21-05-21 21:12:14, Ming Lei wrote: > On Fri, May 21, 2021 at 01:53:54PM +0200, Jan Kara wrote: > > On Fri 21-05-21 08:42:16, Ming Lei wrote: > > > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > > > Most of the merging happens at bio level. There should not be much > > > > merging happening at request level anymore. Furthermore if we backmerged > > > > a request to the previous one, the chances to be able to merge the > > > > result to even previous request are slim - that could succeed only if > > > > requests were inserted in 2 1 3 order. Merging more requests in > > > > > > Right, but some workload has this kind of pattern. > > > > > > For example of qemu IO emulation, it often can be thought as single job, > > > native aio, direct io with high queue depth. IOs is originated from one VM, but > > > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > > > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > > > jobs via the SQ transport), but request merge can really make a difference, see > > > recent patch in the following link: > > > > > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > > > > Oh, request merging definitely does make a difference. But the elevator > > hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE > > AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() > > from their \.bio_merge handler which gets called from blk_mq_submit_bio(). > > So all the merging that can happen in the code I remove should have already > > happened. Or am I missing something? > > There might be at least two reasons: > > 1) when .bio_merge() is called, some requests are kept in plug list, so > the bio may not be merged to requests in scheduler queue; when flushing plug > list and inserts these requests to scheduler queue, we have to try to > merge them further Oh, right, I forgot that plug list stores already requests, not bios. > 2) only blk_mq_sched_try_insert_merge() is capable of doing aggressive > request merge, such as, when req A is merged to req B, the function will > continue to try to merge req B with other in-queue requests, until no > any further merge can't be done; neither blk_mq_sched_try_merge() nor > blk_attempt_plug_merge can do such aggressive request merge. Yes, fair point. I was thinking only about a few requests but it the request sequence is like 0 2 4 6 ... 2n 1 3 5 7 ... 2n+1, then bio merging will result in 'n' requests while request merging will be able to get it down to 1 request. I'll keep the recursive merge and pass back list of requests to free instead. Thanks for explanations! Honza
diff --git a/block/elevator.c b/block/elevator.c index 440699c28119..098f4bd226f5 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -350,12 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, * we can append 'rq' to an existing request, so we can throw 'rq' away * afterwards. * - * Returns true if we merged, false otherwise + * Returns true if we merged, false otherwise. */ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) { struct request *__rq; - bool ret; if (blk_queue_nomerges(q)) return false; @@ -369,21 +368,13 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) if (blk_queue_noxmerges(q)) return false; - ret = false; /* * See if our hash lookup can find a potential backmerge. */ - while (1) { - __rq = elv_rqhash_find(q, blk_rq_pos(rq)); - if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) - break; - - /* The merged request could be merged with others, try again */ - ret = true; - rq = __rq; - } - - return ret; + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); + if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) + return false; + return true; } void elv_merged_request(struct request_queue *q, struct request *rq,
Most of the merging happens at bio level. There should not be much merging happening at request level anymore. Furthermore if we backmerged a request to the previous one, the chances to be able to merge the result to even previous request are slim - that could succeed only if requests were inserted in 2 1 3 order. Merging more requests in elv_attempt_insert_merge() will be difficult to handle when we want to pass requests to free back to the caller of blk_mq_sched_try_insert_merge(). So just remove the possibility of merging multiple requests in elv_attempt_insert_merge(). Signed-off-by: Jan Kara <jack@suse.cz> --- block/elevator.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)