Message ID | 1635852455-39935-1-git-send-email-john.garry@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags | expand |
On 02/11/2021 11:27, John Garry wrote: Hi Kashyap, Any chance you can try this series or give an update on the issue reported earlier? thanks @ Ming for the reviews. Cheers, John > In [0], Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter() > and callees for shared tags. > > Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to > shared tags, but it was not optimum previously. > > So I would like this series tested, and also to know what is triggering > blk_mq_queue_tag_busy_iter() from userspace to cause such high CPU > loading. > > As suggested by Ming, reading /proc/diskstats in a while true loop > can trigger blk_mq_queue_tag_busy_iter(); I do so in a test with 2x > separate consoles, and here are the results: > > v5.15 > blk_mq_queue_tag_busy_iter() 6.2% > part_stat_read_all() 6.7 > > pre-v5.16 (Linus' master branch @ commit bfc484fe6abb) > blk_mq_queue_tag_busy_iter() 4.5% > part_stat_read_all() 6.2 > > pre-v5.16+this series > blk_mq_queue_tag_busy_iter() not shown in top users > part_stat_read_all() 7.5% > > These results are from perf top, on a system with 7x > disks, with hisi_sas which has 16x HW queues. > > [0] https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/ > > John Garry (3): > blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument > blk-mq: Delete busy_iter_fn > blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags > > block/blk-mq-tag.c | 58 +++++++++++++++++++++++++++--------------- > block/blk-mq-tag.h | 2 +- > block/blk-mq.c | 17 ++++++------- > include/linux/blk-mq.h | 2 -- > 4 files changed, 47 insertions(+), 32 deletions(-) >
> Hi Kashyap, > > Any chance you can try this series or give an update on the issue reported > earlier? John - I will try something this week and let you know. > > thanks @ Ming for the reviews. > > Cheers, > John >
> -----Original Message----- > From: Kashyap Desai [mailto:kashyap.desai@broadcom.com] > Sent: Monday, November 15, 2021 10:02 PM > To: 'John Garry' <john.garry@huawei.com>; 'axboe@kernel.dk' > <axboe@kernel.dk> > Cc: 'linux-block@vger.kernel.org' <linux-block@vger.kernel.org>; 'linux- > kernel@vger.kernel.org' <linux-kernel@vger.kernel.org>; > 'ming.lei@redhat.com' > <ming.lei@redhat.com>; 'hare@suse.de' <hare@suse.de> > Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() > for shared tags > > > Hi Kashyap, > > > > Any chance you can try this series or give an update on the issue > > reported earlier? > > John - > > I will try something this week and let you know. John - I tested patchset and looks good. Issue reported at below thread is fixed. https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/ Here is perf top data. 5.70% [megaraid_sas] [k] complete_cmd_fusion 3.80% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion 2.75% [kernel] [k] syscall_return_via_sysret 2.68% [kernel] [k] entry_SYSCALL_64 2.22% [kernel] [k] io_submit_one 2.19% [megaraid_sas] [k] megasas_build_ldio_fusion 1.95% [kernel] [k] llist_add_batch 1.80% [kernel] [k] llist_reverse_order 1.79% [kernel] [k] scsi_complete 1.73% [kernel] [k] scsi_queue_rq 1.66% [kernel] [k] check_preemption_disabled 1.37% [megaraid_sas] [k] megasas_queue_command 1.16% [kernel] [k] native_irq_return_iret 1.11% [kernel] [k] aio_complete_rw 1.07% [kernel] [k] read_tsc 1.06% fio [.] __fio_gettime 1.05% [kernel] [k] flush_smp_call_function_queue 1.03% [kernel] [k] blk_complete_reqs 1.00% [kernel] [k] blk_mq_free_request 0.98% [kernel] [k] sbitmap_get I will continue testing and let you know how it goes. Kashyap > > > > > thanks @ Ming for the reviews. > > > > Cheers, > > John > >
On 25/11/2021 13:46, Kashyap Desai wrote: >> John - >> >> I will try something this week and let you know. > John - I tested patchset and looks good. Issue reported at below thread is > fixed. > https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/ > > Here is perf top data. > > 5.70% [megaraid_sas] [k] complete_cmd_fusion > 3.80% [megaraid_sas] [k] > megasas_build_and_issue_cmd_fusion > 2.75% [kernel] [k] syscall_return_via_sysret > 2.68% [kernel] [k] entry_SYSCALL_64 > 2.22% [kernel] [k] io_submit_one > 2.19% [megaraid_sas] [k] megasas_build_ldio_fusion > 1.95% [kernel] [k] llist_add_batch > 1.80% [kernel] [k] llist_reverse_order > 1.79% [kernel] [k] scsi_complete > 1.73% [kernel] [k] scsi_queue_rq > 1.66% [kernel] [k] check_preemption_disabled > 1.37% [megaraid_sas] [k] megasas_queue_command > 1.16% [kernel] [k] native_irq_return_iret > 1.11% [kernel] [k] aio_complete_rw > 1.07% [kernel] [k] read_tsc > 1.06% fio [.] __fio_gettime > 1.05% [kernel] [k] flush_smp_call_function_queue > 1.03% [kernel] [k] blk_complete_reqs > 1.00% [kernel] [k] blk_mq_free_request > 0.98% [kernel] [k] sbitmap_get > > > I will continue testing and let you know how it goes. ok, good to know, thanks. But I would still like to know what is triggering blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned in this cover letter, this function was hardly optimised before for shared sbitmap. And any opinion whether we would want this as a fix? Information requested above would help explain why we would need it as a fix. Cheers, John
> > > > > > I will continue testing and let you know how it goes. > > ok, good to know, thanks. But I would still like to know what is > triggering > blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned in this cover > letter, this function was hardly optimised before for shared sbitmap. If I give "--disk_util=0" option in my fio run, caller of " blk_mq_queue_tag_busy_iter" reduced drastically. As part of <fio> run, application call diskutils operations and it is almost same as doing "cat /proc/stats" in loop. Looking at fio code, it call diskstats every 250 msec. Here is sample fio logs - diskutil 87720 /sys/block/sdb/stat: stat read ok? 0 diskutil 87720 update io ticks diskutil 87720 open stat file: /sys/block/sdb/stat diskutil 87720 /sys/block/sdb/stat: 127853173 0 1022829056 241827073 0 0 0 0 255 984012 241827073 0 0 0 0 0 0 There is one more call trace, but not sure why it is getting executed in my test. Below path does not execute so frequently but it consumes cpu (not noticeable on my setup) kthread worker_thread process_one_work blk_mq_timeout_work blk_mq_queue_tag_busy_iter bt_iter blk_mq_find_and_get_req _raw_spin_lock_irqsave native_queued_spin_lock_slowpath This patch set improves above call trace even after disk_util=0 is set. Kashyap > > And any opinion whether we would want this as a fix? Information requested > above would help explain why we would need it as a fix. > > Cheers, > John
On 26/11/2021 11:25, Kashyap Desai wrote: >>> >>> I will continue testing and let you know how it goes. >> ok, good to know, thanks. But I would still like to know what is >> triggering >> blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned in this cover >> letter, this function was hardly optimised before for shared sbitmap. > If I give "--disk_util=0" option in my fio run, caller of " > blk_mq_queue_tag_busy_iter" reduced drastically. > As part of <fio> run, application call diskutils operations and it is almost > same as doing "cat /proc/stats" in loop. > Looking at fio code, it call diskstats every 250 msec. Here is sample fio > logs - > > diskutil 87720 /sys/block/sdb/stat: stat read ok? 0 > diskutil 87720 update io ticks > diskutil 87720 open stat file: /sys/block/sdb/stat > diskutil 87720 /sys/block/sdb/stat: 127853173 0 1022829056 241827073 > 0 0 0 0 255 984012 241827073 0 0 > 0 0 0 0 > > There is one more call trace, but not sure why it is getting executed in my > test. Below path does not execute so frequently but it consumes cpu (not > noticeable on my setup) > > kthread > worker_thread > process_one_work > blk_mq_timeout_work > blk_mq_queue_tag_busy_iter > bt_iter > blk_mq_find_and_get_req > _raw_spin_lock_irqsave > native_queued_spin_lock_slowpath > > It would be still nice to know where this is coming from. > This patch set improves above call trace even after disk_util=0 is set. ok, fine. Thanks for testing. So I guess that this is a regression, and you would want this series for v5.16, right? My changes were made with v5.17 in mind. I am not sure how Jens feels about it, since the changes are significant. It would be a lot easier to argue for v5.16 if we got to this point earlier in the cycle... Anyway, it would be good to have full review first, so please help with that. @Ming, can you please give feedback on 3/3 here? BTW, I am on vacation next week and can't help progress then, so any assistance would be good. Thanks, John
On 26/11/2021 11:51, John Garry wrote: >> This patch set improves above call trace even after disk_util=0 is set. > ok, fine. Thanks for testing. > > So I guess that this is a regression, and you would want this series for > v5.16, right? My changes were made with v5.17 in mind. > > I am not sure how Jens feels about it, since the changes are > significant. It would be a lot easier to argue for v5.16 if we got to > this point earlier in the cycle... > note: I will now resend for 5.17 and add your tested-by, please let me know if unhappy about that. > Anyway, it would be good to have full review first, so please help with > that. > > @Ming, can you please give feedback on 3/3 here? Thanks and also to Hannes for the reviews. john
> -----Original Message----- > From: John Garry [mailto:john.garry@huawei.com] > Sent: Monday, December 6, 2021 3:28 PM > To: Kashyap Desai <kashyap.desai@broadcom.com>; axboe@kernel.dk > Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; > ming.lei@redhat.com; hare@suse.de > Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() > for shared tags > > On 26/11/2021 11:51, John Garry wrote: > >> This patch set improves above call trace even after disk_util=0 is set. > > ok, fine. Thanks for testing. > > > > So I guess that this is a regression, and you would want this series > > for v5.16, right? My changes were made with v5.17 in mind. > > > > I am not sure how Jens feels about it, since the changes are > > significant. It would be a lot easier to argue for v5.16 if we got to > > this point earlier in the cycle... > > > > note: I will now resend for 5.17 and add your tested-by, please let me > know if > unhappy about that. John - 5.17 should be OK. I am doing additional testing and so far no issue. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > Anyway, it would be good to have full review first, so please help > > with that. > > > > @Ming, can you please give feedback on 3/3 here? > > Thanks and also to Hannes for the reviews. > > john
On 06/12/2021 10:03, Kashyap Desai wrote: >> note: I will now resend for 5.17 and add your tested-by, please let me >> know if >> unhappy about that. > John - 5.17 should be OK. I am doing additional testing and so far no issue. > > Tested-by: Kashyap Desai<kashyap.desai@broadcom.com> > Hi Kashyap, On a related topic, you mentioned in an earlier discussion another possible regression you saw around 5.11 or 5.12, unrelated to shared tags transition: https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/ Did you come to any conclusion on that? Thanks, John