Message ID | 20230530180959.1108766-1-stefanha@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | block: add blk_io_plug_call() API | expand |
Am 31.05.2023 um 21:50 hat Stefan Hajnoczi geschrieben: > Hi Kevin, > Do you want to review the thread-local blk_io_plug() patch series or > should I merge it? I haven't reviewed it in detail, but on the high level it looks good to me, and you already got reviews for the details. Acked-by: Kevin Wolf <kwolf@redhat.com>
On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote: > v3 > - Patch 5: Mention why dev_max_batch condition was dropped [Stefano] > v2 > - Patch 1: "is not be freed" -> "is not freed" [Eric] > - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event > [Stefano] > - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano] > - Patch 4: Removed whitespace hunk [Eric] > > The existing blk_io_plug() API is not block layer multi-queue friendly because > the plug state is per-BlockDriverState. > > Change blk_io_plug()'s implementation so it is thread-local. This is done by > introducing the blk_io_plug_call() function that block drivers use to batch > calls while plugged. It is relatively easy to convert block drivers from > .bdrv_co_io_plug() to blk_io_plug_call(). > > Random read 4KB performance with virtio-blk on a host NVMe block device: > > iodepth iops change vs today > 1 45612 -4% > 2 87967 +2% > 4 129872 +0% > 8 171096 -3% > 16 194508 -4% > 32 208947 -1% > 64 217647 +0% > 128 229629 +0% > > The results are within the noise for these benchmarks. This is to be expected > because the plugging behavior for a single thread hasn't changed in this patch > series, only that the state is thread-local now. > > The following graph compares several approaches: > https://vmsplice.net/~stefan/blk_io_plug-thread-local.png > - v7.2.0: before most of the multi-queue block layer changes landed. > - with-blk_io_plug: today's post-8.0.0 QEMU. > - blk_io_plug-thread-local: this patch series. > - no-blk_io_plug: what happens when we simply remove plugging? > - call-after-dispatch: what if we integrate plugging into the event loop? I > decided against this approach in the end because it's more likely to > introduce performance regressions since I/O submission is deferred until the > end of the event loop iteration. > > Aside from the no-blk_io_plug case, which bottlenecks much earlier than the > others, we see that all plugging approaches are more or less equivalent in this > benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0. > > The Ansible playbook, fio results, and a Jupyter notebook are available here: > https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug > > Stefan Hajnoczi (6): > block: add blk_io_plug_call() API > block/nvme: convert to blk_io_plug_call() API > block/blkio: convert to blk_io_plug_call() API > block/io_uring: convert to blk_io_plug_call() API > block/linux-aio: convert to blk_io_plug_call() API > block: remove bdrv_co_io_plug() API > > MAINTAINERS | 1 + > include/block/block-io.h | 3 - > include/block/block_int-common.h | 11 --- > include/block/raw-aio.h | 14 --- > include/sysemu/block-backend-io.h | 13 +-- > block/blkio.c | 43 ++++---- > block/block-backend.c | 22 ----- > block/file-posix.c | 38 ------- > block/io.c | 37 ------- > block/io_uring.c | 44 ++++----- > block/linux-aio.c | 41 +++----- > block/nvme.c | 44 +++------ > block/plug.c | 159 ++++++++++++++++++++++++++++++ > hw/block/dataplane/xen-block.c | 8 +- > hw/block/virtio-blk.c | 4 +- > hw/scsi/virtio-scsi.c | 6 +- > block/meson.build | 1 + > block/trace-events | 6 +- > 18 files changed, 239 insertions(+), 256 deletions(-) > create mode 100644 block/plug.c > > -- > 2.40.1 > Thanks, applied to my block tree: https://gitlab.com/stefanha/qemu/commits/block Stefan