Message ID | 20161115063715.12561-2-pbutsykin@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: > It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide > a byte-based interface for AIO read/write. > > Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> I'm in the process to phase out the last users of bdrv_aio_*() so that this set of interfaces can be removed. I'm doing this because it's an unnecessary redundancy, we have too many wrapper functions that expose the same functionality with different syntax. So let's not add new users. At first sight, you don't even seem to use bdrv_aio_preadv() for actual parallelism, but you often have a pattern like this: void foo_cb(void *opaque) { ... qemu_coroutine_enter(acb->co); } void caller() { ... acb = bdrv_aio_preadv(...); qemu_coroutine_yield(); } The code will actually become a lot simpler if you use bdrv_co_preadv() instead because you don't have to have a callback, but you get pure sequential code. The part that actually has some parallelism, pcache_readahead_request(), already creates its own coroutine, so it runs in the background without using callback-style interfaces. Kevin
On 23.11.2016 17:28, Kevin Wolf wrote: > Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: >> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide >> a byte-based interface for AIO read/write. >> >> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> > > I'm in the process to phase out the last users of bdrv_aio_*() so that > this set of interfaces can be removed. I'm doing this because it's an > unnecessary redundancy, we have too many wrapper functions that expose > the same functionality with different syntax. So let's not add new > users. > > At first sight, you don't even seem to use bdrv_aio_preadv() for actual > parallelism, but you often have a pattern like this: > > void foo_cb(void *opaque) > { > ... > qemu_coroutine_enter(acb->co); > } > > void caller() > { > ... > acb = bdrv_aio_preadv(...); > qemu_coroutine_yield(); > } > > The code will actually become a lot simpler if you use bdrv_co_preadv() > instead because you don't have to have a callback, but you get pure > sequential code. > > The part that actually has some parallelism, pcache_readahead_request(), > already creates its own coroutine, so it runs in the background without > using callback-style interfaces. I used bdrv_co_preadv(), because it conveniently solves the partial cache hit. To solve the partial cache hit, we need to split a request into smaller parts, make asynchronous requests and wait for all requests in one place. Do you propose to create a coroutine for each part of request? It seemed to me that bdrv_co_preadv() is a wrapper that allows us to get rid of the same code. > Kevin >
Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben: > On 23.11.2016 17:28, Kevin Wolf wrote: > >Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: > >>It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide > >> a byte-based interface for AIO read/write. > >> > >>Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> > > > >I'm in the process to phase out the last users of bdrv_aio_*() so that > >this set of interfaces can be removed. I'm doing this because it's an > >unnecessary redundancy, we have too many wrapper functions that expose > >the same functionality with different syntax. So let's not add new > >users. > > > >At first sight, you don't even seem to use bdrv_aio_preadv() for actual > >parallelism, but you often have a pattern like this: > > > > void foo_cb(void *opaque) > > { > > ... > > qemu_coroutine_enter(acb->co); > > } > > > > void caller() > > { > > ... > > acb = bdrv_aio_preadv(...); > > qemu_coroutine_yield(); > > } > > > >The code will actually become a lot simpler if you use bdrv_co_preadv() > >instead because you don't have to have a callback, but you get pure > >sequential code. > > > >The part that actually has some parallelism, pcache_readahead_request(), > >already creates its own coroutine, so it runs in the background without > >using callback-style interfaces. > > I used bdrv_co_preadv(), because it conveniently solves the partial > cache hit. To solve the partial cache hit, we need to split a request > into smaller parts, make asynchronous requests and wait for all > requests in one place. > > Do you propose to create a coroutine for each part of request? It > seemed to me that bdrv_co_preadv() is a wrapper that allows us to get > rid of the same code. It's actually the other way round, bdrv_co_preadv() is the "native" block layer API, and bdrv_aio_*() are wrappers providing an alternative interface. I'm looking at pcache_co_readahead(), for example. It looks like this: bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov, node->common.bytes, pcache_aio_readahead_cb, &readahead_acb); qemu_coroutine_yield(); And then we have pcache_aio_readahead_cb(), which ends in: qemu_coroutine_enter(acb->co); So here the callback style doesn't buy you anything, it just rips the code apart in two function. There is no parallelism here anyway, pcache_co_readahead() doesn't do anything until the callback reenters it. This is a very obvious example where bdrv_co_preadv() will simplify the code. It's similar with the other bdrv_aio_preadv() calls, which are in pcache_co_preadv(): if (bytes > s->max_aio_size) { bdrv_aio_preadv(bs->file, offset, qiov, bytes, pcache_aio_read_cb, &acb); goto out; } update_req_stats(s->req_stats, offset, bytes); status = pcache_lookup_data(&acb); if (status == CACHE_MISS) { bdrv_aio_preadv(bs->file, offset, qiov, bytes, pcache_aio_read_cb, &acb); } else if (status == PARTIAL_CACHE_HIT) { assert(acb.part.qiov.niov != 0); bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov, acb.part.bytes, pcache_aio_read_cb, &acb); } pcache_readahead_request(&acb); if (status == CACHE_HIT && --acb.ref == 0) { return 0; } out: qemu_coroutine_yield(); Here you have mainly the pcache_readahead_request() call between bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which works in the background, so I think you can move it to before the reads and then the reads can trivially become bdrv_co_preadv() and the callback can again be inlined instead of ripping the function in two parts. The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same thing and using the coroutine version results in obvious code improvements. And I think this are all uses of bdrv_aio_*() in the pcache driver, so converting it to use bdrv_co_*() instead isn't only possible, but will improve the legibility of your code, too. It's a clear win in all three places. Kevin
On 24.11.2016 15:36, Kevin Wolf wrote: > Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben: >> On 23.11.2016 17:28, Kevin Wolf wrote: >>> Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: >>>> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide >>>> a byte-based interface for AIO read/write. >>>> >>>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> >>> >>> I'm in the process to phase out the last users of bdrv_aio_*() so that >>> this set of interfaces can be removed. I'm doing this because it's an >>> unnecessary redundancy, we have too many wrapper functions that expose >>> the same functionality with different syntax. So let's not add new >>> users. >>> >>> At first sight, you don't even seem to use bdrv_aio_preadv() for actual >>> parallelism, but you often have a pattern like this: >>> >>> void foo_cb(void *opaque) >>> { >>> ... >>> qemu_coroutine_enter(acb->co); >>> } >>> >>> void caller() >>> { >>> ... >>> acb = bdrv_aio_preadv(...); >>> qemu_coroutine_yield(); >>> } >>> >>> The code will actually become a lot simpler if you use bdrv_co_preadv() >>> instead because you don't have to have a callback, but you get pure >>> sequential code. >>> >>> The part that actually has some parallelism, pcache_readahead_request(), >>> already creates its own coroutine, so it runs in the background without >>> using callback-style interfaces. >> >> I used bdrv_co_preadv(), because it conveniently solves the partial >> cache hit. To solve the partial cache hit, we need to split a request >> into smaller parts, make asynchronous requests and wait for all >> requests in one place. >> >> Do you propose to create a coroutine for each part of request? It >> seemed to me that bdrv_co_preadv() is a wrapper that allows us to get >> rid of the same code. > > It's actually the other way round, bdrv_co_preadv() is the "native" > block layer API, and bdrv_aio_*() are wrappers providing an alternative > interface. > Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :) Also I forgot that in this version, pcache can't split one request into many small requests(as in RFC version). > I'm looking at pcache_co_readahead(), for example. It looks like this: > > bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov, > node->common.bytes, pcache_aio_readahead_cb, > &readahead_acb); > qemu_coroutine_yield(); > > And then we have pcache_aio_readahead_cb(), which ends in: > > qemu_coroutine_enter(acb->co); > > So here the callback style doesn't buy you anything, it just rips the > code apart in two function. There is no parallelism here anyway, > pcache_co_readahead() doesn't do anything until the callback reenters > it. This is a very obvious example where bdrv_co_preadv() will simplify > the code. > I agree. > It's similar with the other bdrv_aio_preadv() calls, which are in > pcache_co_preadv(): > > if (bytes > s->max_aio_size) { > bdrv_aio_preadv(bs->file, offset, qiov, bytes, > pcache_aio_read_cb, &acb); > goto out; > } > > update_req_stats(s->req_stats, offset, bytes); > > status = pcache_lookup_data(&acb); > if (status == CACHE_MISS) { > bdrv_aio_preadv(bs->file, offset, qiov, bytes, > pcache_aio_read_cb, &acb); > } else if (status == PARTIAL_CACHE_HIT) { > assert(acb.part.qiov.niov != 0); > bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov, > acb.part.bytes, pcache_aio_read_cb, &acb); > } > > pcache_readahead_request(&acb); > > if (status == CACHE_HIT && --acb.ref == 0) { > return 0; > } > > out: > qemu_coroutine_yield(); > > Here you have mainly the pcache_readahead_request() call between > bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which > works in the background, so I think you can move it to before the reads > and then the reads can trivially become bdrv_co_preadv() and the > callback can again be inlined instead of ripping the function in two > parts. > > > The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same > thing and using the coroutine version results in obvious code > improvements. > > > And I think this are all uses of bdrv_aio_*() in the pcache driver, so > converting it to use bdrv_co_*() instead isn't only possible, but will > improve the legibility of your code, too. It's a clear win in all three > places. > Yes, you're right, this scheme was acceptable when the caller was bdrv_aio_*(), but now it just confuses. Thanks for the explanation! > Kevin >
diff --git a/block/io.c b/block/io.c index fdf7080..099bddb 100644 --- a/block/io.c +++ b/block/io.c @@ -1991,6 +1991,22 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos) /**************************************************************/ /* async I/Os */ +BlockAIOCB *bdrv_aio_preadv(BdrvChild *child, int64_t offset, + QEMUIOVector *qiov, unsigned int bytes, + BlockCompletionFunc *cb, void *opaque) +{ + assert(bytes == qiov->size); + return bdrv_co_aio_prw_vector(child, offset, qiov, 0, cb, opaque, false); +} + +BlockAIOCB *bdrv_aio_pwritev(BdrvChild *child, int64_t offset, + QEMUIOVector *qiov, unsigned int bytes, + BlockCompletionFunc *cb, void *opaque) +{ + assert(bytes == qiov->size); + return bdrv_co_aio_prw_vector(child, offset, qiov, 0, cb, opaque, true); +} + BlockAIOCB *bdrv_aio_readv(BdrvChild *child, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockCompletionFunc *cb, void *opaque) diff --git a/include/block/block.h b/include/block/block.h index e18233a..6728219 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -305,6 +305,13 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, const char *node_name, Error **errp); /* async block I/O */ + +BlockAIOCB *bdrv_aio_preadv(BdrvChild *child, int64_t offset, + QEMUIOVector *qiov, unsigned int bytes, + BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *bdrv_aio_pwritev(BdrvChild *child, int64_t offset, + QEMUIOVector *qiov, unsigned int bytes, + BlockCompletionFunc *cb, void *opaque); BlockAIOCB *bdrv_aio_readv(BdrvChild *child, int64_t sector_num, QEMUIOVector *iov, int nb_sectors, BlockCompletionFunc *cb, void *opaque);
It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide a byte-based interface for AIO read/write. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> --- block/io.c | 16 ++++++++++++++++ include/block/block.h | 7 +++++++ 2 files changed, 23 insertions(+)