diff mbox

[v1,01/18] block/io: add bdrv_aio_{preadv, pwritev}

Message ID 20161115063715.12561-2-pbutsykin@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Butsykin Nov. 15, 2016, 6:36 a.m. UTC
It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
 a byte-based interface for AIO read/write.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
---
 block/io.c            | 16 ++++++++++++++++
 include/block/block.h |  7 +++++++
 2 files changed, 23 insertions(+)

Comments

Kevin Wolf Nov. 23, 2016, 2:28 p.m. UTC | #1
Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:
> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
>  a byte-based interface for AIO read/write.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>

I'm in the process to phase out the last users of bdrv_aio_*() so that
this set of interfaces can be removed. I'm doing this because it's an
unnecessary redundancy, we have too many wrapper functions that expose
the same functionality with different syntax. So let's not add new
users.

At first sight, you don't even seem to use bdrv_aio_preadv() for actual
parallelism, but you often have a pattern like this:

    void foo_cb(void *opaque)
    {
        ...
        qemu_coroutine_enter(acb->co);
    }

    void caller()
    {
        ...
        acb = bdrv_aio_preadv(...);
        qemu_coroutine_yield();
    }

The code will actually become a lot simpler if you use bdrv_co_preadv()
instead because you don't have to have a callback, but you get pure
sequential code.

The part that actually has some parallelism, pcache_readahead_request(),
already creates its own coroutine, so it runs in the background without
using callback-style interfaces.

Kevin
Pavel Butsykin Nov. 24, 2016, 10:58 a.m. UTC | #2
On 23.11.2016 17:28, Kevin Wolf wrote:
> Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:
>> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
>>   a byte-based interface for AIO read/write.
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>
> I'm in the process to phase out the last users of bdrv_aio_*() so that
> this set of interfaces can be removed. I'm doing this because it's an
> unnecessary redundancy, we have too many wrapper functions that expose
> the same functionality with different syntax. So let's not add new
> users.
>
> At first sight, you don't even seem to use bdrv_aio_preadv() for actual
> parallelism, but you often have a pattern like this:
>
>      void foo_cb(void *opaque)
>      {
>          ...
>          qemu_coroutine_enter(acb->co);
>      }
>
>      void caller()
>      {
>          ...
>          acb = bdrv_aio_preadv(...);
>          qemu_coroutine_yield();
>      }
>
> The code will actually become a lot simpler if you use bdrv_co_preadv()
> instead because you don't have to have a callback, but you get pure
> sequential code.
>
> The part that actually has some parallelism, pcache_readahead_request(),
> already creates its own coroutine, so it runs in the background without
> using callback-style interfaces.

I used bdrv_co_preadv(), because it conveniently solves the partial
cache hit. To solve the partial cache hit, we need to split a request
into smaller parts, make asynchronous requests and wait for all
requests in one place.

Do you propose to create a coroutine for each part of request? It
seemed to me that bdrv_co_preadv() is a wrapper that allows us to get
rid of the same code.

> Kevin
>
Kevin Wolf Nov. 24, 2016, 12:36 p.m. UTC | #3
Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben:
> On 23.11.2016 17:28, Kevin Wolf wrote:
> >Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:
> >>It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
> >>  a byte-based interface for AIO read/write.
> >>
> >>Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> >
> >I'm in the process to phase out the last users of bdrv_aio_*() so that
> >this set of interfaces can be removed. I'm doing this because it's an
> >unnecessary redundancy, we have too many wrapper functions that expose
> >the same functionality with different syntax. So let's not add new
> >users.
> >
> >At first sight, you don't even seem to use bdrv_aio_preadv() for actual
> >parallelism, but you often have a pattern like this:
> >
> >     void foo_cb(void *opaque)
> >     {
> >         ...
> >         qemu_coroutine_enter(acb->co);
> >     }
> >
> >     void caller()
> >     {
> >         ...
> >         acb = bdrv_aio_preadv(...);
> >         qemu_coroutine_yield();
> >     }
> >
> >The code will actually become a lot simpler if you use bdrv_co_preadv()
> >instead because you don't have to have a callback, but you get pure
> >sequential code.
> >
> >The part that actually has some parallelism, pcache_readahead_request(),
> >already creates its own coroutine, so it runs in the background without
> >using callback-style interfaces.
> 
> I used bdrv_co_preadv(), because it conveniently solves the partial
> cache hit. To solve the partial cache hit, we need to split a request
> into smaller parts, make asynchronous requests and wait for all
> requests in one place.
> 
> Do you propose to create a coroutine for each part of request? It
> seemed to me that bdrv_co_preadv() is a wrapper that allows us to get
> rid of the same code.

It's actually the other way round, bdrv_co_preadv() is the "native"
block layer API, and bdrv_aio_*() are wrappers providing an alternative
interface.


I'm looking at pcache_co_readahead(), for example. It looks like this:

    bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov,
                    node->common.bytes, pcache_aio_readahead_cb,
                    &readahead_acb);
    qemu_coroutine_yield();

And then we have pcache_aio_readahead_cb(), which ends in:

    qemu_coroutine_enter(acb->co);

So here the callback style doesn't buy you anything, it just rips the
code apart in two function. There is no parallelism here anyway,
pcache_co_readahead() doesn't do anything until the callback reenters
it. This is a very obvious example where bdrv_co_preadv() will simplify
the code.


It's similar with the other bdrv_aio_preadv() calls, which are in
pcache_co_preadv():

        if (bytes > s->max_aio_size) {
            bdrv_aio_preadv(bs->file, offset, qiov, bytes,
                            pcache_aio_read_cb, &acb);
            goto out;
        }

        update_req_stats(s->req_stats, offset, bytes);

        status = pcache_lookup_data(&acb);
        if (status == CACHE_MISS) {
            bdrv_aio_preadv(bs->file, offset, qiov, bytes,
                            pcache_aio_read_cb, &acb);
        } else if (status == PARTIAL_CACHE_HIT) {
            assert(acb.part.qiov.niov != 0);
            bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov,
                            acb.part.bytes, pcache_aio_read_cb, &acb);
        }

        pcache_readahead_request(&acb);

        if (status == CACHE_HIT && --acb.ref == 0) {
            return 0;
        }

    out:
        qemu_coroutine_yield();

Here you have mainly the pcache_readahead_request() call between
bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which
works in the background, so I think you can move it to before the reads
and then the reads can trivially become bdrv_co_preadv() and the
callback can again be inlined instead of ripping the function in two
parts.


The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same
thing and using the coroutine version results in obvious code
improvements.


And I think this are all uses of bdrv_aio_*() in the pcache driver, so
converting it to use bdrv_co_*() instead isn't only possible, but will
improve the legibility of your code, too. It's a clear win in all three
places.

Kevin
Pavel Butsykin Nov. 24, 2016, 3:10 p.m. UTC | #4
On 24.11.2016 15:36, Kevin Wolf wrote:
> Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben:
>> On 23.11.2016 17:28, Kevin Wolf wrote:
>>> Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:
>>>> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
>>>>   a byte-based interface for AIO read/write.
>>>>
>>>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>>>
>>> I'm in the process to phase out the last users of bdrv_aio_*() so that
>>> this set of interfaces can be removed. I'm doing this because it's an
>>> unnecessary redundancy, we have too many wrapper functions that expose
>>> the same functionality with different syntax. So let's not add new
>>> users.
>>>
>>> At first sight, you don't even seem to use bdrv_aio_preadv() for actual
>>> parallelism, but you often have a pattern like this:
>>>
>>>      void foo_cb(void *opaque)
>>>      {
>>>          ...
>>>          qemu_coroutine_enter(acb->co);
>>>      }
>>>
>>>      void caller()
>>>      {
>>>          ...
>>>          acb = bdrv_aio_preadv(...);
>>>          qemu_coroutine_yield();
>>>      }
>>>
>>> The code will actually become a lot simpler if you use bdrv_co_preadv()
>>> instead because you don't have to have a callback, but you get pure
>>> sequential code.
>>>
>>> The part that actually has some parallelism, pcache_readahead_request(),
>>> already creates its own coroutine, so it runs in the background without
>>> using callback-style interfaces.
>>
>> I used bdrv_co_preadv(), because it conveniently solves the partial
>> cache hit. To solve the partial cache hit, we need to split a request
>> into smaller parts, make asynchronous requests and wait for all
>> requests in one place.
>>
>> Do you propose to create a coroutine for each part of request? It
>> seemed to me that bdrv_co_preadv() is a wrapper that allows us to get
>> rid of the same code.
>
> It's actually the other way round, bdrv_co_preadv() is the "native"
> block layer API, and bdrv_aio_*() are wrappers providing an alternative
> interface.
>

Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :)

Also I forgot that in this version, pcache can't split one request into 
many small requests(as in RFC version).

> I'm looking at pcache_co_readahead(), for example. It looks like this:
>
>      bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov,
>                      node->common.bytes, pcache_aio_readahead_cb,
>                      &readahead_acb);
>      qemu_coroutine_yield();
>
> And then we have pcache_aio_readahead_cb(), which ends in:
>
>      qemu_coroutine_enter(acb->co);
>
> So here the callback style doesn't buy you anything, it just rips the
> code apart in two function. There is no parallelism here anyway,
> pcache_co_readahead() doesn't do anything until the callback reenters
> it. This is a very obvious example where bdrv_co_preadv() will simplify
> the code.
>

I agree.

> It's similar with the other bdrv_aio_preadv() calls, which are in
> pcache_co_preadv():
>
>          if (bytes > s->max_aio_size) {
>              bdrv_aio_preadv(bs->file, offset, qiov, bytes,
>                              pcache_aio_read_cb, &acb);
>              goto out;
>          }
>
>          update_req_stats(s->req_stats, offset, bytes);
>
>          status = pcache_lookup_data(&acb);
>          if (status == CACHE_MISS) {
>              bdrv_aio_preadv(bs->file, offset, qiov, bytes,
>                              pcache_aio_read_cb, &acb);
>          } else if (status == PARTIAL_CACHE_HIT) {
>              assert(acb.part.qiov.niov != 0);
>              bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov,
>                              acb.part.bytes, pcache_aio_read_cb, &acb);
>          }
>
>          pcache_readahead_request(&acb);
>
>          if (status == CACHE_HIT && --acb.ref == 0) {
>              return 0;
>          }
>
>      out:
>          qemu_coroutine_yield();
>
> Here you have mainly the pcache_readahead_request() call between
> bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which
> works in the background, so I think you can move it to before the reads
> and then the reads can trivially become bdrv_co_preadv() and the
> callback can again be inlined instead of ripping the function in two
> parts.
>
>
> The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same
> thing and using the coroutine version results in obvious code
> improvements.
>
>
> And I think this are all uses of bdrv_aio_*() in the pcache driver, so
> converting it to use bdrv_co_*() instead isn't only possible, but will
> improve the legibility of your code, too. It's a clear win in all three
> places.
>

Yes, you're right, this scheme was acceptable when the caller was
bdrv_aio_*(), but now it just confuses. Thanks for the explanation!

> Kevin
>
diff mbox

Patch

diff --git a/block/io.c b/block/io.c
index fdf7080..099bddb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1991,6 +1991,22 @@  int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 /**************************************************************/
 /* async I/Os */
 
+BlockAIOCB *bdrv_aio_preadv(BdrvChild *child, int64_t offset,
+                            QEMUIOVector *qiov, unsigned int bytes,
+                            BlockCompletionFunc *cb, void *opaque)
+{
+    assert(bytes == qiov->size);
+    return bdrv_co_aio_prw_vector(child, offset, qiov, 0, cb, opaque, false);
+}
+
+BlockAIOCB *bdrv_aio_pwritev(BdrvChild *child, int64_t offset,
+                             QEMUIOVector *qiov, unsigned int bytes,
+                             BlockCompletionFunc *cb, void *opaque)
+{
+    assert(bytes == qiov->size);
+    return bdrv_co_aio_prw_vector(child, offset, qiov, 0, cb, opaque, true);
+}
+
 BlockAIOCB *bdrv_aio_readv(BdrvChild *child, int64_t sector_num,
                            QEMUIOVector *qiov, int nb_sectors,
                            BlockCompletionFunc *cb, void *opaque)
diff --git a/include/block/block.h b/include/block/block.h
index e18233a..6728219 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -305,6 +305,13 @@  BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp);
 
 /* async block I/O */
+
+BlockAIOCB *bdrv_aio_preadv(BdrvChild *child, int64_t offset,
+                            QEMUIOVector *qiov, unsigned int bytes,
+                            BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *bdrv_aio_pwritev(BdrvChild *child, int64_t offset,
+                             QEMUIOVector *qiov, unsigned int bytes,
+                             BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *bdrv_aio_readv(BdrvChild *child, int64_t sector_num,
                            QEMUIOVector *iov, int nb_sectors,
                            BlockCompletionFunc *cb, void *opaque);