diff mbox series

[v11,05/13] copy-on-read: limit COR operations to base in COR driver

Message ID 1602524605-481160-6-git-send-email-andrey.shinkevich@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series Apply COR-filter to the block-stream permanently | expand

Commit Message

Zhijian Li (Fujitsu)" via Oct. 12, 2020, 5:43 p.m. UTC
Limit COR operations by the base node in the backing chain when the
overlay base node name is given. It will be useful for a block stream
job when the COR-filter is applied. The overlay base node is passed as
the base itself may change due to concurrent commit jobs on the same
backing chain.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

Comments

Max Reitz Oct. 14, 2020, 11:59 a.m. UTC | #1
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>                                             size_t qiov_offset,
>                                             int flags)
>  {
> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -                               flags | BDRV_REQ_COPY_ON_READ);
> +    int64_t n = 0;
> +    int64_t size = offset + bytes;
> +    int local_flags;
> +    int ret;
> +    BDRVStateCOR *state = bs->opaque;
> +
> +    if (!state->base_overlay) {
> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> +                                   flags | BDRV_REQ_COPY_ON_READ);
> +    }
> +
> +    while (offset < size) {
> +        local_flags = flags;
> +
> +        /* In case of failure, try to copy-on-read anyway */
> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +        if (!ret) {

In case of failure, a negative value is going to be returned, we won’t
go into this conditional block, and local_flags isn’t going to contain
BDRV_REQ_COPY_ON_READ.

So the idea of CORing in case of failure sounds sound to me, but it
doesn’t look like that’s done.

> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),

I think this should either be bdrv_backing_chain_next() or we must rule
out the possibility of bs->file->bs being a filter somewhere.  I think
I’d prefer the former.

> +                                          state->base_overlay, true, offset,
> +                                          n, &n);
> +            if (ret) {

“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
|| ret < 0” probably needed above), but correct either way.

Max
Max Reitz Oct. 14, 2020, 12:01 p.m. UTC | #2
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>                                             size_t qiov_offset,
>                                             int flags)
>  {
> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -                               flags | BDRV_REQ_COPY_ON_READ);
> +    int64_t n = 0;
> +    int64_t size = offset + bytes;

Just when I hit send I noticed that “end” would be a more fitting name
for this variable.

> +    int local_flags;
> +    int ret;
> +    BDRVStateCOR *state = bs->opaque;
> +
> +    if (!state->base_overlay) {
> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> +                                   flags | BDRV_REQ_COPY_ON_READ);
> +    }
> +
> +    while (offset < size) {

(because I got a bit confused looking at this)

(Though dropping @size (or @end) and just checking when @bytes becomes 0
should work, too.)

> +        local_flags = flags;
> +
> +        /* In case of failure, try to copy-on-read anyway */
> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +        if (!ret) {
> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> +                                          state->base_overlay, true, offset,
> +                                          n, &n);
> +            if (ret) {
> +                local_flags |= BDRV_REQ_COPY_ON_READ;
> +            }
> +        }

Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
 (I’m not sure.)

Max

> +
> +        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +                                  local_flags);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        offset += n;
> +        qiov_offset += n;
> +        bytes -= n;
> +    }
> +
> +    return 0;
>  }
>  
>  
>
Andrey Shinkevich Oct. 14, 2020, 5:43 p.m. UTC | #3
On 14.10.2020 14:59, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Limit COR operations by the base node in the backing chain when the
>> overlay base node name is given. It will be useful for a block stream
>> job when the COR-filter is applied. The overlay base node is passed as
>> the base itself may change due to concurrent commit jobs on the same
>> backing chain.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index c578b1b..dfbd6ad 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>                                              size_t qiov_offset,
>>                                              int flags)
>>   {
>> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
>> -                               flags | BDRV_REQ_COPY_ON_READ);
>> +    int64_t n = 0;
>> +    int64_t size = offset + bytes;
>> +    int local_flags;
>> +    int ret;
>> +    BDRVStateCOR *state = bs->opaque;
>> +
>> +    if (!state->base_overlay) {
>> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
>> +                                   flags | BDRV_REQ_COPY_ON_READ);
>> +    }
>> +
>> +    while (offset < size) {
>> +        local_flags = flags;
>> +
>> +        /* In case of failure, try to copy-on-read anyway */
>> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
>> +        if (!ret) {
> 
> In case of failure, a negative value is going to be returned, we won’t
> go into this conditional block, and local_flags isn’t going to contain
> BDRV_REQ_COPY_ON_READ.
> 
> So the idea of CORing in case of failure sounds sound to me, but it
> doesn’t look like that’s done.
> 

Yes, it's obvious. That was just my fault to miss setting the additional 
condition for "ret < 0". Thank you for noticing that.

Andrey

>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> 
> I think this should either be bdrv_backing_chain_next() or we must rule
> out the possibility of bs->file->bs being a filter somewhere.  I think
> I’d prefer the former.
> 
>> +                                          state->base_overlay, true, offset,
>> +                                          n, &n);
>> +            if (ret) {
> 
> “ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
> || ret < 0” probably needed above), but correct either way.
> 
> Max
>
Andrey Shinkevich Oct. 14, 2020, 6:57 p.m. UTC | #4
On 14.10.2020 15:01, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Limit COR operations by the base node in the backing chain when the
>> overlay base node name is given. It will be useful for a block stream
>> job when the COR-filter is applied. The overlay base node is passed as
>> the base itself may change due to concurrent commit jobs on the same
>> backing chain.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index c578b1b..dfbd6ad 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>                                              size_t qiov_offset,
>>                                              int flags)
>>   {

[...]

>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>> +                                          state->base_overlay, true, offset,
>> +                                          n, &n);
>> +            if (ret) {
>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>> +            }
>> +        }
> 
> Furthermore, I just noticed – can the is_allocated functions not return
> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>   (I’m not sure.)
> 
> Max
> 

The check for EOF is managed earlier in the stream_run() for a 
block-stream job. For other cases of using the COR-filter, the check for 
EOF can be added to the cor_co_preadv_part().
I would be more than happy if we can escape the duplicated checking for 
is_allocated in the block-stream. But how the stream_run() can stop 
calling the blk_co_preadv() when EOF is reached if is_allocated removed 
from it? May the cor_co_preadv_part() return EOF (or other error code) 
to be handled by a caller if (ret == 0 && n == 0 && (flags & 
BDRV_REQ_PREFETCH)?

Andrey
Max Reitz Oct. 15, 2020, 3:56 p.m. UTC | #5
On 14.10.20 20:57, Andrey Shinkevich wrote:
> On 14.10.2020 15:01, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Limit COR operations by the base node in the backing chain when the
>>> overlay base node name is given. It will be useful for a block stream
>>> job when the COR-filter is applied. The overlay base node is passed as
>>> the base itself may change due to concurrent commit jobs on the same
>>> backing chain.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index c578b1b..dfbd6ad 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>                                              size_t qiov_offset,
>>>                                              int flags)
>>>   {
> 
> [...]
> 
>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>> +                                          state->base_overlay, true,
>>> offset,
>>> +                                          n, &n);
>>> +            if (ret) {
>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>> +            }
>>> +        }
>>
>> Furthermore, I just noticed – can the is_allocated functions not return
>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>   (I’m not sure.)
>>
>> Max
>>
> 
> The check for EOF is managed earlier in the stream_run() for a
> block-stream job. For other cases of using the COR-filter, the check for
> EOF can be added to the cor_co_preadv_part().
> I would be more than happy if we can escape the duplicated checking for
> is_allocated in the block-stream. But how the stream_run() can stop
> calling the blk_co_preadv() when EOF is reached if is_allocated removed
> from it?

True.  Is it that bad to lose that optimization, though?  (And I would
expect the case of a short backing file to be rather rare, too.)

> May the cor_co_preadv_part() return EOF (or other error code)
> to be handled by a caller if (ret == 0 && n == 0 && (flags &
> BDRV_REQ_PREFETCH)?

That sounds like a bad hack.  I’d rather keep the double is_allocated().

But what would be the problem with losing the short backing file
optimization?  Just performance?  Or would we end up writing actual
zeroes into the overlay past the end of the backing file?  Hm, probably
not, if the COR filter would detect that case and handle it like stream
does.

So it seems only a question of performance to me, and I don’t think it
would be that bad to in this rather rare case to have a bunch of useless
is_allocated and is_allocated_above calls past the backing file’s EOF.
(Maybe I’m wrong, though.)

Max
Andrey Shinkevich Oct. 15, 2020, 5:37 p.m. UTC | #6
On 15.10.2020 18:56, Max Reitz wrote:
> On 14.10.20 20:57, Andrey Shinkevich wrote:
>> On 14.10.2020 15:01, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>> Limit COR operations by the base node in the backing chain when the
>>>> overlay base node name is given. It will be useful for a block stream
>>>> job when the COR-filter is applied. The overlay base node is passed as
>>>> the base itself may change due to concurrent commit jobs on the same
>>>> backing chain.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>    block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index c578b1b..dfbd6ad 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>                                               size_t qiov_offset,
>>>>                                               int flags)
>>>>    {
>>
>> [...]
>>
>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>>> +                                          state->base_overlay, true,
>>>> offset,
>>>> +                                          n, &n);
>>>> +            if (ret) {
>>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>>> +            }
>>>> +        }
>>>
>>> Furthermore, I just noticed – can the is_allocated functions not return
>>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>>    (I’m not sure.)
>>>
>>> Max
>>>
>>
>> The check for EOF is managed earlier in the stream_run() for a
>> block-stream job. For other cases of using the COR-filter, the check for
>> EOF can be added to the cor_co_preadv_part().
>> I would be more than happy if we can escape the duplicated checking for
>> is_allocated in the block-stream. But how the stream_run() can stop
>> calling the blk_co_preadv() when EOF is reached if is_allocated removed
>> from it?
> 
> True.  Is it that bad to lose that optimization, though?  (And I would
> expect the case of a short backing file to be rather rare, too.)
> 
>> May the cor_co_preadv_part() return EOF (or other error code)
>> to be handled by a caller if (ret == 0 && n == 0 && (flags &
>> BDRV_REQ_PREFETCH)?
> 
> That sounds like a bad hack.  I’d rather keep the double is_allocated().
> 
> But what would be the problem with losing the short backing file
> optimization?  Just performance?  Or would we end up writing actual
> zeroes into the overlay past the end of the backing file?  Hm, probably
> not, if the COR filter would detect that case and handle it like stream
> does.
> 
> So it seems only a question of performance to me, and I don’t think it
> would be that bad to in this rather rare case to have a bunch of useless
> is_allocated and is_allocated_above calls past the backing file’s EOF.
> (Maybe I’m wrong, though.)
> 
> Max
> 

Thank you, Max, for sharing your thoughts on this subject.
The double check for the is_allocated in the stream_run() is a 
performance degradation also.
And we will make a check for the EOF in the cor_co_preadv_part() in 
either case, won't we?

Andrey
Vladimir Sementsov-Ogievskiy Oct. 16, 2020, 2:28 p.m. UTC | #7
15.10.2020 20:37, Andrey Shinkevich wrote:
> On 15.10.2020 18:56, Max Reitz wrote:
>> On 14.10.20 20:57, Andrey Shinkevich wrote:
>>> On 14.10.2020 15:01, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> Limit COR operations by the base node in the backing chain when the
>>>>> overlay base node name is given. It will be useful for a block stream
>>>>> job when the COR-filter is applied. The overlay base node is passed as
>>>>> the base itself may change due to concurrent commit jobs on the same
>>>>> backing chain.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
>>>>>    block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>>    1 file changed, 37 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>>> index c578b1b..dfbd6ad 100644
>>>>> --- a/block/copy-on-read.c
>>>>> +++ b/block/copy-on-read.c
>>>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>>                                               size_t qiov_offset,
>>>>>                                               int flags)
>>>>>    {
>>>
>>> [...]
>>>
>>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>>>> +                                          state->base_overlay, true,
>>>>> offset,
>>>>> +                                          n, &n);
>>>>> +            if (ret) {
>>>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>>>> +            }
>>>>> +        }
>>>>
>>>> Furthermore, I just noticed – can the is_allocated functions not return
>>>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>>>    (I’m not sure.)
>>>>
>>>> Max
>>>>
>>>
>>> The check for EOF is managed earlier in the stream_run() for a
>>> block-stream job. For other cases of using the COR-filter, the check for
>>> EOF can be added to the cor_co_preadv_part().
>>> I would be more than happy if we can escape the duplicated checking for
>>> is_allocated in the block-stream. But how the stream_run() can stop
>>> calling the blk_co_preadv() when EOF is reached if is_allocated removed
>>> from it?
>>
>> True.  Is it that bad to lose that optimization, though?  (And I would
>> expect the case of a short backing file to be rather rare, too.)
>>
>>> May the cor_co_preadv_part() return EOF (or other error code)
>>> to be handled by a caller if (ret == 0 && n == 0 && (flags &
>>> BDRV_REQ_PREFETCH)?
>>
>> That sounds like a bad hack.  I’d rather keep the double is_allocated().
>>
>> But what would be the problem with losing the short backing file
>> optimization?  Just performance?  Or would we end up writing actual
>> zeroes into the overlay past the end of the backing file?  Hm, probably
>> not, if the COR filter would detect that case and handle it like stream
>> does.
>>
>> So it seems only a question of performance to me, and I don’t think it
>> would be that bad to in this rather rare case to have a bunch of useless
>> is_allocated and is_allocated_above calls past the backing file’s EOF.
>> (Maybe I’m wrong, though.)
>>
>> Max
>>
> 
> Thank you, Max, for sharing your thoughts on this subject.
> The double check for the is_allocated in the stream_run() is a performance degradation also.
> And we will make a check for the EOF in the cor_co_preadv_part() in either case, won't we?
> 
> Andrey


I'd keep is_allocated logic in block-stream as is for now. It's not good that we check block-status several times (in block-stream, than in cor filter, than in generic COR code), but it shouldn't be real problem, and we can postpone optimizations for the next step.

Also, the resulting architecture is not final. I believe that in bright future block-stream will work through block-copy like backup does. And COR filter will call block_copy() by itself, and generic COR code will be dropped together with BDRV_REQ_COR flag. And stream will do just one background call of block_copy for the whole device, like backup in finish on my in-flight backup series. And all extra levels of block_status checking will leave.

About EOF problem discussed here, let's look at more generic problem: we are going to skip _large_ area, but skipping chunk-by-chunk is inefficient. So, we just want to learn to skip large areas. The simplest way is just to call is_allocated/is_allocated_above
from current offset to device end, if we decided to skip current chunk. Then we'll know how much to skip totally. But that kind of optimization is not directly related to these series and may be done in separate if needed.
diff mbox series

Patch

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index c578b1b..dfbd6ad 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -122,8 +122,43 @@  static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
                                            size_t qiov_offset,
                                            int flags)
 {
-    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
-                               flags | BDRV_REQ_COPY_ON_READ);
+    int64_t n = 0;
+    int64_t size = offset + bytes;
+    int local_flags;
+    int ret;
+    BDRVStateCOR *state = bs->opaque;
+
+    if (!state->base_overlay) {
+        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+                                   flags | BDRV_REQ_COPY_ON_READ);
+    }
+
+    while (offset < size) {
+        local_flags = flags;
+
+        /* In case of failure, try to copy-on-read anyway */
+        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
+        if (!ret) {
+            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
+                                          state->base_overlay, true, offset,
+                                          n, &n);
+            if (ret) {
+                local_flags |= BDRV_REQ_COPY_ON_READ;
+            }
+        }
+
+        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+                                  local_flags);
+        if (ret < 0) {
+            return ret;
+        }
+
+        offset += n;
+        qiov_offset += n;
+        bytes -= n;
+    }
+
+    return 0;
 }