mbox series

[v14,0/1] qcow2: cluster space preallocation

Message ID 20190516142749.81019-1-anton.nefedov@virtuozzo.com (mailing list archive)
Headers show
Series qcow2: cluster space preallocation | expand

Message

Anton Nefedov May 16, 2019, 2:27 p.m. UTC
..apparently v13 ended up in a weird base64 that would not easily git-am.
Resending.

----

hi,

this used to be a large 10-patches series, but now thanks to Kevin's
BDRV_REQ_NO_FALLBACK series
(http://lists.nongnu.org/archive/html/qemu-devel/2019-03/msg06389.html)
the core block layer functionality is already in place: the existing flag
can be reused.

A few accompanying patches were also dropped from the series as those can
be processed separately.

So,

new in v13:
   - patch 1 (was patch 9) rebased to use s->data_file pointer
   - comments fixed/added
   - other patches dropped ;)

v12: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg02761.html
v11: http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg04342.html
v10: http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00121.html

----

This pull request is to start to improve a few performance points of
qcow2 format:

  1. non cluster-aligned write requests (to unallocated clusters) explicitly
     pad data with zeroes if there is no backing data.
     Resulting increase in ops number and potential cluster fragmentation
     (on the host file) is already solved by:
       ee22a9d qcow2: Merge the writing of the COW regions with the guest data
     However, in case of zero COW regions, that can be avoided at all
     but the whole clusters are preallocated and zeroed in a single
     efficient write_zeroes() operation

  2. moreover, efficient write_zeroes() operation can be used to preallocate
     space megabytes (*configurable number) ahead which gives noticeable
     improvement on some storage types (e.g. distributed storage)
     where the space allocation operation might be expensive)
     (Not included in this patchset since v6).

  3. this will also allow to enable simultaneous writes to the same unallocated
     cluster after the space has been allocated & zeroed but before
     the first data is written and the cluster is linked to L2.
     (Not included in this patchset).

Efficient write_zeroes usually implies that the blocks are not actually
written to but only reserved and marked as zeroed by the storage.
Existing bdrv_write_zeroes() falls back to writing zero buffers if
write_zeroes is not supported by the driver, so BDRV_REQ_NO_FALLBACK is used.

simple perf test:

  qemu-img create -f qcow2 test.img 4G && \
  qemu-img bench -c $((1024*1024)) -f qcow2 -n -s 4k -t none -w test.img

test results (seconds):

    +-----------+-------+------+-------+------+------+
    |   file    |    before    |     after    | gain |
    +-----------+-------+------+-------+------+------+
    |    ssd    |      61.153  |      36.313  |  41% |
    |    hdd    |     112.676  |     122.056  |  -8% |
    +-----------+--------------+--------------+------+

Anton Nefedov (1):
  qcow2: skip writing zero buffers to empty COW areas

 qapi/block-core.json       |  4 +-
 block/qcow2.h              |  6 +++
 block/qcow2-cluster.c      |  2 +-
 block/qcow2.c              | 93 +++++++++++++++++++++++++++++++++++++-
 block/trace-events         |  1 +
 tests/qemu-iotests/060     |  7 ++-
 tests/qemu-iotests/060.out |  5 +-
 7 files changed, 112 insertions(+), 6 deletions(-)

Comments

Max Reitz May 24, 2019, 1:56 p.m. UTC | #1
On 16.05.19 16:27, Anton Nefedov wrote:
> ..apparently v13 ended up in a weird base64 that would not easily git-am.
> Resending.
> 
> ----
> 
> hi,
> 
> this used to be a large 10-patches series, but now thanks to Kevin's
> BDRV_REQ_NO_FALLBACK series
> (http://lists.nongnu.org/archive/html/qemu-devel/2019-03/msg06389.html)
> the core block layer functionality is already in place: the existing flag
> can be reused.
> 
> A few accompanying patches were also dropped from the series as those can
> be processed separately.
> 
> So,
> 
> new in v13:
>    - patch 1 (was patch 9) rebased to use s->data_file pointer
>    - comments fixed/added
>    - other patches dropped ;)
> 
> v12: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg02761.html
> v11: http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg04342.html
> v10: http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00121.html
> 
> ----
> 
> This pull request is to start to improve a few performance points of
> qcow2 format:
> 
>   1. non cluster-aligned write requests (to unallocated clusters) explicitly
>      pad data with zeroes if there is no backing data.
>      Resulting increase in ops number and potential cluster fragmentation
>      (on the host file) is already solved by:
>        ee22a9d qcow2: Merge the writing of the COW regions with the guest data
>      However, in case of zero COW regions, that can be avoided at all
>      but the whole clusters are preallocated and zeroed in a single
>      efficient write_zeroes() operation
> 
>   2. moreover, efficient write_zeroes() operation can be used to preallocate
>      space megabytes (*configurable number) ahead which gives noticeable
>      improvement on some storage types (e.g. distributed storage)
>      where the space allocation operation might be expensive)
>      (Not included in this patchset since v6).
> 
>   3. this will also allow to enable simultaneous writes to the same unallocated
>      cluster after the space has been allocated & zeroed but before
>      the first data is written and the cluster is linked to L2.
>      (Not included in this patchset).
> 
> Efficient write_zeroes usually implies that the blocks are not actually
> written to but only reserved and marked as zeroed by the storage.
> Existing bdrv_write_zeroes() falls back to writing zero buffers if
> write_zeroes is not supported by the driver, so BDRV_REQ_NO_FALLBACK is used.
> 
> simple perf test:
> 
>   qemu-img create -f qcow2 test.img 4G && \
>   qemu-img bench -c $((1024*1024)) -f qcow2 -n -s 4k -t none -w test.img
> 
> test results (seconds):
> 
>     +-----------+-------+------+-------+------+------+
>     |   file    |    before    |     after    | gain |
>     +-----------+-------+------+-------+------+------+
>     |    ssd    |      61.153  |      36.313  |  41% |
>     |    hdd    |     112.676  |     122.056  |  -8% |
>     +-----------+--------------+--------------+------+

I’ve done a few more tests, and I’ve seen more slowdown on an HDD.
(Like 30 % when doing 64 kB requests that are not aligned to clusters.)
 On the other hand, the SSD gain is generally in the same ballpark (38 %
when issuing the same kind of requests.)

On the basis that:

(1) I believe that SSD performance is more important than HDD performance,

(2) I can’t think of a simple way to automatically decide whether the
new or the old codepath is more efficient[1], and

(3) users probably would not use a switch if we introduced one.

I have applied this patch to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Thanks!


[1] Hm.  We can probably investigate whether the file is stored on a
rotational medium or not.  Is there a fundamental reason why this patch
seems to degrade performance on an HDD but improves it on an SSD?  If
so, we can probably make a choice based on that.

Max
Alberto Garcia May 26, 2019, 3:01 p.m. UTC | #2
On Fri 24 May 2019 03:56:21 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>>     +-----------+-------+------+-------+------+------+
>>     |   file    |    before    |     after    | gain |
>>     +-----------+-------+------+-------+------+------+
>>     |    ssd    |      61.153  |      36.313  |  41% |
>>     |    hdd    |     112.676  |     122.056  |  -8% |
>>     +-----------+--------------+--------------+------+
>
> I’ve done a few more tests, and I’ve seen more slowdown on an HDD.
> (Like 30 % when doing 64 kB requests that are not aligned to
> clusters.)  On the other hand, the SSD gain is generally in the same
> ballpark (38 % when issuing the same kind of requests.)
  [...]
> [1] Hm.  We can probably investigate whether the file is stored on a
> rotational medium or not.  Is there a fundamental reason why this
> patch seems to degrade performance on an HDD but improves it on an
> SSD?  If so, we can probably make a choice based on that.

This is when writing to an unallocated cluster with no existing data on
the backing image, right? Then it's probably because you need 2
operations (write zeros + write data) instead of just one.

Berto
Max Reitz May 27, 2019, 12:14 p.m. UTC | #3
On 26.05.19 17:01, Alberto Garcia wrote:
> On Fri 24 May 2019 03:56:21 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>>>     +-----------+-------+------+-------+------+------+
>>>     |   file    |    before    |     after    | gain |
>>>     +-----------+-------+------+-------+------+------+
>>>     |    ssd    |      61.153  |      36.313  |  41% |
>>>     |    hdd    |     112.676  |     122.056  |  -8% |
>>>     +-----------+--------------+--------------+------+
>>
>> I’ve done a few more tests, and I’ve seen more slowdown on an HDD.
>> (Like 30 % when doing 64 kB requests that are not aligned to
>> clusters.)  On the other hand, the SSD gain is generally in the same
>> ballpark (38 % when issuing the same kind of requests.)
>   [...]
>> [1] Hm.  We can probably investigate whether the file is stored on a
>> rotational medium or not.  Is there a fundamental reason why this
>> patch seems to degrade performance on an HDD but improves it on an
>> SSD?  If so, we can probably make a choice based on that.
> 
> This is when writing to an unallocated cluster with no existing data on
> the backing image, right? Then it's probably because you need 2
> operations (write zeros + write data) instead of just one.

Hm, yes.  I didn’t test writing tail and head separately, which should
be even worse.

Max