diff mbox

[1/4] block/qcow2: add compression_algorithm create option

Message ID 560f62fb-f6fb-42ae-2fd9-b10c9c478350@kamp.de (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Lieven June 27, 2017, 2:49 p.m. UTC
Am 27.06.2017 um 14:49 schrieb Eric Blake:
> On 06/27/2017 07:34 AM, Peter Lieven wrote:
>> this patch adds a new compression_algorithm option when creating qcow2 images.
>> The current default for the compresison algorithm is zlib and zlib will be
> s/compresison/compression/
>
>> used when this option is omitted (like before).
>>
>> If the option is specified e.g. with:
>>
>>   qemu-img create -f qcow2 -o compression_algorithm=zlib image.qcow2 1G
>>
>> then a new compression algorithm header extension is added and an incompatible
>> feature bit is set. This means that if the header is present it must be parsed
>> by Qemu on qcow2_open and it must be validated if the specified compression
>> algorithm is supported by the current build of Qemu.
>>
>> This means if the compression_algorithm option is specified Qemu prior to this
>> commit will not be able to open the created image.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>   block/qcow2.c             | 93 ++++++++++++++++++++++++++++++++++++++++++++---
>>   block/qcow2.h             | 20 +++++++---
>>   docs/interop/qcow2.txt    |  8 +++-
> Focusing on just the spec change first:
>
>> +++ b/docs/interop/qcow2.txt
>> @@ -85,7 +85,11 @@ in the description of a field.
>>                                   be written to (unless for regaining
>>                                   consistency).
>>   
>> -                    Bits 2-63:  Reserved (set to 0)
>> +                    Bit 2:      Compress algorithm bit.  If this bit is set then
>> +                                the compress algorithm extension must be parsed
>> +                                and checked for compatiblity.
> s/compatiblity/compatibility/
>
>> +
>> +                    Bits 3-63:  Reserved (set to 0)
>>   
>>            80 -  87:  compatible_features
>>                       Bitmask of compatible features. An implementation can
>> @@ -135,6 +139,8 @@ be stored. Each extension has a structure like the following:
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>>                           0x23852875 - Bitmaps extension
>> +                        0xC0318300 - Compression Algorithm
>> +                        0xC03183xx - Reserved for compression algorithm params
> s/params/parameters/
>
> You have now introduced 256 different reserved headers, without
> documenting any of their formats.  You absolutely MUST include a
> documentation of how the new 0xC0318300 header is laid out (see, for
> example, our section on "Bitmaps extension"), along with text mentioning
> that the new header MUST be present if incompatible-feature bit is set
> and MUST be absent otherwise.  But I also think that with a bit of
> proper design work, you only need ONE header for all possible algorithm
> parameters, rather than burning an additional 255 unspecified
> reservations.  That is, make sure your new header includes a common
> prefix including a length field and the algorightm in use, and then the
> length covers a variable-length suffix that can be parsed in a
> per-algorithm-specific manner for whatever additional parameters are
> needed for that algorithm.
>

Before I continue, can you please give feedback on the following spec change:


Thanks,
Peter

Comments

Eric Blake June 27, 2017, 3:04 p.m. UTC | #1
On 06/27/2017 09:49 AM, Peter Lieven wrote:

>>
> 
> Before I continue, can you please give feedback on the following spec
> change:
> 
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index 80cdfd0..f1428e9 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -85,7 +85,11 @@ in the description of a field.
>                                  be written to (unless for regaining
>                                  consistency).
> 
> -                    Bits 2-63:  Reserved (set to 0)
> +                    Bit 2:      Compression format bit.  Iff this bit

I know what this means, but spelling it "If and only if" or "When" might
make more sense to other readers, as "Iff" is not common in English.

> is set then
> +                                the compression format extension MUST
> be present
> +                                and MUST be parsed and checked for
> compatibility.
> +
> +                    Bits 3-63:  Reserved (set to 0)
> 
>           80 -  87:  compatible_features
>                      Bitmask of compatible features. An implementation can
> @@ -135,6 +139,7 @@ be stored. Each extension has a structure like the
> following:
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
>                          0x23852875 - Bitmaps extension
> +                        0xC0318300 - Compression format extension

Now that you aren't burning 256 magic numbers, it may make sense to have
the last two hex digits be non-zero.


> +== Compression format extension ==
> +
> +The compression format extension is an optional header extension. It
> provides

Inline pasting created interesting wrapping, but the actual patch will
obviously read better.

> +the ability to specify the compression algorithm and compression
> parameters
> +that are used for compressed clusters. This new header MUST be present if
> +the incompatible-feature bit "compression format bit" is set and MUST
> be absent
> +otherwise.
> +
> +The fields of the compression format extension are:
> +
> +    Byte  0 - 15:  compression_format_name (padded with zeros, but not
> +                   necessarily null terminated if it has full length)

Do we really want arbitrary names of formats, or do we want to specify
specific algorithms (gzip, lzo, zstd) as an enum?  Which way gives us
maximum likelihood of interoperability?

> +
> +              16:  compression_level (uint8_t)
> +                   0 = default compression level
> +                   1 = lowest compression level
> +                   x = highest compression level (the highest compression
> +                       level may vary for different compression formats)
> +
> +         17 - 23:  Reserved for future use, must be zero.

Feels pretty limited - you don't have a length field for variable-length
extension of additional parameters, but have to fit all additions in the
next 8 bytes.  Yes, all extension headers are already paired with a
length parameter outside of the struct, sent alongside the header magic
number, but embedding a length directly in the header (while redundant)
makes it easier to keep information local to the header.  See
extra_data_size under Bitmap directory, for example.  Of course, we may
turn those 8 bytes INTO a length field, that then describe the rest of
the variable length parameters, but why not do it up front?

If we go with an enum mapping of supported compression formats, then you
can go into further details on exactly what extra parameters are
supports for each algorithm; while leaving it as a free-form text string
makes it harder to interpret what any additional payload will represent.
Peter Lieven June 27, 2017, 3:11 p.m. UTC | #2
Am 27.06.2017 um 17:04 schrieb Eric Blake:
> On 06/27/2017 09:49 AM, Peter Lieven wrote:
>
>> Before I continue, can you please give feedback on the following spec
>> change:
>>
>> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
>> index 80cdfd0..f1428e9 100644
>> --- a/docs/interop/qcow2.txt
>> +++ b/docs/interop/qcow2.txt
>> @@ -85,7 +85,11 @@ in the description of a field.
>>                                   be written to (unless for regaining
>>                                   consistency).
>>
>> -                    Bits 2-63:  Reserved (set to 0)
>> +                    Bit 2:      Compression format bit.  Iff this bit
> I know what this means, but spelling it "If and only if" or "When" might
> make more sense to other readers, as "Iff" is not common in English.
>
>> is set then
>> +                                the compression format extension MUST
>> be present
>> +                                and MUST be parsed and checked for
>> compatibility.
>> +
>> +                    Bits 3-63:  Reserved (set to 0)
>>
>>            80 -  87:  compatible_features
>>                       Bitmask of compatible features. An implementation can
>> @@ -135,6 +139,7 @@ be stored. Each extension has a structure like the
>> following:
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>>                           0x23852875 - Bitmaps extension
>> +                        0xC0318300 - Compression format extension
> Now that you aren't burning 256 magic numbers, it may make sense to have
> the last two hex digits be non-zero.
>
>
>> +== Compression format extension ==
>> +
>> +The compression format extension is an optional header extension. It
>> provides
> Inline pasting created interesting wrapping, but the actual patch will
> obviously read better.
>
>> +the ability to specify the compression algorithm and compression
>> parameters
>> +that are used for compressed clusters. This new header MUST be present if
>> +the incompatible-feature bit "compression format bit" is set and MUST
>> be absent
>> +otherwise.
>> +
>> +The fields of the compression format extension are:
>> +
>> +    Byte  0 - 15:  compression_format_name (padded with zeros, but not
>> +                   necessarily null terminated if it has full length)
> Do we really want arbitrary names of formats, or do we want to specify
> specific algorithms (gzip, lzo, zstd) as an enum?  Which way gives us
> maximum likelihood of interoperability?

At some point I have to parse the name and convert it into a number.
This could be the same routine for the parsing of the compress.format
parameter and the compression_format_name in the header field.
The idea was that an accidently change in the enum cannot break
anything as it is local to the qemu binary. But I am ok to have an enum
if the values for each format a fixed. In this case it might be sufficient
to have an uint8_t for the compression format + an uint8_t for the compression
level. Then 2 bytes reserved + an uint32_t for the extra data size. So currently (the extra
data is not necessary at the moment for any format) the header would
be limited to 8 bytes.

Anyway if we glue this into a QAPI scheme, I would appreciate to get
some help with it as I am not that familiar with it yet.

Thanks,
Peter
Peter Lieven June 27, 2017, 3:13 p.m. UTC | #3
Am 27.06.2017 um 17:11 schrieb Peter Lieven:
> Am 27.06.2017 um 17:04 schrieb Eric Blake:
>> On 06/27/2017 09:49 AM, Peter Lieven wrote:
>>
>>> Before I continue, can you please give feedback on the following spec
>>> change:
>>>
>>> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
>>> index 80cdfd0..f1428e9 100644
>>> --- a/docs/interop/qcow2.txt
>>> +++ b/docs/interop/qcow2.txt
>>> @@ -85,7 +85,11 @@ in the description of a field.
>>>                                   be written to (unless for regaining
>>>                                   consistency).
>>>
>>> -                    Bits 2-63:  Reserved (set to 0)
>>> +                    Bit 2:      Compression format bit.  Iff this bit
>> I know what this means, but spelling it "If and only if" or "When" might
>> make more sense to other readers, as "Iff" is not common in English.
>>
>>> is set then
>>> +                                the compression format extension MUST
>>> be present
>>> +                                and MUST be parsed and checked for
>>> compatibility.
>>> +
>>> +                    Bits 3-63:  Reserved (set to 0)
>>>
>>>            80 -  87:  compatible_features
>>>                       Bitmask of compatible features. An implementation can
>>> @@ -135,6 +139,7 @@ be stored. Each extension has a structure like the
>>> following:
>>>                           0xE2792ACA - Backing file format name
>>>                           0x6803f857 - Feature name table
>>>                           0x23852875 - Bitmaps extension
>>> +                        0xC0318300 - Compression format extension
>> Now that you aren't burning 256 magic numbers, it may make sense to have
>> the last two hex digits be non-zero.
>>
>>
>>> +== Compression format extension ==
>>> +
>>> +The compression format extension is an optional header extension. It
>>> provides
>> Inline pasting created interesting wrapping, but the actual patch will
>> obviously read better.
>>
>>> +the ability to specify the compression algorithm and compression
>>> parameters
>>> +that are used for compressed clusters. This new header MUST be present if
>>> +the incompatible-feature bit "compression format bit" is set and MUST
>>> be absent
>>> +otherwise.
>>> +
>>> +The fields of the compression format extension are:
>>> +
>>> +    Byte  0 - 15:  compression_format_name (padded with zeros, but not
>>> +                   necessarily null terminated if it has full length)
>> Do we really want arbitrary names of formats, or do we want to specify
>> specific algorithms (gzip, lzo, zstd) as an enum?  Which way gives us
>> maximum likelihood of interoperability?
>
> At some point I have to parse the name and convert it into a number.
> This could be the same routine for the parsing of the compress.format
> parameter and the compression_format_name in the header field.
> The idea was that an accidently change in the enum cannot break
> anything as it is local to the qemu binary. But I am ok to have an enum
> if the values for each format a fixed. In this case it might be sufficient
> to have an uint8_t for the compression format + an uint8_t for the compression
> level. Then 2 bytes reserved + an uint32_t for the extra data size. So currently (the extra
> data is not necessary at the moment for any format) the header would
> be limited to 8 bytes.
>
> Anyway if we glue this into a QAPI scheme, I would appreciate to get
> some help with it as I am not that familiar with it yet.

Forgot to mention, the idea to store the name here was to be able to display
an appropiate error message if a format is not supported (like compression format xyz is not supported).

Peter
Peter Lieven July 3, 2017, 7:45 p.m. UTC | #4
Am 27.06.2017 um 17:04 schrieb Eric Blake:
> On 06/27/2017 09:49 AM, Peter Lieven wrote:
>
>> Before I continue, can you please give feedback on the following spec
>> change:
>>
>> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
>> index 80cdfd0..f1428e9 100644
>> --- a/docs/interop/qcow2.txt
>> +++ b/docs/interop/qcow2.txt
>> @@ -85,7 +85,11 @@ in the description of a field.
>>                                  be written to (unless for regaining
>>                                  consistency).
>>
>> -                    Bits 2-63:  Reserved (set to 0)
>> +                    Bit 2:      Compression format bit.  Iff this bit
> I know what this means, but spelling it "If and only if" or "When" might
> make more sense to other readers, as "Iff" is not common in English.
>
>> is set then
>> +                                the compression format extension MUST
>> be present
>> +                                and MUST be parsed and checked for
>> compatibility.
>> +
>> +                    Bits 3-63:  Reserved (set to 0)
>>
>>           80 -  87:  compatible_features
>>                      Bitmask of compatible features. An implementation can
>> @@ -135,6 +139,7 @@ be stored. Each extension has a structure like the
>> following:
>>                          0xE2792ACA - Backing file format name
>>                          0x6803f857 - Feature name table
>>                          0x23852875 - Bitmaps extension
>> +                        0xC0318300 - Compression format extension
> Now that you aren't burning 256 magic numbers, it may make sense to have
> the last two hex digits be non-zero.
>
>
>> +== Compression format extension ==
>> +
>> +The compression format extension is an optional header extension. It
>> provides
> Inline pasting created interesting wrapping, but the actual patch will
> obviously read better.
>
>> +the ability to specify the compression algorithm and compression
>> parameters
>> +that are used for compressed clusters. This new header MUST be present if
>> +the incompatible-feature bit "compression format bit" is set and MUST
>> be absent
>> +otherwise.
>> +
>> +The fields of the compression format extension are:
>> +
>> +    Byte  0 - 15:  compression_format_name (padded with zeros, but not
>> +                   necessarily null terminated if it has full length)
> Do we really want arbitrary names of formats, or do we want to specify
> specific algorithms (gzip, lzo, zstd) as an enum?  Which way gives us
> maximum likelihood of interoperability?
>
>> +
>> +              16:  compression_level (uint8_t)
>> +                   0 = default compression level
>> +                   1 = lowest compression level
>> +                   x = highest compression level (the highest compression
>> +                       level may vary for different compression formats)
>> +
>> +         17 - 23:  Reserved for future use, must be zero.
> Feels pretty limited - you don't have a length field for variable-length
> extension of additional parameters, but have to fit all additions in the
> next 8 bytes.  Yes, all extension headers are already paired with a
> length parameter outside of the struct, sent alongside the header magic
> number, but embedding a length directly in the header (while redundant)
> makes it easier to keep information local to the header.  See
> extra_data_size under Bitmap directory, for example.  Of course, we may
> turn those 8 bytes INTO a length field, that then describe the rest of
> the variable length parameters, but why not do it up front?
>
> If we go with an enum mapping of supported compression formats, then you
> can go into further details on exactly what extra parameters are
> supports for each algorithm; while leaving it as a free-form text string
> makes it harder to interpret what any additional payload will represent.
>

I send a V2 of the series including the update of the spec last week.

Maybe you can have a look if this version is better.


Thanks,

Peter
diff mbox

Patch

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 80cdfd0..f1428e9 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -85,7 +85,11 @@  in the description of a field.
                                  be written to (unless for regaining
                                  consistency).

-                    Bits 2-63:  Reserved (set to 0)
+                    Bit 2:      Compression format bit.  Iff this bit is set then
+                                the compression format extension MUST be present
+                                and MUST be parsed and checked for compatibility.
+
+                    Bits 3-63:  Reserved (set to 0)

           80 -  87:  compatible_features
                      Bitmask of compatible features. An implementation can
@@ -135,6 +139,7 @@  be stored. Each extension has a structure like the following:
                          0xE2792ACA - Backing file format name
                          0x6803f857 - Feature name table
                          0x23852875 - Bitmaps extension
+                        0xC0318300 - Compression format extension
                          other      - Unknown header extension, can be safely
                                       ignored

@@ -208,6 +213,28 @@  The fields of the bitmaps extension are:
                     starts. Must be aligned to a cluster boundary.


+== Compression format extension ==
+
+The compression format extension is an optional header extension. It provides
+the ability to specify the compression algorithm and compression parameters
+that are used for compressed clusters. This new header MUST be present if
+the incompatible-feature bit "compression format bit" is set and MUST be absent
+otherwise.
+
+The fields of the compression format extension are:
+
+    Byte  0 - 15:  compression_format_name (padded with zeros, but not
+                   necessarily null terminated if it has full length)
+
+              16:  compression_level (uint8_t)
+                   0 = default compression level
+                   1 = lowest compression level
+                   x = highest compression level (the highest compression
+                       level may vary for different compression formats)
+
+         17 - 23:  Reserved for future use, must be zero.
+
+
  == Host cluster management ==

  qcow2 manages the allocation of host clusters by maintaining a reference count