Message ID | 1454394900-3586-1-git-send-email-vsementsov@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 02/02 09:35, Vladimir Sementsov-Ogievskiy wrote: > The new feature for qcow2: storing bitmaps. > > This patch adds new header extension to qcow2 - Bitmaps Extension. It > provides an ability to store virtual disk related bitmaps in a qcow2 > image. For now there is only one type of such bitmaps: Dirty Tracking > Bitmap, which just tracks virtual disk changes from some moment. > > Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, > should be stored in this qcow2 file. The size of each bitmap > (considering its granularity) is equal to virtual disk size. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > v9 > - rewordings, thanks to Max > > v8 > - rewordings > - bitmap_directory_size: 4b -> 8b > - add more descriptive description in == Bitmaps == section > - add paragraph "Dirty tracking bitmaps" > > Bitmap directory entry: > - extra data should not allocate additional clusters > - padding must be all-bytes-zero > - add extra_data_compatible flag (now behavior in case of unknown > extra data is defined by this flag) > > v7: > > - Rewordings, grammar. > Max, Eric, John, thank you very much. > > - add last paragraph: remaining bits in bitmap data clusters must be > zero. > > - s/Bitmap Directory/bitmap directory/ and other names like this at > the request of Max. > > v6: > > - reword bitmap_directory_size description > - bitmap type: make 0 reserved > - extra_data_size: resize to 4bytes > Also, I've marked this field as "must be zero". We can always change > it, if we decide allowing managing app to specify any extra data, by > defining some magic value as a top of user extra data.. So, for now > non zeor extra_data_size should be considered as an error. > - swap name and extra_data to give good alignment to extra_data. > > > v5: > > - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of > bitmaps. > - rewordings > - move upper bounds to "Notes about Qemu limits" > - s/should/must somewhere. (but not everywhere) > - move name_size field closer to name itself in bitmap header > - add extra data area to bitmap header > - move bitmap data description to separate section > > > docs/specs/qcow2.txt | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 222 insertions(+), 1 deletion(-) > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > index f236d8c..db5e666 100644 > --- a/docs/specs/qcow2.txt > +++ b/docs/specs/qcow2.txt > @@ -103,7 +103,18 @@ in the description of a field. > write to an image with unknown auto-clear features if it > clears the respective bits from this field first. > > - Bits 0-63: Reserved (set to 0) > + Bit 0: Bitmaps extension bit > + This bit indicates consistency for the bitmaps > + extension data. > + > + It is an error if this bit is set without the > + bitmaps extension present. > + > + If the bitmaps extension is present but this > + bit is unset, the bitmaps extension data must be > + considered inconsistent. > + > + Bits 1-63: Reserved (set to 0) > > 96 - 99: refcount_order > Describes the width of a reference count block entry (width > @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: > 0x00000000 - End of the header extension area > 0xE2792ACA - Backing file format name > 0x6803f857 - Feature name table > + 0x23852875 - Bitmaps extension > other - Unknown header extension, can be safely > ignored > > @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: > terminated if it has full length) > > > +== Bitmaps extension == > + > +The bitmaps extension is an optional header extension. It provides the ability > +to store bitmaps related to a virtual disk. For now, there is only one bitmap > +type: the dirty tracking bitmap, which tracks virtual disk changes from some > +point in time. > + > +The data of the extension should be considered consistent only if the > +corresponding auto-clear feature bit is set, see autoclear_features above. > + > +The fields of the bitmaps extension are: > + > + Byte 0 - 3: nb_bitmaps > + The number of bitmaps contained in the image. Must be > + greater than or equal to 1. > + > + Note: Qemu currently only supports up to 65535 bitmaps per > + image. > + > + 4 - 7: Reserved, must be zero. > + > + 8 - 15: bitmap_directory_size > + Size of the bitmap directory in bytes. It is the cumulative > + size of all (nb_bitmaps) bitmap headers. > + > + 16 - 23: bitmap_directory_offset > + Offset into the image file at which the bitmap directory > + starts. Must be aligned to a cluster boundary. > + > + > == Host cluster management == > > qcow2 manages the allocation of host clusters by maintaining a reference count > @@ -360,3 +402,182 @@ Snapshot table entry: > > variable: Padding to round up the snapshot table entry size to the > next multiple of 8. > + > + > +== Bitmaps == > + > +As mentioned above, the bitmaps extension provides the ability to store bitmaps > +related to a virtual disk. This section describes how these bitmaps are stored. > + > +All stored bitmaps are related to the virtual disk stored in the same image, so > +each bitmap size is equal to the virtual disk size. > + > +Each bit of the bitmap is responsible for strictly defined range of the virtual > +disk. For bit number bit_nr the corresponding range (in bytes) will be: > + > + [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] > + > +Granularity is a property of the concrete bitmap, see below. > + > + > +=== Bitmap directory === > + > +Each bitmap saved in the image is described in a bitmap directory entry. The > +bitmap directory is a contiguous area in the image file, whose starting offset > +and length are given by the header extension fields bitmap_directory_offset and > +bitmap_directory_size. The entries of the bitmap directory have variable > +length, depending on the length of the bitmap name and extra data. These s/length/lengths/ ? > +entries are also called bitmap headers. > + > +Structure of a bitmap directory entry: > + > + Byte 0 - 7: bitmap_table_offset > + Offset into the image file at which the bitmap table > + (described below) for the bitmap starts. Must be aligned to > + a cluster boundary. > + > + 8 - 11: bitmap_table_size > + Number of entries in the bitmap table of the bitmap. > + > + 12 - 15: flags > + Bit > + 0: in_use > + The bitmap was not saved correctly and may be > + inconsistent. > + > + 1: auto > + The bitmap must reflect all changes of the virtual > + disk by any application that would write to this qcow2 > + file (including writes, snapshot switching, etc.). The > + type of this bitmap must be 'dirty tracking bitmap'. > + > + 2: extra_data_compatible > + This flags is meaningful when the extra data is > + unknown to the software (currently any extra data is > + unknown to Qemu). > + If it is set, the bitmap may be used as expected, extra > + data must be left as is. > + If it is not set, the bitmap must not be used, but > + both it and its extra data be left as is. > + > + Bits 3 - 31 are reserved and must be 0. > + > + 16: type > + This field describes the sort of the bitmap. > + Values: > + 1: Dirty tracking bitmap > + > + Values 0, 2 - 255 are reserved. > + > + 17: granularity_bits > + Granularity bits. Valid values: 0 - 63. > + > + Note: Qemu currently doesn't support granularity_bits > + greater than 31. > + > + Granularity is calculated as > + granularity = 1 << granularity_bits > + > + A bitmap's granularity is how many bytes of the image > + accounts for one bit of the bitmap. > + > + 18 - 19: name_size > + Size of the bitmap name. Must be non-zero. > + > + Note: Qemu currently doesn't support values greater than > + 1023. > + > + 20 - 23: extra_data_size > + Size of type-specific extra data. > + > + For now, as no extra data is defined, extra_data_size is > + reserved and should be zero. If it is non-zero the > + behavior is defined by extra_data_compatible flag. > + > + variable: extra_data > + Extra data for the bitmap, occupying extra_data_size bytes. > + Extra data must never contain references to clusters or in > + some other way allocate additional clusters. > + > + variable: name > + The name of the bitmap (not null terminated), occupying > + name_size bytes. Must be unique among all bitmap names > + within the bitmaps extension. > + > + variable: Padding to round up the bitmap directory entry size to the > + next multiple of 8. All bytes of the padding must be zero. Isn't it clearer to find the next entry, if you add an "entry_size" in the beginning, before bitmap_table_offset in each record? > + > + > +=== Bitmap table === > + > +Bitmaps are stored using a one-level structure (as opposed to two-level > +structure like for refcounts and guest clusters mapping) for the mapping of s/structure/structures/ > +bitmap data to host clusters. This structure is called the bitmap table. > + > +Each bitmap table has a variable size (stored in the bitmap directory entry) > +and may use multiple clusters, however, it must be contiguous in the image > +file. > + > +Structure of a bitmap table entry: > + > + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. > + If bits 9 - 55 are zero: > + 0: Cluster should be read as all zeros. > + 1: Cluster should be read as all ones. Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much sense to me. In which case bit 0 is set but 9-55 are zero? > + > + 1 - 8: Reserved and must be zero. > + > + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to > + a cluster boundary. If the offset is 0, the cluster is > + unallocated; in that case, bit 0 determines how this > + cluster should be treated during reads. > + > + 56 - 63: Reserved and must be zero. > + > + > +=== Bitmap data === > + > +As noted above, bitmap data is stored in separate clusters, described by the > +bitmap table. Given an offset (in bytes) into the bitmap data, the offset into > +the image file can be obtained as follows: > + > + image_offset = > + bitmap_table[bitmap_data_offset / cluster_size] + > + (bitmap_data_offset % cluster_size) In this pseudo code, image_offset looks like an variable, but... > + > +This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see > +above). > + > +Given an offset byte_nr into the virtual disk and the bitmap's granularity, the > +bit offset into the bitmap can be calculated like this: > + > + bit_offset = > + image_offset(byte_nr / granularity / 8) * 8 + > + (byte_nr / granularity) % 8 ... here it looks like a function. Could you make it consistent? > + > +If the size of the bitmap data is not a multiple of the cluster size then the > +last cluster of the bitmap data contains some unused tail bits. These bits must > +be zero. What defines the size of the bitmap data? > + > + > +=== Dirty tracking bitmaps === > + > +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. > + > +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or > +'disabled'. > While the bitmap is 'enabled', all writes to the virtual disk > +should be reflected in the bitmap. A set bit in the bitmap means that the > +corresponding range of the virtual disk (see above) was written to while the > +bitmap was 'enabled'. An unset bit means that this range was not written to. > + > +The software should not sync the bitmap in the image file with its > +representation in RAM after each write. Flag 'in_use' should be set while the > +bitmap is not synced. I think this is an implementation detail. IMO a software *can* keep the bitmap synced, "should not" is an obsecure and unnecessary constraint. > + > +In the image file the 'enabled' state is reflected by the 'auto' flag. If this > +flag is set, the software must consider the bitmap as 'enabled' and start > +tracking virtual disk changes to this bitmap from the first write to the > +virtual disk. If this flag is not set then the bitmap is disabled. > + > +To maintain bitmap consistency, the only software which is allowed to change > +the value of the 'auto' flag is the one which has created the bitmap. How does one software know if the image is created by it or not? Fam
On 03.02.2016 11:04, Fam Zheng wrote: > On Tue, 02/02 09:35, Vladimir Sementsov-Ogievskiy wrote: >> The new feature for qcow2: storing bitmaps. >> >> This patch adds new header extension to qcow2 - Bitmaps Extension. It >> provides an ability to store virtual disk related bitmaps in a qcow2 >> image. For now there is only one type of such bitmaps: Dirty Tracking >> Bitmap, which just tracks virtual disk changes from some moment. >> >> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, >> should be stored in this qcow2 file. The size of each bitmap >> (considering its granularity) is equal to virtual disk size. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> >> v9 >> - rewordings, thanks to Max >> >> v8 >> - rewordings >> - bitmap_directory_size: 4b -> 8b >> - add more descriptive description in == Bitmaps == section >> - add paragraph "Dirty tracking bitmaps" >> >> Bitmap directory entry: >> - extra data should not allocate additional clusters >> - padding must be all-bytes-zero >> - add extra_data_compatible flag (now behavior in case of unknown >> extra data is defined by this flag) >> >> v7: >> >> - Rewordings, grammar. >> Max, Eric, John, thank you very much. >> >> - add last paragraph: remaining bits in bitmap data clusters must be >> zero. >> >> - s/Bitmap Directory/bitmap directory/ and other names like this at >> the request of Max. >> >> v6: >> >> - reword bitmap_directory_size description >> - bitmap type: make 0 reserved >> - extra_data_size: resize to 4bytes >> Also, I've marked this field as "must be zero". We can always change >> it, if we decide allowing managing app to specify any extra data, by >> defining some magic value as a top of user extra data.. So, for now >> non zeor extra_data_size should be considered as an error. >> - swap name and extra_data to give good alignment to extra_data. >> >> >> v5: >> >> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of >> bitmaps. >> - rewordings >> - move upper bounds to "Notes about Qemu limits" >> - s/should/must somewhere. (but not everywhere) >> - move name_size field closer to name itself in bitmap header >> - add extra data area to bitmap header >> - move bitmap data description to separate section >> >> >> docs/specs/qcow2.txt | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 222 insertions(+), 1 deletion(-) >> >> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt >> index f236d8c..db5e666 100644 >> --- a/docs/specs/qcow2.txt >> +++ b/docs/specs/qcow2.txt >> @@ -103,7 +103,18 @@ in the description of a field. >> write to an image with unknown auto-clear features if it >> clears the respective bits from this field first. >> >> - Bits 0-63: Reserved (set to 0) >> + Bit 0: Bitmaps extension bit >> + This bit indicates consistency for the bitmaps >> + extension data. >> + >> + It is an error if this bit is set without the >> + bitmaps extension present. >> + >> + If the bitmaps extension is present but this >> + bit is unset, the bitmaps extension data must be >> + considered inconsistent. >> + >> + Bits 1-63: Reserved (set to 0) >> >> 96 - 99: refcount_order >> Describes the width of a reference count block entry (width >> @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: >> 0x00000000 - End of the header extension area >> 0xE2792ACA - Backing file format name >> 0x6803f857 - Feature name table >> + 0x23852875 - Bitmaps extension >> other - Unknown header extension, can be safely >> ignored >> >> @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: >> terminated if it has full length) >> >> >> +== Bitmaps extension == >> + >> +The bitmaps extension is an optional header extension. It provides the ability >> +to store bitmaps related to a virtual disk. For now, there is only one bitmap >> +type: the dirty tracking bitmap, which tracks virtual disk changes from some >> +point in time. >> + >> +The data of the extension should be considered consistent only if the >> +corresponding auto-clear feature bit is set, see autoclear_features above. >> + >> +The fields of the bitmaps extension are: >> + >> + Byte 0 - 3: nb_bitmaps >> + The number of bitmaps contained in the image. Must be >> + greater than or equal to 1. >> + >> + Note: Qemu currently only supports up to 65535 bitmaps per >> + image. >> + >> + 4 - 7: Reserved, must be zero. >> + >> + 8 - 15: bitmap_directory_size >> + Size of the bitmap directory in bytes. It is the cumulative >> + size of all (nb_bitmaps) bitmap headers. >> + >> + 16 - 23: bitmap_directory_offset >> + Offset into the image file at which the bitmap directory >> + starts. Must be aligned to a cluster boundary. >> + >> + >> == Host cluster management == >> >> qcow2 manages the allocation of host clusters by maintaining a reference count >> @@ -360,3 +402,182 @@ Snapshot table entry: >> >> variable: Padding to round up the snapshot table entry size to the >> next multiple of 8. >> + >> + >> +== Bitmaps == >> + >> +As mentioned above, the bitmaps extension provides the ability to store bitmaps >> +related to a virtual disk. This section describes how these bitmaps are stored. >> + >> +All stored bitmaps are related to the virtual disk stored in the same image, so >> +each bitmap size is equal to the virtual disk size. >> + >> +Each bit of the bitmap is responsible for strictly defined range of the virtual >> +disk. For bit number bit_nr the corresponding range (in bytes) will be: >> + >> + [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] >> + >> +Granularity is a property of the concrete bitmap, see below. >> + >> + >> +=== Bitmap directory === >> + >> +Each bitmap saved in the image is described in a bitmap directory entry. The >> +bitmap directory is a contiguous area in the image file, whose starting offset >> +and length are given by the header extension fields bitmap_directory_offset and >> +bitmap_directory_size. The entries of the bitmap directory have variable >> +length, depending on the length of the bitmap name and extra data. These > s/length/lengths/ ? ok > >> +entries are also called bitmap headers. >> + >> +Structure of a bitmap directory entry: >> + >> + Byte 0 - 7: bitmap_table_offset >> + Offset into the image file at which the bitmap table >> + (described below) for the bitmap starts. Must be aligned to >> + a cluster boundary. >> + >> + 8 - 11: bitmap_table_size >> + Number of entries in the bitmap table of the bitmap. >> + >> + 12 - 15: flags >> + Bit >> + 0: in_use >> + The bitmap was not saved correctly and may be >> + inconsistent. >> + >> + 1: auto >> + The bitmap must reflect all changes of the virtual >> + disk by any application that would write to this qcow2 >> + file (including writes, snapshot switching, etc.). The >> + type of this bitmap must be 'dirty tracking bitmap'. >> + >> + 2: extra_data_compatible >> + This flags is meaningful when the extra data is >> + unknown to the software (currently any extra data is >> + unknown to Qemu). >> + If it is set, the bitmap may be used as expected, extra >> + data must be left as is. >> + If it is not set, the bitmap must not be used, but >> + both it and its extra data be left as is. >> + >> + Bits 3 - 31 are reserved and must be 0. >> + >> + 16: type >> + This field describes the sort of the bitmap. >> + Values: >> + 1: Dirty tracking bitmap >> + >> + Values 0, 2 - 255 are reserved. >> + >> + 17: granularity_bits >> + Granularity bits. Valid values: 0 - 63. >> + >> + Note: Qemu currently doesn't support granularity_bits >> + greater than 31. >> + >> + Granularity is calculated as >> + granularity = 1 << granularity_bits >> + >> + A bitmap's granularity is how many bytes of the image >> + accounts for one bit of the bitmap. >> + >> + 18 - 19: name_size >> + Size of the bitmap name. Must be non-zero. >> + >> + Note: Qemu currently doesn't support values greater than >> + 1023. >> + >> + 20 - 23: extra_data_size >> + Size of type-specific extra data. >> + >> + For now, as no extra data is defined, extra_data_size is >> + reserved and should be zero. If it is non-zero the >> + behavior is defined by extra_data_compatible flag. >> + >> + variable: extra_data >> + Extra data for the bitmap, occupying extra_data_size bytes. >> + Extra data must never contain references to clusters or in >> + some other way allocate additional clusters. >> + >> + variable: name >> + The name of the bitmap (not null terminated), occupying >> + name_size bytes. Must be unique among all bitmap names >> + within the bitmaps extension. >> + >> + variable: Padding to round up the bitmap directory entry size to the >> + next multiple of 8. All bytes of the padding must be zero. > Isn't it clearer to find the next entry, if you add an "entry_size" in the > beginning, before bitmap_table_offset in each record? Hmm, I'm not sure. It is bad idea to have both extra_data_size and entry_size, because it is superfluous. Also what about padding? If entry_size will include it (which is expected) then we will not know exact size of extra_data. Is it bad? Also current scheme is made like one for snapshots. > >> + >> + >> +=== Bitmap table === >> + >> +Bitmaps are stored using a one-level structure (as opposed to two-level >> +structure like for refcounts and guest clusters mapping) for the mapping of > s/structure/structures/ > >> +bitmap data to host clusters. This structure is called the bitmap table. >> + >> +Each bitmap table has a variable size (stored in the bitmap directory entry) >> +and may use multiple clusters, however, it must be contiguous in the image >> +file. >> + >> +Structure of a bitmap table entry: >> + >> + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. >> + If bits 9 - 55 are zero: >> + 0: Cluster should be read as all zeros. >> + 1: Cluster should be read as all ones. > Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much > sense to me. In which case bit 0 is set but 9-55 are zero? In case "1: Cluster should be read as all ones.". > >> + >> + 1 - 8: Reserved and must be zero. >> + >> + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to >> + a cluster boundary. If the offset is 0, the cluster is >> + unallocated; in that case, bit 0 determines how this >> + cluster should be treated during reads. >> + >> + 56 - 63: Reserved and must be zero. >> + >> + >> +=== Bitmap data === >> + >> +As noted above, bitmap data is stored in separate clusters, described by the >> +bitmap table. Given an offset (in bytes) into the bitmap data, the offset into >> +the image file can be obtained as follows: >> + >> + image_offset = >> + bitmap_table[bitmap_data_offset / cluster_size] + >> + (bitmap_data_offset % cluster_size) > In this pseudo code, image_offset looks like an variable, but... > >> + >> +This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see >> +above). >> + >> +Given an offset byte_nr into the virtual disk and the bitmap's granularity, the >> +bit offset into the bitmap can be calculated like this: >> + >> + bit_offset = >> + image_offset(byte_nr / granularity / 8) * 8 + >> + (byte_nr / granularity) % 8 > ... here it looks like a function. Could you make it consistent? ok, will do > >> + >> +If the size of the bitmap data is not a multiple of the cluster size then the >> +last cluster of the bitmap data contains some unused tail bits. These bits must >> +be zero. > What defines the size of the bitmap data? bitmap size === virtual disk size. > >> + >> + >> +=== Dirty tracking bitmaps === >> + >> +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. >> + >> +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or >> +'disabled'. >> While the bitmap is 'enabled', all writes to the virtual disk >> +should be reflected in the bitmap. A set bit in the bitmap means that the >> +corresponding range of the virtual disk (see above) was written to while the >> +bitmap was 'enabled'. An unset bit means that this range was not written to. >> + >> +The software should not sync the bitmap in the image file with its >> +representation in RAM after each write. Flag 'in_use' should be set while the >> +bitmap is not synced. > I think this is an implementation detail. IMO a software *can* keep the bitmap > synced, "should not" is an obsecure and unnecessary constraint. s/should not/doesn't have to/, ok? > >> + >> +In the image file the 'enabled' state is reflected by the 'auto' flag. If this >> +flag is set, the software must consider the bitmap as 'enabled' and start >> +tracking virtual disk changes to this bitmap from the first write to the >> +virtual disk. If this flag is not set then the bitmap is disabled. >> + >> +To maintain bitmap consistency, the only software which is allowed to change >> +the value of the 'auto' flag is the one which has created the bitmap. > How does one software know if the image is created by it or not? I understand that this is not very good point for spec.. I can drop it. The idea is that "change this flag, do some writes, change it back" may bring great damage to backup tool, which was created that bitmap. > > Fam
On Wed, 02/03 16:45, Vladimir Sementsov-Ogievskiy wrote: > Also current scheme is made like one for snapshots. Okay, then I'll be fine with being consistent. > > > > >>+ > >>+ > >>+=== Bitmap table === > >>+ > >>+Bitmaps are stored using a one-level structure (as opposed to two-level > >>+structure like for refcounts and guest clusters mapping) for the mapping of > >s/structure/structures/ > > > >>+bitmap data to host clusters. This structure is called the bitmap table. > >>+ > >>+Each bitmap table has a variable size (stored in the bitmap directory entry) > >>+and may use multiple clusters, however, it must be contiguous in the image > >>+file. > >>+ > >>+Structure of a bitmap table entry: > >>+ > >>+ Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. > >>+ If bits 9 - 55 are zero: > >>+ 0: Cluster should be read as all zeros. > >>+ 1: Cluster should be read as all ones. > >Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much > >sense to me. In which case bit 0 is set but 9-55 are zero? > > In case "1: Cluster should be read as all ones.". I cannot think of a use case leading to this. > > > >>+ > >>+If the size of the bitmap data is not a multiple of the cluster size then the > >>+last cluster of the bitmap data contains some unused tail bits. These bits must > >>+be zero. > >What defines the size of the bitmap data? > > bitmap size === virtual disk size. okay. > > > > >>+ > >>+ > >>+=== Dirty tracking bitmaps === > >>+ > >>+Bitmaps with 'type' field equal to one are dirty tracking bitmaps. > >>+ > >>+When the virtual disk is in use dirty tracking bitmap may be 'enabled' or > >>+'disabled'. > >>While the bitmap is 'enabled', all writes to the virtual disk > >>+should be reflected in the bitmap. A set bit in the bitmap means that the > >>+corresponding range of the virtual disk (see above) was written to while the > >>+bitmap was 'enabled'. An unset bit means that this range was not written to. > >>+ > >>+The software should not sync the bitmap in the image file with its > >>+representation in RAM after each write. Flag 'in_use' should be set while the > >>+bitmap is not synced. > >I think this is an implementation detail. IMO a software *can* keep the bitmap > >synced, "should not" is an obsecure and unnecessary constraint. > > s/should not/doesn't have to/, ok? yes, that's fine. > > > > >>+ > >>+In the image file the 'enabled' state is reflected by the 'auto' flag. If this > >>+flag is set, the software must consider the bitmap as 'enabled' and start > >>+tracking virtual disk changes to this bitmap from the first write to the > >>+virtual disk. If this flag is not set then the bitmap is disabled. > >>+ > >>+To maintain bitmap consistency, the only software which is allowed to change > >>+the value of the 'auto' flag is the one which has created the bitmap. > >How does one software know if the image is created by it or not? > > I understand that this is not very good point for spec.. I can drop > it. The idea is that "change this flag, do some writes, change it > back" may bring great damage to backup tool, which was created that > bitmap. I think the only reason to switch the 'auto' flag is discarding the bitmap data, no? Fam
On 03.02.2016 17:41, Fam Zheng wrote: > On Wed, 02/03 16:45, Vladimir Sementsov-Ogievskiy wrote: >> Also current scheme is made like one for snapshots. > Okay, then I'll be fine with being consistent. > > >>>> + >>>> + >>>> +=== Bitmap table === >>>> + >>>> +Bitmaps are stored using a one-level structure (as opposed to two-level >>>> +structure like for refcounts and guest clusters mapping) for the mapping of >>> s/structure/structures/ >>> >>>> +bitmap data to host clusters. This structure is called the bitmap table. >>>> + >>>> +Each bitmap table has a variable size (stored in the bitmap directory entry) >>>> +and may use multiple clusters, however, it must be contiguous in the image >>>> +file. >>>> + >>>> +Structure of a bitmap table entry: >>>> + >>>> + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. >>>> + If bits 9 - 55 are zero: >>>> + 0: Cluster should be read as all zeros. >>>> + 1: Cluster should be read as all ones. >>> Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much >>> sense to me. In which case bit 0 is set but 9-55 are zero? >> In case "1: Cluster should be read as all ones.". > I cannot think of a use case leading to this. Why not? It is the dirty bitmap. It may be very dirty, it even may be all-ones. > >>>> + >>>> +If the size of the bitmap data is not a multiple of the cluster size then the >>>> +last cluster of the bitmap data contains some unused tail bits. These bits must >>>> +be zero. >>> What defines the size of the bitmap data? >> bitmap size === virtual disk size. > okay. > >>>> + >>>> + >>>> +=== Dirty tracking bitmaps === >>>> + >>>> +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. >>>> + >>>> +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or >>>> +'disabled'. >>>> While the bitmap is 'enabled', all writes to the virtual disk >>>> +should be reflected in the bitmap. A set bit in the bitmap means that the >>>> +corresponding range of the virtual disk (see above) was written to while the >>>> +bitmap was 'enabled'. An unset bit means that this range was not written to. >>>> + >>>> +The software should not sync the bitmap in the image file with its >>>> +representation in RAM after each write. Flag 'in_use' should be set while the >>>> +bitmap is not synced. >>> I think this is an implementation detail. IMO a software *can* keep the bitmap >>> synced, "should not" is an obsecure and unnecessary constraint. >> s/should not/doesn't have to/, ok? > yes, that's fine. > >>>> + >>>> +In the image file the 'enabled' state is reflected by the 'auto' flag. If this >>>> +flag is set, the software must consider the bitmap as 'enabled' and start >>>> +tracking virtual disk changes to this bitmap from the first write to the >>>> +virtual disk. If this flag is not set then the bitmap is disabled. >>>> + >>>> +To maintain bitmap consistency, the only software which is allowed to change >>>> +the value of the 'auto' flag is the one which has created the bitmap. >>> How does one software know if the image is created by it or not? >> I understand that this is not very good point for spec.. I can drop >> it. The idea is that "change this flag, do some writes, change it >> back" may bring great damage to backup tool, which was created that >> bitmap. > I think the only reason to switch the 'auto' flag is discarding the bitmap > data, no? Hmm, may be.. Ok lets drop this paranoic last paragraph. With the same logic I can add something like "to maintain bitmap consistency, the only software which is allowed to clear bits in it...".. > > Fam
On Wed, 02/03 20:16, Vladimir Sementsov-Ogievskiy wrote: > On 03.02.2016 17:41, Fam Zheng wrote: > >On Wed, 02/03 16:45, Vladimir Sementsov-Ogievskiy wrote: > >>Also current scheme is made like one for snapshots. > >Okay, then I'll be fine with being consistent. > > > > > >>>>+ > >>>>+ > >>>>+=== Bitmap table === > >>>>+ > >>>>+Bitmaps are stored using a one-level structure (as opposed to two-level > >>>>+structure like for refcounts and guest clusters mapping) for the mapping of > >>>s/structure/structures/ > >>> > >>>>+bitmap data to host clusters. This structure is called the bitmap table. > >>>>+ > >>>>+Each bitmap table has a variable size (stored in the bitmap directory entry) > >>>>+and may use multiple clusters, however, it must be contiguous in the image > >>>>+file. > >>>>+ > >>>>+Structure of a bitmap table entry: > >>>>+ > >>>>+ Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. > >>>>+ If bits 9 - 55 are zero: > >>>>+ 0: Cluster should be read as all zeros. > >>>>+ 1: Cluster should be read as all ones. > >>>Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much > >>>sense to me. In which case bit 0 is set but 9-55 are zero? > >>In case "1: Cluster should be read as all ones.". > >I cannot think of a use case leading to this. > > Why not? It is the dirty bitmap. It may be very dirty, it even may > be all-ones. I see what this is about. This assumes the bitmap is only saved when the image is closed, so that if by that time the whole chunk is all-one, this bit is set without allocating the cluster. But again, I don't think that is the only way to save bitmap: an implementation can save dirty bit much more frequently (to free memory), or even do it synchronously (to be power failure proof). In these cases, this bit is hard to use, because it's very unlikely all bits are dirtied between two adjacent saving points. Sorry for asking for this so late, what about making bit 0 and the offset orthogonal? Bits[9..55] = 0 | Bits[9..55] != 0 Bit[0] = 0 zero | read Bit[0] = 1 one | one Fam
On 04.02.2016 05:25, Fam Zheng wrote: > On Wed, 02/03 20:16, Vladimir Sementsov-Ogievskiy wrote: >> On 03.02.2016 17:41, Fam Zheng wrote: >>> On Wed, 02/03 16:45, Vladimir Sementsov-Ogievskiy wrote: >>>> Also current scheme is made like one for snapshots. >>> Okay, then I'll be fine with being consistent. >>> >>> >>>>>> + >>>>>> + >>>>>> +=== Bitmap table === >>>>>> + >>>>>> +Bitmaps are stored using a one-level structure (as opposed to two-level >>>>>> +structure like for refcounts and guest clusters mapping) for the mapping of >>>>> s/structure/structures/ >>>>> >>>>>> +bitmap data to host clusters. This structure is called the bitmap table. >>>>>> + >>>>>> +Each bitmap table has a variable size (stored in the bitmap directory entry) >>>>>> +and may use multiple clusters, however, it must be contiguous in the image >>>>>> +file. >>>>>> + >>>>>> +Structure of a bitmap table entry: >>>>>> + >>>>>> + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. >>>>>> + If bits 9 - 55 are zero: >>>>>> + 0: Cluster should be read as all zeros. >>>>>> + 1: Cluster should be read as all ones. >>>>> Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much >>>>> sense to me. In which case bit 0 is set but 9-55 are zero? >>>> In case "1: Cluster should be read as all ones.". >>> I cannot think of a use case leading to this. >> Why not? It is the dirty bitmap. It may be very dirty, it even may >> be all-ones. > I see what this is about. This assumes the bitmap is only saved when the image > is closed, so that if by that time the whole chunk is all-one, this bit is set > without allocating the cluster. > > But again, I don't think that is the only way to save bitmap: an implementation > can save dirty bit much more frequently (to free memory), or even do it > synchronously (to be power failure proof). In these cases, this bit is hard to > use, because it's very unlikely all bits are dirtied between two adjacent > saving points. > > Sorry for asking for this so late, what about making bit 0 and the offset > orthogonal? > > Bits[9..55] = 0 | Bits[9..55] != 0 > Bit[0] = 0 zero | read > Bit[0] = 1 one | one And what the meaning of bits[9..55] in case of bit[0] = 1? "Reserved" is better here, I think, than "ignored". In other places of this doc we switched from "ignored" to "reserved" during this discussion. For example in case of snapshot-switch we will (may be) set all bits in the bitmap. Frequency of sync doesn't matter, the bitmaps becomes more and more dirty and it will be cleared only after next incremental backup. "this bit is hard to use" - even if it is hard, you are not forced to use it: you can allocate cluster and set all bits in it (or unset them) and leave bit[0] = zero. It is an additional feature, which will save disk space and io in some cases. > > Fam
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index f236d8c..db5e666 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -103,7 +103,18 @@ in the description of a field. write to an image with unknown auto-clear features if it clears the respective bits from this field first. - Bits 0-63: Reserved (set to 0) + Bit 0: Bitmaps extension bit + This bit indicates consistency for the bitmaps + extension data. + + It is an error if this bit is set without the + bitmaps extension present. + + If the bitmaps extension is present but this + bit is unset, the bitmaps extension data must be + considered inconsistent. + + Bits 1-63: Reserved (set to 0) 96 - 99: refcount_order Describes the width of a reference count block entry (width @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: 0x00000000 - End of the header extension area 0xE2792ACA - Backing file format name 0x6803f857 - Feature name table + 0x23852875 - Bitmaps extension other - Unknown header extension, can be safely ignored @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: terminated if it has full length) +== Bitmaps extension == + +The bitmaps extension is an optional header extension. It provides the ability +to store bitmaps related to a virtual disk. For now, there is only one bitmap +type: the dirty tracking bitmap, which tracks virtual disk changes from some +point in time. + +The data of the extension should be considered consistent only if the +corresponding auto-clear feature bit is set, see autoclear_features above. + +The fields of the bitmaps extension are: + + Byte 0 - 3: nb_bitmaps + The number of bitmaps contained in the image. Must be + greater than or equal to 1. + + Note: Qemu currently only supports up to 65535 bitmaps per + image. + + 4 - 7: Reserved, must be zero. + + 8 - 15: bitmap_directory_size + Size of the bitmap directory in bytes. It is the cumulative + size of all (nb_bitmaps) bitmap headers. + + 16 - 23: bitmap_directory_offset + Offset into the image file at which the bitmap directory + starts. Must be aligned to a cluster boundary. + + == Host cluster management == qcow2 manages the allocation of host clusters by maintaining a reference count @@ -360,3 +402,182 @@ Snapshot table entry: variable: Padding to round up the snapshot table entry size to the next multiple of 8. + + +== Bitmaps == + +As mentioned above, the bitmaps extension provides the ability to store bitmaps +related to a virtual disk. This section describes how these bitmaps are stored. + +All stored bitmaps are related to the virtual disk stored in the same image, so +each bitmap size is equal to the virtual disk size. + +Each bit of the bitmap is responsible for strictly defined range of the virtual +disk. For bit number bit_nr the corresponding range (in bytes) will be: + + [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] + +Granularity is a property of the concrete bitmap, see below. + + +=== Bitmap directory === + +Each bitmap saved in the image is described in a bitmap directory entry. The +bitmap directory is a contiguous area in the image file, whose starting offset +and length are given by the header extension fields bitmap_directory_offset and +bitmap_directory_size. The entries of the bitmap directory have variable +length, depending on the length of the bitmap name and extra data. These +entries are also called bitmap headers. + +Structure of a bitmap directory entry: + + Byte 0 - 7: bitmap_table_offset + Offset into the image file at which the bitmap table + (described below) for the bitmap starts. Must be aligned to + a cluster boundary. + + 8 - 11: bitmap_table_size + Number of entries in the bitmap table of the bitmap. + + 12 - 15: flags + Bit + 0: in_use + The bitmap was not saved correctly and may be + inconsistent. + + 1: auto + The bitmap must reflect all changes of the virtual + disk by any application that would write to this qcow2 + file (including writes, snapshot switching, etc.). The + type of this bitmap must be 'dirty tracking bitmap'. + + 2: extra_data_compatible + This flags is meaningful when the extra data is + unknown to the software (currently any extra data is + unknown to Qemu). + If it is set, the bitmap may be used as expected, extra + data must be left as is. + If it is not set, the bitmap must not be used, but + both it and its extra data be left as is. + + Bits 3 - 31 are reserved and must be 0. + + 16: type + This field describes the sort of the bitmap. + Values: + 1: Dirty tracking bitmap + + Values 0, 2 - 255 are reserved. + + 17: granularity_bits + Granularity bits. Valid values: 0 - 63. + + Note: Qemu currently doesn't support granularity_bits + greater than 31. + + Granularity is calculated as + granularity = 1 << granularity_bits + + A bitmap's granularity is how many bytes of the image + accounts for one bit of the bitmap. + + 18 - 19: name_size + Size of the bitmap name. Must be non-zero. + + Note: Qemu currently doesn't support values greater than + 1023. + + 20 - 23: extra_data_size + Size of type-specific extra data. + + For now, as no extra data is defined, extra_data_size is + reserved and should be zero. If it is non-zero the + behavior is defined by extra_data_compatible flag. + + variable: extra_data + Extra data for the bitmap, occupying extra_data_size bytes. + Extra data must never contain references to clusters or in + some other way allocate additional clusters. + + variable: name + The name of the bitmap (not null terminated), occupying + name_size bytes. Must be unique among all bitmap names + within the bitmaps extension. + + variable: Padding to round up the bitmap directory entry size to the + next multiple of 8. All bytes of the padding must be zero. + + +=== Bitmap table === + +Bitmaps are stored using a one-level structure (as opposed to two-level +structure like for refcounts and guest clusters mapping) for the mapping of +bitmap data to host clusters. This structure is called the bitmap table. + +Each bitmap table has a variable size (stored in the bitmap directory entry) +and may use multiple clusters, however, it must be contiguous in the image +file. + +Structure of a bitmap table entry: + + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. + If bits 9 - 55 are zero: + 0: Cluster should be read as all zeros. + 1: Cluster should be read as all ones. + + 1 - 8: Reserved and must be zero. + + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to + a cluster boundary. If the offset is 0, the cluster is + unallocated; in that case, bit 0 determines how this + cluster should be treated during reads. + + 56 - 63: Reserved and must be zero. + + +=== Bitmap data === + +As noted above, bitmap data is stored in separate clusters, described by the +bitmap table. Given an offset (in bytes) into the bitmap data, the offset into +the image file can be obtained as follows: + + image_offset = + bitmap_table[bitmap_data_offset / cluster_size] + + (bitmap_data_offset % cluster_size) + +This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see +above). + +Given an offset byte_nr into the virtual disk and the bitmap's granularity, the +bit offset into the bitmap can be calculated like this: + + bit_offset = + image_offset(byte_nr / granularity / 8) * 8 + + (byte_nr / granularity) % 8 + +If the size of the bitmap data is not a multiple of the cluster size then the +last cluster of the bitmap data contains some unused tail bits. These bits must +be zero. + + +=== Dirty tracking bitmaps === + +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. + +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or +'disabled'. While the bitmap is 'enabled', all writes to the virtual disk +should be reflected in the bitmap. A set bit in the bitmap means that the +corresponding range of the virtual disk (see above) was written to while the +bitmap was 'enabled'. An unset bit means that this range was not written to. + +The software should not sync the bitmap in the image file with its +representation in RAM after each write. Flag 'in_use' should be set while the +bitmap is not synced. + +In the image file the 'enabled' state is reflected by the 'auto' flag. If this +flag is set, the software must consider the bitmap as 'enabled' and start +tracking virtual disk changes to this bitmap from the first write to the +virtual disk. If this flag is not set then the bitmap is disabled. + +To maintain bitmap consistency, the only software which is allowed to change +the value of the 'auto' flag is the one which has created the bitmap.
The new feature for qcow2: storing bitmaps. This patch adds new header extension to qcow2 - Bitmaps Extension. It provides an ability to store virtual disk related bitmaps in a qcow2 image. For now there is only one type of such bitmaps: Dirty Tracking Bitmap, which just tracks virtual disk changes from some moment. Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, should be stored in this qcow2 file. The size of each bitmap (considering its granularity) is equal to virtual disk size. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- v9 - rewordings, thanks to Max v8 - rewordings - bitmap_directory_size: 4b -> 8b - add more descriptive description in == Bitmaps == section - add paragraph "Dirty tracking bitmaps" Bitmap directory entry: - extra data should not allocate additional clusters - padding must be all-bytes-zero - add extra_data_compatible flag (now behavior in case of unknown extra data is defined by this flag) v7: - Rewordings, grammar. Max, Eric, John, thank you very much. - add last paragraph: remaining bits in bitmap data clusters must be zero. - s/Bitmap Directory/bitmap directory/ and other names like this at the request of Max. v6: - reword bitmap_directory_size description - bitmap type: make 0 reserved - extra_data_size: resize to 4bytes Also, I've marked this field as "must be zero". We can always change it, if we decide allowing managing app to specify any extra data, by defining some magic value as a top of user extra data.. So, for now non zeor extra_data_size should be considered as an error. - swap name and extra_data to give good alignment to extra_data. v5: - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of bitmaps. - rewordings - move upper bounds to "Notes about Qemu limits" - s/should/must somewhere. (but not everywhere) - move name_size field closer to name itself in bitmap header - add extra data area to bitmap header - move bitmap data description to separate section docs/specs/qcow2.txt | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 222 insertions(+), 1 deletion(-)