diff mbox series

[v4,2/5] qcow2: Assign the L2 cache relatively to image size

Message ID 20180808221138.5770-3-lbloch@janustech.com (mailing list archive)
State New, archived
Headers show
Series qcow2: Take the image size into account when allocating the L2 cache | expand

Commit Message

Leonid Bloch Aug. 8, 2018, 10:11 p.m. UTC
Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O. The memory overhead is not significant
in most cases, as the cache size is only 1 MB for each 8 GB of virtual
image size (with the default cluster size of 64 KB).

Previously, the L2 cache was allocated without considering the image
size, and an option existed to manually determine this size. Thus to
achieve full coverage of the image by the L2 cache (i.e. use more than
the default value of MAX(1 MB, 8 clusters)), a user needed to calculate
the required size manually or using a script, and passs this value to
the 'l2-cache-size' option.

Now, the L2 cache is assigned taking the actual image size into account,
and will cover the entire image, unless the size needed for that is
larger than a certain maximum. This maximum is set to 32 MB by default
(enough to cover a 256 GB image using the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c              | 33 +++++++++++++--------------------
 block/qcow2.h              |  4 +---
 docs/qcow2-cache.txt       | 24 ++++++++++++++----------
 qemu-options.hx            |  6 +++---
 tests/qemu-iotests/137     |  1 -
 tests/qemu-iotests/137.out |  1 -
 6 files changed, 31 insertions(+), 38 deletions(-)

Comments

Alberto Garcia Aug. 9, 2018, 12:14 p.m. UTC | #1
On Thu 09 Aug 2018 12:11:35 AM CEST, Leonid Bloch wrote:
> Sufficient L2 cache can noticeably improve the performance when using
> large images with frequent I/O. The memory overhead is not significant
> in most cases, as the cache size is only 1 MB for each 8 GB of virtual
> image size (with the default cluster size of 64 KB).
>
> Previously, the L2 cache was allocated without considering the image
> size, and an option existed to manually determine this size. Thus to
> achieve full coverage of the image by the L2 cache (i.e. use more than
> the default value of MAX(1 MB, 8 clusters)), a user needed to calculate
> the required size manually or using a script, and passs this value to
> the 'l2-cache-size' option.

s/passs/pass/

> diff --git a/block/qcow2.c b/block/qcow2.c
> index ec9e6238a0..98cb96aaca 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -777,29 +777,35 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
>                               uint64_t *refcount_cache_size, Error **errp)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    uint64_t combined_cache_size;
> +    uint64_t combined_cache_size, l2_cache_max_setting;
>      bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
> -    int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
> +    uint64_t min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;

Why do you need to change this data type here? min_refcount_cache is
guaranteed to fit in an int.

> +    uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
> +    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> +    *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
> +

You need to declare virtual_disk_size and max_l2_cache at the beginning.

>              } else {
> -                *refcount_cache_size =
> -                    MIN(combined_cache_size, min_refcount_cache);
> +                *refcount_cache_size = MIN(combined_cache_size,
> +                                           min_refcount_cache);

There are no functional changes, why do you need to change the
indentation here?

> -    } else {
> -        if (!l2_cache_size_set) {
> -            *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
> -                                 (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
> -                                 * s->cluster_size);
> -        }
> -        if (!refcount_cache_size_set) {
> -            *refcount_cache_size = min_refcount_cache;
> -        }
>      }

It's not obvious why you are removing the *refcount_cache_size
assignment here. I see that if you leave this out then the caller
qcow2_update_options_prepare() ensures that refcount_cache_size has the
minimum size, but that's not directly related to the rest of the changes
in this patch.

So either mention in explicitly in the commit message or remove those
lines in a separate patch.

> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> index 5bf2a8ad29..c7625cdeb3 100644
> --- a/docs/qcow2-cache.txt
> +++ b/docs/qcow2-cache.txt
> @@ -97,12 +97,14 @@ need:
>     l2_cache_size = disk_size_GB * 131072
>     refcount_cache_size = disk_size_GB * 32768
>  
> -QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
> -cache of 256KB (262144 bytes), so using the formulas we've just seen
> -we have
> +QEMU will use a default L2 cache sufficient to cover the entire virtual
> +size of an image, which with the default cluster size will result in 1 MB
> +of cache for every 8 GB of virtual image size:

This is not true. QEMU will use a default size of 32MB, which may or may
not cover the entire image.

> -   1048576 / 131072 = 8 GB of virtual disk covered by that cache
> -    262144 /  32768 = 8 GB
> +   65536 / 8 = 8192 = 8 GB / 1 MB

And those formulas still apply, but with the new values.

> + - The default L2 cache size will cover the entire virtual size of an
> +   image, but is capped at 32 MB (enough for image sizes of up to 256 GB
> +   with the default cluster size).

Again, this is misleading. A constant in this series is that "The L2
cache now covers the entire image", but that's not the case at all :-)

I would prefer if you would say "we increased the default cache size so
now we cover larger images" instead of "the default cache size will now
cover the entire image", because the latter is not true.

>   - If the L2 cache is big enough to hold all of the image's L2 tables
> -   (as explained in the "Choosing the right cache sizes" section
> -   earlier in this document) then none of this is necessary and you
> -   can omit the "l2-cache-entry-size" parameter altogether.
> +   (the default behavior) then none of this is necessary and you can
> +   omit the "l2-cache-entry-size" parameter altogether.

And once more. In the previous paragraph you say that the default is
enough for images <= 256GB, and in this one you say that it's enough to
hold all L2 tables.

The previous text was accurate, you don't need to change this paragraph.

Berto
Leonid Bloch Aug. 9, 2018, 2:04 p.m. UTC | #2
On 8/9/18 3:14 PM, Alberto Garcia wrote:
> On Thu 09 Aug 2018 12:11:35 AM CEST, Leonid Bloch wrote:
>> Sufficient L2 cache can noticeably improve the performance when using
>> large images with frequent I/O. The memory overhead is not significant
>> in most cases, as the cache size is only 1 MB for each 8 GB of virtual
>> image size (with the default cluster size of 64 KB).
>>
>> Previously, the L2 cache was allocated without considering the image
>> size, and an option existed to manually determine this size. Thus to
>> achieve full coverage of the image by the L2 cache (i.e. use more than
>> the default value of MAX(1 MB, 8 clusters)), a user needed to calculate
>> the required size manually or using a script, and passs this value to
>> the 'l2-cache-size' option.
> 
> s/passs/pass/

Thanks! fixed.

> 
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index ec9e6238a0..98cb96aaca 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -777,29 +777,35 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
>>                                uint64_t *refcount_cache_size, Error **errp)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>> -    uint64_t combined_cache_size;
>> +    uint64_t combined_cache_size, l2_cache_max_setting;
>>       bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
>> -    int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>> +    uint64_t min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
> 
> Why do you need to change this data type here? min_refcount_cache is
> guaranteed to fit in an int.

No necessity here, just it participates in arithmetics with other 
uint64_t's afterwards, so it might as well be uint64_t from the get-go.

> 
>> +    uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
>> +    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
>> +    *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
>> +
> 
> You need to declare virtual_disk_size and max_l2_cache at the beginning.

Sure, done, thanks.

> 
>>               } else {
>> -                *refcount_cache_size =
>> -                    MIN(combined_cache_size, min_refcount_cache);
>> +                *refcount_cache_size = MIN(combined_cache_size,
>> +                                           min_refcount_cache);
> 
> There are no functional changes, why do you need to change the
> indentation here?

It's in the "immediate area (few lines) of the lines [I'm] changing".

> 
>> -    } else {
>> -        if (!l2_cache_size_set) {
>> -            *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
>> -                                 (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
>> -                                 * s->cluster_size);
>> -        }
>> -        if (!refcount_cache_size_set) {
>> -            *refcount_cache_size = min_refcount_cache;
>> -        }
>>       }
> 
> It's not obvious why you are removing the *refcount_cache_size
> assignment here. I see that if you leave this out then the caller
> qcow2_update_options_prepare() ensures that refcount_cache_size has the
> minimum size, but that's not directly related to the rest of the changes
> in this patch.
> 
> So either mention in explicitly in the commit message or remove those
> lines in a separate patch.

OK, I'll mention it.

> 
>> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
>> index 5bf2a8ad29..c7625cdeb3 100644
>> --- a/docs/qcow2-cache.txt
>> +++ b/docs/qcow2-cache.txt
>> @@ -97,12 +97,14 @@ need:
>>      l2_cache_size = disk_size_GB * 131072
>>      refcount_cache_size = disk_size_GB * 32768
>>   
>> -QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
>> -cache of 256KB (262144 bytes), so using the formulas we've just seen
>> -we have
>> +QEMU will use a default L2 cache sufficient to cover the entire virtual
>> +size of an image, which with the default cluster size will result in 1 MB
>> +of cache for every 8 GB of virtual image size:
> 
> This is not true. QEMU will use a default size of 32MB, which may or may
> not cover the entire image.

But no, QEMU will not use a default size of 32MB. It will use a default 
size which is just enough to cover the image, unless the needed size is 
larger than 32 MB.
Anyway, this is a section which deals with explanations of the cache 
coverage, and not with the defaults, so I have removed the mention of 
the defaults here, as it is mentioned in the relevant section below.

> 
>> -   1048576 / 131072 = 8 GB of virtual disk covered by that cache
>> -    262144 /  32768 = 8 GB
>> +   65536 / 8 = 8192 = 8 GB / 1 MB
> 
> And those formulas still apply, but with the new values.

They apply for the coverage calculation, yes. Made a bit of 
clarification in v5.

> 
>> + - The default L2 cache size will cover the entire virtual size of an
>> +   image, but is capped at 32 MB (enough for image sizes of up to 256 GB
>> +   with the default cluster size).
> 
> Again, this is misleading. A constant in this series is that "The L2
> cache now covers the entire image", but that's not the case at all :-)
> 
> I would prefer if you would say "we increased the default cache size so
> now we cover larger images" instead of "the default cache size will now
> cover the entire image", because the latter is not true.

But it's not correct: we did not increase the default size, we made the 
default size fit the image size, and set a maximum. It's not the same, 
do you agree?
In any case, I have reworded this part in v5.

> 
>>    - If the L2 cache is big enough to hold all of the image's L2 tables
>> -   (as explained in the "Choosing the right cache sizes" section
>> -   earlier in this document) then none of this is necessary and you
>> -   can omit the "l2-cache-entry-size" parameter altogether.
>> +   (the default behavior) then none of this is necessary and you can
>> +   omit the "l2-cache-entry-size" parameter altogether.
> 
> And once more. In the previous paragraph you say that the default is
> enough for images <= 256GB, and in this one you say that it's enough to
> hold all L2 tables.

Reworded here as well.

Thanks for your review!
Leonid.

> 
> The previous text was accurate, you don't need to change this paragraph.
> 
> Berto
>
Alberto Garcia Aug. 9, 2018, 3:31 p.m. UTC | #3
On Thu 09 Aug 2018 04:04:23 PM CEST, Leonid Bloch wrote:
>>> -    int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>>> +    uint64_t min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>> 
>> Why do you need to change this data type here? min_refcount_cache is
>> guaranteed to fit in an int.
>
> No necessity here, just it participates in arithmetics with other
> uint64_t's afterwards, so it might as well be uint64_t from the
> get-go.

The compiler already does that when needed, so it's not so important
(and it adds noise to the patch).

>>> -                *refcount_cache_size =
>>> -                    MIN(combined_cache_size, min_refcount_cache);
>>> +                *refcount_cache_size = MIN(combined_cache_size,
>>> +                                           min_refcount_cache);
>> 
>> There are no functional changes, why do you need to change the
>> indentation here?
>
> It's in the "immediate area (few lines) of the lines [I'm] changing".

But there's no need to change those lines unless there's an obvious
mistake. In this case it's just a matter of style so they just add noise
to the patch.

>>> +QEMU will use a default L2 cache sufficient to cover the entire virtual
>>> +size of an image, which with the default cluster size will result in 1 MB
>>> +of cache for every 8 GB of virtual image size:
>> 
>> This is not true. QEMU will use a default size of 32MB, which may or
>> may not cover the entire image.
>
> But no, QEMU will not use a default size of 32MB. It will use a
> default size which is just enough to cover the image, unless the
> needed size is larger than 32 MB.

Now: QEMU will use a default L2 cache of up to 32MB, which may or may
not be enough to cover the entire image.

Previously: QEMU will use a default L2 cache of 1MB, which may or may
not be enough to cover the entire image.

>> I would prefer if you would say "we increased the default cache size
>> so now we cover larger images" instead of "the default cache size
>> will now cover the entire image", because the latter is not true.
>
> But it's not correct: we did not increase the default size, we made
> the default size fit the image size, and set a maximum. It's not the
> same, do you agree?

I don't think we made the default size fit the image size, because if
you override the default size QEMU will still adjust it if it's too
large. What we did is guarantee that QEMU will use *up to* l2-cache-size
bytes, regardless of whether l2-cache-size is set by the user or is the
default value. Plus, we increased that default value to 32MB.

From the end user's point of view, who had a VM with images of 8GB,
200GB and 2TB, the most visible result is that the L2 cache is now
larger, enough for the first two images but still not enough for the
third. That's the big change, both in terms of performance and memory
usage, and it's easy to measure.

The other change (the cache size fits the image size) is not immediately
visible, and certainly not with a 32MB cache.

Let's make an experiment:

   - Take QEMU stable or master (without these patches)
   - Create a 16GB qcow2 image and fill it completely with data
   - Run QEMU and attach that image with l2-cache-size=2M (2 MB L2 cache)
   - Read the complete image to make sure that all L2 tables are cached
   - Measure the amount of memory that QEMU is using, e.g. with smem
     (you can do that before and after caching the L2 tables)

Repeat the same experiment with l2-cache-size=2G (2 GB L2 cache). Do you
see any difference?

Berto
Leonid Bloch Aug. 9, 2018, 4:46 p.m. UTC | #4
On 8/9/18 6:31 PM, Alberto Garcia wrote:
> On Thu 09 Aug 2018 04:04:23 PM CEST, Leonid Bloch wrote:
>>>> -    int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>>>> +    uint64_t min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>>>
>>> Why do you need to change this data type here? min_refcount_cache is
>>> guaranteed to fit in an int.
>>
>> No necessity here, just it participates in arithmetics with other
>> uint64_t's afterwards, so it might as well be uint64_t from the
>> get-go.
> 
> The compiler already does that when needed, so it's not so important
> (and it adds noise to the patch).

I didn't say it's important or increases the performance in any way. It 
just looks nicer, more logical, and too small of a change to require a 
separate patch, even a trivial one. Since it's just next to the lines 
I'm modifying anyway, I made this change because it looks nicer and more 
consistent.

> 
>>>> -                *refcount_cache_size =
>>>> -                    MIN(combined_cache_size, min_refcount_cache);
>>>> +                *refcount_cache_size = MIN(combined_cache_size,
>>>> +                                           min_refcount_cache);
>>>
>>> There are no functional changes, why do you need to change the
>>> indentation here?
>>
>> It's in the "immediate area (few lines) of the lines [I'm] changing".
> 
> But there's no need to change those lines unless there's an obvious
> mistake. In this case it's just a matter of style so they just add noise
> to the patch.

Again, it just looks nicer, more readable, compliant to the generally 
accepted style, and right next to the functional changes. It's a style 
improvement which is in the immediate vicinity of the functional 
improvements. I made another one, you must have seen it already, in v5.

Look, it just looks better. It's possible to make another patch for 
these cosmetic changes, but is it worth it when they are right next to 
the functional changes? It's a bit of noise in the patch, versus noise 
in the Git history.

> 
>>>> +QEMU will use a default L2 cache sufficient to cover the entire virtual
>>>> +size of an image, which with the default cluster size will result in 1 MB
>>>> +of cache for every 8 GB of virtual image size:
>>>
>>> This is not true. QEMU will use a default size of 32MB, which may or
>>> may not cover the entire image.
>>
>> But no, QEMU will not use a default size of 32MB. It will use a
>> default size which is just enough to cover the image, unless the
>> needed size is larger than 32 MB.
> 
> Now: QEMU will use a default L2 cache of up to 32MB, which may or may
> not be enough to cover the entire image.
> 
> Previously: QEMU will use a default L2 cache of 1MB, which may or may
> not be enough to cover the entire image.

Now: QEMU will use just enough to fit the image, unless it's more than 
32 MB.

Then: QEMU will use MAX(1 MB, 8 * cluster_size).

Anyway, this place should not mention the default cache sizes, and I 
have reworded it in v5 (maybe let's fix this in patch 1/5?). This 
section is all about explaining the numbers needed for the cache. The 
defaults are mentioned below.

> 
>>> I would prefer if you would say "we increased the default cache size
>>> so now we cover larger images" instead of "the default cache size
>>> will now cover the entire image", because the latter is not true.
>>
>> But it's not correct: we did not increase the default size, we made
>> the default size fit the image size, and set a maximum. It's not the
>> same, do you agree?
> 
> I don't think we made the default size fit the image size, because if
> you override the default size QEMU will still adjust it if it's too
> large.

Now there is no such thing as the default size, there is the "default 
maximum size". It fits the image by default now, unless it needs to be 
larger than the max.

> What we did is guarantee that QEMU will use *up to* l2-cache-size
> bytes, regardless of whether l2-cache-size is set by the user or is the
> default value. Plus, we increased that default value to 32MB.

Yes! :)
But the meaning of "default" now and before is different. Before it was 
the "default size", and now - "default maximum size".

> 
>  From the end user's point of view, who had a VM with images of 8GB,
> 200GB and 2TB, the most visible result is that the L2 cache is now
> larger, enough for the first two images but still not enough for the
> third. That's the big change, both in terms of performance and memory
> usage, and it's easy to measure.
> 
> The other change (the cache size fits the image size) is not immediately
> visible, and certainly not with a 32MB cache.

Isn't it the same change? :)

> 
> Let's make an experiment:
> 
>     - Take QEMU stable or master (without these patches)
>     - Create a 16GB qcow2 image and fill it completely with data
>     - Run QEMU and attach that image with l2-cache-size=2M (2 MB L2 cache)
>     - Read the complete image to make sure that all L2 tables are cached
>     - Measure the amount of memory that QEMU is using, e.g. with smem
>       (you can do that before and after caching the L2 tables)
> 
> Repeat the same experiment with l2-cache-size=2G (2 GB L2 cache). Do you
> see any difference?

Actually, I did this experiment before, after Kevin's suggestions. I 
know what you want to say: it's not actually used, but allocated in the 
virtual memory. But still, with this patch it will be just enough 
allocation.

Look, we agree on the functional changes, right? It's just the cosmetic 
changes and the documentation that remain unsettled. :)

Leonid.

> 
> Berto
>
Eric Blake Aug. 9, 2018, 5:37 p.m. UTC | #5
On 08/09/2018 11:46 AM, Leonid Bloch wrote:
>>>> There are no functional changes, why do you need to change the
>>>> indentation here?
>>>
>>> It's in the "immediate area (few lines) of the lines [I'm] changing".
>>
>> But there's no need to change those lines unless there's an obvious
>> mistake. In this case it's just a matter of style so they just add noise
>> to the patch.
> 
> Again, it just looks nicer, more readable, compliant to the generally 
> accepted style, and right next to the functional changes. It's a style 
> improvement which is in the immediate vicinity of the functional 
> improvements. I made another one, you must have seen it already, in v5.
> 
> Look, it just looks better. It's possible to make another patch for 
> these cosmetic changes, but is it worth it when they are right next to 
> the functional changes? It's a bit of noise in the patch, versus noise 
> in the Git history.

Patch splitting is an art form. But it IS easier to review two patches 
(one that fixes style but has no semantic change, and one that does 
semantic change in as few lines as possible) than to review one (that 
mixes both steps at once).  The more things you do in a single patch, 
the more likely you were to be better off by having split it into 
independent patches.

A longer git history is not a problem. Our bottleneck is reviewer time, 
and everything you can do to make life easier for reviewers is a net win 
in overall time spent on the project. And splitting two distinct changes 
  IS worthwhile, especially when a reviewer has requested that split.

Along those lines, I'll also comment that I've seen Berto request that 
you consider splitting even the functional part of this patch into two 
pieces - one raising the default value, and the other fixing things to 
use only what is needed rather than the full specified length when the 
specified/default length is larger than necessary.  It's not a hard 
split, and while you've continued to argue against the split, I tend to 
agree that doing the two parts separately makes the series a bit easier 
to backport to other stable branches (for example, if a distro wants to 
change to yet a different default value, but still use your patch that 
changes to not overallocate when the specified/default is larger than 
needed).
Leonid Bloch Aug. 9, 2018, 9:51 p.m. UTC | #6
On 8/9/18 8:37 PM, Eric Blake wrote:
> On 08/09/2018 11:46 AM, Leonid Bloch wrote:
>>>>> There are no functional changes, why do you need to change the
>>>>> indentation here?
>>>>
>>>> It's in the "immediate area (few lines) of the lines [I'm] changing".
>>>
>>> But there's no need to change those lines unless there's an obvious
>>> mistake. In this case it's just a matter of style so they just add noise
>>> to the patch.
>>
>> Again, it just looks nicer, more readable, compliant to the generally 
>> accepted style, and right next to the functional changes. It's a style 
>> improvement which is in the immediate vicinity of the functional 
>> improvements. I made another one, you must have seen it already, in v5.
>>
>> Look, it just looks better. It's possible to make another patch for 
>> these cosmetic changes, but is it worth it when they are right next to 
>> the functional changes? It's a bit of noise in the patch, versus noise 
>> in the Git history.
> 
> Patch splitting is an art form. But it IS easier to review two patches 
> (one that fixes style but has no semantic change, and one that does 
> semantic change in as few lines as possible) than to review one (that 
> mixes both steps at once).  The more things you do in a single patch, 
> the more likely you were to be better off by having split it into 
> independent patches.
> 
> A longer git history is not a problem. Our bottleneck is reviewer time, 
> and everything you can do to make life easier for reviewers is a net win 
> in overall time spent on the project. And splitting two distinct changes 
>   IS worthwhile, especially when a reviewer has requested that split.
> 
> Along those lines, I'll also comment that I've seen Berto request that 
> you consider splitting even the functional part of this patch into two 
> pieces - one raising the default value, and the other fixing things to 
> use only what is needed rather than the full specified length when the 
> specified/default length is larger than necessary.  It's not a hard 
> split, and while you've continued to argue against the split, I tend to 
> agree that doing the two parts separately makes the series a bit easier 
> to backport to other stable branches (for example, if a distro wants to 
> change to yet a different default value, but still use your patch that 
> changes to not overallocate when the specified/default is larger than 
> needed).
> 

Hi Eric,

I agree with your arguments here, splitting the cosmetic fixes being the 
reviewer's request, and splitting the size setting for the reason that 
you have mentioned above.

Thanks! Sending v6.

Leonid.
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index ec9e6238a0..98cb96aaca 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -777,29 +777,35 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
                              uint64_t *refcount_cache_size, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t combined_cache_size;
+    uint64_t combined_cache_size, l2_cache_max_setting;
     bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
-    int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
+    uint64_t min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
     l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
     refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE);
 
     combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
-    *l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
+    l2_cache_max_setting = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE,
+                                             DEFAULT_L2_CACHE_MAX_SIZE);
     *refcount_cache_size = qemu_opt_get_size(opts,
                                              QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
 
     *l2_cache_entry_size = qemu_opt_get_size(
         opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size);
 
+    uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
+    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+    *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
+
     if (combined_cache_size_set) {
         if (l2_cache_size_set && refcount_cache_size_set) {
             error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
                        " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
                        "at the same time");
             return;
-        } else if (*l2_cache_size > combined_cache_size) {
+        } else if (l2_cache_size_set &&
+                   (l2_cache_max_setting > combined_cache_size)) {
             error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
                        QCOW2_OPT_CACHE_SIZE);
             return;
@@ -814,29 +820,16 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
         } else if (refcount_cache_size_set) {
             *l2_cache_size = combined_cache_size - *refcount_cache_size;
         } else {
-            uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-            uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
-
             /* Assign as much memory as possible to the L2 cache, and
              * use the remainder for the refcount cache */
-            if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
-                *l2_cache_size = max_l2_cache;
+            if (combined_cache_size >= *l2_cache_size + min_refcount_cache) {
                 *refcount_cache_size = combined_cache_size - *l2_cache_size;
             } else {
-                *refcount_cache_size =
-                    MIN(combined_cache_size, min_refcount_cache);
+                *refcount_cache_size = MIN(combined_cache_size,
+                                           min_refcount_cache);
                 *l2_cache_size = combined_cache_size - *refcount_cache_size;
             }
         }
-    } else {
-        if (!l2_cache_size_set) {
-            *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
-                                 (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
-                                 * s->cluster_size);
-        }
-        if (!refcount_cache_size_set) {
-            *refcount_cache_size = min_refcount_cache;
-        }
     }
 
     if (*l2_cache_entry_size < (1 << MIN_CLUSTER_BITS) ||
diff --git a/block/qcow2.h b/block/qcow2.h
index 81b844e936..d77a31d932 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -73,9 +73,7 @@ 
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
 
-/* Whichever is more */
-#define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
-#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
+#define DEFAULT_L2_CACHE_MAX_SIZE 0x2000000U /* bytes */
 
 #define DEFAULT_CLUSTER_SIZE 65536
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 5bf2a8ad29..c7625cdeb3 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -97,12 +97,14 @@  need:
    l2_cache_size = disk_size_GB * 131072
    refcount_cache_size = disk_size_GB * 32768
 
-QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
-cache of 256KB (262144 bytes), so using the formulas we've just seen
-we have
+QEMU will use a default L2 cache sufficient to cover the entire virtual
+size of an image, which with the default cluster size will result in 1 MB
+of cache for every 8 GB of virtual image size:
 
-   1048576 / 131072 = 8 GB of virtual disk covered by that cache
-    262144 /  32768 = 8 GB
+   65536 / 8 = 8192 = 8 GB / 1 MB
+
+A default refcount cache is 4 times the cluster size, which defaults to
+256 KB (262144 bytes).
 
 
 How to configure the cache sizes
@@ -121,8 +123,11 @@  There are a few things that need to be taken into account:
  - Both caches must have a size that is a multiple of the cluster size
    (or the cache entry size: see "Using smaller cache sizes" below).
 
- - The default L2 cache size is 8 clusters or 1MB (whichever is more),
-   and the minimum is 2 clusters (or 2 cache entries, see below).
+ - The default L2 cache size will cover the entire virtual size of an
+   image, but is capped at 32 MB (enough for image sizes of up to 256 GB
+   with the default cluster size). This maximum value can be reduced or
+   enlarged using the "l2-cache-size" option. The minimum is 2 clusters
+   (or 2 cache entries, see below).
 
  - The default (and minimum) refcount cache size is 4 clusters.
 
@@ -180,9 +185,8 @@  Some things to take into account:
    always uses the cluster size as the entry size.
 
  - If the L2 cache is big enough to hold all of the image's L2 tables
-   (as explained in the "Choosing the right cache sizes" section
-   earlier in this document) then none of this is necessary and you
-   can omit the "l2-cache-entry-size" parameter altogether.
+   (the default behavior) then none of this is necessary and you can
+   omit the "l2-cache-entry-size" parameter altogether.
 
 
 Reducing the memory usage
diff --git a/qemu-options.hx b/qemu-options.hx
index f6804758d3..d6e15b2f06 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -756,9 +756,9 @@  The maximum total size of the L2 table and refcount block caches in bytes
 
 @item l2-cache-size
 The maximum size of the L2 table cache in bytes
-(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever
-is larger; otherwise, as large as possible or needed within the cache-size,
-while permitting the requested or the minimal refcount cache size)
+(default: if cache-size is not defined - 32M; otherwise, as large as possible
+or needed within the cache-size, while permitting the requested or the minimal
+refcount cache size)
 
 @item refcount-cache-size
 The maximum size of the refcount block cache in bytes
diff --git a/tests/qemu-iotests/137 b/tests/qemu-iotests/137
index 87965625d8..e3fb078588 100755
--- a/tests/qemu-iotests/137
+++ b/tests/qemu-iotests/137
@@ -109,7 +109,6 @@  $QEMU_IO \
     -c "reopen -o cache-size=1M,l2-cache-size=64k,refcount-cache-size=64k" \
     -c "reopen -o cache-size=1M,l2-cache-size=2M" \
     -c "reopen -o cache-size=1M,refcount-cache-size=2M" \
-    -c "reopen -o l2-cache-size=256T" \
     -c "reopen -o l2-cache-entry-size=33k" \
     -c "reopen -o l2-cache-entry-size=128k" \
     -c "reopen -o refcount-cache-size=256T" \
diff --git a/tests/qemu-iotests/137.out b/tests/qemu-iotests/137.out
index 6a2ffc71fd..70f245ae7a 100644
--- a/tests/qemu-iotests/137.out
+++ b/tests/qemu-iotests/137.out
@@ -19,7 +19,6 @@  Parameter 'lazy-refcounts' expects 'on' or 'off'
 cache-size, l2-cache-size and refcount-cache-size may not be set at the same time
 l2-cache-size may not exceed cache-size
 refcount-cache-size may not exceed cache-size
-L2 cache size too big
 L2 cache entry size must be a power of two between 512 and the cluster size (65536)
 L2 cache entry size must be a power of two between 512 and the cluster size (65536)
 Refcount cache size too big