Message ID | 20180810062647.23211-7-lbloch@janustech.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Take the image size into account when allocating the L2 cache | expand |
On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: > The upper limit on the L2 cache size is increased from 1 MB to 32 MB. > This is done in order to allow default full coverage with the L2 cache > for images of up to 256 GB in size (was 8 GB). Note, that only the > needed amount to cover the full image is allocated. The value which is > changed here is just the upper limit on the L2 cache size, beyond which > it will not grow, even if the size of the image will require it to. > > Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> > -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) > +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) The patch looks perfect to me now and I'm fine with this change, but this is quite an increase from the previous default value. If anyone thinks that this is too aggressive (or too little :)) I'm all ears. Berto
On 2018-08-10 14:00, Alberto Garcia wrote: > On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: >> The upper limit on the L2 cache size is increased from 1 MB to 32 MB. >> This is done in order to allow default full coverage with the L2 cache >> for images of up to 256 GB in size (was 8 GB). Note, that only the >> needed amount to cover the full image is allocated. The value which is >> changed here is just the upper limit on the L2 cache size, beyond which >> it will not grow, even if the size of the image will require it to. >> >> Signed-off-by: Leonid Bloch <lbloch@janustech.com> > > Reviewed-by: Alberto Garcia <berto@igalia.com> > >> -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) >> +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) > > The patch looks perfect to me now and I'm fine with this change, but > this is quite an increase from the previous default value. If anyone > thinks that this is too aggressive (or too little :)) I'm all ears. This is just noise from the sidelines (so nothing too serious), but anyway, I don't like it very much. My first point is that the old limit doesn't mean you can only use 8 GB qcow2 images. You can use more, you just can't access more than 8 GB randomly. I know I'm naive, but I think that the number of use cases where you need random IOPS spread out over more than 8 GB of an image are limited. My second point is that qemu still allocated 128 MB of RAM by default. Using 1/4th of that for every qcow2 image you attach to the VM seems a bit much. Now it gets a bit complicated. This series makes cache-clean-interval default to 10 minutes, so it shouldn't be an issue in practice. But one thing to note is that this is a Linux-specific feature, so on every other system, this really means 32 MB per image. (Also, 10 minutes means that whenever I boot up my VM with a couple of disks with random accesses all over the images during boot, I might end up using 32 MB per image again (for 10 min), even though I don't really need that performance.) Now if we really rely on that cache-clean-interval, why not make it always cover the whole image by default? I don't really see why we should now say "256 GB seems reasonable, and 32 MB doesn't sound like too much, let's go there". (Well, OK, I do see how you end up using 32 MB as basically a safety margin, where you'd say that anything above it is just unreasonable.) Do we update the limit in a couple of years again because people have more RAM and larger disks then? (Maybe we do?) My personal opinion is this: Most users should be fine with 8 GB of randomly accessible image space (this may be wrong). Whenever a user does have an application that uses more than 8 GB, they are probably in an area where they want to do some performance tuning anyway. Requiring them to set l2-cache-full in that case seems reasonable to me. Pushing the default to 256 GB to me looks a bit like just letting them run into the problem later. It doesn't solve the issue that you need to do some performance tuning if you have a bit of a special use case (but maybe I'm wrong and accessing more than 8 GB randomly is what everybody does with their VMs). (Maybe it's even a good thing to limit it to a smaller number so users run into the issue sooner than later...) OTOH, this change means that everyone on a non-Linux system will have to use 32 MB of their RAM per qcow2 image, and everyone on a Linux system will potentially use it e.g. during boot when you do access a lot randomly (even though the performance usually is not of utmost importance then (important, but not extremely so)). But then again, this will probably only affect a single disk (the one with the OS on it), so it won't be too bad. So my stance is: (1) Is it really worth pushing the default to 256 GB if you probably have to do a bit of performance tuning anyway when you get past 8 GB random IOPS? I think it's reasonable to ask users to use l2-cache-full or adjust the cache to their needs. (2) For non-Linux systems, this seems to really mean 32 MB of RAM per qcow2 image. That's 1/4th of default VM RAM. Is that worth it? (3) For Linux, I don't like it much either, but that's because I'm stupid. The fact that if you don't need this much random I/O only your boot disk may cause a RAM usage spike, and even then it's going to go down after 10 minutes, is probably enough to justify this change. I suppose my moaning would subside if we only increased the default on systems that actually support cache-clean-interval...? Max PS: I also don't quite like how you got to the default of 10 minutes of the cache-clean-interval. You can't justify using 32 MB as the default cache size by virtue of "We have a cache-clean-interval now", and then justify a CCI of 10 min by "It's just for VMs which sit idle". No. If you rely on CCI to be there to make the cache size reasonable by default for whatever the user is doing with their images, you have to consider that fact when choosing a CCI. Ideally we'd probably want a soft and a hard cache limit, but I don't know... (Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard cache limit of 32 MB with a CCI of 1 min by default. So whenever your cache uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, your CCI is 10 min.)
On August 13, 2018 4:39:35 AM EEST, Max Reitz <mreitz@redhat.com> wrote: >On 2018-08-10 14:00, Alberto Garcia wrote: >> On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: >>> The upper limit on the L2 cache size is increased from 1 MB to 32 >MB. >>> This is done in order to allow default full coverage with the L2 >cache >>> for images of up to 256 GB in size (was 8 GB). Note, that only the >>> needed amount to cover the full image is allocated. The value which >is >>> changed here is just the upper limit on the L2 cache size, beyond >which >>> it will not grow, even if the size of the image will require it to. >>> >>> Signed-off-by: Leonid Bloch <lbloch@janustech.com> >> >> Reviewed-by: Alberto Garcia <berto@igalia.com> >> >>> -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) >>> +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) >> >> The patch looks perfect to me now and I'm fine with this change, but >> this is quite an increase from the previous default value. If anyone >> thinks that this is too aggressive (or too little :)) I'm all ears. > >This is just noise from the sidelines (so nothing too serious), but >anyway, I don't like it very much. > >My first point is that the old limit doesn't mean you can only use 8 GB >qcow2 images. You can use more, you just can't access more than 8 GB >randomly. I know I'm naive, but I think that the number of use cases >where you need random IOPS spread out over more than 8 GB of an image >are limited. > >My second point is that qemu still allocated 128 MB of RAM by default. >Using 1/4th of that for every qcow2 image you attach to the VM seems a >bit much. > >Now it gets a bit complicated. This series makes cache-clean-interval >default to 10 minutes, so it shouldn't be an issue in practice. But >one >thing to note is that this is a Linux-specific feature, so on every >other system, this really means 32 MB per image. (Also, 10 minutes >means that whenever I boot up my VM with a couple of disks with random >accesses all over the images during boot, I might end up using 32 MB >per >image again (for 10 min), even though I don't really need that >performance.) > >Now if we really rely on that cache-clean-interval, why not make it >always cover the whole image by default? I don't really see why we >should now say "256 GB seems reasonable, and 32 MB doesn't sound like >too much, let's go there". (Well, OK, I do see how you end up using 32 >MB as basically a safety margin, where you'd say that anything above it >is just unreasonable.) > >Do we update the limit in a couple of years again because people have >more RAM and larger disks then? (Maybe we do?) > >My personal opinion is this: Most users should be fine with 8 GB of >randomly accessible image space (this may be wrong). Whenever a user >does have an application that uses more than 8 GB, they are probably in >an area where they want to do some performance tuning anyway. >Requiring >them to set l2-cache-full in that case seems reasonable to me. Pushing >the default to 256 GB to me looks a bit like just letting them run into >the problem later. It doesn't solve the issue that you need to do some >performance tuning if you have a bit of a special use case (but maybe >I'm wrong and accessing more than 8 GB randomly is what everybody does >with their VMs). > >(Maybe it's even a good thing to limit it to a smaller number so users >run into the issue sooner than later...) > >OTOH, this change means that everyone on a non-Linux system will have >to >use 32 MB of their RAM per qcow2 image, and everyone on a Linux system >will potentially use it e.g. during boot when you do access a lot >randomly (even though the performance usually is not of utmost >importance then (important, but not extremely so)). But then again, >this will probably only affect a single disk (the one with the OS on >it), so it won't be too bad. > >So my stance is: > >(1) Is it really worth pushing the default to 256 GB if you probably >have to do a bit of performance tuning anyway when you get past 8 GB >random IOPS? I think it's reasonable to ask users to use l2-cache-full >or adjust the cache to their needs. > >(2) For non-Linux systems, this seems to really mean 32 MB of RAM per >qcow2 image. That's 1/4th of default VM RAM. Is that worth it? > >(3) For Linux, I don't like it much either, but that's because I'm >stupid. The fact that if you don't need this much random I/O only your >boot disk may cause a RAM usage spike, and even then it's going to go >down after 10 minutes, is probably enough to justify this change. > > >I suppose my moaning would subside if we only increased the default on >systems that actually support cache-clean-interval...? > >Max > > >PS: I also don't quite like how you got to the default of 10 minutes of >the cache-clean-interval. You can't justify using 32 MB as the default >cache size by virtue of "We have a cache-clean-interval now", and then >justify a CCI of 10 min by "It's just for VMs which sit idle". > >No. If you rely on CCI to be there to make the cache size reasonable >by >default for whatever the user is doing with their images, you have to >consider that fact when choosing a CCI. > >Ideally we'd probably want a soft and a hard cache limit, but I don't >know... > >(Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard >cache >limit of 32 MB with a CCI of 1 min by default. So whenever your cache >uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, >your CCI is 10 min.) Max, thanks for your insight. Indeed some good points. Considering this, I'm thinking to set the limit to 16 MB, and the CCI to 5 min. What do you think? Modern Windows installations should gain performance from being able to random I/O to >8 GB chunks, and data processing tasks where each data set is 8+ GB for sure do (did benchmarks). And the maximum is only ever used if (a) the image is large enough and (b) it is indeed used. While taking 256 GB images as the "limit" can be considered an overshoot, 128 GB is quite reasonable, I think. Your idea with "soft" and "hard" limits is great! I'm tempted to implement this. Say 4 MB with 10 min., and 16 MB with 5 min? Leonid.
Am 13.08.2018 um 03:39 hat Max Reitz geschrieben: > On 2018-08-10 14:00, Alberto Garcia wrote: > > On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: > >> The upper limit on the L2 cache size is increased from 1 MB to 32 MB. > >> This is done in order to allow default full coverage with the L2 cache > >> for images of up to 256 GB in size (was 8 GB). Note, that only the > >> needed amount to cover the full image is allocated. The value which is > >> changed here is just the upper limit on the L2 cache size, beyond which > >> it will not grow, even if the size of the image will require it to. > >> > >> Signed-off-by: Leonid Bloch <lbloch@janustech.com> > > > > Reviewed-by: Alberto Garcia <berto@igalia.com> > > > >> -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) > >> +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) > > > > The patch looks perfect to me now and I'm fine with this change, but > > this is quite an increase from the previous default value. If anyone > > thinks that this is too aggressive (or too little :)) I'm all ears. > > This is just noise from the sidelines (so nothing too serious), but > anyway, I don't like it very much. > > My first point is that the old limit doesn't mean you can only use 8 GB > qcow2 images. You can use more, you just can't access more than 8 GB > randomly. I know I'm naive, but I think that the number of use cases > where you need random IOPS spread out over more than 8 GB of an image > are limited. I think I can see use cases for databases that are spead across more than 8 GB. But you're right, it's a tradeoff and users can always increase the cache size in theory if they need more performance. But then, they can also decrease the cache size if they need more memory. Let's be honest: While qcow2 does have some room for functional improvements, it mostly has an image problem, which comes from the fact that there are cases where performance drops drastically. Benchmarks are a very important use case and they do random I/O over more than 8 GB. Not properly supporting such cases out-of-the-box is the reason why people are requesting that we add features to raw images even if they require on-disk metadata. If we want to avoid this kind of nonsense, we need to improve the out-of-the-box experience with qcow2. > My second point is that qemu still allocated 128 MB of RAM by default. > Using 1/4th of that for every qcow2 image you attach to the VM seems a > bit much. Well, that's more because 128 MB is ridiculously low today and you won't be able to run any recent guest without overriding the default. > Now it gets a bit complicated. This series makes cache-clean-interval > default to 10 minutes, so it shouldn't be an issue in practice. But one > thing to note is that this is a Linux-specific feature, so on every > other system, this really means 32 MB per image. That's a bit inaccurate in this generality: On non-Linux, it means 32 MB per fully accessed image with a size >= 256 GB. > (Also, 10 minutes means that whenever I boot up my VM with a couple of > disks with random accesses all over the images during boot, I might > end up using 32 MB per image again (for 10 min), even though I don't > really need that performance.) If your system files are fragmented in a way that a boot will access every 512 MB chunk in a 256 GB disk, you should seriously think about fixing that... This is a pathological case that shouldn't define our defaults. Random I/O over 256 GB is really a pathological case, too, but people are likely to actually test it. They aren't going to systematically test a horribly fragmented system that wouldn't happen in reality. > Now if we really rely on that cache-clean-interval, why not make it > always cover the whole image by default? I don't really see why we > should now say "256 GB seems reasonable, and 32 MB doesn't sound like > too much, let's go there". (Well, OK, I do see how you end up using 32 > MB as basically a safety margin, where you'd say that anything above it > is just unreasonable.) > > Do we update the limit in a couple of years again because people have > more RAM and larger disks then? (Maybe we do?) Possibly. I see those defaults as values that we can adjust to reality whenever we think the old values don't reflect the important cases well enough any more. > My personal opinion is this: Most users should be fine with 8 GB of > randomly accessible image space (this may be wrong). Whenever a user > does have an application that uses more than 8 GB, they are probably in > an area where they want to do some performance tuning anyway. Requiring > them to set l2-cache-full in that case seems reasonable to me. In principle, I'd agree. I'd even say that management tools should always explicitly set those options instead of relying on our defaults. But management tools have been ignoring these options for a long time and keep doing so. And honestly, if you can't spend a few megabytes for the caches, it's just as reasonable that you should set l2-cache to a lower value. You'll need some more tweaking anyway to reduce the memory footprint. > Pushing the default to 256 GB to me looks a bit like just letting them > run into the problem later. It doesn't solve the issue that you need > to do some performance tuning if you have a bit of a special use case > (but maybe I'm wrong and accessing more than 8 GB randomly is what > everybody does with their VMs). > > (Maybe it's even a good thing to limit it to a smaller number so users > run into the issue sooner than later...) Definitely not when their management tool doesn't give them the option of changing the value. Being slow makes qcow2 look really bad. In contrast, I don't think I've ever heard anyone complain about memory usage of qcow2. Our choice of a default should reflect that, especially considering that we only use the memory on demand. If your image is only 32 GB, you'll never use more than 4 MB of cache. And if your image is huge, but only access part of it, we also won't use the full 32 MB. > OTOH, this change means that everyone on a non-Linux system will have to > use 32 MB of their RAM per qcow2 image, and everyone on a Linux system > will potentially use it e.g. during boot when you do access a lot > randomly (even though the performance usually is not of utmost > importance then (important, but not extremely so)). But then again, > this will probably only affect a single disk (the one with the OS on > it), so it won't be too bad. > > So my stance is: > > (1) Is it really worth pushing the default to 256 GB if you probably > have to do a bit of performance tuning anyway when you get past 8 GB > random IOPS? I think it's reasonable to ask users to use l2-cache-full > or adjust the cache to their needs. > > (2) For non-Linux systems, this seems to really mean 32 MB of RAM per > qcow2 image. That's 1/4th of default VM RAM. Is that worth it? > > (3) For Linux, I don't like it much either, but that's because I'm > stupid. The fact that if you don't need this much random I/O only your > boot disk may cause a RAM usage spike, and even then it's going to go > down after 10 minutes, is probably enough to justify this change. > > > I suppose my moaning would subside if we only increased the default on > systems that actually support cache-clean-interval...? > > Max > > > PS: I also don't quite like how you got to the default of 10 minutes of > the cache-clean-interval. You can't justify using 32 MB as the default > cache size by virtue of "We have a cache-clean-interval now", and then > justify a CCI of 10 min by "It's just for VMs which sit idle". > > No. If you rely on CCI to be there to make the cache size reasonable by > default for whatever the user is doing with their images, you have to > consider that fact when choosing a CCI. > > Ideally we'd probably want a soft and a hard cache limit, but I don't > know... > > (Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard cache > limit of 32 MB with a CCI of 1 min by default. So whenever your cache > uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, > your CCI is 10 min.) I've actually thought of something like this before, too. Maybe we should do that. But that can be done on top of this series. Kevin
On 2018-08-13 13:23, Kevin Wolf wrote: > Am 13.08.2018 um 03:39 hat Max Reitz geschrieben: >> On 2018-08-10 14:00, Alberto Garcia wrote: >>> On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: >>>> The upper limit on the L2 cache size is increased from 1 MB to 32 MB. >>>> This is done in order to allow default full coverage with the L2 cache >>>> for images of up to 256 GB in size (was 8 GB). Note, that only the >>>> needed amount to cover the full image is allocated. The value which is >>>> changed here is just the upper limit on the L2 cache size, beyond which >>>> it will not grow, even if the size of the image will require it to. >>>> >>>> Signed-off-by: Leonid Bloch <lbloch@janustech.com> >>> >>> Reviewed-by: Alberto Garcia <berto@igalia.com> >>> >>>> -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) >>>> +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) >>> >>> The patch looks perfect to me now and I'm fine with this change, but >>> this is quite an increase from the previous default value. If anyone >>> thinks that this is too aggressive (or too little :)) I'm all ears. >> >> This is just noise from the sidelines (so nothing too serious), but >> anyway, I don't like it very much. >> >> My first point is that the old limit doesn't mean you can only use 8 GB >> qcow2 images. You can use more, you just can't access more than 8 GB >> randomly. I know I'm naive, but I think that the number of use cases >> where you need random IOPS spread out over more than 8 GB of an image >> are limited. > > I think I can see use cases for databases that are spead across more > than 8 GB. Sure, there are use cases. But that's not quite the general case, that was my point. > But you're right, it's a tradeoff and users can always > increase the cache size in theory if they need more performance. But > then, they can also decrease the cache size if they need more memory. True. But the issue here is: When your disk performance drops, you are likely to look into what causes your disk to be slow. Maybe you're lazy and switch to raw. Maybe you aren't and discover that the cache may be an issue, so you adjust those options to your needs. When your RAM runs low, at least I would never think of some disk image cache, to be honest. So I probably would either not use qemu or increase my swap size. > Let's be honest: While qcow2 does have some room for functional > improvements, it mostly has an image problem, which comes from the fact > that there are cases where performance drops drastically. Benchmarks are > a very important use case and they do random I/O over more than 8 GB. As long as it's our benchmarks, setting the right options is easy. O:-) > Not properly supporting such cases out-of-the-box is the reason why > people are requesting that we add features to raw images even if they > require on-disk metadata. If we want to avoid this kind of nonsense, we > need to improve the out-of-the-box experience with qcow2. Reasonable indeed. >> My second point is that qemu still allocated 128 MB of RAM by default. >> Using 1/4th of that for every qcow2 image you attach to the VM seems a >> bit much. > > Well, that's more because 128 MB is ridiculously low today and you won't > be able to run any recent guest without overriding the default. I'm running my L4Linux just fine over here! O:-) My point here was -- if the default RAM size is as low as it is (and nobody seems to want to increase it), does it make sense if we try to increase our defaults? I suppose you could say that not adjusting the RAM default is a bad decision, but it's not our decision, so there's nothing we can do about that. I suppose you could also say that adjusting the RAM size is easier than adjusting the qcow2 cache size. So, yeah. True. >> Now it gets a bit complicated. This series makes cache-clean-interval >> default to 10 minutes, so it shouldn't be an issue in practice. But one >> thing to note is that this is a Linux-specific feature, so on every >> other system, this really means 32 MB per image. > > That's a bit inaccurate in this generality: On non-Linux, it means 32 MB > per fully accessed image with a size >= 256 GB. > >> (Also, 10 minutes means that whenever I boot up my VM with a couple of >> disks with random accesses all over the images during boot, I might >> end up using 32 MB per image again (for 10 min), even though I don't >> really need that performance.) > > If your system files are fragmented in a way that a boot will access > every 512 MB chunk in a 256 GB disk, you should seriously think about > fixing that... > > This is a pathological case that shouldn't define our defaults. Random > I/O over 256 GB is really a pathological case, too, but people are > likely to actually test it. They aren't going to systematically test a > horribly fragmented system that wouldn't happen in reality. True. >> Now if we really rely on that cache-clean-interval, why not make it >> always cover the whole image by default? I don't really see why we >> should now say "256 GB seems reasonable, and 32 MB doesn't sound like >> too much, let's go there". (Well, OK, I do see how you end up using 32 >> MB as basically a safety margin, where you'd say that anything above it >> is just unreasonable.) >> >> Do we update the limit in a couple of years again because people have >> more RAM and larger disks then? (Maybe we do?) > > Possibly. I see those defaults as values that we can adjust to reality > whenever we think the old values don't reflect the important cases well > enough any more. OK. >> My personal opinion is this: Most users should be fine with 8 GB of >> randomly accessible image space (this may be wrong). Whenever a user >> does have an application that uses more than 8 GB, they are probably in >> an area where they want to do some performance tuning anyway. Requiring >> them to set l2-cache-full in that case seems reasonable to me. > > In principle, I'd agree. I'd even say that management tools should > always explicitly set those options instead of relying on our defaults. > But management tools have been ignoring these options for a long time > and keep doing so. > > And honestly, if you can't spend a few megabytes for the caches, it's > just as reasonable that you should set l2-cache to a lower value. You'll > need some more tweaking anyway to reduce the memory footprint. It isn't, because as I explained above, it is more reasonable to expect people to find out about disk options because their disk performance is abysmal than because their RAM is exhausted. I would like to say "but it is nearly as reasonable", but I really don't think so. >> Pushing the default to 256 GB to me looks a bit like just letting them >> run into the problem later. It doesn't solve the issue that you need >> to do some performance tuning if you have a bit of a special use case >> (but maybe I'm wrong and accessing more than 8 GB randomly is what >> everybody does with their VMs). >> >> (Maybe it's even a good thing to limit it to a smaller number so users >> run into the issue sooner than later...) > > Definitely not when their management tool doesn't give them the option > of changing the value. That is true. > Being slow makes qcow2 look really bad. In contrast, I don't think I've > ever heard anyone complain about memory usage of qcow2. Yeah, because it never was an issue. It might (in theory) become now. Also note again that people might just not realize the memory usage is due to qcow2. > Our choice of a > default should reflect that, especially considering that we only use > the memory on demand. If your image is only 32 GB, you'll never use more > than 4 MB of cache. Well, OK, yes. This is an especially important point when it really is about hosts that have limited memory. In those cases, users probably won't run huge images anyway. > And if your image is huge, but only access part of > it, we also won't use the full 32 MB. On Linux. O:-) >> OTOH, this change means that everyone on a non-Linux system will have to >> use 32 MB of their RAM per qcow2 image, and everyone on a Linux system >> will potentially use it e.g. during boot when you do access a lot >> randomly (even though the performance usually is not of utmost >> importance then (important, but not extremely so)). But then again, >> this will probably only affect a single disk (the one with the OS on >> it), so it won't be too bad. >> >> So my stance is: >> >> (1) Is it really worth pushing the default to 256 GB if you probably >> have to do a bit of performance tuning anyway when you get past 8 GB >> random IOPS? I think it's reasonable to ask users to use l2-cache-full >> or adjust the cache to their needs. >> >> (2) For non-Linux systems, this seems to really mean 32 MB of RAM per >> qcow2 image. That's 1/4th of default VM RAM. Is that worth it? >> >> (3) For Linux, I don't like it much either, but that's because I'm >> stupid. The fact that if you don't need this much random I/O only your >> boot disk may cause a RAM usage spike, and even then it's going to go >> down after 10 minutes, is probably enough to justify this change. >> >> >> I suppose my moaning would subside if we only increased the default on >> systems that actually support cache-clean-interval...? So it's good that you have calmed my nerves about how this might be problematic on Linux systems (it isn't in practice, although I disagree that people will find qcow2 to be the fault when their memory runs out), but you haven't said anything about non-Linux systems. I understand that you don't care, but as I said here, this was my only substantial concern anyway. >> Max >> >> >> PS: I also don't quite like how you got to the default of 10 minutes of >> the cache-clean-interval. You can't justify using 32 MB as the default >> cache size by virtue of "We have a cache-clean-interval now", and then >> justify a CCI of 10 min by "It's just for VMs which sit idle". >> >> No. If you rely on CCI to be there to make the cache size reasonable by >> default for whatever the user is doing with their images, you have to >> consider that fact when choosing a CCI. >> >> Ideally we'd probably want a soft and a hard cache limit, but I don't >> know... >> >> (Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard cache >> limit of 32 MB with a CCI of 1 min by default. So whenever your cache >> uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, >> your CCI is 10 min.) > > I've actually thought of something like this before, too. Maybe we > should do that. But that can be done on top of this series. Sure. Max
On 2018-08-13 08:09, Leonid Bloch wrote: > On August 13, 2018 4:39:35 AM EEST, Max Reitz <mreitz@redhat.com> wrote: [...] >> Ideally we'd probably want a soft and a hard cache limit, but I don't >> know... >> >> (Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard >> cache >> limit of 32 MB with a CCI of 1 min by default. So whenever your cache >> uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, >> your CCI is 10 min.) > > Max, thanks for your insight. Indeed some good points. > Considering this, I'm thinking to set the limit to 16 MB, and the CCI to 5 min. What do you think? I think it's good for a preliminary solution, and then later increase the limit with the soft and hard limits. OTOH, if we implement the soft/hard limits, it doesn't really matter what default you choose now... > Modern Windows installations should gain performance from being able to random I/O to >8 GB chunks, and data processing tasks where each data set is 8+ GB for sure do (did benchmarks). And the maximum is only ever used if (a) the image is large enough and (b) it is indeed used. > While taking 256 GB images as the "limit" can be considered an overshoot, 128 GB is quite reasonable, I think. > > Your idea with "soft" and "hard" limits is great! I'm tempted to implement this. Say 4 MB with 10 min., and 16 MB with 5 min? 32 MB and 2 or 3 min? :-) If you do that, I'm fine with a plain default of 32 MB for now. Max
Am 13.08.2018 um 17:11 hat Max Reitz geschrieben: > >> My personal opinion is this: Most users should be fine with 8 GB of > >> randomly accessible image space (this may be wrong). Whenever a user > >> does have an application that uses more than 8 GB, they are probably in > >> an area where they want to do some performance tuning anyway. Requiring > >> them to set l2-cache-full in that case seems reasonable to me. > > > > In principle, I'd agree. I'd even say that management tools should > > always explicitly set those options instead of relying on our defaults. > > But management tools have been ignoring these options for a long time > > and keep doing so. > > > > And honestly, if you can't spend a few megabytes for the caches, it's > > just as reasonable that you should set l2-cache to a lower value. You'll > > need some more tweaking anyway to reduce the memory footprint. > > It isn't, because as I explained above, it is more reasonable to expect > people to find out about disk options because their disk performance is > abysmal than because their RAM is exhausted. > > I would like to say "but it is nearly as reasonable", but I really don't > think so. Maybe in a perfect world, finding the option when their disk performance is abysmal is what users would do. In this world, they either just use raw and scream for backing files and dirty bitmaps and whatnot for raw, or they just directly go to some other hypervisor. Realistically, the cache options don't exist. They are hard to discover in the QEMU command line and management tools don't support them. Conclusion: We're doomed to find a one-size-fits-all default that works well in all common use cases, including benchmarks. We can try and make it adapt to the situation, but we can't reasonably expect users to manually override it. > > Our choice of a > > default should reflect that, especially considering that we only use > > the memory on demand. If your image is only 32 GB, you'll never use more > > than 4 MB of cache. > > Well, OK, yes. This is an especially important point when it really is > about hosts that have limited memory. In those cases, users probably > won't run huge images anyway. > > > And if your image is huge, but only access part of > > it, we also won't use the full 32 MB. > > On Linux. O:-) No, on any system where qemu_try_blockalign() results in a COW zero page. The Linux-only addition is returning memory even after an access. > So it's good that you have calmed my nerves about how this might be > problematic on Linux systems (it isn't in practice, although I disagree > that people will find qcow2 to be the fault when their memory runs out), > but you haven't said anything about non-Linux systems. I understand > that you don't care, but as I said here, this was my only substantial > concern anyway. I don't actually think it's so bad to keep the cache permanently allocated, but I wouldn't object to a lower default for non-Linux hosts either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) might be more adequate. My typical desktop VMs are larger than 8 GB, but smaller than 32 GB. Kevin
Am 13.08.2018 um 17:16 hat Max Reitz geschrieben: > On 2018-08-13 08:09, Leonid Bloch wrote: > > On August 13, 2018 4:39:35 AM EEST, Max Reitz <mreitz@redhat.com> wrote: > > [...] > > >> Ideally we'd probably want a soft and a hard cache limit, but I don't > >> know... > >> > >> (Like, a soft cache limit of 1 MB with a CCI of 10 min, and a hard > >> cache > >> limit of 32 MB with a CCI of 1 min by default. So whenever your cache > >> uses more than 1 MB of RAM, your CCI is 1 min, and whenever it's below, > >> your CCI is 10 min.) > > > > Max, thanks for your insight. Indeed some good points. > > Considering this, I'm thinking to set the limit to 16 MB, and the CCI to 5 min. What do you think? > > I think it's good for a preliminary solution, and then later increase > the limit with the soft and hard limits. > > OTOH, if we implement the soft/hard limits, it doesn't really matter > what default you choose now... > > > Modern Windows installations should gain performance from being able to random I/O to >8 GB chunks, and data processing tasks where each data set is 8+ GB for sure do (did benchmarks). And the maximum is only ever used if (a) the image is large enough and (b) it is indeed used. > > While taking 256 GB images as the "limit" can be considered an overshoot, 128 GB is quite reasonable, I think. > > > > Your idea with "soft" and "hard" limits is great! I'm tempted to implement this. Say 4 MB with 10 min., and 16 MB with 5 min? > > 32 MB and 2 or 3 min? :-) > > If you do that, I'm fine with a plain default of 32 MB for now. I would be happy with that. Kevin
On 2018-08-13 17:58, Kevin Wolf wrote: > Am 13.08.2018 um 17:11 hat Max Reitz geschrieben: >>>> My personal opinion is this: Most users should be fine with 8 GB of >>>> randomly accessible image space (this may be wrong). Whenever a user >>>> does have an application that uses more than 8 GB, they are probably in >>>> an area where they want to do some performance tuning anyway. Requiring >>>> them to set l2-cache-full in that case seems reasonable to me. >>> >>> In principle, I'd agree. I'd even say that management tools should >>> always explicitly set those options instead of relying on our defaults. >>> But management tools have been ignoring these options for a long time >>> and keep doing so. >>> >>> And honestly, if you can't spend a few megabytes for the caches, it's >>> just as reasonable that you should set l2-cache to a lower value. You'll >>> need some more tweaking anyway to reduce the memory footprint. >> >> It isn't, because as I explained above, it is more reasonable to expect >> people to find out about disk options because their disk performance is >> abysmal than because their RAM is exhausted. >> >> I would like to say "but it is nearly as reasonable", but I really don't >> think so. > > Maybe in a perfect world, finding the option when their disk performance > is abysmal is what users would do. In this world, they either just use > raw and scream for backing files and dirty bitmaps and whatnot for raw, > or they just directly go to some other hypervisor. > > Realistically, the cache options don't exist. They are hard to discover > in the QEMU command line and management tools don't support them. > > Conclusion: We're doomed to find a one-size-fits-all default that works > well in all common use cases, including benchmarks. We can try and make > it adapt to the situation, but we can't reasonably expect users to > manually override it. OK, saying both is unreasonable is something I can get behind. >>> Our choice of a >>> default should reflect that, especially considering that we only use >>> the memory on demand. If your image is only 32 GB, you'll never use more >>> than 4 MB of cache. >> >> Well, OK, yes. This is an especially important point when it really is >> about hosts that have limited memory. In those cases, users probably >> won't run huge images anyway. >> >>> And if your image is huge, but only access part of >>> it, we also won't use the full 32 MB. >> >> On Linux. O:-) > > No, on any system where qemu_try_blockalign() results in a COW zero > page. OK, yes, but why would you only ever access part of it? Then you might just as well have created a smaller disk from the beginning. > The Linux-only addition is returning memory even after an access. > >> So it's good that you have calmed my nerves about how this might be >> problematic on Linux systems (it isn't in practice, although I disagree >> that people will find qcow2 to be the fault when their memory runs out), >> but you haven't said anything about non-Linux systems. I understand >> that you don't care, but as I said here, this was my only substantial >> concern anyway. > > I don't actually think it's so bad to keep the cache permanently > allocated, but I wouldn't object to a lower default for non-Linux hosts > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) > might be more adequate. My typical desktop VMs are larger than 8 GB, but > smaller than 32 GB. Will your typical desktop VMs gain anything from the cache covering more than 8 GB? Anyway, I certainly won't complain about 4 MB. (My point here is that on non-Linux systems, qemu probably does not have users who have use cases where they need to access 256 GB of disk simultaneously. Probably not even more than 8 GB. If you want to increase the cache size there to 4 MB, fine, I think that won't hurt. But 32 MB might hurt, and I don't think on non-Linux systems there are users who would benefit from it -- specifically because your "typical desktop VM" wouldn't benefit from it.) Max
Am 13.08.2018 um 18:08 hat Max Reitz geschrieben: > >>> default should reflect that, especially considering that we only use > >>> the memory on demand. If your image is only 32 GB, you'll never use more > >>> than 4 MB of cache. > >> > >> Well, OK, yes. This is an especially important point when it really is > >> about hosts that have limited memory. In those cases, users probably > >> won't run huge images anyway. > >> > >>> And if your image is huge, but only access part of > >>> it, we also won't use the full 32 MB. > >> > >> On Linux. O:-) > > > > No, on any system where qemu_try_blockalign() results in a COW zero > > page. > > OK, yes, but why would you only ever access part of it? Then you might > just as well have created a smaller disk from the beginning. I always create my qcow2 images larger than I actually need them. It costs basically nothing and avoids the need to resize my partitions inside the guest later. And anyway, a disk with 100% usage is not the common case, but the point where the user will either delete stuff or resize the image. For long-running VMs, deleting stuff doesn't get rid of the large cache on non-Linux, but I think we agree that long-running guests aren't what we expect on those hosts? > > The Linux-only addition is returning memory even after an access. > > > >> So it's good that you have calmed my nerves about how this might be > >> problematic on Linux systems (it isn't in practice, although I disagree > >> that people will find qcow2 to be the fault when their memory runs out), > >> but you haven't said anything about non-Linux systems. I understand > >> that you don't care, but as I said here, this was my only substantial > >> concern anyway. > > > > I don't actually think it's so bad to keep the cache permanently > > allocated, but I wouldn't object to a lower default for non-Linux hosts > > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) > > might be more adequate. My typical desktop VMs are larger than 8 GB, but > > smaller than 32 GB. > > Will your typical desktop VMs gain anything from the cache covering > more than 8 GB? Good point. Probably not. > Anyway, I certainly won't complain about 4 MB. > > (My point here is that on non-Linux systems, qemu probably does not have > users who have use cases where they need to access 256 GB of disk > simultaneously. Probably not even more than 8 GB. If you want to > increase the cache size there to 4 MB, fine, I think that won't hurt. > But 32 MB might hurt, and I don't think on non-Linux systems there are > users who would benefit from it -- specifically because your "typical > desktop VM" wouldn't benefit from it.) Maybe 1 MB is fine for them, after all. Kevin
> I don't actually think it's so bad to keep the cache permanently > allocated, but I wouldn't object to a lower default for non-Linux hosts > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) > might be more adequate. My typical desktop VMs are larger than 8 GB, but > smaller than 32 GB. > > Kevin > And for a Windows VM just the OS installation takes above 40 GB. While we probably are not running Windows VMs for our own needs, it is very common that a customer of, for example, some cloud service uses QEMU (unknowingly) for a full-blown Windows. So 100 GB+ images which are quite heavily used is not a rare scenario. 256 GB - yeah, that would be on the higher end. So 16 MB would indeed be a reasonable default for the *max.* L2 cache now, although below that would be too little, I think. 32 MB - if we want some future-proofing. Leonid.
Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben: > > I don't actually think it's so bad to keep the cache permanently > > allocated, but I wouldn't object to a lower default for non-Linux hosts > > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) > > might be more adequate. My typical desktop VMs are larger than 8 GB, but > > smaller than 32 GB. > > And for a Windows VM just the OS installation takes above 40 GB. While we > probably are not running Windows VMs for our own needs, it is very common > that a customer of, for example, some cloud service uses QEMU (unknowingly) > for a full-blown Windows. So 100 GB+ images which are quite heavily used is > not a rare scenario. 256 GB - yeah, that would be on the higher end. The OS installation is mostly sequential access, though. You only need that much cache when you have completely random I/O across the whole image. Otherwise the LRU based approach of the cache is good enough to keep those tables cached that are actually in use. The maximum cache size is maybe for huge databases or indeed random I/O benchmarks, both of which are important to support (on Linux at least), but probably not the most common use case. > So 16 MB would indeed be a reasonable default for the *max.* L2 cache now, > although below that would be too little, I think. 32 MB - if we want some > future-proofing. I think we all agree that 32 MB + cache-clean-interval is okay. It's just that for non-Linux guests, cache-clean-interval doesn't work. However, we probably care less about those large random I/O cases there, so a smaller cache size like 1 MB can do on non-Linux. Kevin
On 8/14/18 11:18 AM, Kevin Wolf wrote: > Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben: >>> I don't actually think it's so bad to keep the cache permanently >>> allocated, but I wouldn't object to a lower default for non-Linux hosts >>> either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) >>> might be more adequate. My typical desktop VMs are larger than 8 GB, but >>> smaller than 32 GB. >> >> And for a Windows VM just the OS installation takes above 40 GB. While we >> probably are not running Windows VMs for our own needs, it is very common >> that a customer of, for example, some cloud service uses QEMU (unknowingly) >> for a full-blown Windows. So 100 GB+ images which are quite heavily used is >> not a rare scenario. 256 GB - yeah, that would be on the higher end. > > The OS installation is mostly sequential access, though. You only need > that much cache when you have completely random I/O across the whole > image. Otherwise the LRU based approach of the cache is good enough to > keep those tables cached that are actually in use. Sorry, by "OS installation" I meant the installed size of the OS, which should be available for fast and frequent access, not the installation process itself. Obviously for one-time tasks like the installation process it's not worth it, unless one installs all the time, instead of using ready images, for some reason. :) > > The maximum cache size is maybe for huge databases or indeed random I/O > benchmarks, both of which are important to support (on Linux at least), > but probably not the most common use case. > >> So 16 MB would indeed be a reasonable default for the *max.* L2 cache now, >> although below that would be too little, I think. 32 MB - if we want some >> future-proofing. > > I think we all agree that 32 MB + cache-clean-interval is okay. > > It's just that for non-Linux guests, cache-clean-interval doesn't work. > However, we probably care less about those large random I/O cases there, > so a smaller cache size like 1 MB can do on non-Linux. > > Kevin >
Am 14.08.2018 um 13:34 hat Leonid Bloch geschrieben: > On 8/14/18 11:18 AM, Kevin Wolf wrote: > > Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben: > > > > I don't actually think it's so bad to keep the cache permanently > > > > allocated, but I wouldn't object to a lower default for non-Linux hosts > > > > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) > > > > might be more adequate. My typical desktop VMs are larger than 8 GB, but > > > > smaller than 32 GB. > > > > > > And for a Windows VM just the OS installation takes above 40 GB. While we > > > probably are not running Windows VMs for our own needs, it is very common > > > that a customer of, for example, some cloud service uses QEMU (unknowingly) > > > for a full-blown Windows. So 100 GB+ images which are quite heavily used is > > > not a rare scenario. 256 GB - yeah, that would be on the higher end. > > > > The OS installation is mostly sequential access, though. You only need > > that much cache when you have completely random I/O across the whole > > image. Otherwise the LRU based approach of the cache is good enough to > > keep those tables cached that are actually in use. > > Sorry, by "OS installation" I meant the installed size of the OS, which > should be available for fast and frequent access, not the installation > process itself. Obviously for one-time tasks like the installation process > it's not worth it, unless one installs all the time, instead of using ready > images, for some reason. :) But you never use everything that is present in an OS installation of 40 GB (is it really _that_ huge these days?), and you don't read OS files non-stop. The most frequently used parts of the OS are actually in the guest RAM. I don't think you'll really notice the difference in qcow2 unless you have a really I/O intensive workload - and that is not usually for OS files, but for user data. For only occasional accesses, the additional 64k for the metadata table wouldn't play a big role. Kevin
On 8/14/18 2:44 PM, Kevin Wolf wrote: > Am 14.08.2018 um 13:34 hat Leonid Bloch geschrieben: >> On 8/14/18 11:18 AM, Kevin Wolf wrote: >>> Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben: >>>>> I don't actually think it's so bad to keep the cache permanently >>>>> allocated, but I wouldn't object to a lower default for non-Linux hosts >>>>> either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB) >>>>> might be more adequate. My typical desktop VMs are larger than 8 GB, but >>>>> smaller than 32 GB. >>>> >>>> And for a Windows VM just the OS installation takes above 40 GB. While we >>>> probably are not running Windows VMs for our own needs, it is very common >>>> that a customer of, for example, some cloud service uses QEMU (unknowingly) >>>> for a full-blown Windows. So 100 GB+ images which are quite heavily used is >>>> not a rare scenario. 256 GB - yeah, that would be on the higher end. >>> >>> The OS installation is mostly sequential access, though. You only need >>> that much cache when you have completely random I/O across the whole >>> image. Otherwise the LRU based approach of the cache is good enough to >>> keep those tables cached that are actually in use. >> >> Sorry, by "OS installation" I meant the installed size of the OS, which >> should be available for fast and frequent access, not the installation >> process itself. Obviously for one-time tasks like the installation process >> it's not worth it, unless one installs all the time, instead of using ready >> images, for some reason. :) > > But you never use everything that is present in an OS installation of > 40 GB (is it really _that_ huge these days?), and you don't read OS > files non-stop. The most frequently used parts of the OS are actually in > the guest RAM. Yes, Windows 8.1, with all the desktop bloat - just above 40 GB. :] I did a proper benchmarking indeed only on heavy I/O load, where full cache did show above 50% improvement, although just regular usage felt faster as well, but maybe it's just psychosomatic. :) Leonid. > > I don't think you'll really notice the difference in qcow2 unless you > have a really I/O intensive workload - and that is not usually for OS > files, but for user data. For only occasional accesses, the additional > 64k for the metadata table wouldn't play a big role. > > Kevin >
diff --git a/block/qcow2.h b/block/qcow2.h index d917b5f577..e699a55d02 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -74,7 +74,7 @@ /* Must be at least 4 to cover all cases of refcount table growth */ #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */ -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) #define DEFAULT_CLUSTER_SIZE (64 * KiB) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 69af306267..6ad1081d1a 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -125,8 +125,8 @@ There are a few things that need to be taken into account: (or the cache entry size: see "Using smaller cache sizes" below). - The default L2 cache size will cover the entire virtual size of an - image, up to a certain maximum. This maximum is 1 MB by default - (enough for image sizes of up to 8 GB with the default cluster size) + image, up to a certain maximum. This maximum is 32 MB by default + (enough for image sizes of up to 256 GB with the default cluster size) and it can be reduced or enlarged using the "l2-cache-size" option. The minimum is 2 clusters (or 2 cache entries, see below). @@ -186,7 +186,7 @@ Some things to take into account: always uses the cluster size as the entry size. - If the L2 cache is big enough to hold all of the image's L2 tables - (the default behavior for images of up to 8 GB in size) then none + (the default behavior for images of up to 256 GB in size) then none of this is necessary and you can omit the "l2-cache-entry-size" parameter altogether. diff --git a/qemu-options.hx b/qemu-options.hx index 22e8e2d113..4c44cdbc23 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -756,7 +756,7 @@ The maximum total size of the L2 table and refcount block caches in bytes @item l2-cache-size The maximum size of the L2 table cache in bytes -(default: if cache-size is not specified - 1M; otherwise, as large as possible +(default: if cache-size is not specified - 32M; otherwise, as large as possible within the cache-size, while permitting the requested or the minimal refcount cache size)
The upper limit on the L2 cache size is increased from 1 MB to 32 MB. This is done in order to allow default full coverage with the L2 cache for images of up to 256 GB in size (was 8 GB). Note, that only the needed amount to cover the full image is allocated. The value which is changed here is just the upper limit on the L2 cache size, beyond which it will not grow, even if the size of the image will require it to. Signed-off-by: Leonid Bloch <lbloch@janustech.com> --- block/qcow2.h | 2 +- docs/qcow2-cache.txt | 6 +++--- qemu-options.hx | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-)