Message ID | 20191108092247.16207-1-kchamart@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [qemu-web] Add a blog post on "Micro-Optimizing KVM VM-Exits" | expand |
[Cc: Rich Jones, addressing his feedback on IRC, below.] On Fri, Nov 08, 2019 at 10:22:47AM +0100, Kashyap Chamarthy wrote: > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], > given by Andrea Arcangeli at the recently concluded KVM Forum 2019. > > [1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc > > Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> > --- [...] > +The microbechmark: CPUID in a one million loop > +---------------------------------------------- > + > +The synthetic microbenchmark (meaning, focus on measuring the > +performance of a specific area of code) Andrea used was to run the CPUID > +instruction one million times, without any GCC optimizations or caching. > +This was done to test the latency of VM-Exits. I can send a v2 (but will wait for any other feedback), or when applying someone please replace the above paragraph with the following: "Andrea constructed a synthetic microbenchmark program (without any GCC optimizations or caching) which runs the CPUID instructions one million times in a loop. This microbenchmark is meant to focus on measuring the performance of a specific area of the code -- in this case, to test the latency of VM-Exits." (Rich, hope that reads better. Thanks for the review.) > +While stressing that the results of these microbenchmarks do not > +represent real-world workloads, he had two goals in mind with it: (a) > +explain how the software mitigation works; and (b) to justify to the > +broader community the value of the software optimizations he's working > +on in KVM. > + > +Andrea then reasoned through several interesting graphs that show how > +CPU computation time gets impacted when you disable or enable the > +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. [...]
On Fri, Nov 08, 2019 at 10:22:47AM +0100, Kashyap Chamarthy wrote: > +The proposal: "KVM Monolithic" > +------------------------------ > + > +Based on his investigation, Andrea proposed a patch series, ["KVM > +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM s/monolithc/monolithic/ Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On 08/11/2019 10.22, Kashyap Chamarthy wrote: > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], > given by Andrea Arcangeli at the recently concluded KVM Forum 2019. > Hi Kashyap, first thanks for writing up this article! It's a really nice summary of the presentation, I think. But before we include it, let me ask a meta-question: Is an article about the KVM *kernel* code suitable for the *QEMU* blog? Or is there maybe a better place for this, like an article on www.linux-kvm.org ? Opinions? Ideas? Thomas > --- > ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++ > 1 file changed, 115 insertions(+) > create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > > diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > new file mode 100644 > index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976 > --- /dev/null > +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > @@ -0,0 +1,115 @@ > +--- > +layout: post > +title: "Micro-Optimizing KVM VM-Exits" > +date: 2019-11-08 > +categories: [kvm, optimization] > +--- > + > +Background on VM-Exits > +---------------------- > + > +KVM (Kernel-based Virtual Machine) is the Linux kernel module that > +allows a host to run virtualized guests (Linux, Windows, etc). The KVM > +"guest execution loop", with QEMU (the open source emulator and > +virtualizer) as its user space, is roughly as follows: QEMU issues the > +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode" > +-- a special processor mode which allows guest code to safely run > +directly on the physical CPU. The guest code, which is inside a "jail" > +and thus cannot interfere with the rest of the system, keeps running on > +the hardware until it encounters a request it cannot handle. Then the > +processor gives the control back (referred to as "VM-Exit") either to > +kernel space, or to the user space to handle the request. Once the > +request is handled, native execution of guest code on the processor > +resumes again. And the loop goes on. > + > +There are dozens of reasons for VM-Exits (Intel's Software Developer > +Manual outlines 64 "Basic Exit Reasons"). For example, when a guest > +needs to emulate the CPUID instruction, it causes a "light-weight exit" > +to kernel space, because CPUID (among a few others) is emulated in the > +kernel itself, for performance reasons. But when the kernel _cannot_ > +handle a request, e.g. to emulate certain hardware, it results in a > +"heavy-weight exit" to QEMU, to perform the emulation. These VM-Exits > +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can > +be expensive. What can be done about it? > + > +Guest workloads that are hard to virtualize > +------------------------------------------- > + > +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea > +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. > + > +His talk touched on the cost of VM-Exits into the kernel, especially for > +guest workloads (e.g. enterprise databases) that are sensitive to their > +performance penalty. However, these workloads cannot avoid triggering > +VM-Exits with a high frequency. Andrea then outlined some of the > +optimizations he's been working on to improve the VM-Exit performance in > +the KVM code path -- especially in light of applying mitigations for > +speculative execution flaws (Spectre v2, MDS, L1TF). > + > +Andrea gave a brief recap of the different kinds of speculative > +execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that > +he outlined the performance impact of Spectre-v2 mitigations in context > +of KVM. > + > +The microbechmark: CPUID in a one million loop > +---------------------------------------------- > + > +The synthetic microbenchmark (meaning, focus on measuring the > +performance of a specific area of code) Andrea used was to run the CPUID > +instruction one million times, without any GCC optimizations or caching. > +This was done to test the latency of VM-Exits. > + > +While stressing that the results of these microbenchmarks do not > +represent real-world workloads, he had two goals in mind with it: (a) > +explain how the software mitigation works; and (b) to justify to the > +broader community the value of the software optimizations he's working > +on in KVM. > + > +Andrea then reasoned through several interesting graphs that show how > +CPU computation time gets impacted when you disable or enable the > +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. > + > +The proposal: "KVM Monolithic" > +------------------------------ > + > +Based on his investigation, Andrea proposed a patch series, ["KVM > +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM > +common module, 'kvm.ko'. Instead the KVM common code gets linked twice > +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and > +'kvm-amd.ko'. > + > +The reason for doing this is that the 'kvm.ko' module indirectly calls > +(via the "retpoline" technique) the vendor-specific KVM modules at every > +VM-Exit, several times. These indirect calls were not optimal before, > +but the "retpoline" mitigation (which isolates indirect branches, that > +allow a CPU to execute code from arbitrary locations, from speculative > +execution) for Spectre v2 compounds the problem, as it degrades > +performance. > + > +This approach will result in a few MiB of increased disk space for > +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls, > +and the elimination of "retpoline" overhead at run-time more than > +compensate for it. > + > +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks > +show a double-digit improvement in performance with default mitigations > +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'. And > +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES > +"KVM monolithic" still improve[s] performance, albiet it's on the order > +of 1%. > + > +Conclusion > +---------- > + > +Removal of the common KVM module has a non-negligible positive > +performance impact. And the "KVM Monolitic" patch series is still > +actively being reviewed, modulo some pending clean-ups. Based on the > +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other > +reviewers seemed amenable to merge the series. > + > +Although, we still have to deal with mitigations for 'indirect branch > +prediction' for a long time, reducing the VM-Exit latency is important > +in general; and more specifically, for guest workloads that happen to > +trigger frequent VM-Exits, without having to disable Spectre v2 > +mitigations on the host, as Andrea stated in the cover letter of his > +patch series. >
On 15/11/19 13:08, Thomas Huth wrote: > On 08/11/2019 10.22, Kashyap Chamarthy wrote: >> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], >> given by Andrea Arcangeli at the recently concluded KVM Forum 2019. >> > > Hi Kashyap, > > first thanks for writing up this article! It's a really nice summary of > the presentation, I think. > > But before we include it, let me ask a meta-question: Is an article > about the KVM *kernel* code suitable for the *QEMU* blog? Or is there > maybe a better place for this, like an article on www.linux-kvm.org ? I'm not sure there is such a thing as articles on www.linux-kvm.org. :) I have the same doubt, actually. Unfortunately I cannot think of another place that would host KVM-specific articles. Paolo > > Opinions? Ideas? > > Thomas > > >> --- >> ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++ >> 1 file changed, 115 insertions(+) >> create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> >> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> new file mode 100644 >> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976 >> --- /dev/null >> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> @@ -0,0 +1,115 @@ >> +--- >> +layout: post >> +title: "Micro-Optimizing KVM VM-Exits" >> +date: 2019-11-08 >> +categories: [kvm, optimization] >> +--- >> + >> +Background on VM-Exits >> +---------------------- >> + >> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that >> +allows a host to run virtualized guests (Linux, Windows, etc). The KVM >> +"guest execution loop", with QEMU (the open source emulator and >> +virtualizer) as its user space, is roughly as follows: QEMU issues the >> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode" >> +-- a special processor mode which allows guest code to safely run >> +directly on the physical CPU. The guest code, which is inside a "jail" >> +and thus cannot interfere with the rest of the system, keeps running on >> +the hardware until it encounters a request it cannot handle. Then the >> +processor gives the control back (referred to as "VM-Exit") either to >> +kernel space, or to the user space to handle the request. Once the >> +request is handled, native execution of guest code on the processor >> +resumes again. And the loop goes on. >> + >> +There are dozens of reasons for VM-Exits (Intel's Software Developer >> +Manual outlines 64 "Basic Exit Reasons"). For example, when a guest >> +needs to emulate the CPUID instruction, it causes a "light-weight exit" >> +to kernel space, because CPUID (among a few others) is emulated in the >> +kernel itself, for performance reasons. But when the kernel _cannot_ >> +handle a request, e.g. to emulate certain hardware, it results in a >> +"heavy-weight exit" to QEMU, to perform the emulation. These VM-Exits >> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can >> +be expensive. What can be done about it? >> + >> +Guest workloads that are hard to virtualize >> +------------------------------------------- >> + >> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea >> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. >> + >> +His talk touched on the cost of VM-Exits into the kernel, especially for >> +guest workloads (e.g. enterprise databases) that are sensitive to their >> +performance penalty. However, these workloads cannot avoid triggering >> +VM-Exits with a high frequency. Andrea then outlined some of the >> +optimizations he's been working on to improve the VM-Exit performance in >> +the KVM code path -- especially in light of applying mitigations for >> +speculative execution flaws (Spectre v2, MDS, L1TF). >> + >> +Andrea gave a brief recap of the different kinds of speculative >> +execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that >> +he outlined the performance impact of Spectre-v2 mitigations in context >> +of KVM. >> + >> +The microbechmark: CPUID in a one million loop >> +---------------------------------------------- >> + >> +The synthetic microbenchmark (meaning, focus on measuring the >> +performance of a specific area of code) Andrea used was to run the CPUID >> +instruction one million times, without any GCC optimizations or caching. >> +This was done to test the latency of VM-Exits. >> + >> +While stressing that the results of these microbenchmarks do not >> +represent real-world workloads, he had two goals in mind with it: (a) >> +explain how the software mitigation works; and (b) to justify to the >> +broader community the value of the software optimizations he's working >> +on in KVM. >> + >> +Andrea then reasoned through several interesting graphs that show how >> +CPU computation time gets impacted when you disable or enable the >> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. >> + >> +The proposal: "KVM Monolithic" >> +------------------------------ >> + >> +Based on his investigation, Andrea proposed a patch series, ["KVM >> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM >> +common module, 'kvm.ko'. Instead the KVM common code gets linked twice >> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and >> +'kvm-amd.ko'. >> + >> +The reason for doing this is that the 'kvm.ko' module indirectly calls >> +(via the "retpoline" technique) the vendor-specific KVM modules at every >> +VM-Exit, several times. These indirect calls were not optimal before, >> +but the "retpoline" mitigation (which isolates indirect branches, that >> +allow a CPU to execute code from arbitrary locations, from speculative >> +execution) for Spectre v2 compounds the problem, as it degrades >> +performance. >> + >> +This approach will result in a few MiB of increased disk space for >> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls, >> +and the elimination of "retpoline" overhead at run-time more than >> +compensate for it. >> + >> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks >> +show a double-digit improvement in performance with default mitigations >> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'. And >> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES >> +"KVM monolithic" still improve[s] performance, albiet it's on the order >> +of 1%. >> + >> +Conclusion >> +---------- >> + >> +Removal of the common KVM module has a non-negligible positive >> +performance impact. And the "KVM Monolitic" patch series is still >> +actively being reviewed, modulo some pending clean-ups. Based on the >> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other >> +reviewers seemed amenable to merge the series. >> + >> +Although, we still have to deal with mitigations for 'indirect branch >> +prediction' for a long time, reducing the VM-Exit latency is important >> +in general; and more specifically, for guest workloads that happen to >> +trigger frequent VM-Exits, without having to disable Spectre v2 >> +mitigations on the host, as Andrea stated in the cover letter of his >> +patch series. >> >
Thomas Huth <thuth@redhat.com> writes: > On 08/11/2019 10.22, Kashyap Chamarthy wrote: >> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], >> given by Andrea Arcangeli at the recently concluded KVM Forum 2019. >> > > Hi Kashyap, > > first thanks for writing up this article! It's a really nice summary of > the presentation, I think. > > But before we include it, let me ask a meta-question: Is an article > about the KVM *kernel* code suitable for the *QEMU* blog? Or is there > maybe a better place for this, like an article on www.linux-kvm.org ? > > Opinions? Ideas? I don't think it is a particular problem hosting it on the QEMU blog given the closeness of the two projects. It would get syndicated to planet.libvirt as well ;-) > > Thomas > > >> --- >> ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++ >> 1 file changed, 115 insertions(+) >> create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> >> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> new file mode 100644 >> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976 >> --- /dev/null >> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt >> @@ -0,0 +1,115 @@ >> +--- >> +layout: post >> +title: "Micro-Optimizing KVM VM-Exits" >> +date: 2019-11-08 >> +categories: [kvm, optimization] >> +--- >> + >> +Background on VM-Exits >> +---------------------- >> + >> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that >> +allows a host to run virtualized guests (Linux, Windows, etc). The KVM >> +"guest execution loop", with QEMU (the open source emulator and >> +virtualizer) as its user space, is roughly as follows: QEMU issues the >> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode" >> +-- a special processor mode which allows guest code to safely run >> +directly on the physical CPU. The guest code, which is inside a "jail" >> +and thus cannot interfere with the rest of the system, keeps running on >> +the hardware until it encounters a request it cannot handle. Then the >> +processor gives the control back (referred to as "VM-Exit") either to >> +kernel space, or to the user space to handle the request. Once the >> +request is handled, native execution of guest code on the processor >> +resumes again. And the loop goes on. >> + >> +There are dozens of reasons for VM-Exits (Intel's Software Developer >> +Manual outlines 64 "Basic Exit Reasons"). For example, when a guest >> +needs to emulate the CPUID instruction, it causes a "light-weight exit" >> +to kernel space, because CPUID (among a few others) is emulated in the >> +kernel itself, for performance reasons. But when the kernel _cannot_ >> +handle a request, e.g. to emulate certain hardware, it results in a >> +"heavy-weight exit" to QEMU, to perform the emulation. These VM-Exits >> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can >> +be expensive. What can be done about it? >> + >> +Guest workloads that are hard to virtualize >> +------------------------------------------- >> + >> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea >> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. >> + >> +His talk touched on the cost of VM-Exits into the kernel, especially for >> +guest workloads (e.g. enterprise databases) that are sensitive to their >> +performance penalty. However, these workloads cannot avoid triggering >> +VM-Exits with a high frequency. Andrea then outlined some of the >> +optimizations he's been working on to improve the VM-Exit performance in >> +the KVM code path -- especially in light of applying mitigations for >> +speculative execution flaws (Spectre v2, MDS, L1TF). >> + >> +Andrea gave a brief recap of the different kinds of speculative >> +execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that >> +he outlined the performance impact of Spectre-v2 mitigations in context >> +of KVM. >> + >> +The microbechmark: CPUID in a one million loop >> +---------------------------------------------- >> + >> +The synthetic microbenchmark (meaning, focus on measuring the >> +performance of a specific area of code) Andrea used was to run the CPUID >> +instruction one million times, without any GCC optimizations or caching. >> +This was done to test the latency of VM-Exits. >> + >> +While stressing that the results of these microbenchmarks do not >> +represent real-world workloads, he had two goals in mind with it: (a) >> +explain how the software mitigation works; and (b) to justify to the >> +broader community the value of the software optimizations he's working >> +on in KVM. >> + >> +Andrea then reasoned through several interesting graphs that show how >> +CPU computation time gets impacted when you disable or enable the >> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. >> + >> +The proposal: "KVM Monolithic" >> +------------------------------ >> + >> +Based on his investigation, Andrea proposed a patch series, ["KVM >> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM >> +common module, 'kvm.ko'. Instead the KVM common code gets linked twice >> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and >> +'kvm-amd.ko'. >> + >> +The reason for doing this is that the 'kvm.ko' module indirectly calls >> +(via the "retpoline" technique) the vendor-specific KVM modules at every >> +VM-Exit, several times. These indirect calls were not optimal before, >> +but the "retpoline" mitigation (which isolates indirect branches, that >> +allow a CPU to execute code from arbitrary locations, from speculative >> +execution) for Spectre v2 compounds the problem, as it degrades >> +performance. >> + >> +This approach will result in a few MiB of increased disk space for >> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls, >> +and the elimination of "retpoline" overhead at run-time more than >> +compensate for it. >> + >> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks >> +show a double-digit improvement in performance with default mitigations >> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'. And >> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES >> +"KVM monolithic" still improve[s] performance, albiet it's on the order >> +of 1%. >> + >> +Conclusion >> +---------- >> + >> +Removal of the common KVM module has a non-negligible positive >> +performance impact. And the "KVM Monolitic" patch series is still >> +actively being reviewed, modulo some pending clean-ups. Based on the >> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other >> +reviewers seemed amenable to merge the series. >> + >> +Although, we still have to deal with mitigations for 'indirect branch >> +prediction' for a long time, reducing the VM-Exit latency is important >> +in general; and more specifically, for guest workloads that happen to >> +trigger frequent VM-Exits, without having to disable Spectre v2 >> +mitigations on the host, as Andrea stated in the cover letter of his >> +patch series. >> -- Alex Bennée
On Fri, Nov 15, 2019 at 01:08:53PM +0100, Thomas Huth wrote: > On 08/11/2019 10.22, Kashyap Chamarthy wrote: > > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], > > given by Andrea Arcangeli at the recently concluded KVM Forum 2019. > > > > Hi Kashyap, > > first thanks for writing up this article! It's a really nice summary of > the presentation, I think. > > But before we include it, let me ask a meta-question: Is an article > about the KVM *kernel* code suitable for the *QEMU* blog? Or is there > maybe a better place for this, like an article on www.linux-kvm.org ? > > Opinions? Ideas? I don't see a problem with this. KVM and QEMU developers work very closely together and many users of QEMU care about the whole stack, so KVM is on-topic IMHO Regards, Daniel
On Fri, Nov 15, 2019 at 01:08:53PM +0100, Thomas Huth wrote: > On 08/11/2019 10.22, Kashyap Chamarthy wrote: > > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], > > given by Andrea Arcangeli at the recently concluded KVM Forum 2019. > > > > Hi Kashyap, > > first thanks for writing up this article! It's a really nice summary of > the presentation, I think. Hi Thomas, Thanks! > But before we include it, let me ask a meta-question: Is an article > about the KVM *kernel* code suitable for the *QEMU* blog? I had the same thought, and expressed it to Stefan as such, when he suggested qemu.org :-). I too found it odd to have a kernel-heavy article on qemu.org. > Or is there > maybe a better place for this, like an article on www.linux-kvm.org ? I thought about it; but I've never seen anyone write an "article" there; as it's a WikiSpace. And, like Paolo, I couldn't think of a better place either. FWIW, the qemu.org blog is indexed by a few blog "planet" aggregators; and linux-kvm.org is largely a static site that is occasionally updated by people if they happened to notice something (especially if it's egregiously wrong). > Opinions? Ideas? Another _potential_ venue: Given the topic is kernel space-related, it is likely to fit in with the LWN audience. LWN itself says they generally look for kernel-related articles. Although, I'm aware that there's already a few LWN articles being written on KVM Forum-based talks. (Perhaps once the "KVM Monolithic" patch series merges, this can be reworked into a standalone LWN kernel article — assuming LWN is amenable to it; need to check with LWN.) [...]
On 15/11/19 13:37, Kashyap Chamarthy wrote: >> Opinions? Ideas? > Another _potential_ venue: Given the topic is kernel space-related, it > is likely to fit in with the LWN audience. LWN itself says they > generally look for kernel-related articles. Although, I'm aware that > there's already a few LWN articles being written on KVM Forum-based > talks. (Perhaps once the "KVM Monolithic" patch series merges, this can > be reworked into a standalone LWN kernel article — assuming LWN is > amenable to it; need to check with LWN.) Yeah, perhaps later. For now I guess qemu.org is the best. Paolo
On 11/08/19 10:22, Kashyap Chamarthy wrote: > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], > given by Andrea Arcangeli at the recently concluded KVM Forum 2019. > > [1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc > > Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> > --- > ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++ > 1 file changed, 115 insertions(+) > create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > > diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > new file mode 100644 > index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976 > --- /dev/null > +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt > @@ -0,0 +1,115 @@ > +--- > +layout: post > +title: "Micro-Optimizing KVM VM-Exits" > +date: 2019-11-08 > +categories: [kvm, optimization] > +--- > + > +Background on VM-Exits > +---------------------- > + > +KVM (Kernel-based Virtual Machine) is the Linux kernel module that > +allows a host to run virtualized guests (Linux, Windows, etc). The KVM > +"guest execution loop", with QEMU (the open source emulator and > +virtualizer) as its user space, is roughly as follows: QEMU issues the > +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode" > +-- a special processor mode which allows guest code to safely run > +directly on the physical CPU. The guest code, which is inside a "jail" > +and thus cannot interfere with the rest of the system, keeps running on > +the hardware until it encounters a request it cannot handle. Then the > +processor gives the control back (referred to as "VM-Exit") either to > +kernel space, or to the user space to handle the request. Once the > +request is handled, native execution of guest code on the processor > +resumes again. And the loop goes on. > + > +There are dozens of reasons for VM-Exits (Intel's Software Developer > +Manual outlines 64 "Basic Exit Reasons"). For example, when a guest > +needs to emulate the CPUID instruction, it causes a "light-weight exit" > +to kernel space, because CPUID (among a few others) is emulated in the > +kernel itself, for performance reasons. But when the kernel _cannot_ > +handle a request, e.g. to emulate certain hardware, it results in a > +"heavy-weight exit" to QEMU, to perform the emulation. These VM-Exits > +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can > +be expensive. What can be done about it? > + > +Guest workloads that are hard to virtualize > +------------------------------------------- > + > +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea > +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. I'd suggest "addressed", not "attempted to address". > + > +His talk touched on the cost of VM-Exits into the kernel, especially for > +guest workloads (e.g. enterprise databases) that are sensitive to their > +performance penalty. However, these workloads cannot avoid triggering > +VM-Exits with a high frequency. Andrea then outlined some of the > +optimizations he's been working on to improve the VM-Exit performance in > +the KVM code path -- especially in light of applying mitigations for > +speculative execution flaws (Spectre v2, MDS, L1TF). > + > +Andrea gave a brief recap of the different kinds of speculative > +execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that > +he outlined the performance impact of Spectre-v2 mitigations in context > +of KVM. > + > +The microbechmark: CPUID in a one million loop > +---------------------------------------------- > + > +The synthetic microbenchmark (meaning, focus on measuring the > +performance of a specific area of code) Andrea used was to run the CPUID > +instruction one million times, without any GCC optimizations or caching. > +This was done to test the latency of VM-Exits. > + > +While stressing that the results of these microbenchmarks do not > +represent real-world workloads, he had two goals in mind with it: (a) > +explain how the software mitigation works; and (b) to justify to the > +broader community the value of the software optimizations he's working > +on in KVM. > + > +Andrea then reasoned through several interesting graphs that show how > +CPU computation time gets impacted when you disable or enable the > +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. > + > +The proposal: "KVM Monolithic" > +------------------------------ > + > +Based on his investigation, Andrea proposed a patch series, ["KVM > +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM > +common module, 'kvm.ko'. Instead the KVM common code gets linked twice > +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and > +'kvm-amd.ko'. > + > +The reason for doing this is that the 'kvm.ko' module indirectly calls > +(via the "retpoline" technique) the vendor-specific KVM modules at every > +VM-Exit, several times. These indirect calls were not optimal before, > +but the "retpoline" mitigation (which isolates indirect branches, that > +allow a CPU to execute code from arbitrary locations, from speculative > +execution) for Spectre v2 compounds the problem, as it degrades > +performance. > + > +This approach will result in a few MiB of increased disk space for > +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls, > +and the elimination of "retpoline" overhead at run-time more than > +compensate for it. > + > +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks > +show a double-digit improvement in performance with default mitigations > +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'. And > +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES > +"KVM monolithic" still improve[s] performance, albiet it's on the order > +of 1%. > + > +Conclusion > +---------- > + > +Removal of the common KVM module has a non-negligible positive > +performance impact. And the "KVM Monolitic" patch series is still > +actively being reviewed, modulo some pending clean-ups. Based on the > +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other > +reviewers seemed amenable to merge the series. > + > +Although, we still have to deal with mitigations for 'indirect branch > +prediction' for a long time, reducing the VM-Exit latency is important > +in general; and more specifically, for guest workloads that happen to > +trigger frequent VM-Exits, without having to disable Spectre v2 > +mitigations on the host, as Andrea stated in the cover letter of his > +patch series. > This article refers to "indirect calls" and "indirect branches" quite a few times. I suggest mentioning "function pointers" at least once... (AIUI, the core of the issue is that kvm.ko calls kvm-intel.ko and kvm-amd.ko through function pointers. Such calls are the target of malicious branch predictor mis-training, and therefore, as a counter-measure, they are compiled into retpolines, rather than the directly corresponding indirect call assembly instructions. But retpolines run slowly, in comparison. Calling the functions in question by name, in the C source code, rather than via function pointers, eliminates the indirect call assembly instructions, and obviates the need for retpolines. The resultant C source code is less abstract and less dynamic at runtime, but the original indirection isn't inherently necessary at runtime.) I couldn't attend Andrea's presentation, nor have I seen the slides, or a recording thereof, or the patchset; so I could easily be off. My point is, *if* the expression "function pointers" applies in this context, please do mention it; otherwise "indirect calls" just hangs in the air, IMHO. It might be as simple as replacing These indirect calls were not optimal before, with These indirect calls -- via function pointers in the C source code -- were not optimal before, Thanks! Laszlo
On Fri, Nov 15, 2019 at 01:41:01PM +0100, Paolo Bonzini wrote: > On 15/11/19 13:37, Kashyap Chamarthy wrote: > >> Opinions? Ideas? > > Another _potential_ venue: Given the topic is kernel space-related, it > > is likely to fit in with the LWN audience. LWN itself says they > > generally look for kernel-related articles. Although, I'm aware that > > there's already a few LWN articles being written on KVM Forum-based > > talks. (Perhaps once the "KVM Monolithic" patch series merges, this can > > be reworked into a standalone LWN kernel article — assuming LWN is > > amenable to it; need to check with LWN.) > > Yeah, perhaps later. For now I guess qemu.org is the best. Sure; others also seem to agree it's okay to be on qemu.org.
On Fri, Nov 15, 2019 at 01:45:51PM +0100, Laszlo Ersek wrote: > On 11/08/19 10:22, Kashyap Chamarthy wrote: [...] > > +Guest workloads that are hard to virtualize > > +------------------------------------------- > > + > > +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea > > +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. > > I'd suggest "addressed", not "attempted to address". Will fix in next iteration. [...] > > +Conclusion > > +---------- [...] > > +Although, we still have to deal with mitigations for 'indirect branch > > +prediction' for a long time, reducing the VM-Exit latency is important > > +in general; and more specifically, for guest workloads that happen to > > +trigger frequent VM-Exits, without having to disable Spectre v2 > > +mitigations on the host, as Andrea stated in the cover letter of his > > +patch series. > > > > This article refers to "indirect calls" and "indirect branches" quite a > few times. > > I suggest mentioning "function pointers" at least once... > > (AIUI, the core of the issue is that kvm.ko calls kvm-intel.ko and > kvm-amd.ko through function pointers. Such calls are the target of > malicious branch predictor mis-training, and therefore, as a > counter-measure, they are compiled into retpolines, rather than the > directly corresponding indirect call assembly instructions. But > retpolines run slowly, in comparison. Calling the functions in question > by name, in the C source code, rather than via function pointers, > eliminates the indirect call assembly instructions, and obviates the > need for retpolines. The resultant C source code is less abstract and > less dynamic at runtime, but the original indirection isn't inherently > necessary at runtime.) > > I couldn't attend Andrea's presentation, nor have I seen the slides, or > a recording thereof, or the patchset; so I could easily be off. I think your above explanation is indeed correct (which I couldn't have articulated so well; thanks!), based on my understanding, and reading Andrea's patch[*] and its commit message: "This [patch] replaces all kvm_x86_ops pointer to functions with regular external functions that don't require indirect calls. "[...] The pointer to function virtual template model cannot provide any runtime benefit because kvm-intel and kvm-amd can't be loaded at the same time. [...]" [*] https://lkml.org/lkml/2019/9/20/932 -- [PATCH 02/17] KVM: monolithic: x86: convert the kvm_x86_ops methods to external functions > My point is, *if* the expression "function pointers" applies in this > context, please do mention it; otherwise "indirect calls" just hangs > in the air, IMHO. > > It might be as simple as replacing > > These indirect calls were not optimal before, > > with > > These indirect calls -- via function pointers in the C source code > -- were not optimal before, Will fix; thanks for the thorough review. If you want to read Andrea's slides, here they are: https://static.sched.com/hosted_files/kvmforum2019/3b/kvm-monolithic.pdf Thanks for the review!
diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt new file mode 100644 index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976 --- /dev/null +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt @@ -0,0 +1,115 @@ +--- +layout: post +title: "Micro-Optimizing KVM VM-Exits" +date: 2019-11-08 +categories: [kvm, optimization] +--- + +Background on VM-Exits +---------------------- + +KVM (Kernel-based Virtual Machine) is the Linux kernel module that +allows a host to run virtualized guests (Linux, Windows, etc). The KVM +"guest execution loop", with QEMU (the open source emulator and +virtualizer) as its user space, is roughly as follows: QEMU issues the +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode" +-- a special processor mode which allows guest code to safely run +directly on the physical CPU. The guest code, which is inside a "jail" +and thus cannot interfere with the rest of the system, keeps running on +the hardware until it encounters a request it cannot handle. Then the +processor gives the control back (referred to as "VM-Exit") either to +kernel space, or to the user space to handle the request. Once the +request is handled, native execution of guest code on the processor +resumes again. And the loop goes on. + +There are dozens of reasons for VM-Exits (Intel's Software Developer +Manual outlines 64 "Basic Exit Reasons"). For example, when a guest +needs to emulate the CPUID instruction, it causes a "light-weight exit" +to kernel space, because CPUID (among a few others) is emulated in the +kernel itself, for performance reasons. But when the kernel _cannot_ +handle a request, e.g. to emulate certain hardware, it results in a +"heavy-weight exit" to QEMU, to perform the emulation. These VM-Exits +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can +be expensive. What can be done about it? + +Guest workloads that are hard to virtualize +------------------------------------------- + +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea +Arcangeli, attempted to address the kernel part of minimizing VM-Exits. + +His talk touched on the cost of VM-Exits into the kernel, especially for +guest workloads (e.g. enterprise databases) that are sensitive to their +performance penalty. However, these workloads cannot avoid triggering +VM-Exits with a high frequency. Andrea then outlined some of the +optimizations he's been working on to improve the VM-Exit performance in +the KVM code path -- especially in light of applying mitigations for +speculative execution flaws (Spectre v2, MDS, L1TF). + +Andrea gave a brief recap of the different kinds of speculative +execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that +he outlined the performance impact of Spectre-v2 mitigations in context +of KVM. + +The microbechmark: CPUID in a one million loop +---------------------------------------------- + +The synthetic microbenchmark (meaning, focus on measuring the +performance of a specific area of code) Andrea used was to run the CPUID +instruction one million times, without any GCC optimizations or caching. +This was done to test the latency of VM-Exits. + +While stressing that the results of these microbenchmarks do not +represent real-world workloads, he had two goals in mind with it: (a) +explain how the software mitigation works; and (b) to justify to the +broader community the value of the software optimizations he's working +on in KVM. + +Andrea then reasoned through several interesting graphs that show how +CPU computation time gets impacted when you disable or enable the +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al. + +The proposal: "KVM Monolithic" +------------------------------ + +Based on his investigation, Andrea proposed a patch series, ["KVM +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM +common module, 'kvm.ko'. Instead the KVM common code gets linked twice +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and +'kvm-amd.ko'. + +The reason for doing this is that the 'kvm.ko' module indirectly calls +(via the "retpoline" technique) the vendor-specific KVM modules at every +VM-Exit, several times. These indirect calls were not optimal before, +but the "retpoline" mitigation (which isolates indirect branches, that +allow a CPU to execute code from arbitrary locations, from speculative +execution) for Spectre v2 compounds the problem, as it degrades +performance. + +This approach will result in a few MiB of increased disk space for +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls, +and the elimination of "retpoline" overhead at run-time more than +compensate for it. + +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks +show a double-digit improvement in performance with default mitigations +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'. And +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES +"KVM monolithic" still improve[s] performance, albiet it's on the order +of 1%. + +Conclusion +---------- + +Removal of the common KVM module has a non-negligible positive +performance impact. And the "KVM Monolitic" patch series is still +actively being reviewed, modulo some pending clean-ups. Based on the +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other +reviewers seemed amenable to merge the series. + +Although, we still have to deal with mitigations for 'indirect branch +prediction' for a long time, reducing the VM-Exit latency is important +in general; and more specifically, for guest workloads that happen to +trigger frequent VM-Exits, without having to disable Spectre v2 +mitigations on the host, as Andrea stated in the cover letter of his +patch series.
This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1], given by Andrea Arcangeli at the recently concluded KVM Forum 2019. [1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> --- ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt