mbox series

[RFC,0/2] KVM: Support Heterogeneous RT VCPU Configurations.

Message ID 20210728073700.120449-1-suleiman@google.com (mailing list archive)
Headers show
Series KVM: Support Heterogeneous RT VCPU Configurations. | expand

Message

Suleiman Souhlal July 28, 2021, 7:36 a.m. UTC
Hello,

This series attempts to solve some issues that arise from
having some VCPUs be real-time while others aren't.

We are trying to play media inside a VM on a desktop environment
(Chromebooks), which requires us to have some tasks in the guest
be serviced at real-time priority on the host so that the media
can be played smoothly.

To achieve this, we give a VCPU real-time priority on the host
and use isolcpus= to ensure that only designated tasks are allowed
to run on the RT VCPU.
In order to avoid priority inversions (for example when the RT
VCPU preempts a non-RT that's holding a lock that it wants to
acquire), we dedicate a host core to the RT vcpu: Only the RT
VCPU is allowed to run on that CPU, while all the other non-RT
cores run on all the other host CPUs.

This approach works on machines that have a large enough number
of CPUs where it's possible to dedicate a whole CPU for this,
but we also have machines that only have 2 CPUs and doing this
on those is too costly.

This patch series makes it possible to have a RT VCPU without
having to dedicate a whole host core for it.
It does this by making it so that non-RT VCPUs can't be
preempted if they are in a critical section, which we
approximate as having interrupts disabled or non-zero
preempt_count. Once the VCPU is found to not be in a critical
section anymore, it will give up the CPU.
There measures to ensure that preemption isn't delayed too
many times.

(I realize that the hooks in the scheduler aren't very
tasteful, but I couldn't figure out a better way.
SVM support will be added when sending the patch for
inclusion.)

Feedback or alternatives are appreciated.

Thanks,
-- Suleiman


Suleiman Souhlal (2):
  kvm,x86: Support heterogeneous RT VCPU configurations.
  kvm,x86: Report preempt_count to host.

 arch/x86/Kconfig                     | 11 +++++
 arch/x86/include/asm/kvm_host.h      |  7 +++
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/kvm.c                | 10 ++++
 arch/x86/kvm/Kconfig                 | 13 ++++++
 arch/x86/kvm/cpuid.c                 |  3 ++
 arch/x86/kvm/vmx/vmx.c               | 15 ++++++
 arch/x86/kvm/x86.c                   | 70 +++++++++++++++++++++++++++-
 arch/x86/kvm/x86.h                   |  2 +
 include/linux/kvm_host.h             |  4 ++
 include/linux/preempt.h              |  7 +++
 kernel/sched/core.c                  | 30 ++++++++++++
 virt/kvm/Kconfig                     |  3 ++
 virt/kvm/kvm_main.c                  | 13 ++++++
 14 files changed, 189 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra July 28, 2021, 8:10 a.m. UTC | #1
On Wed, Jul 28, 2021 at 04:36:58PM +0900, Suleiman Souhlal wrote:
> Hello,
> 
> This series attempts to solve some issues that arise from
> having some VCPUs be real-time while others aren't.
> 
> We are trying to play media inside a VM on a desktop environment
> (Chromebooks), which requires us to have some tasks in the guest
> be serviced at real-time priority on the host so that the media
> can be played smoothly.
> 
> To achieve this, we give a VCPU real-time priority on the host
> and use isolcpus= to ensure that only designated tasks are allowed
> to run on the RT VCPU.

WTH do you need isolcpus for that? What's wrong with cpusets?

> In order to avoid priority inversions (for example when the RT
> VCPU preempts a non-RT that's holding a lock that it wants to
> acquire), we dedicate a host core to the RT vcpu: Only the RT
> VCPU is allowed to run on that CPU, while all the other non-RT
> cores run on all the other host CPUs.
> 
> This approach works on machines that have a large enough number
> of CPUs where it's possible to dedicate a whole CPU for this,
> but we also have machines that only have 2 CPUs and doing this
> on those is too costly.
> 
> This patch series makes it possible to have a RT VCPU without
> having to dedicate a whole host core for it.
> It does this by making it so that non-RT VCPUs can't be
> preempted if they are in a critical section, which we
> approximate as having interrupts disabled or non-zero
> preempt_count. Once the VCPU is found to not be in a critical
> section anymore, it will give up the CPU.
> There measures to ensure that preemption isn't delayed too
> many times.
> 
> (I realize that the hooks in the scheduler aren't very
> tasteful, but I couldn't figure out a better way.
> SVM support will be added when sending the patch for
> inclusion.)
> 
> Feedback or alternatives are appreciated.

This is disguisting and completely wrecks the host scheduling. You're
placing guest over host, that's fundamentally wrong.

NAK!

If you want co-ordinated RT scheduling, look at paravirtualized deadline
scheduling.
Marcelo Tosatti July 28, 2021, 10:32 a.m. UTC | #2
On Wed, Jul 28, 2021 at 10:10:31AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 28, 2021 at 04:36:58PM +0900, Suleiman Souhlal wrote:
> > Hello,
> > 
> > This series attempts to solve some issues that arise from
> > having some VCPUs be real-time while others aren't.
> > 
> > We are trying to play media inside a VM on a desktop environment
> > (Chromebooks), which requires us to have some tasks in the guest
> > be serviced at real-time priority on the host so that the media
> > can be played smoothly.
> > 
> > To achieve this, we give a VCPU real-time priority on the host
> > and use isolcpus= to ensure that only designated tasks are allowed
> > to run on the RT VCPU.
> 
> WTH do you need isolcpus for that? What's wrong with cpusets?
> 
> > In order to avoid priority inversions (for example when the RT
> > VCPU preempts a non-RT that's holding a lock that it wants to
> > acquire), we dedicate a host core to the RT vcpu: Only the RT
> > VCPU is allowed to run on that CPU, while all the other non-RT
> > cores run on all the other host CPUs.
> > 
> > This approach works on machines that have a large enough number
> > of CPUs where it's possible to dedicate a whole CPU for this,
> > but we also have machines that only have 2 CPUs and doing this
> > on those is too costly.
> > 
> > This patch series makes it possible to have a RT VCPU without
> > having to dedicate a whole host core for it.
> > It does this by making it so that non-RT VCPUs can't be
> > preempted if they are in a critical section, which we
> > approximate as having interrupts disabled or non-zero
> > preempt_count. Once the VCPU is found to not be in a critical
> > section anymore, it will give up the CPU.
> > There measures to ensure that preemption isn't delayed too
> > many times.
> > 
> > (I realize that the hooks in the scheduler aren't very
> > tasteful, but I couldn't figure out a better way.
> > SVM support will be added when sending the patch for
> > inclusion.)
> > 
> > Feedback or alternatives are appreciated.
> 
> This is disguisting and completely wrecks the host scheduling. You're
> placing guest over host, that's fundamentally wrong.
> 
> NAK!
> 
> If you want co-ordinated RT scheduling, look at paravirtualized deadline
> scheduling.

Peter, not sure what exactly are you thinking of? (to solve this
particular problem with pv deadline scheduling).

Shouldnt it be possible to, through paravirt locks, boost the priority
of the non-RT vCPU (when locking fails in the -RT vCPU) ?
Suleiman Souhlal July 28, 2021, 10:37 a.m. UTC | #3
On Wed, Jul 28, 2021 at 7:34 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Wed, Jul 28, 2021 at 10:10:31AM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 28, 2021 at 04:36:58PM +0900, Suleiman Souhlal wrote:
> > > Hello,
> > >
> > > This series attempts to solve some issues that arise from
> > > having some VCPUs be real-time while others aren't.
> > >
> > > We are trying to play media inside a VM on a desktop environment
> > > (Chromebooks), which requires us to have some tasks in the guest
> > > be serviced at real-time priority on the host so that the media
> > > can be played smoothly.
> > >
> > > To achieve this, we give a VCPU real-time priority on the host
> > > and use isolcpus= to ensure that only designated tasks are allowed
> > > to run on the RT VCPU.
> >
> > WTH do you need isolcpus for that? What's wrong with cpusets?
> >
> > > In order to avoid priority inversions (for example when the RT
> > > VCPU preempts a non-RT that's holding a lock that it wants to
> > > acquire), we dedicate a host core to the RT vcpu: Only the RT
> > > VCPU is allowed to run on that CPU, while all the other non-RT
> > > cores run on all the other host CPUs.
> > >
> > > This approach works on machines that have a large enough number
> > > of CPUs where it's possible to dedicate a whole CPU for this,
> > > but we also have machines that only have 2 CPUs and doing this
> > > on those is too costly.
> > >
> > > This patch series makes it possible to have a RT VCPU without
> > > having to dedicate a whole host core for it.
> > > It does this by making it so that non-RT VCPUs can't be
> > > preempted if they are in a critical section, which we
> > > approximate as having interrupts disabled or non-zero
> > > preempt_count. Once the VCPU is found to not be in a critical
> > > section anymore, it will give up the CPU.
> > > There measures to ensure that preemption isn't delayed too
> > > many times.
> > >
> > > (I realize that the hooks in the scheduler aren't very
> > > tasteful, but I couldn't figure out a better way.
> > > SVM support will be added when sending the patch for
> > > inclusion.)
> > >
> > > Feedback or alternatives are appreciated.
> >
> > This is disguisting and completely wrecks the host scheduling. You're
> > placing guest over host, that's fundamentally wrong.
> >
> > NAK!
> >
> > If you want co-ordinated RT scheduling, look at paravirtualized deadline
> > scheduling.
>
> Peter, not sure what exactly are you thinking of? (to solve this
> particular problem with pv deadline scheduling).
>
> Shouldnt it be possible to, through paravirt locks, boost the priority
> of the non-RT vCPU (when locking fails in the -RT vCPU) ?

Unfortunately paravirt locks doesn't work in this configuration
because sched_yield() doesn't work across scheduling classes (non-RT
vs RT). :-(

-- Suleiman
Peter Zijlstra July 28, 2021, 10:46 a.m. UTC | #4
On Wed, Jul 28, 2021 at 07:32:53AM -0300, Marcelo Tosatti wrote:
> Peter, not sure what exactly are you thinking of? (to solve this
> particular problem with pv deadline scheduling).
> 
> Shouldnt it be possible to, through paravirt locks, boost the priority
> of the non-RT vCPU (when locking fails in the -RT vCPU) ?

No. Static priority scheduling does not compose. Any scheme that relies
on the guest behaving 'nice' is unacceptable.
Suleiman Souhlal July 30, 2021, 9:09 a.m. UTC | #5
Hi Peter,

On Wed, Jul 28, 2021 at 5:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jul 28, 2021 at 04:36:58PM +0900, Suleiman Souhlal wrote:
> > Hello,
> >
> > This series attempts to solve some issues that arise from
> > having some VCPUs be real-time while others aren't.
> >
> > We are trying to play media inside a VM on a desktop environment
> > (Chromebooks), which requires us to have some tasks in the guest
> > be serviced at real-time priority on the host so that the media
> > can be played smoothly.
> >
> > To achieve this, we give a VCPU real-time priority on the host
> > and use isolcpus= to ensure that only designated tasks are allowed
> > to run on the RT VCPU.
>
> WTH do you need isolcpus for that? What's wrong with cpusets?

I regret mentioning isolcpus here.
The patchset doesn't dictate how the guest is supposed to use RT.
cpusets also work.

> > In order to avoid priority inversions (for example when the RT
> > VCPU preempts a non-RT that's holding a lock that it wants to
> > acquire), we dedicate a host core to the RT vcpu: Only the RT
> > VCPU is allowed to run on that CPU, while all the other non-RT
> > cores run on all the other host CPUs.
> >
> > This approach works on machines that have a large enough number
> > of CPUs where it's possible to dedicate a whole CPU for this,
> > but we also have machines that only have 2 CPUs and doing this
> > on those is too costly.
> >
> > This patch series makes it possible to have a RT VCPU without
> > having to dedicate a whole host core for it.
> > It does this by making it so that non-RT VCPUs can't be
> > preempted if they are in a critical section, which we
> > approximate as having interrupts disabled or non-zero
> > preempt_count. Once the VCPU is found to not be in a critical
> > section anymore, it will give up the CPU.
> > There measures to ensure that preemption isn't delayed too
> > many times.
> >
> > (I realize that the hooks in the scheduler aren't very
> > tasteful, but I couldn't figure out a better way.
> > SVM support will be added when sending the patch for
> > inclusion.)
> >
> > Feedback or alternatives are appreciated.
>
> This is disguisting and completely wrecks the host scheduling. You're
> placing guest over host, that's fundamentally wrong.

I understand the sentiment.

For what it's worth, the patchset doesn't completely rely on a
well-behaved guest: It only delays preemption a bounded number of
times, after which it yields back no matter what.

> NAK!
>
> If you want co-ordinated RT scheduling, look at paravirtualized deadline
> scheduling.

Thanks for the suggestion, I will look into it.

-- Suleiman