Message ID | 20231214024727.3503870-9-vineeth@bitbyteword.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Dynamic vcpu priority management in kvm | expand |
On Wed, Dec 13 2023 at 21:47, Vineeth Pillai (Google) wrote: > The host proactively boosts the VCPU threads during irq/nmi injection. > However, the host is unaware of posted interrupts, and therefore, the > guest should request a boost if it has not already been boosted. > > Similarly, guest should request an unboost on irq/nmi/softirq exit if > the vcpu doesn't need the boost any more. That's giving a hint but no context for someone who is not familiar with the problem which is tried to be solved here. > @@ -327,6 +327,13 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) > .exit_rcu = false, > }; > > +#ifdef CONFIG_PARAVIRT_SCHED > + instrumentation_begin(); Slapping instrumentation_begin() at it silences the objtool checker, but that does not make it correct in any way. You _cannot_ call random code _before_ the kernel has established context. It's clearly documented: https://www.kernel.org/doc/html/latest/core-api/entry.html No? > + if (pv_sched_enabled()) > + pv_sched_boost_vcpu_lazy(); > + instrumentation_end(); > +#endif > + > if (user_mode(regs)) { > irqentry_enter_from_user_mode(regs); > return ret; > @@ -452,6 +459,18 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) > if (state.exit_rcu) > ct_irq_exit(); > } > + > +#ifdef CONFIG_PARAVIRT_SCHED > + instrumentation_begin(); Broken too > + /* > + * On irq exit, request a deboost from hypervisor if no softirq pending > + * and current task is not RT and !need_resched. > + */ > + if (pv_sched_enabled() && !local_softirq_pending() && > + !need_resched() && !task_is_realtime(current)) > + pv_sched_unboost_vcpu(); > + instrumentation_end(); > +#endif > } > > irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) > @@ -469,6 +488,11 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) > kmsan_unpoison_entry_regs(regs); > trace_hardirqs_off_finish(); > ftrace_nmi_enter(); > + > +#ifdef CONFIG_PARAVIRT_SCHED > + if (pv_sched_enabled()) > + pv_sched_boost_vcpu_lazy(); > +#endif > instrumentation_end(); > > return irq_state; > @@ -482,6 +506,12 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state) > trace_hardirqs_on_prepare(); > lockdep_hardirqs_on_prepare(); > } > + > +#ifdef CONFIG_PARAVIRT_SCHED > + if (pv_sched_enabled() && !in_hardirq() && !local_softirq_pending() && > + !need_resched() && !task_is_realtime(current)) > + pv_sched_unboost_vcpu(); > +#endif Symmetry is overrated. Just pick a spot and slap your hackery in. Aside of that this whole #ifdeffery is tasteless at best. > instrumentation_end(); > +#ifdef CONFIG_PARAVIRT_SCHED > + if (pv_sched_enabled()) > + pv_sched_boost_vcpu_lazy(); > +#endif But what's worse is that this is just another approach of sprinkling random hints all over the place and see what sticks. Abusing KVM as the man in the middle to communicate between guest and host scheduler is creative, but ill defined. From the host scheduler POV the vCPU is just a user space thread and making the guest special is just the wrong approach. The kernel has no proper general interface to convey information from a user space task to the scheduler. So instead of adding some ad-hoc KVM hackery the right thing is to solve this problem from ground up in a generic way and KVM can just utilize that instead of having the special snow-flake hackery which is just a maintainability nightmare. Thanks, tglx
[...snip..] > > +#ifdef CONFIG_PARAVIRT_SCHED > > + if (pv_sched_enabled() && !in_hardirq() && !local_softirq_pending() && > > + !need_resched() && !task_is_realtime(current)) > > + pv_sched_unboost_vcpu(); > > +#endif > > Symmetry is overrated. Just pick a spot and slap your hackery in. > > Aside of that this whole #ifdeffery is tasteless at best. > > > instrumentation_end(); > > > +#ifdef CONFIG_PARAVIRT_SCHED > > + if (pv_sched_enabled()) > > + pv_sched_boost_vcpu_lazy(); > > +#endif > > But what's worse is that this is just another approach of sprinkling > random hints all over the place and see what sticks. > Understood. WIll have a full re-write of guest side logic for v2. > Abusing KVM as the man in the middle to communicate between guest and > host scheduler is creative, but ill defined. > > From the host scheduler POV the vCPU is just a user space thread and > making the guest special is just the wrong approach. > > The kernel has no proper general interface to convey information from a > user space task to the scheduler. > > So instead of adding some ad-hoc KVM hackery the right thing is to solve > this problem from ground up in a generic way and KVM can just utilize > that instead of having the special snow-flake hackery which is just a > maintainability nightmare. > We had a constructive and productive discussion on the cover letter thread and reaching a consensus on the kvm side of the design. Will work towards that and post iterative revisions. Thanks, Vineeth
diff --git a/kernel/entry/common.c b/kernel/entry/common.c index fae56faac0b0..c69912b71725 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -327,6 +327,13 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) .exit_rcu = false, }; +#ifdef CONFIG_PARAVIRT_SCHED + instrumentation_begin(); + if (pv_sched_enabled()) + pv_sched_boost_vcpu_lazy(); + instrumentation_end(); +#endif + if (user_mode(regs)) { irqentry_enter_from_user_mode(regs); return ret; @@ -452,6 +459,18 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) if (state.exit_rcu) ct_irq_exit(); } + +#ifdef CONFIG_PARAVIRT_SCHED + instrumentation_begin(); + /* + * On irq exit, request a deboost from hypervisor if no softirq pending + * and current task is not RT and !need_resched. + */ + if (pv_sched_enabled() && !local_softirq_pending() && + !need_resched() && !task_is_realtime(current)) + pv_sched_unboost_vcpu(); + instrumentation_end(); +#endif } irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) @@ -469,6 +488,11 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) kmsan_unpoison_entry_regs(regs); trace_hardirqs_off_finish(); ftrace_nmi_enter(); + +#ifdef CONFIG_PARAVIRT_SCHED + if (pv_sched_enabled()) + pv_sched_boost_vcpu_lazy(); +#endif instrumentation_end(); return irq_state; @@ -482,6 +506,12 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state) trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); } + +#ifdef CONFIG_PARAVIRT_SCHED + if (pv_sched_enabled() && !in_hardirq() && !local_softirq_pending() && + !need_resched() && !task_is_realtime(current)) + pv_sched_unboost_vcpu(); +#endif instrumentation_end(); ct_nmi_exit(); diff --git a/kernel/softirq.c b/kernel/softirq.c index 807b34ccd797..90a127615e16 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -530,6 +530,11 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) in_hardirq = lockdep_softirq_start(); account_softirq_enter(current); +#ifdef CONFIG_PARAVIRT_SCHED + if (pv_sched_enabled()) + pv_sched_boost_vcpu_lazy(); +#endif + restart: /* Reset the pending bitmask before enabling irqs */ set_softirq_pending(0); @@ -577,6 +582,12 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) wakeup_softirqd(); } +#ifdef CONFIG_PARAVIRT_SCHED + if (pv_sched_enabled() && !need_resched() && + !task_is_realtime(current)) + pv_sched_unboost_vcpu(); +#endif + account_softirq_exit(current); lockdep_softirq_end(in_hardirq); softirq_handle_end();