[RFC,v3,01/11] KVM: Capture VM start

Message ID	20220104194918.373612-2-rananta@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Date: Tue, 4 Jan 2022 19:49:08 +0000 In-Reply-To: <20220104194918.373612-1-rananta@google.com> Message-Id: <20220104194918.373612-2-rananta@google.com> Mime-Version: 1.0 References: <20220104194918.373612-1-rananta@google.com> Subject: [RFC PATCH v3 01/11] KVM: Capture VM start From: Raghavendra Rao Ananta <rananta@google.com> To: Marc Zyngier <maz@kernel.org>, Andrew Jones <drjones@redhat.com>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Peter Shier <pshier@google.com>, Ricardo Koller <ricarkol@google.com>, Oliver Upton <oupton@google.com>, Reiji Watanabe <reijiw@google.com>, Jing Zhang <jingzhangos@google.com>, Raghavendra Rao Anata <rananta@google.com>, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	KVM: arm64: Add support for hypercall services selection \| expand [RFC,v3,00/11] KVM: arm64: Add support for hypercall services selection [RFC,v3,01/11] KVM: Capture VM start [RFC,v3,02/11] KVM: arm64: Factor out firmware register handling from psci.c [RFC,v3,03/11] KVM: Introduce KVM_CAP_ARM_HVC_FW_REG_BMAP [RFC,v3,04/11] KVM: arm64: Setup a framework for hypercall bitmap firmware registers [RFC,v3,05/11] KVM: arm64: Add standard hypervisor firmware register [RFC,v3,06/11] KVM: arm64: Add vendor hypervisor firmware register [RFC,v3,07/11] Docs: KVM: Add doc for the bitmap firmware registers [RFC,v3,08/11] Docs: KVM: Rename psci.rst to hypercalls.rst [RFC,v3,09/11] tools: Import ARM SMCCC definitions [RFC,v3,10/11] selftests: KVM: aarch64: Introduce hypercall ABI test [RFC,v3,11/11] selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list

Raghavendra Rao Ananta Jan. 4, 2022, 7:49 p.m. UTC

Capture the start of the KVM VM, which is basically the
start of any vCPU run. This state of the VM is helpful
in the upcoming patches to prevent user-space from
configuring certain VM features after the VM has started
running.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 include/linux/kvm_host.h | 3 +++
 virt/kvm/kvm_main.c      | 9 +++++++++
 2 files changed, 12 insertions(+)

Reiji Watanabe Jan. 7, 2022, 6:06 a.m. UTC | #1

Hi Raghu,

On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
<rananta@google.com> wrote:
>
> Capture the start of the KVM VM, which is basically the
> start of any vCPU run. This state of the VM is helpful
> in the upcoming patches to prevent user-space from
> configuring certain VM features after the VM has started
> running.
>
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  include/linux/kvm_host.h | 3 +++
>  virt/kvm/kvm_main.c      | 9 +++++++++
>  2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c310648cc8f1..d0bd8f7a026c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -623,6 +623,7 @@ struct kvm {
>         struct notifier_block pm_notifier;
>  #endif
>         char stats_id[KVM_STATS_NAME_SIZE];
> +       bool vm_started;

Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
started yet, the name might be a bit misleading IMHO.  I would
think 'has_run_once' or 'ran_once' might be more clear (?).


>  };
>
>  #define kvm_err(fmt, ...) \
> @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
>         }
>  }
>
> +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> +
>  extern bool kvm_rebooting;
>
>  extern unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 72c4e6b39389..962b91ac2064 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
>         int r;
>         struct kvm_fpu *fpu = NULL;
>         struct kvm_sregs *kvm_sregs = NULL;
> +       struct kvm *kvm = vcpu->kvm;
>
>         if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
>                 return -EIO;
> @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
>                         if (oldpid)
>                                 synchronize_rcu();
>                         put_pid(oldpid);
> +
> +                       /*
> +                        * Since we land here even on the first vCPU run,
> +                        * we can mark that the VM has started running.
> +                        */

It might be nicer to add a comment why the code below gets kvm->lock.

Anyway, the patch generally looks good to me, and thank you
for making this change (it works for my purpose as well).

Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks,
Reiji


> +                       mutex_lock(&kvm->lock);
> +                       kvm->vm_started = true;
> +                       mutex_unlock(&kvm->lock);
>                 }
>                 r = kvm_arch_vcpu_ioctl_run(vcpu);
>                 trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> --
> 2.34.1.448.ga2b2bfdf31-goog
>

Raghavendra Rao Ananta Jan. 7, 2022, 11:43 p.m. UTC | #2

Hi Reiji,

On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
>
> Hi Raghu,
>
> On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > Capture the start of the KVM VM, which is basically the
> > start of any vCPU run. This state of the VM is helpful
> > in the upcoming patches to prevent user-space from
> > configuring certain VM features after the VM has started
> > running.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  include/linux/kvm_host.h | 3 +++
> >  virt/kvm/kvm_main.c      | 9 +++++++++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c310648cc8f1..d0bd8f7a026c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -623,6 +623,7 @@ struct kvm {
> >         struct notifier_block pm_notifier;
> >  #endif
> >         char stats_id[KVM_STATS_NAME_SIZE];
> > +       bool vm_started;
>
> Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> started yet, the name might be a bit misleading IMHO.  I would
> think 'has_run_once' or 'ran_once' might be more clear (?).
>
I always struggle with the names; but if you feel that 'ran_once'
makes more sense for a reader, I can change it.
>
> >  };
> >
> >  #define kvm_err(fmt, ...) \
> > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> >         }
> >  }
> >
> > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > +
> >  extern bool kvm_rebooting;
> >
> >  extern unsigned int halt_poll_ns;
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 72c4e6b39389..962b91ac2064 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >         int r;
> >         struct kvm_fpu *fpu = NULL;
> >         struct kvm_sregs *kvm_sregs = NULL;
> > +       struct kvm *kvm = vcpu->kvm;
> >
> >         if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> >                 return -EIO;
> > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >                         if (oldpid)
> >                                 synchronize_rcu();
> >                         put_pid(oldpid);
> > +
> > +                       /*
> > +                        * Since we land here even on the first vCPU run,
> > +                        * we can mark that the VM has started running.
> > +                        */
>
> It might be nicer to add a comment why the code below gets kvm->lock.
>
I've been going back and forth on this one. Initially I considered
simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
framework for hypercall bitmap firmware registers)
kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
have the whole 'is the register busy?' operation atomic. But, that's
just one of the applications.
> Anyway, the patch generally looks good to me, and thank you
> for making this change (it works for my purpose as well).
>
> Reviewed-by: Reiji Watanabe <reijiw@google.com>
>
Glad that it's helping you as well and thanks for the review.

Regards,
Raghavendra

> Thanks,
> Reiji
>
>
> > +                       mutex_lock(&kvm->lock);
> > +                       kvm->vm_started = true;
> > +                       mutex_unlock(&kvm->lock);
> >                 }
> >                 r = kvm_arch_vcpu_ioctl_run(vcpu);
> >                 trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >

Jim Mattson Jan. 8, 2022, 12:04 a.m. UTC | #3

On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
<rananta@google.com> wrote:
>
> Hi Reiji,
>
> On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <rananta@google.com> wrote:
> > >
> > > Capture the start of the KVM VM, which is basically the
> > > start of any vCPU run. This state of the VM is helpful
> > > in the upcoming patches to prevent user-space from
> > > configuring certain VM features after the VM has started
> > > running.

What about live migration, where the VM has already technically been
started before the first call to KVM_RUN?

Sean Christopherson Jan. 8, 2022, 1:06 a.m. UTC | #4

On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> Capture the start of the KVM VM, which is basically the

Please wrap at ~75 chars.

> start of any vCPU run. This state of the VM is helpful
> in the upcoming patches to prevent user-space from
> configuring certain VM features after the VM has started
> running.

Please provide context of how the flag will be used.  I glanced at the future
patches, and knowing very little about arm, I was unable to glean useful info
about exactly who is being prevented from doing what.

> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  include/linux/kvm_host.h | 3 +++
>  virt/kvm/kvm_main.c      | 9 +++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c310648cc8f1..d0bd8f7a026c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -623,6 +623,7 @@ struct kvm {
>  	struct notifier_block pm_notifier;
>  #endif
>  	char stats_id[KVM_STATS_NAME_SIZE];
> +	bool vm_started;
>  };
>  
>  #define kvm_err(fmt, ...) \
> @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +#define kvm_vm_has_started(kvm) (kvm->vm_started)

Needs parantheses around (kvm), but why bother with a macro?  This is the same
header that defines struct kvm.

> +
>  extern bool kvm_rebooting;
>  
>  extern unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 72c4e6b39389..962b91ac2064 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  	int r;
>  	struct kvm_fpu *fpu = NULL;
>  	struct kvm_sregs *kvm_sregs = NULL;
> +	struct kvm *kvm = vcpu->kvm;

If you're going to bother grabbing kvm, replace the instances below that also do
vcpu->kvm.

>  
>  	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
>  		return -EIO;
> @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  			if (oldpid)
>  				synchronize_rcu();
>  			put_pid(oldpid);
> +
> +			/*
> +			 * Since we land here even on the first vCPU run,
> +			 * we can mark that the VM has started running.

Please avoid "we", "us", etc..

"vm_started" is also ambiguous.  If we end up with a flag, then I would prefer a
much more literal name, a la created_vcpus, e.g. ran_vcpus or something.

> +			 */
> +			mutex_lock(&kvm->lock);

This adds unnecessary lock contention when running vCPUs.  The naive solution
would be:
			if (!kvm->vm_started) {
				...
			}

> +			kvm->vm_started = true;
> +			mutex_unlock(&kvm->lock);

Lastly, why is this in generic KVM?

>  		}
>  		r = kvm_arch_vcpu_ioctl_run(vcpu);
>  		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> -- 
> 2.34.1.448.ga2b2bfdf31-goog
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Raghavendra Rao Ananta Jan. 10, 2022, 11:07 p.m. UTC | #5

On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > Hi Reiji,
> >
> > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > <rananta@google.com> wrote:
> > > >
> > > > Capture the start of the KVM VM, which is basically the
> > > > start of any vCPU run. This state of the VM is helpful
> > > > in the upcoming patches to prevent user-space from
> > > > configuring certain VM features after the VM has started
> > > > running.
>
> What about live migration, where the VM has already technically been
> started before the first call to KVM_RUN?

My understanding is that a new 'struct kvm' is created on the target
machine and this flag should be reset, which would allow the VMM to
restore the firmware registers. However, we would be running KVM_RUN
for the first time on the target machine, thus setting the flag.
So, you are right; It's more of a resume operation from the guest's
point of view. I guess the name of the variable is what's confusing
here.

Thanks,
Raghavendra

Raghavendra Rao Ananta Jan. 10, 2022, 11:23 p.m. UTC | #6

On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > Capture the start of the KVM VM, which is basically the
>
> Please wrap at ~75 chars.
>
> > start of any vCPU run. This state of the VM is helpful
> > in the upcoming patches to prevent user-space from
> > configuring certain VM features after the VM has started
> > running.
>
> Please provide context of how the flag will be used.  I glanced at the future
> patches, and knowing very little about arm, I was unable to glean useful info
> about exactly who is being prevented from doing what.
>
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  include/linux/kvm_host.h | 3 +++
> >  virt/kvm/kvm_main.c      | 9 +++++++++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c310648cc8f1..d0bd8f7a026c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -623,6 +623,7 @@ struct kvm {
> >       struct notifier_block pm_notifier;
> >  #endif
> >       char stats_id[KVM_STATS_NAME_SIZE];
> > +     bool vm_started;
> >  };
> >
> >  #define kvm_err(fmt, ...) \
> > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> >       }
> >  }
> >
> > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
>
> Needs parantheses around (kvm), but why bother with a macro?  This is the same
> header that defines struct kvm.
>
No specific reason for creating a macro as such. I can remove it if it
feels noisy.
> > +
> >  extern bool kvm_rebooting;
> >
> >  extern unsigned int halt_poll_ns;
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 72c4e6b39389..962b91ac2064 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >       int r;
> >       struct kvm_fpu *fpu = NULL;
> >       struct kvm_sregs *kvm_sregs = NULL;
> > +     struct kvm *kvm = vcpu->kvm;
>
> If you're going to bother grabbing kvm, replace the instances below that also do
> vcpu->kvm.
>
> >
> >       if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> >               return -EIO;
> > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >                       if (oldpid)
> >                               synchronize_rcu();
> >                       put_pid(oldpid);
> > +
> > +                     /*
> > +                      * Since we land here even on the first vCPU run,
> > +                      * we can mark that the VM has started running.
>
> Please avoid "we", "us", etc..
>
> "vm_started" is also ambiguous.  If we end up with a flag, then I would prefer a
> much more literal name, a la created_vcpus, e.g. ran_vcpus or something.
>
> > +                      */
> > +                     mutex_lock(&kvm->lock);
>
> This adds unnecessary lock contention when running vCPUs.  The naive solution
> would be:
>                         if (!kvm->vm_started) {
>                                 ...
>                         }
>
Not sure if I understood the solution..

> > +                     kvm->vm_started = true;
> > +                     mutex_unlock(&kvm->lock);
>
> Lastly, why is this in generic KVM?
>
The v1 of the series originally had it in the arm specific code.
However, I was suggested to move it to the generic code since the book
keeping is not arch specific and could be helpful to others too [1].

Thanks for the review. I'll add your other comments as well.

Regards,
Raghavendra

[1]: https://lore.kernel.org/kvmarm/YYMKphExkqttn2w0@google.com/

> >               }
> >               r = kvm_arch_vcpu_ioctl_run(vcpu);
> >               trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Jim Mattson Jan. 10, 2022, 11:57 p.m. UTC | #7

On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
<rananta@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > <rananta@google.com> wrote:
> > >
> > > Hi Reiji,
> > >
> > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > > >
> > > > Hi Raghu,
> > > >
> > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > <rananta@google.com> wrote:
> > > > >
> > > > > Capture the start of the KVM VM, which is basically the
> > > > > start of any vCPU run. This state of the VM is helpful
> > > > > in the upcoming patches to prevent user-space from
> > > > > configuring certain VM features after the VM has started
> > > > > running.
> >
> > What about live migration, where the VM has already technically been
> > started before the first call to KVM_RUN?
>
> My understanding is that a new 'struct kvm' is created on the target
> machine and this flag should be reset, which would allow the VMM to
> restore the firmware registers. However, we would be running KVM_RUN
> for the first time on the target machine, thus setting the flag.
> So, you are right; It's more of a resume operation from the guest's
> point of view. I guess the name of the variable is what's confusing
> here.

I was actually thinking that live migration gives userspace an easy
way to circumvent your restriction. You said, "This state of the VM is
helpful in the upcoming patches to prevent user-space from configuring
certain VM features after the VM has started running." However, if you
don't ensure that these VM features are configured the same way on the
target machine as they were on the source machine, you have not
actually accomplished your stated goal.

> Thanks,
> Raghavendra

Reiji Watanabe Jan. 11, 2022, 12:03 a.m. UTC | #8

On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
<rananta@google.com> wrote:
>
> Hi Reiji,
>
> On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <rananta@google.com> wrote:
> > >
> > > Capture the start of the KVM VM, which is basically the
> > > start of any vCPU run. This state of the VM is helpful
> > > in the upcoming patches to prevent user-space from
> > > configuring certain VM features after the VM has started
> > > running.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > ---
> > >  include/linux/kvm_host.h | 3 +++
> > >  virt/kvm/kvm_main.c      | 9 +++++++++
> > >  2 files changed, 12 insertions(+)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index c310648cc8f1..d0bd8f7a026c 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -623,6 +623,7 @@ struct kvm {
> > >         struct notifier_block pm_notifier;
> > >  #endif
> > >         char stats_id[KVM_STATS_NAME_SIZE];
> > > +       bool vm_started;
> >
> > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> > started yet, the name might be a bit misleading IMHO.  I would
> > think 'has_run_once' or 'ran_once' might be more clear (?).
> >
> I always struggle with the names; but if you feel that 'ran_once'
> makes more sense for a reader, I can change it.

I would prefer 'ran_once'.


> > >  };
> > >
> > >  #define kvm_err(fmt, ...) \
> > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > >         }
> > >  }
> > >
> > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > > +
> > >  extern bool kvm_rebooting;
> > >
> > >  extern unsigned int halt_poll_ns;
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 72c4e6b39389..962b91ac2064 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > >         int r;
> > >         struct kvm_fpu *fpu = NULL;
> > >         struct kvm_sregs *kvm_sregs = NULL;
> > > +       struct kvm *kvm = vcpu->kvm;
> > >
> > >         if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > >                 return -EIO;
> > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > >                         if (oldpid)
> > >                                 synchronize_rcu();
> > >                         put_pid(oldpid);
> > > +
> > > +                       /*
> > > +                        * Since we land here even on the first vCPU run,
> > > +                        * we can mark that the VM has started running.
> > > +                        */
> >
> > It might be nicer to add a comment why the code below gets kvm->lock.
> >
> I've been going back and forth on this one. Initially I considered
> simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
> framework for hypercall bitmap firmware registers)
> kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
> have the whole 'is the register busy?' operation atomic. But, that's
> just one of the applications.

I understand why you need the code to get the lock here with the
current implementation.
But, since the code just set the one field (vm_started) with the lock,
I thought the intention of getting the lock might not be so obvious.
(But, maybe clear enough looking at the code in the patch-4)

Thanks,
Reiji


> > Anyway, the patch generally looks good to me, and thank you
> > for making this change (it works for my purpose as well).
> >
> > Reviewed-by: Reiji Watanabe <reijiw@google.com>
> >
> Glad that it's helping you as well and thanks for the review.
>
> Regards,
> Raghavendra
>
> > Thanks,
> > Reiji
> >
> >
> > > +                       mutex_lock(&kvm->lock);
> > > +                       kvm->vm_started = true;
> > > +                       mutex_unlock(&kvm->lock);
> > >                 }
> > >                 r = kvm_arch_vcpu_ioctl_run(vcpu);
> > >                 trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > > --
> > > 2.34.1.448.ga2b2bfdf31-goog
> > >

Sean Christopherson Jan. 11, 2022, 5:36 p.m. UTC | #9

On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote:
> On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> >
> > Needs parantheses around (kvm), but why bother with a macro?  This is the same
> > header that defines struct kvm.
> >
> No specific reason for creating a macro as such. I can remove it if it
> feels noisy.

Please do.  In the future, don't use a macro unless there's a good reason to do
so.  Don't get me wrong, I love abusing macros, but for things like this they are
completely inferior to

  static inline bool kvm_vm_has_started(struct kvm *kvm)
  {
  	return kvm->vm_started;
  }

because a helper function gives us type safety, doesn't suffer from concatenation
of tokens potentially doing weird things, is easier to extend to a multi-line
implementation, etc...

An example of when it's ok to use a macro is x86's

  #define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)

which uses a macro instead of a proper function to avoid a circular dependency
due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h
and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the
definition of struct kvm_vcpu.  But that's very much an exception and done only
because the alternatives suck more.

> > > +                      */
> > > +                     mutex_lock(&kvm->lock);
> >
> > This adds unnecessary lock contention when running vCPUs.  The naive solution
> > would be:
> >                         if (!kvm->vm_started) {
> >                                 ...
> >                         }
> >
> Not sure if I understood the solution..

In your proposed patch, KVM_RUN will take kvm->lock _every_ time.  That introduces
unnecessary contention as it will serialize this bit of code if multiple vCPUs
are attempting KVM_RUN.  By checking !vm_started, only the "first" KVM_RUN for a
VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
There's still a possibility that multiple vCPUs will contend for kvm->lock on their
first KVM_RUN, hence the quotes.  I called it "naive" because it's possible there's
a more elegant solution depending on the use case, e.g. a lockless approach might
work (or it might not).

> > > +                     kvm->vm_started = true;
> > > +                     mutex_unlock(&kvm->lock);
> >
> > Lastly, why is this in generic KVM?
> >
> The v1 of the series originally had it in the arm specific code.
> However, I was suggested to move it to the generic code since the book
> keeping is not arch specific and could be helpful to others too [1].

I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
but I'm skeptical in this particular case.  The code _is_ arch specific in that
arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
versus a hypothetical x86 use case that might be completely ok with a lockless
implementation.  And it's not obvious that there's a plausible, safe use case
outside of arm64, e.g. on x86, there is very, very little that is truly shared
across the entire VM/system, most things are per-thread/core/package in some way,
shape, or form.  In other words, I'm a wary of providing something like this for
x86 because odds are good that any use will be functionally incorrect.

Raghavendra Rao Ananta Jan. 11, 2022, 6:46 p.m. UTC | #10

On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote:
> > On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > >
> > > Needs parantheses around (kvm), but why bother with a macro?  This is the same
> > > header that defines struct kvm.
> > >
> > No specific reason for creating a macro as such. I can remove it if it
> > feels noisy.
>
> Please do.  In the future, don't use a macro unless there's a good reason to do
> so.  Don't get me wrong, I love abusing macros, but for things like this they are
> completely inferior to
>
>   static inline bool kvm_vm_has_started(struct kvm *kvm)
>   {
>         return kvm->vm_started;
>   }
>
> because a helper function gives us type safety, doesn't suffer from concatenation
> of tokens potentially doing weird things, is easier to extend to a multi-line
> implementation, etc...
>
> An example of when it's ok to use a macro is x86's
>
>   #define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
>
> which uses a macro instead of a proper function to avoid a circular dependency
> due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h
> and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the
> definition of struct kvm_vcpu.  But that's very much an exception and done only
> because the alternatives suck more.
>
Understood. Thanks for the explanation! Will switch to an inline function.

> > > > +                      */
> > > > +                     mutex_lock(&kvm->lock);
> > >
> > > This adds unnecessary lock contention when running vCPUs.  The naive solution
> > > would be:
> > >                         if (!kvm->vm_started) {
> > >                                 ...
> > >                         }
> > >
> > Not sure if I understood the solution..
>
> In your proposed patch, KVM_RUN will take kvm->lock _every_ time.  That introduces
> unnecessary contention as it will serialize this bit of code if multiple vCPUs
> are attempting KVM_RUN.  By checking !vm_started, only the "first" KVM_RUN for a
> VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> first KVM_RUN, hence the quotes.  I called it "naive" because it's possible there's
> a more elegant solution depending on the use case, e.g. a lockless approach might
> work (or it might not).
>
But is it safe to read kvm->vm_started without grabbing the lock in
the first place? use atomic_t maybe for this?

> > > > +                     kvm->vm_started = true;
> > > > +                     mutex_unlock(&kvm->lock);
> > >
> > > Lastly, why is this in generic KVM?
> > >
> > The v1 of the series originally had it in the arm specific code.
> > However, I was suggested to move it to the generic code since the book
> > keeping is not arch specific and could be helpful to others too [1].
>
> I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> but I'm skeptical in this particular case.  The code _is_ arch specific in that
> arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> versus a hypothetical x86 use case that might be completely ok with a lockless
> implementation.  And it's not obvious that there's a plausible, safe use case
> outside of arm64, e.g. on x86, there is very, very little that is truly shared
> across the entire VM/system, most things are per-thread/core/package in some way,
> shape, or form.  In other words, I'm a wary of providing something like this for
> x86 because odds are good that any use will be functionally incorrect.
I've been going back and forth on this. I've seen a couple of
variables declared in the generic struct and used only in the arch
code. vcpu->valid_wakeup for instance, which is used only by s390
arch. Maybe I'm looking at it the wrong way as to what can and can't
go in the generic kvm code.

Thanks,
Raghavendra

Raghavendra Rao Ananta Jan. 11, 2022, 6:52 p.m. UTC | #11

On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > <rananta@google.com> wrote:
> > > >
> > > > Hi Reiji,
> > > >
> > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > > > >
> > > > > Hi Raghu,
> > > > >
> > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > <rananta@google.com> wrote:
> > > > > >
> > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > in the upcoming patches to prevent user-space from
> > > > > > configuring certain VM features after the VM has started
> > > > > > running.
> > >
> > > What about live migration, where the VM has already technically been
> > > started before the first call to KVM_RUN?
> >
> > My understanding is that a new 'struct kvm' is created on the target
> > machine and this flag should be reset, which would allow the VMM to
> > restore the firmware registers. However, we would be running KVM_RUN
> > for the first time on the target machine, thus setting the flag.
> > So, you are right; It's more of a resume operation from the guest's
> > point of view. I guess the name of the variable is what's confusing
> > here.
>
> I was actually thinking that live migration gives userspace an easy
> way to circumvent your restriction. You said, "This state of the VM is
> helpful in the upcoming patches to prevent user-space from configuring
> certain VM features after the VM has started running." However, if you
> don't ensure that these VM features are configured the same way on the
> target machine as they were on the source machine, you have not
> actually accomplished your stated goal.
>
Isn't that up to the VMM to save/restore and validate the registers
across migrations?
Perhaps I have to re-word my intentions for the patch- userspace
should be able to configure the registers before issuing the first
KVM_RUN.

Thanks,
Raghavendra

> > Thanks,
> > Raghavendra

Raghavendra Rao Ananta Jan. 11, 2022, 6:54 p.m. UTC | #12

On Mon, Jan 10, 2022 at 4:04 PM Reiji Watanabe <reijiw@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > Hi Reiji,
> >
> > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > <rananta@google.com> wrote:
> > > >
> > > > Capture the start of the KVM VM, which is basically the
> > > > start of any vCPU run. This state of the VM is helpful
> > > > in the upcoming patches to prevent user-space from
> > > > configuring certain VM features after the VM has started
> > > > running.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > ---
> > > >  include/linux/kvm_host.h | 3 +++
> > > >  virt/kvm/kvm_main.c      | 9 +++++++++
> > > >  2 files changed, 12 insertions(+)
> > > >
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index c310648cc8f1..d0bd8f7a026c 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -623,6 +623,7 @@ struct kvm {
> > > >         struct notifier_block pm_notifier;
> > > >  #endif
> > > >         char stats_id[KVM_STATS_NAME_SIZE];
> > > > +       bool vm_started;
> > >
> > > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> > > started yet, the name might be a bit misleading IMHO.  I would
> > > think 'has_run_once' or 'ran_once' might be more clear (?).
> > >
> > I always struggle with the names; but if you feel that 'ran_once'
> > makes more sense for a reader, I can change it.
>
> I would prefer 'ran_once'.
>
Yes, that makes sense. I think the name created a lot of confusion for
the patch.

Thanks,
Raghavendra
>
> > > >  };
> > > >
> > > >  #define kvm_err(fmt, ...) \
> > > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > > >         }
> > > >  }
> > > >
> > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > > > +
> > > >  extern bool kvm_rebooting;
> > > >
> > > >  extern unsigned int halt_poll_ns;
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 72c4e6b39389..962b91ac2064 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > >         int r;
> > > >         struct kvm_fpu *fpu = NULL;
> > > >         struct kvm_sregs *kvm_sregs = NULL;
> > > > +       struct kvm *kvm = vcpu->kvm;
> > > >
> > > >         if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > > >                 return -EIO;
> > > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > >                         if (oldpid)
> > > >                                 synchronize_rcu();
> > > >                         put_pid(oldpid);
> > > > +
> > > > +                       /*
> > > > +                        * Since we land here even on the first vCPU run,
> > > > +                        * we can mark that the VM has started running.
> > > > +                        */
> > >
> > > It might be nicer to add a comment why the code below gets kvm->lock.
> > >
> > I've been going back and forth on this one. Initially I considered
> > simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
> > framework for hypercall bitmap firmware registers)
> > kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
> > have the whole 'is the register busy?' operation atomic. But, that's
> > just one of the applications.
>
> I understand why you need the code to get the lock here with the
> current implementation.
> But, since the code just set the one field (vm_started) with the lock,
> I thought the intention of getting the lock might not be so obvious.
> (But, maybe clear enough looking at the code in the patch-4)
>
> Thanks,
> Reiji
>
>
> > > Anyway, the patch generally looks good to me, and thank you
> > > for making this change (it works for my purpose as well).
> > >
> > > Reviewed-by: Reiji Watanabe <reijiw@google.com>
> > >
> > Glad that it's helping you as well and thanks for the review.
> >
> > Regards,
> > Raghavendra
> >
> > > Thanks,
> > > Reiji
> > >
> > >
> > > > +                       mutex_lock(&kvm->lock);
> > > > +                       kvm->vm_started = true;
> > > > +                       mutex_unlock(&kvm->lock);
> > > >                 }
> > > >                 r = kvm_arch_vcpu_ioctl_run(vcpu);
> > > >                 trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > > > --
> > > > 2.34.1.448.ga2b2bfdf31-goog
> > > >

Sean Christopherson Jan. 11, 2022, 7:04 p.m. UTC | #13

On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote:
> On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote:
> > In your proposed patch, KVM_RUN will take kvm->lock _every_ time.  That introduces
> > unnecessary contention as it will serialize this bit of code if multiple vCPUs
> > are attempting KVM_RUN.  By checking !vm_started, only the "first" KVM_RUN for a
> > VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> > There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> > first KVM_RUN, hence the quotes.  I called it "naive" because it's possible there's
> > a more elegant solution depending on the use case, e.g. a lockless approach might
> > work (or it might not).
> >
> But is it safe to read kvm->vm_started without grabbing the lock in
> the first place?

Don't know, but that's my point.  Without a consumer in generic KVM and due to
my lack of arm64 knowledge, without a high-level description of how the flag will
be used by arm64, it's really difficult to determine what's safe and what's not.
For other architectures, it's an impossible question to answer because we don't
know how the flag might be used.

> use atomic_t maybe for this?

No.  An atomic_t is generally useful only if there are multiple writers that can
possibly write different values.  It's highly unlikely that simply switching to an
atomic address the needs of arm64.

> > > > > +                     kvm->vm_started = true;
> > > > > +                     mutex_unlock(&kvm->lock);
> > > >
> > > > Lastly, why is this in generic KVM?
> > > >
> > > The v1 of the series originally had it in the arm specific code.
> > > However, I was suggested to move it to the generic code since the book
> > > keeping is not arch specific and could be helpful to others too [1].
> >
> > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> > but I'm skeptical in this particular case.  The code _is_ arch specific in that
> > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> > versus a hypothetical x86 use case that might be completely ok with a lockless
> > implementation.  And it's not obvious that there's a plausible, safe use case
> > outside of arm64, e.g. on x86, there is very, very little that is truly shared
> > across the entire VM/system, most things are per-thread/core/package in some way,
> > shape, or form.  In other words, I'm a wary of providing something like this for
> > x86 because odds are good that any use will be functionally incorrect.
> I've been going back and forth on this. I've seen a couple of
> variables declared in the generic struct and used only in the arch
> code. vcpu->valid_wakeup for instance, which is used only by s390
> arch. Maybe I'm looking at it the wrong way as to what can and can't
> go in the generic kvm code.

Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of
arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup().

That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics
for how it is used, so it's purely a "this code is rather odd" issue.  vm_started
on the other hand is only produced by generic KVM, and so its required semantics are
unclear.

Jim Mattson Jan. 11, 2022, 7:16 p.m. UTC | #14

On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta
<rananta@google.com> wrote:
>
> On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> > <rananta@google.com> wrote:
> > >
> > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote:
> > > >
> > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > > <rananta@google.com> wrote:
> > > > >
> > > > > Hi Reiji,
> > > > >
> > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > > > > >
> > > > > > Hi Raghu,
> > > > > >
> > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > > <rananta@google.com> wrote:
> > > > > > >
> > > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > > in the upcoming patches to prevent user-space from
> > > > > > > configuring certain VM features after the VM has started
> > > > > > > running.
> > > >
> > > > What about live migration, where the VM has already technically been
> > > > started before the first call to KVM_RUN?
> > >
> > > My understanding is that a new 'struct kvm' is created on the target
> > > machine and this flag should be reset, which would allow the VMM to
> > > restore the firmware registers. However, we would be running KVM_RUN
> > > for the first time on the target machine, thus setting the flag.
> > > So, you are right; It's more of a resume operation from the guest's
> > > point of view. I guess the name of the variable is what's confusing
> > > here.
> >
> > I was actually thinking that live migration gives userspace an easy
> > way to circumvent your restriction. You said, "This state of the VM is
> > helpful in the upcoming patches to prevent user-space from configuring
> > certain VM features after the VM has started running." However, if you
> > don't ensure that these VM features are configured the same way on the
> > target machine as they were on the source machine, you have not
> > actually accomplished your stated goal.
> >
> Isn't that up to the VMM to save/restore and validate the registers
> across migrations?

Yes, just as it is up to userspace not to make bad configuration
changes after the first VMRUN.

> Perhaps I have to re-word my intentions for the patch- userspace
> should be able to configure the registers before issuing the first
> KVM_RUN.

Perhaps it would help if you explained *why* you are doing this. It
sounds like you are either trying to protect against a malicious
userspace, or you are trying to keep userspace from doing something
stupid. In general, kvm only enforces constraints that are necessary
to protect the host. If that's what you're doing, I don't understand
why live migration doesn't provide an end-run around your protections.

Raghavendra Rao Ananta Jan. 12, 2022, 6:08 p.m. UTC | #15

On Tue, Jan 11, 2022 at 11:04 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote:
> > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time.  That introduces
> > > unnecessary contention as it will serialize this bit of code if multiple vCPUs
> > > are attempting KVM_RUN.  By checking !vm_started, only the "first" KVM_RUN for a
> > > VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> > > There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> > > first KVM_RUN, hence the quotes.  I called it "naive" because it's possible there's
> > > a more elegant solution depending on the use case, e.g. a lockless approach might
> > > work (or it might not).
> > >
> > But is it safe to read kvm->vm_started without grabbing the lock in
> > the first place?
>
> Don't know, but that's my point.  Without a consumer in generic KVM and due to
> my lack of arm64 knowledge, without a high-level description of how the flag will
> be used by arm64, it's really difficult to determine what's safe and what's not.
> For other architectures, it's an impossible question to answer because we don't
> know how the flag might be used.
>
> > use atomic_t maybe for this?
>
> No.  An atomic_t is generally useful only if there are multiple writers that can
> possibly write different values.  It's highly unlikely that simply switching to an
> atomic address the needs of arm64.
>
> > > > > > +                     kvm->vm_started = true;
> > > > > > +                     mutex_unlock(&kvm->lock);
> > > > >
> > > > > Lastly, why is this in generic KVM?
> > > > >
> > > > The v1 of the series originally had it in the arm specific code.
> > > > However, I was suggested to move it to the generic code since the book
> > > > keeping is not arch specific and could be helpful to others too [1].
> > >
> > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> > > but I'm skeptical in this particular case.  The code _is_ arch specific in that
> > > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> > > versus a hypothetical x86 use case that might be completely ok with a lockless
> > > implementation.  And it's not obvious that there's a plausible, safe use case
> > > outside of arm64, e.g. on x86, there is very, very little that is truly shared
> > > across the entire VM/system, most things are per-thread/core/package in some way,
> > > shape, or form.  In other words, I'm a wary of providing something like this for
> > > x86 because odds are good that any use will be functionally incorrect.
> > I've been going back and forth on this. I've seen a couple of
> > variables declared in the generic struct and used only in the arch
> > code. vcpu->valid_wakeup for instance, which is used only by s390
> > arch. Maybe I'm looking at it the wrong way as to what can and can't
> > go in the generic kvm code.
>
> Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of
> arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup().
>
> That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics
> for how it is used, so it's purely a "this code is rather odd" issue.  vm_started
> on the other hand is only produced by generic KVM, and so its required semantics are
> unclear.

Understood. I'll move it to arm64 and we can refactor it if there's a
need for any other arch(s).

Thanks,
Raghavendra

Sean Christopherson Jan. 12, 2022, 6:24 p.m. UTC | #16

On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> Understood. I'll move it to arm64 and we can refactor it if there's a
> need for any other arch(s).

Before you do that, can you answer Jim's question on _why_ arm64 needs this?

Raghavendra Rao Ananta Jan. 12, 2022, 6:29 p.m. UTC | #17

On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
>
> On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> > > <rananta@google.com> wrote:
> > > >
> > > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote:
> > > > >
> > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > > > <rananta@google.com> wrote:
> > > > > >
> > > > > > Hi Reiji,
> > > > > >
> > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote:
> > > > > > >
> > > > > > > Hi Raghu,
> > > > > > >
> > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > > > <rananta@google.com> wrote:
> > > > > > > >
> > > > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > > > in the upcoming patches to prevent user-space from
> > > > > > > > configuring certain VM features after the VM has started
> > > > > > > > running.
> > > > >
> > > > > What about live migration, where the VM has already technically been
> > > > > started before the first call to KVM_RUN?
> > > >
> > > > My understanding is that a new 'struct kvm' is created on the target
> > > > machine and this flag should be reset, which would allow the VMM to
> > > > restore the firmware registers. However, we would be running KVM_RUN
> > > > for the first time on the target machine, thus setting the flag.
> > > > So, you are right; It's more of a resume operation from the guest's
> > > > point of view. I guess the name of the variable is what's confusing
> > > > here.
> > >
> > > I was actually thinking that live migration gives userspace an easy
> > > way to circumvent your restriction. You said, "This state of the VM is
> > > helpful in the upcoming patches to prevent user-space from configuring
> > > certain VM features after the VM has started running." However, if you
> > > don't ensure that these VM features are configured the same way on the
> > > target machine as they were on the source machine, you have not
> > > actually accomplished your stated goal.
> > >
> > Isn't that up to the VMM to save/restore and validate the registers
> > across migrations?
>
> Yes, just as it is up to userspace not to make bad configuration
> changes after the first VMRUN.
>
> > Perhaps I have to re-word my intentions for the patch- userspace
> > should be able to configure the registers before issuing the first
> > KVM_RUN.
>
> Perhaps it would help if you explained *why* you are doing this. It
> sounds like you are either trying to protect against a malicious
> userspace, or you are trying to keep userspace from doing something
> stupid. In general, kvm only enforces constraints that are necessary
> to protect the host. If that's what you're doing, I don't understand
> why live migration doesn't provide an end-run around your protections.
It's mainly to safeguard the guests. With respect to migration, KVM
and the userspace are collectively playing a role here. It's up to the
userspace to ensure that the registers are configured the same across
migrations and KVM ensures that the userspace doesn't modify the
registers after KVM_RUN so that they don't see features turned OFF/ON
during execution. I'm not sure if it falls into the definition of
protecting the host. Do you see a value in adding this extra
protection from KVM?

Regards,
Raghavendra

Sean Christopherson Jan. 12, 2022, 6:31 p.m. UTC | #18

On Wed, Jan 12, 2022, Sean Christopherson wrote:
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > Understood. I'll move it to arm64 and we can refactor it if there's a
> > need for any other arch(s).
> 
> Before you do that, can you answer Jim's question on _why_ arm64 needs this?

Gah, sorry, kvmarm@lists.cs.columbia.edu once again lost the spam battle with Gmail.

Sean Christopherson Jan. 13, 2022, 5:21 p.m. UTC | #19

On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
> > Perhaps it would help if you explained *why* you are doing this. It
> > sounds like you are either trying to protect against a malicious
> > userspace, or you are trying to keep userspace from doing something
> > stupid. In general, kvm only enforces constraints that are necessary
> > to protect the host. If that's what you're doing, I don't understand
> > why live migration doesn't provide an end-run around your protections.
> It's mainly to safeguard the guests. With respect to migration, KVM
> and the userspace are collectively playing a role here. It's up to the
> userspace to ensure that the registers are configured the same across
> migrations and KVM ensures that the userspace doesn't modify the
> registers after KVM_RUN so that they don't see features turned OFF/ON
> during execution. I'm not sure if it falls into the definition of
> protecting the host. Do you see a value in adding this extra
> protection from KVM?

Short answer: probably not?

There is precedent for disallowing userspace from doing stupid things, but that's
either for KVM's protection (as Jim pointed out), or because KVM can't honor the
change, e.g. x86 is currently in the process of disallowing most CPUID changes
after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
support updating some of it's own internal state (because removing features like
GB hugepage support is nonsensical and would require a large pile of complicated,
messy code).

Restricing CPUID changes does offer some "protection" to the guest, but that's
not the goal.  E.g. KVM won't detect CPUID misconfiguration in the migration
case, and trying to do so is a fool's errand.

If restricting updates in the arm64 is necessary to ensure KVM provides sane
behavior, then it could be justified.  But if it's purely a sanity check on
behalf of the guest, then it's not justified.

Raghavendra Rao Ananta Jan. 14, 2022, 12:42 a.m. UTC | #20

On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
>
> Short answer: probably not?
>
> There is precedent for disallowing userspace from doing stupid things, but that's
> either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> change, e.g. x86 is currently in the process of disallowing most CPUID changes
> after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> support updating some of it's own internal state (because removing features like
> GB hugepage support is nonsensical and would require a large pile of complicated,
> messy code).
>
> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal.  E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
>
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified.  But if it's purely a sanity check on
> behalf of the guest, then it's not justified.
Agreed that KVM doesn't really safeguard the guests, but just curious,
is there really a downside in adding this thin layer of safety check?
On the bright side, the guests would be safe, and it could save the
developers some time in hunting down the bugs in this path, no?

Regards,
Raghavendra

Sean Christopherson Jan. 14, 2022, 1:10 a.m. UTC | #21

On Thu, Jan 13, 2022, Raghavendra Rao Ananta wrote:
> On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > If restricting updates in the arm64 is necessary to ensure KVM provides sane
> > behavior, then it could be justified.  But if it's purely a sanity check on
> > behalf of the guest, then it's not justified.
> Agreed that KVM doesn't really safeguard the guests, but just curious,
> is there really a downside in adding this thin layer of safety check?

It's more stuff that KVM has to maintain, creates an ABI that KVM must adhere to,
potentially creates inconsistencies in KVM, and prevents using KVM to intentionally
do stupid things to test scenarios that are "impossible".  And we also try to avoid
defining arbitrary CPU behavior in KVM (that may not be the case here).

> On the bright side, the guests would be safe, and it could save the
> developers some time in hunting down the bugs in this path, no?

Yes, but that can be said for lots and lots of things.  This is both a slippery
slope argument and the inconsistency argument above, e.g. if KVM actively prevents
userspace from doing X, why doesn't KVM prevent userspace from doing Y?  Having a
decently defined rule for these types of things, e.g. protect KVM/kernel and adhere
to the architecture but otherwise let userspace do whatever, avoids spending too
much time arguing over what KVM should/shouldn't allow, or wondering why on earth
KVM does XYZ, at least in theory :-)

There are certainly times where KVM could have saved userspace some pain, but
overall I do think KVM is better off staying out of the way when possible.

Reiji Watanabe Jan. 14, 2022, 9:51 p.m. UTC | #22

On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
>
> Short answer: probably not?
>
> There is precedent for disallowing userspace from doing stupid things, but that's
> either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> change, e.g. x86 is currently in the process of disallowing most CPUID changes
> after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> support updating some of it's own internal state (because removing features like
> GB hugepage support is nonsensical and would require a large pile of complicated,
> messy code).
>
> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal.  E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
>
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified.  But if it's purely a sanity check on
> behalf of the guest, then it's not justified.

The pseudo firmware hvc registers, which this series are adding, are
used by KVM to identify available hvc features for the guest, and not
directly exposed to the guest as registers.
The ways the KVM code in the series consumes the registers' values are
very limited, and no KVM data/state is created based on their values.
But, as the code that consumes the registers grows in the future,
I wouldn't be surprised if KVM consumes them differently than it does
now (e.g. create another data structure based on the register values).
I'm not sure though :)

The restriction, with which KVM doesn't need to worry about the changes
in the registers after KVM_RUN, could potentially protect or be useful
to protect KVM and simplify future changes/maintenance of the KVM codes
that consumes the values.
I thought this was one of the reasons for having the restriction.

Thanks,
Reiji

Raghavendra Rao Ananta Jan. 18, 2022, 10:54 p.m. UTC | #23

On Fri, Jan 14, 2022 at 1:51 PM Reiji Watanabe <reijiw@google.com> wrote:
>
> On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
> > > > Perhaps it would help if you explained *why* you are doing this. It
> > > > sounds like you are either trying to protect against a malicious
> > > > userspace, or you are trying to keep userspace from doing something
> > > > stupid. In general, kvm only enforces constraints that are necessary
> > > > to protect the host. If that's what you're doing, I don't understand
> > > > why live migration doesn't provide an end-run around your protections.
> > > It's mainly to safeguard the guests. With respect to migration, KVM
> > > and the userspace are collectively playing a role here. It's up to the
> > > userspace to ensure that the registers are configured the same across
> > > migrations and KVM ensures that the userspace doesn't modify the
> > > registers after KVM_RUN so that they don't see features turned OFF/ON
> > > during execution. I'm not sure if it falls into the definition of
> > > protecting the host. Do you see a value in adding this extra
> > > protection from KVM?
> >
> > Short answer: probably not?
> >
> > There is precedent for disallowing userspace from doing stupid things, but that's
> > either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> > change, e.g. x86 is currently in the process of disallowing most CPUID changes
> > after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> > support updating some of it's own internal state (because removing features like
> > GB hugepage support is nonsensical and would require a large pile of complicated,
> > messy code).
> >
> > Restricing CPUID changes does offer some "protection" to the guest, but that's
> > not the goal.  E.g. KVM won't detect CPUID misconfiguration in the migration
> > case, and trying to do so is a fool's errand.
> >
> > If restricting updates in the arm64 is necessary to ensure KVM provides sane
> > behavior, then it could be justified.  But if it's purely a sanity check on
> > behalf of the guest, then it's not justified.
>
> The pseudo firmware hvc registers, which this series are adding, are
> used by KVM to identify available hvc features for the guest, and not
> directly exposed to the guest as registers.
> The ways the KVM code in the series consumes the registers' values are
> very limited, and no KVM data/state is created based on their values.
> But, as the code that consumes the registers grows in the future,
> I wouldn't be surprised if KVM consumes them differently than it does
> now (e.g. create another data structure based on the register values).
> I'm not sure though :)
>
> The restriction, with which KVM doesn't need to worry about the changes
> in the registers after KVM_RUN, could potentially protect or be useful
> to protect KVM and simplify future changes/maintenance of the KVM codes
> that consumes the values.
> I thought this was one of the reasons for having the restriction.
>
Well, that wasn't the original intention of the patch, but just to
protect the guests from the userspace's dynamic updates. Having said
that, and based on what Sean mentioned in his last reply, it could be
inconsistent from what KVM has been doing so far and would be
difficult to cover all the scenarios that userspace can mess things up
for guests.
I'll plan to drop this patch in the next version, and bring it back
back to arm64 if we really need it.

Thanks Sean, Jim, and Reiji for the comments and discussion.

Regards,
Raghavendra
> Thanks,
> Reiji

Sean Christopherson Jan. 19, 2022, 12:07 a.m. UTC | #24

On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> The restriction, with which KVM doesn't need to worry about the changes
> in the registers after KVM_RUN, could potentially protect or be useful
> to protect KVM and simplify future changes/maintenance of the KVM codes
> that consumes the values.

That sort of protection is definitely welcome, the previously mentioned CPUID mess
on x86 would have benefit greatly by KVM being restrictive in the past.  That said,
hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
e.g. if there are side effects that are visible in other registers, than an update
can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
inconsitent.

If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
a relatively common pattern in KVM, and provides a clear boundary to userpace
regarding what is/isn't allowed.

Reiji Watanabe Jan. 19, 2022, 7:47 a.m. UTC | #25

On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > The restriction, with which KVM doesn't need to worry about the changes
> > in the registers after KVM_RUN, could potentially protect or be useful
> > to protect KVM and simplify future changes/maintenance of the KVM codes
> > that consumes the values.
>
> That sort of protection is definitely welcome, the previously mentioned CPUID mess
> on x86 would have benefit greatly by KVM being restrictive in the past.  That said,
> hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> e.g. if there are side effects that are visible in other registers, than an update
> can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> inconsitent.

Thank you for your comments !
I think I understand your concern, and that's a great point.
That's not the case for those pseudo registers though at least for now :)
BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
same for the option with "if kvm->created_vcpus > 0" ?)


> If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> a relatively common pattern in KVM, and provides a clear boundary to userpace
> regarding what is/isn't allowed.

Yes, I agree that would be better in general.  For (pseudo) registers,
I would think preventing modification if kvm->created_vcpus > 0 might
not be a very good option for KVM/ARM though considering usage of
KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.

Thanks,
Reiji

Sean Christopherson Jan. 20, 2022, 12:27 a.m. UTC | #26

On Tue, Jan 18, 2022, Reiji Watanabe wrote:
> On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > > The restriction, with which KVM doesn't need to worry about the changes
> > > in the registers after KVM_RUN, could potentially protect or be useful
> > > to protect KVM and simplify future changes/maintenance of the KVM codes
> > > that consumes the values.
> >
> > That sort of protection is definitely welcome, the previously mentioned CPUID mess
> > on x86 would have benefit greatly by KVM being restrictive in the past.  That said,
> > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> > all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> > e.g. if there are side effects that are visible in other registers, than an update
> > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> > inconsitent.
> 
> Thank you for your comments !
> I think I understand your concern, and that's a great point.
> That's not the case for those pseudo registers though at least for now :)
> BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
> same for the option with "if kvm->created_vcpus > 0" ?)

Not really?  The goal with created_vcpus is to avoid having inconsistent state in
"struct kvm_vcpu" with respect to the VM as whole.  "struct kvm" obvioulsy can't
be inconsistent with itself, e.g. even if userspace consumes some side effect,
that's simply "the state".  Did that make sense?  Hard to explain in writing :-)

> > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> > a relatively common pattern in KVM, and provides a clear boundary to userpace
> > regarding what is/isn't allowed.
> 
> Yes, I agree that would be better in general.  For (pseudo) registers,

What exactly are these pseudo registers?  If it's something that's an immutable
property of the (virtual) system, then it might make sense to use a separate,
non-vCPU mechanism for setting/getting their values.  Then you can easily restrict
the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR.

> I would think preventing modification if kvm->created_vcpus > 0 might
> not be a very good option for KVM/ARM though considering usage of
> KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.

Raghavendra Rao Ananta Jan. 20, 2022, 7:16 p.m. UTC | #27

On Wed, Jan 19, 2022 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jan 18, 2022, Reiji Watanabe wrote:
> > On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > > > The restriction, with which KVM doesn't need to worry about the changes
> > > > in the registers after KVM_RUN, could potentially protect or be useful
> > > > to protect KVM and simplify future changes/maintenance of the KVM codes
> > > > that consumes the values.
> > >
> > > That sort of protection is definitely welcome, the previously mentioned CPUID mess
> > > on x86 would have benefit greatly by KVM being restrictive in the past.  That said,
> > > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> > > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> > > all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> > > e.g. if there are side effects that are visible in other registers, than an update
> > > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> > > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> > > inconsitent.
> >
> > Thank you for your comments !
> > I think I understand your concern, and that's a great point.
> > That's not the case for those pseudo registers though at least for now :)
> > BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
> > same for the option with "if kvm->created_vcpus > 0" ?)
>
> Not really?  The goal with created_vcpus is to avoid having inconsistent state in
> "struct kvm_vcpu" with respect to the VM as whole.  "struct kvm" obvioulsy can't
> be inconsistent with itself, e.g. even if userspace consumes some side effect,
> that's simply "the state".  Did that make sense?  Hard to explain in writing :-)
>
> > > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> > > a relatively common pattern in KVM, and provides a clear boundary to userpace
> > > regarding what is/isn't allowed.
> >
> > Yes, I agree that would be better in general.  For (pseudo) registers,
>
> What exactly are these pseudo registers?  If it's something that's an immutable
> property of the (virtual) system, then it might make sense to use a separate,
> non-vCPU mechanism for setting/getting their values.  Then you can easily restrict
> the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR.
>
In general, these pseudo-registers are reserved non-architectural
register spaces, currently being used to represent KVM-as-a-firmware's
versioning across guests' migrations [1]. That is, the user-space
configures these registers for the guests to see same 'firmware'
versions before and after migrations. The model is built over the
existing KVM_GET_REG_LIST and KVM_[SET|GET]_ONE_REG APIs. Since this
series' efforts falls into the same realm, the idea was keep this
consistent with the existing model to which VMMs (such as QEMU) are
already used to.

Granted, even though these registers should technically be immutable,
there was no similar protection employed for the existing
psuedo-registers. I was wondering if that could be of any value if we
start providing one; But I guess not, since it may break the
user-space's expectations of KVM (and probably why we didn't have it
earlier).

Regards,
Raghavendra

[1]: https://github.com/torvalds/linux/blob/master/Documentation/virt/kvm/arm/psci.rst

> > I would think preventing modification if kvm->created_vcpus > 0 might
> > not be a very good option for KVM/ARM though considering usage of
> > KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.

Marc Zyngier Jan. 25, 2022, 3:10 p.m. UTC | #28

It's probably time I jump on this thread

On Thu, 13 Jan 2022 17:21:19 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
> 
> Short answer: probably not?

Well, I beg to defer.

> There is precedent for disallowing userspace from doing stupid
> things, but that's either for KVM's protection (as Jim pointed out),
> or because KVM can't honor the change, e.g. x86 is currently in the
> process of disallowing most CPUID changes after KVM_RUN because KVM
> itself consumes the CPUID information and KVM doesn't support
> updating some of it's own internal state (because removing features
> like GB hugepage support is nonsensical and would require a large
> pile of complicated, messy code).

We provide quite a lot of CPU emulation based on the feature
registers exposed to the guest. Allowing userspace to change this
stuff once the guest is running is a massive attack vector.

> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal.  E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
> 
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified.  But if it's purely a sanity check on
> behalf of the guest, then it's not justified.

No. This is about preventing userspace from tripping the kernel into
doing things it really shouldn't by flipping bits of configuration
that should be set in stone.

	M.

Marc Zyngier Jan. 25, 2022, 3:15 p.m. UTC | #29

On Wed, 19 Jan 2022 00:07:44 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > The restriction, with which KVM doesn't need to worry about the changes
> > in the registers after KVM_RUN, could potentially protect or be useful
> > to protect KVM and simplify future changes/maintenance of the KVM codes
> > that consumes the values.
> 
> That sort of protection is definitely welcome, the previously mentioned CPUID mess
> on x86 would have benefit greatly by KVM being restrictive in the past.  That said,
> hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> e.g. if there are side effects that are visible in other registers, than an update
> can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> inconsitent.
> 
> If possible, preventing modification if kvm->created_vcpus > 0 is
> ideal as it's a relatively common pattern in KVM, and provides a
> clear boundary to userpace regarding what is/isn't allowed.

No, that's way too late. The configuration is in general per-CPU, and
I really don't want to expand the surface of the userspace API to
allow all sort of magic trick depending on the nature of what you
save/restore.

The "first run" crap is already there. We have it on a per-CPU basis,
and we need it at the VM level for other reasons (see the recent
discussion about PMU filtering vs binding to a specific PMU
implementation).

	M.

[RFC,v3,01/11] KVM: Capture VM start

Commit Message

Comments

Patch