Message ID | 20220104194918.373612-2-rananta@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Add support for hypercall services selection | expand |
Hi Raghu, On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta <rananta@google.com> wrote: > > Capture the start of the KVM VM, which is basically the > start of any vCPU run. This state of the VM is helpful > in the upcoming patches to prevent user-space from > configuring certain VM features after the VM has started > running. > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > --- > include/linux/kvm_host.h | 3 +++ > virt/kvm/kvm_main.c | 9 +++++++++ > 2 files changed, 12 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c310648cc8f1..d0bd8f7a026c 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -623,6 +623,7 @@ struct kvm { > struct notifier_block pm_notifier; > #endif > char stats_id[KVM_STATS_NAME_SIZE]; > + bool vm_started; Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM started yet, the name might be a bit misleading IMHO. I would think 'has_run_once' or 'ran_once' might be more clear (?). > }; > > #define kvm_err(fmt, ...) \ > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > } > } > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > + > extern bool kvm_rebooting; > > extern unsigned int halt_poll_ns; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 72c4e6b39389..962b91ac2064 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > int r; > struct kvm_fpu *fpu = NULL; > struct kvm_sregs *kvm_sregs = NULL; > + struct kvm *kvm = vcpu->kvm; > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > return -EIO; > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > if (oldpid) > synchronize_rcu(); > put_pid(oldpid); > + > + /* > + * Since we land here even on the first vCPU run, > + * we can mark that the VM has started running. > + */ It might be nicer to add a comment why the code below gets kvm->lock. Anyway, the patch generally looks good to me, and thank you for making this change (it works for my purpose as well). Reviewed-by: Reiji Watanabe <reijiw@google.com> Thanks, Reiji > + mutex_lock(&kvm->lock); > + kvm->vm_started = true; > + mutex_unlock(&kvm->lock); > } > r = kvm_arch_vcpu_ioctl_run(vcpu); > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > -- > 2.34.1.448.ga2b2bfdf31-goog >
Hi Reiji, On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > Hi Raghu, > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > <rananta@google.com> wrote: > > > > Capture the start of the KVM VM, which is basically the > > start of any vCPU run. This state of the VM is helpful > > in the upcoming patches to prevent user-space from > > configuring certain VM features after the VM has started > > running. > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > > --- > > include/linux/kvm_host.h | 3 +++ > > virt/kvm/kvm_main.c | 9 +++++++++ > > 2 files changed, 12 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index c310648cc8f1..d0bd8f7a026c 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -623,6 +623,7 @@ struct kvm { > > struct notifier_block pm_notifier; > > #endif > > char stats_id[KVM_STATS_NAME_SIZE]; > > + bool vm_started; > > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM > started yet, the name might be a bit misleading IMHO. I would > think 'has_run_once' or 'ran_once' might be more clear (?). > I always struggle with the names; but if you feel that 'ran_once' makes more sense for a reader, I can change it. > > > }; > > > > #define kvm_err(fmt, ...) \ > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > > } > > } > > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > + > > extern bool kvm_rebooting; > > > > extern unsigned int halt_poll_ns; > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 72c4e6b39389..962b91ac2064 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > > int r; > > struct kvm_fpu *fpu = NULL; > > struct kvm_sregs *kvm_sregs = NULL; > > + struct kvm *kvm = vcpu->kvm; > > > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > > return -EIO; > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > > if (oldpid) > > synchronize_rcu(); > > put_pid(oldpid); > > + > > + /* > > + * Since we land here even on the first vCPU run, > > + * we can mark that the VM has started running. > > + */ > > It might be nicer to add a comment why the code below gets kvm->lock. > I've been going back and forth on this one. Initially I considered simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a framework for hypercall bitmap firmware registers) kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to have the whole 'is the register busy?' operation atomic. But, that's just one of the applications. > Anyway, the patch generally looks good to me, and thank you > for making this change (it works for my purpose as well). > > Reviewed-by: Reiji Watanabe <reijiw@google.com> > Glad that it's helping you as well and thanks for the review. Regards, Raghavendra > Thanks, > Reiji > > > > + mutex_lock(&kvm->lock); > > + kvm->vm_started = true; > > + mutex_unlock(&kvm->lock); > > } > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > > -- > > 2.34.1.448.ga2b2bfdf31-goog > >
On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta <rananta@google.com> wrote: > > Hi Reiji, > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > Hi Raghu, > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > <rananta@google.com> wrote: > > > > > > Capture the start of the KVM VM, which is basically the > > > start of any vCPU run. This state of the VM is helpful > > > in the upcoming patches to prevent user-space from > > > configuring certain VM features after the VM has started > > > running. What about live migration, where the VM has already technically been started before the first call to KVM_RUN?
On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote: > Capture the start of the KVM VM, which is basically the Please wrap at ~75 chars. > start of any vCPU run. This state of the VM is helpful > in the upcoming patches to prevent user-space from > configuring certain VM features after the VM has started > running. Please provide context of how the flag will be used. I glanced at the future patches, and knowing very little about arm, I was unable to glean useful info about exactly who is being prevented from doing what. > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > --- > include/linux/kvm_host.h | 3 +++ > virt/kvm/kvm_main.c | 9 +++++++++ > 2 files changed, 12 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c310648cc8f1..d0bd8f7a026c 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -623,6 +623,7 @@ struct kvm { > struct notifier_block pm_notifier; > #endif > char stats_id[KVM_STATS_NAME_SIZE]; > + bool vm_started; > }; > > #define kvm_err(fmt, ...) \ > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > } > } > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) Needs parantheses around (kvm), but why bother with a macro? This is the same header that defines struct kvm. > + > extern bool kvm_rebooting; > > extern unsigned int halt_poll_ns; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 72c4e6b39389..962b91ac2064 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > int r; > struct kvm_fpu *fpu = NULL; > struct kvm_sregs *kvm_sregs = NULL; > + struct kvm *kvm = vcpu->kvm; If you're going to bother grabbing kvm, replace the instances below that also do vcpu->kvm. > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > return -EIO; > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > if (oldpid) > synchronize_rcu(); > put_pid(oldpid); > + > + /* > + * Since we land here even on the first vCPU run, > + * we can mark that the VM has started running. Please avoid "we", "us", etc.. "vm_started" is also ambiguous. If we end up with a flag, then I would prefer a much more literal name, a la created_vcpus, e.g. ran_vcpus or something. > + */ > + mutex_lock(&kvm->lock); This adds unnecessary lock contention when running vCPUs. The naive solution would be: if (!kvm->vm_started) { ... } > + kvm->vm_started = true; > + mutex_unlock(&kvm->lock); Lastly, why is this in generic KVM? > } > r = kvm_arch_vcpu_ioctl_run(vcpu); > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > -- > 2.34.1.448.ga2b2bfdf31-goog > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote: > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > <rananta@google.com> wrote: > > > > Hi Reiji, > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > Hi Raghu, > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > <rananta@google.com> wrote: > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > start of any vCPU run. This state of the VM is helpful > > > > in the upcoming patches to prevent user-space from > > > > configuring certain VM features after the VM has started > > > > running. > > What about live migration, where the VM has already technically been > started before the first call to KVM_RUN? My understanding is that a new 'struct kvm' is created on the target machine and this flag should be reset, which would allow the VMM to restore the firmware registers. However, we would be running KVM_RUN for the first time on the target machine, thus setting the flag. So, you are right; It's more of a resume operation from the guest's point of view. I guess the name of the variable is what's confusing here. Thanks, Raghavendra
On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote: > > Capture the start of the KVM VM, which is basically the > > Please wrap at ~75 chars. > > > start of any vCPU run. This state of the VM is helpful > > in the upcoming patches to prevent user-space from > > configuring certain VM features after the VM has started > > running. > > Please provide context of how the flag will be used. I glanced at the future > patches, and knowing very little about arm, I was unable to glean useful info > about exactly who is being prevented from doing what. > > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > > --- > > include/linux/kvm_host.h | 3 +++ > > virt/kvm/kvm_main.c | 9 +++++++++ > > 2 files changed, 12 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index c310648cc8f1..d0bd8f7a026c 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -623,6 +623,7 @@ struct kvm { > > struct notifier_block pm_notifier; > > #endif > > char stats_id[KVM_STATS_NAME_SIZE]; > > + bool vm_started; > > }; > > > > #define kvm_err(fmt, ...) \ > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > > } > > } > > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > Needs parantheses around (kvm), but why bother with a macro? This is the same > header that defines struct kvm. > No specific reason for creating a macro as such. I can remove it if it feels noisy. > > + > > extern bool kvm_rebooting; > > > > extern unsigned int halt_poll_ns; > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 72c4e6b39389..962b91ac2064 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > > int r; > > struct kvm_fpu *fpu = NULL; > > struct kvm_sregs *kvm_sregs = NULL; > > + struct kvm *kvm = vcpu->kvm; > > If you're going to bother grabbing kvm, replace the instances below that also do > vcpu->kvm. > > > > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > > return -EIO; > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > > if (oldpid) > > synchronize_rcu(); > > put_pid(oldpid); > > + > > + /* > > + * Since we land here even on the first vCPU run, > > + * we can mark that the VM has started running. > > Please avoid "we", "us", etc.. > > "vm_started" is also ambiguous. If we end up with a flag, then I would prefer a > much more literal name, a la created_vcpus, e.g. ran_vcpus or something. > > > + */ > > + mutex_lock(&kvm->lock); > > This adds unnecessary lock contention when running vCPUs. The naive solution > would be: > if (!kvm->vm_started) { > ... > } > Not sure if I understood the solution.. > > + kvm->vm_started = true; > > + mutex_unlock(&kvm->lock); > > Lastly, why is this in generic KVM? > The v1 of the series originally had it in the arm specific code. However, I was suggested to move it to the generic code since the book keeping is not arch specific and could be helpful to others too [1]. Thanks for the review. I'll add your other comments as well. Regards, Raghavendra [1]: https://lore.kernel.org/kvmarm/YYMKphExkqttn2w0@google.com/ > > } > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > > -- > > 2.34.1.448.ga2b2bfdf31-goog > > > > _______________________________________________ > > kvmarm mailing list > > kvmarm@lists.cs.columbia.edu > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta <rananta@google.com> wrote: > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote: > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > > <rananta@google.com> wrote: > > > > > > Hi Reiji, > > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > > > Hi Raghu, > > > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > > <rananta@google.com> wrote: > > > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > > start of any vCPU run. This state of the VM is helpful > > > > > in the upcoming patches to prevent user-space from > > > > > configuring certain VM features after the VM has started > > > > > running. > > > > What about live migration, where the VM has already technically been > > started before the first call to KVM_RUN? > > My understanding is that a new 'struct kvm' is created on the target > machine and this flag should be reset, which would allow the VMM to > restore the firmware registers. However, we would be running KVM_RUN > for the first time on the target machine, thus setting the flag. > So, you are right; It's more of a resume operation from the guest's > point of view. I guess the name of the variable is what's confusing > here. I was actually thinking that live migration gives userspace an easy way to circumvent your restriction. You said, "This state of the VM is helpful in the upcoming patches to prevent user-space from configuring certain VM features after the VM has started running." However, if you don't ensure that these VM features are configured the same way on the target machine as they were on the source machine, you have not actually accomplished your stated goal. > Thanks, > Raghavendra
On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta <rananta@google.com> wrote: > > Hi Reiji, > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > Hi Raghu, > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > <rananta@google.com> wrote: > > > > > > Capture the start of the KVM VM, which is basically the > > > start of any vCPU run. This state of the VM is helpful > > > in the upcoming patches to prevent user-space from > > > configuring certain VM features after the VM has started > > > running. > > > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > > > --- > > > include/linux/kvm_host.h | 3 +++ > > > virt/kvm/kvm_main.c | 9 +++++++++ > > > 2 files changed, 12 insertions(+) > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > index c310648cc8f1..d0bd8f7a026c 100644 > > > --- a/include/linux/kvm_host.h > > > +++ b/include/linux/kvm_host.h > > > @@ -623,6 +623,7 @@ struct kvm { > > > struct notifier_block pm_notifier; > > > #endif > > > char stats_id[KVM_STATS_NAME_SIZE]; > > > + bool vm_started; > > > > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM > > started yet, the name might be a bit misleading IMHO. I would > > think 'has_run_once' or 'ran_once' might be more clear (?). > > > I always struggle with the names; but if you feel that 'ran_once' > makes more sense for a reader, I can change it. I would prefer 'ran_once'. > > > }; > > > > > > #define kvm_err(fmt, ...) \ > > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > > > } > > > } > > > > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > > + > > > extern bool kvm_rebooting; > > > > > > extern unsigned int halt_poll_ns; > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 72c4e6b39389..962b91ac2064 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > > > int r; > > > struct kvm_fpu *fpu = NULL; > > > struct kvm_sregs *kvm_sregs = NULL; > > > + struct kvm *kvm = vcpu->kvm; > > > > > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > > > return -EIO; > > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > > > if (oldpid) > > > synchronize_rcu(); > > > put_pid(oldpid); > > > + > > > + /* > > > + * Since we land here even on the first vCPU run, > > > + * we can mark that the VM has started running. > > > + */ > > > > It might be nicer to add a comment why the code below gets kvm->lock. > > > I've been going back and forth on this one. Initially I considered > simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a > framework for hypercall bitmap firmware registers) > kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to > have the whole 'is the register busy?' operation atomic. But, that's > just one of the applications. I understand why you need the code to get the lock here with the current implementation. But, since the code just set the one field (vm_started) with the lock, I thought the intention of getting the lock might not be so obvious. (But, maybe clear enough looking at the code in the patch-4) Thanks, Reiji > > Anyway, the patch generally looks good to me, and thank you > > for making this change (it works for my purpose as well). > > > > Reviewed-by: Reiji Watanabe <reijiw@google.com> > > > Glad that it's helping you as well and thanks for the review. > > Regards, > Raghavendra > > > Thanks, > > Reiji > > > > > > > + mutex_lock(&kvm->lock); > > > + kvm->vm_started = true; > > > + mutex_unlock(&kvm->lock); > > > } > > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > > > -- > > > 2.34.1.448.ga2b2bfdf31-goog > > >
On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote: > On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote: > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > > > Needs parantheses around (kvm), but why bother with a macro? This is the same > > header that defines struct kvm. > > > No specific reason for creating a macro as such. I can remove it if it > feels noisy. Please do. In the future, don't use a macro unless there's a good reason to do so. Don't get me wrong, I love abusing macros, but for things like this they are completely inferior to static inline bool kvm_vm_has_started(struct kvm *kvm) { return kvm->vm_started; } because a helper function gives us type safety, doesn't suffer from concatenation of tokens potentially doing weird things, is easier to extend to a multi-line implementation, etc... An example of when it's ok to use a macro is x86's #define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) which uses a macro instead of a proper function to avoid a circular dependency due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the definition of struct kvm_vcpu. But that's very much an exception and done only because the alternatives suck more. > > > + */ > > > + mutex_lock(&kvm->lock); > > > > This adds unnecessary lock contention when running vCPUs. The naive solution > > would be: > > if (!kvm->vm_started) { > > ... > > } > > > Not sure if I understood the solution.. In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces unnecessary contention as it will serialize this bit of code if multiple vCPUs are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a VM will acquire kvm->lock and thus avoid contention once the VM is up and running. There's still a possibility that multiple vCPUs will contend for kvm->lock on their first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's a more elegant solution depending on the use case, e.g. a lockless approach might work (or it might not). > > > + kvm->vm_started = true; > > > + mutex_unlock(&kvm->lock); > > > > Lastly, why is this in generic KVM? > > > The v1 of the series originally had it in the arm specific code. > However, I was suggested to move it to the generic code since the book > keeping is not arch specific and could be helpful to others too [1]. I'm definitely in favor of moving/adding thing to generic KVM when it makes sense, but I'm skeptical in this particular case. The code _is_ arch specific in that arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g. versus a hypothetical x86 use case that might be completely ok with a lockless implementation. And it's not obvious that there's a plausible, safe use case outside of arm64, e.g. on x86, there is very, very little that is truly shared across the entire VM/system, most things are per-thread/core/package in some way, shape, or form. In other words, I'm a wary of providing something like this for x86 because odds are good that any use will be functionally incorrect.
On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote: > > On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote: > > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > > > > > Needs parantheses around (kvm), but why bother with a macro? This is the same > > > header that defines struct kvm. > > > > > No specific reason for creating a macro as such. I can remove it if it > > feels noisy. > > Please do. In the future, don't use a macro unless there's a good reason to do > so. Don't get me wrong, I love abusing macros, but for things like this they are > completely inferior to > > static inline bool kvm_vm_has_started(struct kvm *kvm) > { > return kvm->vm_started; > } > > because a helper function gives us type safety, doesn't suffer from concatenation > of tokens potentially doing weird things, is easier to extend to a multi-line > implementation, etc... > > An example of when it's ok to use a macro is x86's > > #define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) > > which uses a macro instead of a proper function to avoid a circular dependency > due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h > and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the > definition of struct kvm_vcpu. But that's very much an exception and done only > because the alternatives suck more. > Understood. Thanks for the explanation! Will switch to an inline function. > > > > + */ > > > > + mutex_lock(&kvm->lock); > > > > > > This adds unnecessary lock contention when running vCPUs. The naive solution > > > would be: > > > if (!kvm->vm_started) { > > > ... > > > } > > > > > Not sure if I understood the solution.. > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces > unnecessary contention as it will serialize this bit of code if multiple vCPUs > are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a > VM will acquire kvm->lock and thus avoid contention once the VM is up and running. > There's still a possibility that multiple vCPUs will contend for kvm->lock on their > first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's > a more elegant solution depending on the use case, e.g. a lockless approach might > work (or it might not). > But is it safe to read kvm->vm_started without grabbing the lock in the first place? use atomic_t maybe for this? > > > > + kvm->vm_started = true; > > > > + mutex_unlock(&kvm->lock); > > > > > > Lastly, why is this in generic KVM? > > > > > The v1 of the series originally had it in the arm specific code. > > However, I was suggested to move it to the generic code since the book > > keeping is not arch specific and could be helpful to others too [1]. > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense, > but I'm skeptical in this particular case. The code _is_ arch specific in that > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g. > versus a hypothetical x86 use case that might be completely ok with a lockless > implementation. And it's not obvious that there's a plausible, safe use case > outside of arm64, e.g. on x86, there is very, very little that is truly shared > across the entire VM/system, most things are per-thread/core/package in some way, > shape, or form. In other words, I'm a wary of providing something like this for > x86 because odds are good that any use will be functionally incorrect. I've been going back and forth on this. I've seen a couple of variables declared in the generic struct and used only in the arch code. vcpu->valid_wakeup for instance, which is used only by s390 arch. Maybe I'm looking at it the wrong way as to what can and can't go in the generic kvm code. Thanks, Raghavendra
On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote: > > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta > <rananta@google.com> wrote: > > > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > > > <rananta@google.com> wrote: > > > > > > > > Hi Reiji, > > > > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > > > > > Hi Raghu, > > > > > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > > > <rananta@google.com> wrote: > > > > > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > > > start of any vCPU run. This state of the VM is helpful > > > > > > in the upcoming patches to prevent user-space from > > > > > > configuring certain VM features after the VM has started > > > > > > running. > > > > > > What about live migration, where the VM has already technically been > > > started before the first call to KVM_RUN? > > > > My understanding is that a new 'struct kvm' is created on the target > > machine and this flag should be reset, which would allow the VMM to > > restore the firmware registers. However, we would be running KVM_RUN > > for the first time on the target machine, thus setting the flag. > > So, you are right; It's more of a resume operation from the guest's > > point of view. I guess the name of the variable is what's confusing > > here. > > I was actually thinking that live migration gives userspace an easy > way to circumvent your restriction. You said, "This state of the VM is > helpful in the upcoming patches to prevent user-space from configuring > certain VM features after the VM has started running." However, if you > don't ensure that these VM features are configured the same way on the > target machine as they were on the source machine, you have not > actually accomplished your stated goal. > Isn't that up to the VMM to save/restore and validate the registers across migrations? Perhaps I have to re-word my intentions for the patch- userspace should be able to configure the registers before issuing the first KVM_RUN. Thanks, Raghavendra > > Thanks, > > Raghavendra
On Mon, Jan 10, 2022 at 4:04 PM Reiji Watanabe <reijiw@google.com> wrote: > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > <rananta@google.com> wrote: > > > > Hi Reiji, > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > Hi Raghu, > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > <rananta@google.com> wrote: > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > start of any vCPU run. This state of the VM is helpful > > > > in the upcoming patches to prevent user-space from > > > > configuring certain VM features after the VM has started > > > > running. > > > > > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> > > > > --- > > > > include/linux/kvm_host.h | 3 +++ > > > > virt/kvm/kvm_main.c | 9 +++++++++ > > > > 2 files changed, 12 insertions(+) > > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > > index c310648cc8f1..d0bd8f7a026c 100644 > > > > --- a/include/linux/kvm_host.h > > > > +++ b/include/linux/kvm_host.h > > > > @@ -623,6 +623,7 @@ struct kvm { > > > > struct notifier_block pm_notifier; > > > > #endif > > > > char stats_id[KVM_STATS_NAME_SIZE]; > > > > + bool vm_started; > > > > > > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM > > > started yet, the name might be a bit misleading IMHO. I would > > > think 'has_run_once' or 'ran_once' might be more clear (?). > > > > > I always struggle with the names; but if you feel that 'ran_once' > > makes more sense for a reader, I can change it. > > I would prefer 'ran_once'. > Yes, that makes sense. I think the name created a lot of confusion for the patch. Thanks, Raghavendra > > > > > }; > > > > > > > > #define kvm_err(fmt, ...) \ > > > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) > > > > } > > > > } > > > > > > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started) > > > > + > > > > extern bool kvm_rebooting; > > > > > > > > extern unsigned int halt_poll_ns; > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > > index 72c4e6b39389..962b91ac2064 100644 > > > > --- a/virt/kvm/kvm_main.c > > > > +++ b/virt/kvm/kvm_main.c > > > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, > > > > int r; > > > > struct kvm_fpu *fpu = NULL; > > > > struct kvm_sregs *kvm_sregs = NULL; > > > > + struct kvm *kvm = vcpu->kvm; > > > > > > > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) > > > > return -EIO; > > > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, > > > > if (oldpid) > > > > synchronize_rcu(); > > > > put_pid(oldpid); > > > > + > > > > + /* > > > > + * Since we land here even on the first vCPU run, > > > > + * we can mark that the VM has started running. > > > > + */ > > > > > > It might be nicer to add a comment why the code below gets kvm->lock. > > > > > I've been going back and forth on this one. Initially I considered > > simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a > > framework for hypercall bitmap firmware registers) > > kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to > > have the whole 'is the register busy?' operation atomic. But, that's > > just one of the applications. > > I understand why you need the code to get the lock here with the > current implementation. > But, since the code just set the one field (vm_started) with the lock, > I thought the intention of getting the lock might not be so obvious. > (But, maybe clear enough looking at the code in the patch-4) > > Thanks, > Reiji > > > > > Anyway, the patch generally looks good to me, and thank you > > > for making this change (it works for my purpose as well). > > > > > > Reviewed-by: Reiji Watanabe <reijiw@google.com> > > > > > Glad that it's helping you as well and thanks for the review. > > > > Regards, > > Raghavendra > > > > > Thanks, > > > Reiji > > > > > > > > > > + mutex_lock(&kvm->lock); > > > > + kvm->vm_started = true; > > > > + mutex_unlock(&kvm->lock); > > > > } > > > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > > > > -- > > > > 2.34.1.448.ga2b2bfdf31-goog > > > >
On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote: > On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote: > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces > > unnecessary contention as it will serialize this bit of code if multiple vCPUs > > are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a > > VM will acquire kvm->lock and thus avoid contention once the VM is up and running. > > There's still a possibility that multiple vCPUs will contend for kvm->lock on their > > first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's > > a more elegant solution depending on the use case, e.g. a lockless approach might > > work (or it might not). > > > But is it safe to read kvm->vm_started without grabbing the lock in > the first place? Don't know, but that's my point. Without a consumer in generic KVM and due to my lack of arm64 knowledge, without a high-level description of how the flag will be used by arm64, it's really difficult to determine what's safe and what's not. For other architectures, it's an impossible question to answer because we don't know how the flag might be used. > use atomic_t maybe for this? No. An atomic_t is generally useful only if there are multiple writers that can possibly write different values. It's highly unlikely that simply switching to an atomic address the needs of arm64. > > > > > + kvm->vm_started = true; > > > > > + mutex_unlock(&kvm->lock); > > > > > > > > Lastly, why is this in generic KVM? > > > > > > > The v1 of the series originally had it in the arm specific code. > > > However, I was suggested to move it to the generic code since the book > > > keeping is not arch specific and could be helpful to others too [1]. > > > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense, > > but I'm skeptical in this particular case. The code _is_ arch specific in that > > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g. > > versus a hypothetical x86 use case that might be completely ok with a lockless > > implementation. And it's not obvious that there's a plausible, safe use case > > outside of arm64, e.g. on x86, there is very, very little that is truly shared > > across the entire VM/system, most things are per-thread/core/package in some way, > > shape, or form. In other words, I'm a wary of providing something like this for > > x86 because odds are good that any use will be functionally incorrect. > I've been going back and forth on this. I've seen a couple of > variables declared in the generic struct and used only in the arch > code. vcpu->valid_wakeup for instance, which is used only by s390 > arch. Maybe I'm looking at it the wrong way as to what can and can't > go in the generic kvm code. Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup(). That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics for how it is used, so it's purely a "this code is rather odd" issue. vm_started on the other hand is only produced by generic KVM, and so its required semantics are unclear.
On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta <rananta@google.com> wrote: > > On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote: > > > > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta > > <rananta@google.com> wrote: > > > > > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > > > > <rananta@google.com> wrote: > > > > > > > > > > Hi Reiji, > > > > > > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > > > > > > > Hi Raghu, > > > > > > > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > > > > <rananta@google.com> wrote: > > > > > > > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > > > > start of any vCPU run. This state of the VM is helpful > > > > > > > in the upcoming patches to prevent user-space from > > > > > > > configuring certain VM features after the VM has started > > > > > > > running. > > > > > > > > What about live migration, where the VM has already technically been > > > > started before the first call to KVM_RUN? > > > > > > My understanding is that a new 'struct kvm' is created on the target > > > machine and this flag should be reset, which would allow the VMM to > > > restore the firmware registers. However, we would be running KVM_RUN > > > for the first time on the target machine, thus setting the flag. > > > So, you are right; It's more of a resume operation from the guest's > > > point of view. I guess the name of the variable is what's confusing > > > here. > > > > I was actually thinking that live migration gives userspace an easy > > way to circumvent your restriction. You said, "This state of the VM is > > helpful in the upcoming patches to prevent user-space from configuring > > certain VM features after the VM has started running." However, if you > > don't ensure that these VM features are configured the same way on the > > target machine as they were on the source machine, you have not > > actually accomplished your stated goal. > > > Isn't that up to the VMM to save/restore and validate the registers > across migrations? Yes, just as it is up to userspace not to make bad configuration changes after the first VMRUN. > Perhaps I have to re-word my intentions for the patch- userspace > should be able to configure the registers before issuing the first > KVM_RUN. Perhaps it would help if you explained *why* you are doing this. It sounds like you are either trying to protect against a malicious userspace, or you are trying to keep userspace from doing something stupid. In general, kvm only enforces constraints that are necessary to protect the host. If that's what you're doing, I don't understand why live migration doesn't provide an end-run around your protections.
On Tue, Jan 11, 2022 at 11:04 AM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote: > > On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote: > > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces > > > unnecessary contention as it will serialize this bit of code if multiple vCPUs > > > are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a > > > VM will acquire kvm->lock and thus avoid contention once the VM is up and running. > > > There's still a possibility that multiple vCPUs will contend for kvm->lock on their > > > first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's > > > a more elegant solution depending on the use case, e.g. a lockless approach might > > > work (or it might not). > > > > > But is it safe to read kvm->vm_started without grabbing the lock in > > the first place? > > Don't know, but that's my point. Without a consumer in generic KVM and due to > my lack of arm64 knowledge, without a high-level description of how the flag will > be used by arm64, it's really difficult to determine what's safe and what's not. > For other architectures, it's an impossible question to answer because we don't > know how the flag might be used. > > > use atomic_t maybe for this? > > No. An atomic_t is generally useful only if there are multiple writers that can > possibly write different values. It's highly unlikely that simply switching to an > atomic address the needs of arm64. > > > > > > > + kvm->vm_started = true; > > > > > > + mutex_unlock(&kvm->lock); > > > > > > > > > > Lastly, why is this in generic KVM? > > > > > > > > > The v1 of the series originally had it in the arm specific code. > > > > However, I was suggested to move it to the generic code since the book > > > > keeping is not arch specific and could be helpful to others too [1]. > > > > > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense, > > > but I'm skeptical in this particular case. The code _is_ arch specific in that > > > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g. > > > versus a hypothetical x86 use case that might be completely ok with a lockless > > > implementation. And it's not obvious that there's a plausible, safe use case > > > outside of arm64, e.g. on x86, there is very, very little that is truly shared > > > across the entire VM/system, most things are per-thread/core/package in some way, > > > shape, or form. In other words, I'm a wary of providing something like this for > > > x86 because odds are good that any use will be functionally incorrect. > > I've been going back and forth on this. I've seen a couple of > > variables declared in the generic struct and used only in the arch > > code. vcpu->valid_wakeup for instance, which is used only by s390 > > arch. Maybe I'm looking at it the wrong way as to what can and can't > > go in the generic kvm code. > > Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of > arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup(). > > That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics > for how it is used, so it's purely a "this code is rather odd" issue. vm_started > on the other hand is only produced by generic KVM, and so its required semantics are > unclear. Understood. I'll move it to arm64 and we can refactor it if there's a need for any other arch(s). Thanks, Raghavendra
On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > Understood. I'll move it to arm64 and we can refactor it if there's a > need for any other arch(s). Before you do that, can you answer Jim's question on _why_ arm64 needs this?
On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta > <rananta@google.com> wrote: > > > > On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta > > > <rananta@google.com> wrote: > > > > > > > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta > > > > > <rananta@google.com> wrote: > > > > > > > > > > > > Hi Reiji, > > > > > > > > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <reijiw@google.com> wrote: > > > > > > > > > > > > > > Hi Raghu, > > > > > > > > > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta > > > > > > > <rananta@google.com> wrote: > > > > > > > > > > > > > > > > Capture the start of the KVM VM, which is basically the > > > > > > > > start of any vCPU run. This state of the VM is helpful > > > > > > > > in the upcoming patches to prevent user-space from > > > > > > > > configuring certain VM features after the VM has started > > > > > > > > running. > > > > > > > > > > What about live migration, where the VM has already technically been > > > > > started before the first call to KVM_RUN? > > > > > > > > My understanding is that a new 'struct kvm' is created on the target > > > > machine and this flag should be reset, which would allow the VMM to > > > > restore the firmware registers. However, we would be running KVM_RUN > > > > for the first time on the target machine, thus setting the flag. > > > > So, you are right; It's more of a resume operation from the guest's > > > > point of view. I guess the name of the variable is what's confusing > > > > here. > > > > > > I was actually thinking that live migration gives userspace an easy > > > way to circumvent your restriction. You said, "This state of the VM is > > > helpful in the upcoming patches to prevent user-space from configuring > > > certain VM features after the VM has started running." However, if you > > > don't ensure that these VM features are configured the same way on the > > > target machine as they were on the source machine, you have not > > > actually accomplished your stated goal. > > > > > Isn't that up to the VMM to save/restore and validate the registers > > across migrations? > > Yes, just as it is up to userspace not to make bad configuration > changes after the first VMRUN. > > > Perhaps I have to re-word my intentions for the patch- userspace > > should be able to configure the registers before issuing the first > > KVM_RUN. > > Perhaps it would help if you explained *why* you are doing this. It > sounds like you are either trying to protect against a malicious > userspace, or you are trying to keep userspace from doing something > stupid. In general, kvm only enforces constraints that are necessary > to protect the host. If that's what you're doing, I don't understand > why live migration doesn't provide an end-run around your protections. It's mainly to safeguard the guests. With respect to migration, KVM and the userspace are collectively playing a role here. It's up to the userspace to ensure that the registers are configured the same across migrations and KVM ensures that the userspace doesn't modify the registers after KVM_RUN so that they don't see features turned OFF/ON during execution. I'm not sure if it falls into the definition of protecting the host. Do you see a value in adding this extra protection from KVM? Regards, Raghavendra
On Wed, Jan 12, 2022, Sean Christopherson wrote: > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > > Understood. I'll move it to arm64 and we can refactor it if there's a > > need for any other arch(s). > > Before you do that, can you answer Jim's question on _why_ arm64 needs this? Gah, sorry, kvmarm@lists.cs.columbia.edu once again lost the spam battle with Gmail.
On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > Perhaps it would help if you explained *why* you are doing this. It > > sounds like you are either trying to protect against a malicious > > userspace, or you are trying to keep userspace from doing something > > stupid. In general, kvm only enforces constraints that are necessary > > to protect the host. If that's what you're doing, I don't understand > > why live migration doesn't provide an end-run around your protections. > It's mainly to safeguard the guests. With respect to migration, KVM > and the userspace are collectively playing a role here. It's up to the > userspace to ensure that the registers are configured the same across > migrations and KVM ensures that the userspace doesn't modify the > registers after KVM_RUN so that they don't see features turned OFF/ON > during execution. I'm not sure if it falls into the definition of > protecting the host. Do you see a value in adding this extra > protection from KVM? Short answer: probably not? There is precedent for disallowing userspace from doing stupid things, but that's either for KVM's protection (as Jim pointed out), or because KVM can't honor the change, e.g. x86 is currently in the process of disallowing most CPUID changes after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't support updating some of it's own internal state (because removing features like GB hugepage support is nonsensical and would require a large pile of complicated, messy code). Restricing CPUID changes does offer some "protection" to the guest, but that's not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration case, and trying to do so is a fool's errand. If restricting updates in the arm64 is necessary to ensure KVM provides sane behavior, then it could be justified. But if it's purely a sanity check on behalf of the guest, then it's not justified.
On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > > Perhaps it would help if you explained *why* you are doing this. It > > > sounds like you are either trying to protect against a malicious > > > userspace, or you are trying to keep userspace from doing something > > > stupid. In general, kvm only enforces constraints that are necessary > > > to protect the host. If that's what you're doing, I don't understand > > > why live migration doesn't provide an end-run around your protections. > > It's mainly to safeguard the guests. With respect to migration, KVM > > and the userspace are collectively playing a role here. It's up to the > > userspace to ensure that the registers are configured the same across > > migrations and KVM ensures that the userspace doesn't modify the > > registers after KVM_RUN so that they don't see features turned OFF/ON > > during execution. I'm not sure if it falls into the definition of > > protecting the host. Do you see a value in adding this extra > > protection from KVM? > > Short answer: probably not? > > There is precedent for disallowing userspace from doing stupid things, but that's > either for KVM's protection (as Jim pointed out), or because KVM can't honor the > change, e.g. x86 is currently in the process of disallowing most CPUID changes > after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't > support updating some of it's own internal state (because removing features like > GB hugepage support is nonsensical and would require a large pile of complicated, > messy code). > > Restricing CPUID changes does offer some "protection" to the guest, but that's > not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration > case, and trying to do so is a fool's errand. > > If restricting updates in the arm64 is necessary to ensure KVM provides sane > behavior, then it could be justified. But if it's purely a sanity check on > behalf of the guest, then it's not justified. Agreed that KVM doesn't really safeguard the guests, but just curious, is there really a downside in adding this thin layer of safety check? On the bright side, the guests would be safe, and it could save the developers some time in hunting down the bugs in this path, no? Regards, Raghavendra
On Thu, Jan 13, 2022, Raghavendra Rao Ananta wrote: > On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote: > > If restricting updates in the arm64 is necessary to ensure KVM provides sane > > behavior, then it could be justified. But if it's purely a sanity check on > > behalf of the guest, then it's not justified. > Agreed that KVM doesn't really safeguard the guests, but just curious, > is there really a downside in adding this thin layer of safety check? It's more stuff that KVM has to maintain, creates an ABI that KVM must adhere to, potentially creates inconsistencies in KVM, and prevents using KVM to intentionally do stupid things to test scenarios that are "impossible". And we also try to avoid defining arbitrary CPU behavior in KVM (that may not be the case here). > On the bright side, the guests would be safe, and it could save the > developers some time in hunting down the bugs in this path, no? Yes, but that can be said for lots and lots of things. This is both a slippery slope argument and the inconsistency argument above, e.g. if KVM actively prevents userspace from doing X, why doesn't KVM prevent userspace from doing Y? Having a decently defined rule for these types of things, e.g. protect KVM/kernel and adhere to the architecture but otherwise let userspace do whatever, avoids spending too much time arguing over what KVM should/shouldn't allow, or wondering why on earth KVM does XYZ, at least in theory :-) There are certainly times where KVM could have saved userspace some pain, but overall I do think KVM is better off staying out of the way when possible.
On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > > Perhaps it would help if you explained *why* you are doing this. It > > > sounds like you are either trying to protect against a malicious > > > userspace, or you are trying to keep userspace from doing something > > > stupid. In general, kvm only enforces constraints that are necessary > > > to protect the host. If that's what you're doing, I don't understand > > > why live migration doesn't provide an end-run around your protections. > > It's mainly to safeguard the guests. With respect to migration, KVM > > and the userspace are collectively playing a role here. It's up to the > > userspace to ensure that the registers are configured the same across > > migrations and KVM ensures that the userspace doesn't modify the > > registers after KVM_RUN so that they don't see features turned OFF/ON > > during execution. I'm not sure if it falls into the definition of > > protecting the host. Do you see a value in adding this extra > > protection from KVM? > > Short answer: probably not? > > There is precedent for disallowing userspace from doing stupid things, but that's > either for KVM's protection (as Jim pointed out), or because KVM can't honor the > change, e.g. x86 is currently in the process of disallowing most CPUID changes > after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't > support updating some of it's own internal state (because removing features like > GB hugepage support is nonsensical and would require a large pile of complicated, > messy code). > > Restricing CPUID changes does offer some "protection" to the guest, but that's > not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration > case, and trying to do so is a fool's errand. > > If restricting updates in the arm64 is necessary to ensure KVM provides sane > behavior, then it could be justified. But if it's purely a sanity check on > behalf of the guest, then it's not justified. The pseudo firmware hvc registers, which this series are adding, are used by KVM to identify available hvc features for the guest, and not directly exposed to the guest as registers. The ways the KVM code in the series consumes the registers' values are very limited, and no KVM data/state is created based on their values. But, as the code that consumes the registers grows in the future, I wouldn't be surprised if KVM consumes them differently than it does now (e.g. create another data structure based on the register values). I'm not sure though :) The restriction, with which KVM doesn't need to worry about the changes in the registers after KVM_RUN, could potentially protect or be useful to protect KVM and simplify future changes/maintenance of the KVM codes that consumes the values. I thought this was one of the reasons for having the restriction. Thanks, Reiji
On Fri, Jan 14, 2022 at 1:51 PM Reiji Watanabe <reijiw@google.com> wrote: > > On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > > > Perhaps it would help if you explained *why* you are doing this. It > > > > sounds like you are either trying to protect against a malicious > > > > userspace, or you are trying to keep userspace from doing something > > > > stupid. In general, kvm only enforces constraints that are necessary > > > > to protect the host. If that's what you're doing, I don't understand > > > > why live migration doesn't provide an end-run around your protections. > > > It's mainly to safeguard the guests. With respect to migration, KVM > > > and the userspace are collectively playing a role here. It's up to the > > > userspace to ensure that the registers are configured the same across > > > migrations and KVM ensures that the userspace doesn't modify the > > > registers after KVM_RUN so that they don't see features turned OFF/ON > > > during execution. I'm not sure if it falls into the definition of > > > protecting the host. Do you see a value in adding this extra > > > protection from KVM? > > > > Short answer: probably not? > > > > There is precedent for disallowing userspace from doing stupid things, but that's > > either for KVM's protection (as Jim pointed out), or because KVM can't honor the > > change, e.g. x86 is currently in the process of disallowing most CPUID changes > > after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't > > support updating some of it's own internal state (because removing features like > > GB hugepage support is nonsensical and would require a large pile of complicated, > > messy code). > > > > Restricing CPUID changes does offer some "protection" to the guest, but that's > > not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration > > case, and trying to do so is a fool's errand. > > > > If restricting updates in the arm64 is necessary to ensure KVM provides sane > > behavior, then it could be justified. But if it's purely a sanity check on > > behalf of the guest, then it's not justified. > > The pseudo firmware hvc registers, which this series are adding, are > used by KVM to identify available hvc features for the guest, and not > directly exposed to the guest as registers. > The ways the KVM code in the series consumes the registers' values are > very limited, and no KVM data/state is created based on their values. > But, as the code that consumes the registers grows in the future, > I wouldn't be surprised if KVM consumes them differently than it does > now (e.g. create another data structure based on the register values). > I'm not sure though :) > > The restriction, with which KVM doesn't need to worry about the changes > in the registers after KVM_RUN, could potentially protect or be useful > to protect KVM and simplify future changes/maintenance of the KVM codes > that consumes the values. > I thought this was one of the reasons for having the restriction. > Well, that wasn't the original intention of the patch, but just to protect the guests from the userspace's dynamic updates. Having said that, and based on what Sean mentioned in his last reply, it could be inconsistent from what KVM has been doing so far and would be difficult to cover all the scenarios that userspace can mess things up for guests. I'll plan to drop this patch in the next version, and bring it back back to arm64 if we really need it. Thanks Sean, Jim, and Reiji for the comments and discussion. Regards, Raghavendra > Thanks, > Reiji
On Fri, Jan 14, 2022, Reiji Watanabe wrote: > The restriction, with which KVM doesn't need to worry about the changes > in the registers after KVM_RUN, could potentially protect or be useful > to protect KVM and simplify future changes/maintenance of the KVM codes > that consumes the values. That sort of protection is definitely welcome, the previously mentioned CPUID mess on x86 would have benefit greatly by KVM being restrictive in the past. That said, hooking KVM_RUN is likely the wrong way to go about implementing any restrictions. Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's all too easy for KVM to implicity/indirectly consume state via a different ioctl(), e.g. if there are side effects that are visible in other registers, than an update can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing modifying state after KVM_RUN but not after reading/writing regs is arbitrary and inconsitent. If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's a relatively common pattern in KVM, and provides a clear boundary to userpace regarding what is/isn't allowed.
On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Jan 14, 2022, Reiji Watanabe wrote: > > The restriction, with which KVM doesn't need to worry about the changes > > in the registers after KVM_RUN, could potentially protect or be useful > > to protect KVM and simplify future changes/maintenance of the KVM codes > > that consumes the values. > > That sort of protection is definitely welcome, the previously mentioned CPUID mess > on x86 would have benefit greatly by KVM being restrictive in the past. That said, > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions. > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's > all too easy for KVM to implicity/indirectly consume state via a different ioctl(), > e.g. if there are side effects that are visible in other registers, than an update > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and > inconsitent. Thank you for your comments ! I think I understand your concern, and that's a great point. That's not the case for those pseudo registers though at least for now :) BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the same for the option with "if kvm->created_vcpus > 0" ?) > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's > a relatively common pattern in KVM, and provides a clear boundary to userpace > regarding what is/isn't allowed. Yes, I agree that would be better in general. For (pseudo) registers, I would think preventing modification if kvm->created_vcpus > 0 might not be a very good option for KVM/ARM though considering usage of KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG. Thanks, Reiji
On Tue, Jan 18, 2022, Reiji Watanabe wrote: > On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Jan 14, 2022, Reiji Watanabe wrote: > > > The restriction, with which KVM doesn't need to worry about the changes > > > in the registers after KVM_RUN, could potentially protect or be useful > > > to protect KVM and simplify future changes/maintenance of the KVM codes > > > that consumes the values. > > > > That sort of protection is definitely welcome, the previously mentioned CPUID mess > > on x86 would have benefit greatly by KVM being restrictive in the past. That said, > > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions. > > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's > > all too easy for KVM to implicity/indirectly consume state via a different ioctl(), > > e.g. if there are side effects that are visible in other registers, than an update > > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing > > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and > > inconsitent. > > Thank you for your comments ! > I think I understand your concern, and that's a great point. > That's not the case for those pseudo registers though at least for now :) > BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the > same for the option with "if kvm->created_vcpus > 0" ?) Not really? The goal with created_vcpus is to avoid having inconsistent state in "struct kvm_vcpu" with respect to the VM as whole. "struct kvm" obvioulsy can't be inconsistent with itself, e.g. even if userspace consumes some side effect, that's simply "the state". Did that make sense? Hard to explain in writing :-) > > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's > > a relatively common pattern in KVM, and provides a clear boundary to userpace > > regarding what is/isn't allowed. > > Yes, I agree that would be better in general. For (pseudo) registers, What exactly are these pseudo registers? If it's something that's an immutable property of the (virtual) system, then it might make sense to use a separate, non-vCPU mechanism for setting/getting their values. Then you can easily restrict the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR. > I would think preventing modification if kvm->created_vcpus > 0 might > not be a very good option for KVM/ARM though considering usage of > KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.
On Wed, Jan 19, 2022 at 4:27 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Jan 18, 2022, Reiji Watanabe wrote: > > On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Fri, Jan 14, 2022, Reiji Watanabe wrote: > > > > The restriction, with which KVM doesn't need to worry about the changes > > > > in the registers after KVM_RUN, could potentially protect or be useful > > > > to protect KVM and simplify future changes/maintenance of the KVM codes > > > > that consumes the values. > > > > > > That sort of protection is definitely welcome, the previously mentioned CPUID mess > > > on x86 would have benefit greatly by KVM being restrictive in the past. That said, > > > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions. > > > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's > > > all too easy for KVM to implicity/indirectly consume state via a different ioctl(), > > > e.g. if there are side effects that are visible in other registers, than an update > > > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing > > > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and > > > inconsitent. > > > > Thank you for your comments ! > > I think I understand your concern, and that's a great point. > > That's not the case for those pseudo registers though at least for now :) > > BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the > > same for the option with "if kvm->created_vcpus > 0" ?) > > Not really? The goal with created_vcpus is to avoid having inconsistent state in > "struct kvm_vcpu" with respect to the VM as whole. "struct kvm" obvioulsy can't > be inconsistent with itself, e.g. even if userspace consumes some side effect, > that's simply "the state". Did that make sense? Hard to explain in writing :-) > > > > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's > > > a relatively common pattern in KVM, and provides a clear boundary to userpace > > > regarding what is/isn't allowed. > > > > Yes, I agree that would be better in general. For (pseudo) registers, > > What exactly are these pseudo registers? If it's something that's an immutable > property of the (virtual) system, then it might make sense to use a separate, > non-vCPU mechanism for setting/getting their values. Then you can easily restrict > the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR. > In general, these pseudo-registers are reserved non-architectural register spaces, currently being used to represent KVM-as-a-firmware's versioning across guests' migrations [1]. That is, the user-space configures these registers for the guests to see same 'firmware' versions before and after migrations. The model is built over the existing KVM_GET_REG_LIST and KVM_[SET|GET]_ONE_REG APIs. Since this series' efforts falls into the same realm, the idea was keep this consistent with the existing model to which VMMs (such as QEMU) are already used to. Granted, even though these registers should technically be immutable, there was no similar protection employed for the existing psuedo-registers. I was wondering if that could be of any value if we start providing one; But I guess not, since it may break the user-space's expectations of KVM (and probably why we didn't have it earlier). Regards, Raghavendra [1]: https://github.com/torvalds/linux/blob/master/Documentation/virt/kvm/arm/psci.rst > > I would think preventing modification if kvm->created_vcpus > 0 might > > not be a very good option for KVM/ARM though considering usage of > > KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.
It's probably time I jump on this thread On Thu, 13 Jan 2022 17:21:19 +0000, Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote: > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <jmattson@google.com> wrote: > > > Perhaps it would help if you explained *why* you are doing this. It > > > sounds like you are either trying to protect against a malicious > > > userspace, or you are trying to keep userspace from doing something > > > stupid. In general, kvm only enforces constraints that are necessary > > > to protect the host. If that's what you're doing, I don't understand > > > why live migration doesn't provide an end-run around your protections. > > It's mainly to safeguard the guests. With respect to migration, KVM > > and the userspace are collectively playing a role here. It's up to the > > userspace to ensure that the registers are configured the same across > > migrations and KVM ensures that the userspace doesn't modify the > > registers after KVM_RUN so that they don't see features turned OFF/ON > > during execution. I'm not sure if it falls into the definition of > > protecting the host. Do you see a value in adding this extra > > protection from KVM? > > Short answer: probably not? Well, I beg to defer. > There is precedent for disallowing userspace from doing stupid > things, but that's either for KVM's protection (as Jim pointed out), > or because KVM can't honor the change, e.g. x86 is currently in the > process of disallowing most CPUID changes after KVM_RUN because KVM > itself consumes the CPUID information and KVM doesn't support > updating some of it's own internal state (because removing features > like GB hugepage support is nonsensical and would require a large > pile of complicated, messy code). We provide quite a lot of CPU emulation based on the feature registers exposed to the guest. Allowing userspace to change this stuff once the guest is running is a massive attack vector. > Restricing CPUID changes does offer some "protection" to the guest, but that's > not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration > case, and trying to do so is a fool's errand. > > If restricting updates in the arm64 is necessary to ensure KVM provides sane > behavior, then it could be justified. But if it's purely a sanity check on > behalf of the guest, then it's not justified. No. This is about preventing userspace from tripping the kernel into doing things it really shouldn't by flipping bits of configuration that should be set in stone. M.
On Wed, 19 Jan 2022 00:07:44 +0000, Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Jan 14, 2022, Reiji Watanabe wrote: > > The restriction, with which KVM doesn't need to worry about the changes > > in the registers after KVM_RUN, could potentially protect or be useful > > to protect KVM and simplify future changes/maintenance of the KVM codes > > that consumes the values. > > That sort of protection is definitely welcome, the previously mentioned CPUID mess > on x86 would have benefit greatly by KVM being restrictive in the past. That said, > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions. > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's > all too easy for KVM to implicity/indirectly consume state via a different ioctl(), > e.g. if there are side effects that are visible in other registers, than an update > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and > inconsitent. > > If possible, preventing modification if kvm->created_vcpus > 0 is > ideal as it's a relatively common pattern in KVM, and provides a > clear boundary to userpace regarding what is/isn't allowed. No, that's way too late. The configuration is in general per-CPU, and I really don't want to expand the surface of the userspace API to allow all sort of magic trick depending on the nature of what you save/restore. The "first run" crap is already there. We have it on a per-CPU basis, and we need it at the VM level for other reasons (see the recent discussion about PMU filtering vs binding to a specific PMU implementation). M.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c310648cc8f1..d0bd8f7a026c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -623,6 +623,7 @@ struct kvm { struct notifier_block pm_notifier; #endif char stats_id[KVM_STATS_NAME_SIZE]; + bool vm_started; }; #define kvm_err(fmt, ...) \ @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) } } +#define kvm_vm_has_started(kvm) (kvm->vm_started) + extern bool kvm_rebooting; extern unsigned int halt_poll_ns; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 72c4e6b39389..962b91ac2064 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp, int r; struct kvm_fpu *fpu = NULL; struct kvm_sregs *kvm_sregs = NULL; + struct kvm *kvm = vcpu->kvm; if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead) return -EIO; @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp, if (oldpid) synchronize_rcu(); put_pid(oldpid); + + /* + * Since we land here even on the first vCPU run, + * we can mark that the VM has started running. + */ + mutex_lock(&kvm->lock); + kvm->vm_started = true; + mutex_unlock(&kvm->lock); } r = kvm_arch_vcpu_ioctl_run(vcpu); trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
Capture the start of the KVM VM, which is basically the start of any vCPU run. This state of the VM is helpful in the upcoming patches to prevent user-space from configuring certain VM features after the VM has started running. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> --- include/linux/kvm_host.h | 3 +++ virt/kvm/kvm_main.c | 9 +++++++++ 2 files changed, 12 insertions(+)