Message ID | 20221123231206.274392-1-mizhang@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu | expand |
On Wed, Nov 23, 2022, Mingwei Zhang wrote: > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4736d7849c60..075d31b0db9c 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -955,12 +955,12 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) > > if (!rmap_head->val) { > pr_err("%s: %p 0->BUG\n", __func__, spte); > - BUG(); > + KVM_BUG(); This won't compile. KVM_BUG() isn't a direct replacement for BUG(), it's more akin to WARN(). And that's why I suggested this be RFC: @kvm needs to be plumbed down here in order to use KVM_BUG(). I don't mind that too much, it's just a little unfortunate.
On Wed, Nov 23, 2022 at 3:18 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Nov 23, 2022, Mingwei Zhang wrote: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 4736d7849c60..075d31b0db9c 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -955,12 +955,12 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) > > > > if (!rmap_head->val) { > > pr_err("%s: %p 0->BUG\n", __func__, spte); > > - BUG(); > > + KVM_BUG(); > > This won't compile. KVM_BUG() isn't a direct replacement for BUG(), it's more > akin to WARN(). > > And that's why I suggested this be RFC: @kvm needs to be plumbed down here in order > to use KVM_BUG(). I don't mind that too much, it's just a little unfortunate. I wonder if using kvm_get_running_vcpu()->kvm is safe here? Assuming we can, then @kvm plumbing shouldn't be a problem.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4736d7849c60..075d31b0db9c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -955,12 +955,12 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) if (!rmap_head->val) { pr_err("%s: %p 0->BUG\n", __func__, spte); - BUG(); + KVM_BUG(); } else if (!(rmap_head->val & 1)) { rmap_printk("%p 1->0\n", spte); if ((u64 *)rmap_head->val != spte) { pr_err("%s: %p 1->BUG\n", __func__, spte); - BUG(); + KVM_BUG(); } rmap_head->val = 0; } else { @@ -979,7 +979,7 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) desc = desc->more; } pr_err("%s: %p many->many\n", __func__, spte); - BUG(); + KVM_BUG(); } }
Replace BUG() in pte_list_remove() with KVM_BUG() to avoid crashing the host. MMU bug is difficult to discover due to various racing conditions and corner cases and thus it extremely hard to debug. The situation gets much worse when it triggers the shutdown of a host. Host machine crash eliminates everything including the potential clues for debugging. From cloud computing service perspective, BUG() or BUG_ON() is probably no longer appropriate as the host reliability is top priority. Crashing the physical machine is almost never a good option as it eliminates innocent VMs and cause service outage in a larger scope. Even worse, if attacker can reliably triggers this code by diverting the control flow or corrupting the memory, then this becomes vm-of-death attack. This is a huge attack vector to cloud providers, as the death of one single host machine is not the end of the story. Without manual interferences, a failed cloud job may be dispatched to other hosts and continue host crashes until all of them are dead. Because of the above reasons, shrink the scope of crash to the target VM only. Cc: Nagareddy Reddy <nspreddy@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: David Matlack <dmatlack@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> --- arch/x86/kvm/mmu/mmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)