Message ID | 20090312171843.GU27823@random.random (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Andrea Arcangeli wrote: > From: Andrea Arcangeli <aarcange@redhat.com> > > While looking at invlpg out of sync code with Izik I think I noticed a > missing smp tlb flush here. Without this the other cpu can still write > to a freed host physical page. tlb smp flush must happen if > rmap_remove is called always before mmu_lock is released because the > VM will take the mmu_lock before it can finally add the page to the > freelist after swapout. mmu notifier makes it safe to flush the tlb > after freeing the page (otherwise it would never be safe) so we can do > a single flush for multiple sptes invalidated. > Izik pointed out that for invlpg, the guest is responsible for smp tlb flushes, and mmu notifiers will protect against pageout. We still have a couple of holes, though, with the current code: - tlb loaded with an entry - guest invlpg - invlpg code drops the spte and rmap entry - pageout - mmu notifiers don't find an rmap entry, so tlb is not flushed The second hole is much simpler, we need a local invlpg at least. This doesn't show up on Intel since a vmexit will flush the entire tlb (and most AMDs have NPT by now). I think we can fix this without taking the hit of the IPI by - running a local invlpg() - making need_flush a vm flag instead of a local - clearing need_flush whenever remote tlbs are flushed - flushing remote tlbs on an mmu_notifier call when need_flush is set Since mmu notifier calls are rare, this would collapse many remote tlb flushes into one.
On Sun, Mar 15, 2009 at 12:35:48PM +0200, Avi Kivity wrote: > Izik pointed out that for invlpg, the guest is responsible for smp tlb > flushes, and mmu notifiers will protect against pageout. How will mmu notifier protect against pageout if the spte is already invalid and removed from the rmapp chain? mmu notifier will search the rmapp chain and it'll find nothing, it'll do nothing, so then the page will be freed under the other cpus without no ipi flushing their VT tlbs. All that mmu notifier does is to protect against pageout until the mmu_lock is released. So that you can send a single ipi to the other physical cpus after a flood of rmap_remove, without having to do the array of pages like arch/x86/include/asm/tlb.h. This because if the VM was in the process of swapping out that page while we were inside the mmu_lock protected critical section, the mmu notifier will force the swap path to take the vcpu->kvm->mmu_lock first for each kvm instance registered with the mmu notifier. But after taking that lock, the mmu notifier will do nothing if rmap_remove already run before the mmu_lock was released (like in this case). The mmu_lock is just to stop temporarily the swap, so that it waits the ipi is delivered to all cpus before proceeding freeing the page. It's up to the kvm code that takes the lock to flush the tlb of any other running guest, before it is allowed to release the mmu_lock as far as I can tell. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrea Arcangeli wrote: > On Sun, Mar 15, 2009 at 12:35:48PM +0200, Avi Kivity wrote: > >> Izik pointed out that for invlpg, the guest is responsible for smp tlb >> flushes, and mmu notifiers will protect against pageout. >> > > How will mmu notifier protect against pageout if the spte is already > invalid and removed from the rmapp chain? mmu notifier will search the > rmapp chain and it'll find nothing, it'll do nothing, so then the page > will be freed under the other cpus without no ipi flushing their VT > tlbs. > I mentioned this: > I think we can fix this without taking the hit of the IPI by > - running a local invlpg() > - making need_flush a vm flag instead of a local > - clearing need_flush whenever remote tlbs are flushed > - flushing remote tlbs on an mmu_notifier call when need_flush is set
On Sun, Mar 15, 2009 at 06:19:58PM +0200, Avi Kivity wrote: > I mentioned this: > >> I think we can fix this without taking the hit of the IPI by >> - running a local invlpg() >> - making need_flush a vm flag instead of a local >> - clearing need_flush whenever remote tlbs are flushed >> - flushing remote tlbs on an mmu_notifier call when need_flush is set Ah so this was a proposed fix for this bug, I thought you were talking about different bugs, and you didn't acknowledge this as a bug sorry! About the need_flush that could become a per-vcpu bit too cleared at every exit so perhaps we'll never have to flush, but it'd need to stay in the vcpu structure to avoid cacheline bouncing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrea Arcangeli wrote: > Ah so this was a proposed fix for this bug, I thought you were talking > about different bugs, and you didn't acknowledge this as a bug sorry! > > If ignoring bugs could make them go away... > About the need_flush that could become a per-vcpu bit too cleared at > every exit so perhaps we'll never have to flush, but it'd need to stay > in the vcpu structure to avoid cacheline bouncing. > But then we need to set it for all vcpus on every invlpg. I'm assuming invlpg is much more frequent than mmu notifiers, so it's better to keep it global. We've already taken a shared cacheline when we acquired mmu_lock. btw, it's probably better to apply your patch, then adapt it to the non-IPIing version; your patch is more suitable for -stable.
On Sun, Mar 15, 2009 at 06:35:02PM +0200, Avi Kivity wrote: > Andrea Arcangeli wrote: >> Ah so this was a proposed fix for this bug, I thought you were talking >> about different bugs, and you didn't acknowledge this as a bug sorry! >> >> > > If ignoring bugs could make them go away... ;) >> About the need_flush that could become a per-vcpu bit too cleared at >> every exit so perhaps we'll never have to flush, but it'd need to stay >> in the vcpu structure to avoid cacheline bouncing. >> > > But then we need to set it for all vcpus on every invlpg. I'm assuming > invlpg is much more frequent than mmu notifiers, so it's better to keep it > global. > > We've already taken a shared cacheline when we acquired mmu_lock. Ok. > btw, it's probably better to apply your patch, then adapt it to the > non-IPIing version; your patch is more suitable for -stable. It's up to you, I guess it'll make life easier with the compat code ;). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 12, 2009 at 06:18:43PM +0100, Andrea Arcangeli wrote: > From: Andrea Arcangeli <aarcange@redhat.com> > > While looking at invlpg out of sync code with Izik I think I noticed a > missing smp tlb flush here. Without this the other cpu can still write > to a freed host physical page. tlb smp flush must happen if > rmap_remove is called always before mmu_lock is released because the > VM will take the mmu_lock before it can finally add the page to the > freelist after swapout. mmu notifier makes it safe to flush the tlb > after freeing the page (otherwise it would never be safe) so we can do > a single flush for multiple sptes invalidated. I think this fix is more expensive than it needs to be, but better than being unsafe for now. Acked-by: Marcelo Tosatti <mtosatti@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: > On Thu, Mar 12, 2009 at 06:18:43PM +0100, Andrea Arcangeli wrote: > >> From: Andrea Arcangeli <aarcange@redhat.com> >> >> While looking at invlpg out of sync code with Izik I think I noticed a >> missing smp tlb flush here. Without this the other cpu can still write >> to a freed host physical page. tlb smp flush must happen if >> rmap_remove is called always before mmu_lock is released because the >> VM will take the mmu_lock before it can finally add the page to the >> freelist after swapout. mmu notifier makes it safe to flush the tlb >> after freeing the page (otherwise it would never be safe) so we can do >> a single flush for multiple sptes invalidated. >> > > I think this fix is more expensive than it needs to be, but better than > being unsafe for now. > > Acked-by: Marcelo Tosatti <mtosatti@redhat.com> > > What about inside mmu_set_spte(): } else if (pfn != spte_to_pfn(*shadow_pte)) { pgprintk("hfn old %lx new %lx\n", spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu->kvm, shadow_pte); } else Doesnt this required tlb flush for all the cpus as well? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrea Arcangeli wrote:
> It's up to you, I guess it'll make life easier with the compat code ;).
I applied it, looking now at reducing the cost.
On Sun, Mar 15, 2009 at 10:11:29PM +0200, Izik Eidus wrote: > Marcelo Tosatti wrote: >> On Thu, Mar 12, 2009 at 06:18:43PM +0100, Andrea Arcangeli wrote: >> >>> From: Andrea Arcangeli <aarcange@redhat.com> >>> >>> While looking at invlpg out of sync code with Izik I think I noticed a >>> missing smp tlb flush here. Without this the other cpu can still write >>> to a freed host physical page. tlb smp flush must happen if >>> rmap_remove is called always before mmu_lock is released because the >>> VM will take the mmu_lock before it can finally add the page to the >>> freelist after swapout. mmu notifier makes it safe to flush the tlb >>> after freeing the page (otherwise it would never be safe) so we can do >>> a single flush for multiple sptes invalidated. >>> >> >> I think this fix is more expensive than it needs to be, but better than >> being unsafe for now. >> >> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> >> >> > What about inside mmu_set_spte(): > } else if (pfn != spte_to_pfn(*shadow_pte)) { > pgprintk("hfn old %lx new %lx\n", > spte_to_pfn(*shadow_pte), pfn); > rmap_remove(vcpu->kvm, shadow_pte); > } else > > Doesnt this required tlb flush for all the cpus as well? Probably. This particular condition can only happen without mmu notifiers, and when doing mmap(MADV_DONTNEED), though. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index a0c11ea..855eb71 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -445,6 +445,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) gpa_t pte_gpa = -1; int level; u64 *sptep; + int need_flush = 0; spin_lock(&vcpu->kvm->mmu_lock); @@ -464,6 +465,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) rmap_remove(vcpu->kvm, sptep); if (is_large_pte(*sptep)) --vcpu->kvm->stat.lpages; + need_flush = 1; } set_shadow_pte(sptep, shadow_trap_nonpresent_pte); break; @@ -473,6 +475,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) break; } + if (need_flush) + kvm_flush_remote_tlbs(vcpu->kvm); spin_unlock(&vcpu->kvm->mmu_lock); if (pte_gpa == -1)