Message ID | B2D15215269B544CADD246097EACE747395AD759@dggeml511-mbx.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Wanpeng, > 2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: > > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > >> > >> > >> On 11/05/2017 14:07, Zhoujian (jay) wrote: > >>> - * Scan sptes if dirty logging has been stopped, dropping > those > >>> - * which can be collapsed into a single large-page spte. > Later > >>> - * page faults will create the large-page sptes. > >>> + * Reset each vcpu's mmu, then page faults will create the > large-page > >>> + * sptes later. > >>> */ > >>> if ((change != KVM_MR_DELETE) && > >>> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && > >>> - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) > >>> - kvm_mmu_zap_collapsible_sptes(kvm, new); > > > > This is an unlikely branch(unless guest live migration fails and > > continue to run on the source machine) instead of hot path, do you > > have any performance number for your real workloads? > > I find the original discussion by google. > https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html > You will not go to this branch if the guest live migration successfully. In our tests, this branch is taken when living migration is successful. AFAIK, the kmod does not know whether living migration successful or not when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something? Regards, Jay Zhou
2017-05-11 22:18 GMT+08:00 Zhoujian (jay) <jianjay.zhou@huawei.com>: > Hi Wanpeng, > >> 2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: >> > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >> >> >> >> >> >> On 11/05/2017 14:07, Zhoujian (jay) wrote: >> >>> - * Scan sptes if dirty logging has been stopped, dropping >> those >> >>> - * which can be collapsed into a single large-page spte. >> Later >> >>> - * page faults will create the large-page sptes. >> >>> + * Reset each vcpu's mmu, then page faults will create the >> large-page >> >>> + * sptes later. >> >>> */ >> >>> if ((change != KVM_MR_DELETE) && >> >>> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && >> >>> - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) >> >>> - kvm_mmu_zap_collapsible_sptes(kvm, new); >> > >> > This is an unlikely branch(unless guest live migration fails and >> > continue to run on the source machine) instead of hot path, do you >> > have any performance number for your real workloads? >> >> I find the original discussion by google. >> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html >> You will not go to this branch if the guest live migration successfully. > > In our tests, this branch is taken when living migration is successful. > AFAIK, the kmod does not know whether living migration successful or not > when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something? Original there is a bug which will not clear memslot dirty log flag after live migration fails, a patch is submitted to fix it, https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00794.html, however, I can't remember whether the dirty log flag will be cleared if live migration complete successfully at that time, but maybe not. Paolo replied to the patch he has a better method. Then I'm too busy and didn't follow the qemu patch for this fix any more, I just find this commit is merged currently: http://git.qemu.org/?p=qemu.git;a=commit;h=6f6a5ef3e429f92f987678ea8c396aab4dc6aa19. This commit will clear memslot dirty log flag after live migration no matter whether it is successful or not. Regards, Wanpeng Li
On 05/11/2017 08:24 PM, Paolo Bonzini wrote: > > > On 11/05/2017 14:07, Zhoujian (jay) wrote: >> - * Scan sptes if dirty logging has been stopped, dropping those >> - * which can be collapsed into a single large-page spte. Later >> - * page faults will create the large-page sptes. >> + * Reset each vcpu's mmu, then page faults will create the large-page >> + * sptes later. >> */ >> if ((change != KVM_MR_DELETE) && >> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && >> - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) >> - kvm_mmu_zap_collapsible_sptes(kvm, new); >> + !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) { >> + kvm_for_each_vcpu(i, vcpu, kvm) >> + kvm_mmu_reset_context(vcpu); > > This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but > I am not sure it is enough. I think that if you do not zap the SPTEs, > the page faults will use 4K SPTEs, not large ones (though I'd have to > check better; CCing Xiao and Wanpeng). Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's root page table, 4k mappings are still kept. There are two issues reported: - one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking. Its root cause is kvm_mmu_slot_remove_write_access() takes too much time. We can make the code adaptive to use the new fast-write-protect faculty introduced by my patchset, i.e, if the number of pages contained in this memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then we use fast-write-protect instead. - another one is kvm_mmu_zap_collapsible_sptes() when disable dirty log tracking. collapsible_sptes zaps 4k mappings to make memory-read happy, it is not required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not urgent for vCPU's running, it could be done in a separate thread and use lock-break technology. Thanks!
On 2017/5/12 16:09, Xiao Guangrong wrote: > > On 05/11/2017 08:24 PM, Paolo Bonzini wrote: >> >> On 11/05/2017 14:07, Zhoujian (jay) wrote: >>> - * Scan sptes if dirty logging has been stopped, dropping those >>> - * which can be collapsed into a single large-page spte. Later >>> - * page faults will create the large-page sptes. >>> + * Reset each vcpu's mmu, then page faults will create the large-page >>> + * sptes later. >>> */ >>> if ((change != KVM_MR_DELETE) && >>> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && >>> - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) >>> - kvm_mmu_zap_collapsible_sptes(kvm, new); >>> + !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) { >>> + kvm_for_each_vcpu(i, vcpu, kvm) >>> + kvm_mmu_reset_context(vcpu); >> This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but >> I am not sure it is enough. I think that if you do not zap the SPTEs, >> the page faults will use 4K SPTEs, not large ones (though I'd have to >> check better; CCing Xiao and Wanpeng). > Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's > root page table, 4k mappings are still kept. > > There are two issues reported: > - one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking. > > Its root cause is kvm_mmu_slot_remove_write_access() takes too much > time. > > We can make the code adaptive to use the new fast-write-protect faculty > introduced by my patchset, i.e, if the number of pages contained in this > memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then > we use fast-write-protect instead. > > - another one is kvm_mmu_zap_collapsible_sptes() when disable dirty > log tracking. > > collapsible_sptes zaps 4k mappings to make memory-read happy, it is not > required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not > urgent for vCPU's running, it could be done in a separate thread and use > lock-break technology. How about move the action of stopping dirty log into migrate_fd_cleanup() directly, which is processed in main thread as BH after migration is completed? It will not has any side-effect even migration is failed, Or users cancel migration, No ? > Thanks! > > > . >
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 464da93..fe26ee5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8313,6 +8313,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, enum kvm_mr_change change) { int nr_mmu_pages = 0; + int i; + struct kvm_vcpu *vcpu; if (!kvm->arch.n_requested_mmu_pages) nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); @@ -8328,14 +8330,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, * in the source machine (for example if live migration fails), small * sptes will remain around and cause bad performance. * - * Scan sptes if dirty logging has been stopped, dropping those - * which can be collapsed into a single large-page spte. Later - * page faults will create the large-page sptes. + * Reset each vcpu's mmu, then page faults will create the large-page + * sptes later. */ if ((change != KVM_MR_DELETE) && (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) - kvm_mmu_zap_collapsible_sptes(kvm, new); + !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) { + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_mmu_reset_context(vcpu); + } /* * Set up write protection and/or dirty logging for the new slot.