Message ID | 20230825093528.1637-7-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Implement SW/HW combined dirty log | expand |
On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote: > From: Keqian Zhu <zhukeqian1@huawei.com> > > This function write protects all PTEs between the ffs and fls of mask. > There may be unset bits between this range. It works well under pure > software dirty log, as software dirty log is not working during this > process. > > But it will unexpectly clear dirty status of PTE when hardware dirty > log is enabled. So change it to only write protect selected PTE. Ah, I did wonder about losing the dirty status. The equivalent to S1 would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit. I'm only superficially familiar with how KVM does dirty tracking for live migration. Does it need to first write-protect the pages and disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with your patches leave the DBM on? If the latter, the 'wp' aspect is a bit confusing since DBM basically means writeable (and maybe clean). So better to have something like stage2_clean_range().
On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote: > On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote: > > From: Keqian Zhu <zhukeqian1@huawei.com> > > > > This function write protects all PTEs between the ffs and fls of mask. > > There may be unset bits between this range. It works well under pure > > software dirty log, as software dirty log is not working during this > > process. > > > > But it will unexpectly clear dirty status of PTE when hardware dirty > > log is enabled. So change it to only write protect selected PTE. > > Ah, I did wonder about losing the dirty status. The equivalent to S1 > would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit. > > I'm only superficially familiar with how KVM does dirty tracking for > live migration. Does it need to first write-protect the pages and > disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with > your patches leave the DBM on? If the latter, the 'wp' aspect is a bit > confusing since DBM basically means writeable (and maybe clean). So > better to have something like stage2_clean_range(). KVM has never enabled DBM and we solely rely on write-protection faults for dirty tracking. IOW, we do not have a writable-clean state for stage-2 PTEs (yet).
On Fri, Sep 22, 2023 at 04:59:08PM +0000, Oliver Upton wrote: > On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote: > > On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote: > > > From: Keqian Zhu <zhukeqian1@huawei.com> > > > > > > This function write protects all PTEs between the ffs and fls of mask. > > > There may be unset bits between this range. It works well under pure > > > software dirty log, as software dirty log is not working during this > > > process. > > > > > > But it will unexpectly clear dirty status of PTE when hardware dirty > > > log is enabled. So change it to only write protect selected PTE. > > > > Ah, I did wonder about losing the dirty status. The equivalent to S1 > > would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit. > > > > I'm only superficially familiar with how KVM does dirty tracking for > > live migration. Does it need to first write-protect the pages and > > disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with > > your patches leave the DBM on? If the latter, the 'wp' aspect is a bit > > confusing since DBM basically means writeable (and maybe clean). So > > better to have something like stage2_clean_range(). > > KVM has never enabled DBM and we solely rely on write-protection faults > for dirty tracking. IOW, we do not have a writable-clean state for > stage-2 PTEs (yet). When I did the stage 2 AF support I left out DBM as it was unlikely to be of any use in the real world. Now with dirty tracking for migration, we may have a better use for this feature. What I find confusing with these patches is that stage2_wp_range() is supposed to make a stage 2 pte read-only, as the name implies. However, if the pte was writeable, it leaves it writeable, clean with DBM enabled. Doesn't the change to kvm_pgtable_stage2_wrprotect() in patch 4 break other uses of stage2_wp_range()? E.g. kvm_mmu_wp_memory_region()? Unless I misunderstood, I'd rather change kvm_arch_mmu_enable_log_dirty_pt_masked() to call a new function, stage2_clean_range(), which clears S2AP[1] together with setting DBM if previously writeable. But we should not confuse this with write-protecting or change the write-protecting functions to mark a pte writeable+clean.
On Tue, Sep 26, 2023 at 04:58:03PM +0100, Catalin Marinas wrote: > On Fri, Sep 22, 2023 at 04:59:08PM +0000, Oliver Upton wrote: > > On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote: > > > On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote: > > > > From: Keqian Zhu <zhukeqian1@huawei.com> > > > > > > > > This function write protects all PTEs between the ffs and fls of mask. > > > > There may be unset bits between this range. It works well under pure > > > > software dirty log, as software dirty log is not working during this > > > > process. > > > > > > > > But it will unexpectly clear dirty status of PTE when hardware dirty > > > > log is enabled. So change it to only write protect selected PTE. > > > > > > Ah, I did wonder about losing the dirty status. The equivalent to S1 > > > would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit. > > > > > > I'm only superficially familiar with how KVM does dirty tracking for > > > live migration. Does it need to first write-protect the pages and > > > disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with > > > your patches leave the DBM on? If the latter, the 'wp' aspect is a bit > > > confusing since DBM basically means writeable (and maybe clean). So > > > better to have something like stage2_clean_range(). > > > > KVM has never enabled DBM and we solely rely on write-protection faults > > for dirty tracking. IOW, we do not have a writable-clean state for > > stage-2 PTEs (yet). > > When I did the stage 2 AF support I left out DBM as it was unlikely > to be of any use in the real world. Now with dirty tracking for > migration, we may have a better use for this feature. > > What I find confusing with these patches is that stage2_wp_range() is > supposed to make a stage 2 pte read-only, as the name implies. However, > if the pte was writeable, it leaves it writeable, clean with DBM > enabled. Doesn't the change to kvm_pgtable_stage2_wrprotect() in patch 4 > break other uses of stage2_wp_range()? E.g. kvm_mmu_wp_memory_region()? Ah, that's also used for dirty tracking, so maybe it's ok. AFAICT KVM doesn't do any form of stage 2 pte change from writeable to read-only other than dirty tracking (all other cases triggered via MMU notifier end up unmapping at stage 2). > Unless I misunderstood, I'd rather change > kvm_arch_mmu_enable_log_dirty_pt_masked() to call a new function, > stage2_clean_range(), which clears S2AP[1] together with setting DBM if > previously writeable. But we should not confuse this with > write-protecting or change the write-protecting functions to mark a pte > writeable+clean. I think it's still good to rename stage2_wp_range() to make it clear that it's about clean ptes rather than read-only.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index f5ae4b97df4d..34251932560e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1132,10 +1132,17 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, phys_addr_t base_gfn = slot->base_gfn + gfn_offset; phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; + int rs, re; lockdep_assert_held_write(&kvm->mmu_lock); - stage2_wp_range(&kvm->arch.mmu, start, end); + for_each_set_bitrange(rs, re, &mask, BITS_PER_LONG) { + phys_addr_t addr_s, addr_e; + + addr_s = (base_gfn + rs) << PAGE_SHIFT; + addr_e = (base_gfn + re) << PAGE_SHIFT; + stage2_wp_range(&kvm->arch.mmu, addr_s, addr_e); + } /* * Eager-splitting is done when manual-protect is set. We