diff mbox series

[RFC,v2,6/8] KVM: arm64: Only write protect selected PTE

Message ID 20230825093528.1637-7-shameerali.kolothum.thodi@huawei.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Implement SW/HW combined dirty log | expand

Commit Message

Shameer Kolothum Aug. 25, 2023, 9:35 a.m. UTC
From: Keqian Zhu <zhukeqian1@huawei.com>

This function write protects all PTEs between the ffs and fls of mask.
There may be unset bits between this range. It works well under pure
software dirty log, as software dirty log is not working during this
process.

But it will unexpectly clear dirty status of PTE when hardware dirty
log is enabled. So change it to only write protect selected PTE.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/kvm/mmu.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Catalin Marinas Sept. 22, 2023, 4 p.m. UTC | #1
On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote:
> From: Keqian Zhu <zhukeqian1@huawei.com>
> 
> This function write protects all PTEs between the ffs and fls of mask.
> There may be unset bits between this range. It works well under pure
> software dirty log, as software dirty log is not working during this
> process.
> 
> But it will unexpectly clear dirty status of PTE when hardware dirty
> log is enabled. So change it to only write protect selected PTE.

Ah, I did wonder about losing the dirty status. The equivalent to S1
would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit.

I'm only superficially familiar with how KVM does dirty tracking for
live migration. Does it need to first write-protect the pages and
disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with
your patches leave the DBM on? If the latter, the 'wp' aspect is a bit
confusing since DBM basically means writeable (and maybe clean). So
better to have something like stage2_clean_range().
Oliver Upton Sept. 22, 2023, 4:59 p.m. UTC | #2
On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote:
> On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote:
> > From: Keqian Zhu <zhukeqian1@huawei.com>
> > 
> > This function write protects all PTEs between the ffs and fls of mask.
> > There may be unset bits between this range. It works well under pure
> > software dirty log, as software dirty log is not working during this
> > process.
> > 
> > But it will unexpectly clear dirty status of PTE when hardware dirty
> > log is enabled. So change it to only write protect selected PTE.
> 
> Ah, I did wonder about losing the dirty status. The equivalent to S1
> would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit.
> 
> I'm only superficially familiar with how KVM does dirty tracking for
> live migration. Does it need to first write-protect the pages and
> disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with
> your patches leave the DBM on? If the latter, the 'wp' aspect is a bit
> confusing since DBM basically means writeable (and maybe clean). So
> better to have something like stage2_clean_range().

KVM has never enabled DBM and we solely rely on write-protection faults
for dirty tracking. IOW, we do not have a writable-clean state for
stage-2 PTEs (yet).
Catalin Marinas Sept. 26, 2023, 3:58 p.m. UTC | #3
On Fri, Sep 22, 2023 at 04:59:08PM +0000, Oliver Upton wrote:
> On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote:
> > On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote:
> > > From: Keqian Zhu <zhukeqian1@huawei.com>
> > > 
> > > This function write protects all PTEs between the ffs and fls of mask.
> > > There may be unset bits between this range. It works well under pure
> > > software dirty log, as software dirty log is not working during this
> > > process.
> > > 
> > > But it will unexpectly clear dirty status of PTE when hardware dirty
> > > log is enabled. So change it to only write protect selected PTE.
> > 
> > Ah, I did wonder about losing the dirty status. The equivalent to S1
> > would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit.
> > 
> > I'm only superficially familiar with how KVM does dirty tracking for
> > live migration. Does it need to first write-protect the pages and
> > disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with
> > your patches leave the DBM on? If the latter, the 'wp' aspect is a bit
> > confusing since DBM basically means writeable (and maybe clean). So
> > better to have something like stage2_clean_range().
> 
> KVM has never enabled DBM and we solely rely on write-protection faults
> for dirty tracking. IOW, we do not have a writable-clean state for
> stage-2 PTEs (yet).

When I did the stage 2 AF support I left out DBM as it was unlikely
to be of any use in the real world. Now with dirty tracking for
migration, we may have a better use for this feature.

What I find confusing with these patches is that stage2_wp_range() is
supposed to make a stage 2 pte read-only, as the name implies. However,
if the pte was writeable, it leaves it writeable, clean with DBM
enabled. Doesn't the change to kvm_pgtable_stage2_wrprotect() in patch 4
break other uses of stage2_wp_range()? E.g. kvm_mmu_wp_memory_region()?

Unless I misunderstood, I'd rather change
kvm_arch_mmu_enable_log_dirty_pt_masked() to call a new function,
stage2_clean_range(), which clears S2AP[1] together with setting DBM if
previously writeable. But we should not confuse this with
write-protecting or change the write-protecting functions to mark a pte
writeable+clean.
Catalin Marinas Sept. 26, 2023, 4:10 p.m. UTC | #4
On Tue, Sep 26, 2023 at 04:58:03PM +0100, Catalin Marinas wrote:
> On Fri, Sep 22, 2023 at 04:59:08PM +0000, Oliver Upton wrote:
> > On Fri, Sep 22, 2023 at 05:00:40PM +0100, Catalin Marinas wrote:
> > > On Fri, Aug 25, 2023 at 10:35:26AM +0100, Shameer Kolothum wrote:
> > > > From: Keqian Zhu <zhukeqian1@huawei.com>
> > > > 
> > > > This function write protects all PTEs between the ffs and fls of mask.
> > > > There may be unset bits between this range. It works well under pure
> > > > software dirty log, as software dirty log is not working during this
> > > > process.
> > > > 
> > > > But it will unexpectly clear dirty status of PTE when hardware dirty
> > > > log is enabled. So change it to only write protect selected PTE.
> > > 
> > > Ah, I did wonder about losing the dirty status. The equivalent to S1
> > > would be for kvm_pgtable_stage2_wrprotect() to set a software dirty bit.
> > > 
> > > I'm only superficially familiar with how KVM does dirty tracking for
> > > live migration. Does it need to first write-protect the pages and
> > > disable DBM? Is DBM re-enabled later? Or does stage2_wp_range() with
> > > your patches leave the DBM on? If the latter, the 'wp' aspect is a bit
> > > confusing since DBM basically means writeable (and maybe clean). So
> > > better to have something like stage2_clean_range().
> > 
> > KVM has never enabled DBM and we solely rely on write-protection faults
> > for dirty tracking. IOW, we do not have a writable-clean state for
> > stage-2 PTEs (yet).
> 
> When I did the stage 2 AF support I left out DBM as it was unlikely
> to be of any use in the real world. Now with dirty tracking for
> migration, we may have a better use for this feature.
> 
> What I find confusing with these patches is that stage2_wp_range() is
> supposed to make a stage 2 pte read-only, as the name implies. However,
> if the pte was writeable, it leaves it writeable, clean with DBM
> enabled. Doesn't the change to kvm_pgtable_stage2_wrprotect() in patch 4
> break other uses of stage2_wp_range()? E.g. kvm_mmu_wp_memory_region()?

Ah, that's also used for dirty tracking, so maybe it's ok.

AFAICT KVM doesn't do any form of stage 2 pte change from writeable to
read-only other than dirty tracking (all other cases triggered via MMU
notifier end up unmapping at stage 2).

> Unless I misunderstood, I'd rather change
> kvm_arch_mmu_enable_log_dirty_pt_masked() to call a new function,
> stage2_clean_range(), which clears S2AP[1] together with setting DBM if
> previously writeable. But we should not confuse this with
> write-protecting or change the write-protecting functions to mark a pte
> writeable+clean.

I think it's still good to rename stage2_wp_range() to make it clear
that it's about clean ptes rather than read-only.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5ae4b97df4d..34251932560e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1132,10 +1132,17 @@  void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+	int rs, re;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	for_each_set_bitrange(rs, re, &mask, BITS_PER_LONG) {
+		phys_addr_t addr_s, addr_e;
+
+		addr_s = (base_gfn + rs) << PAGE_SHIFT;
+		addr_e = (base_gfn + re) << PAGE_SHIFT;
+		stage2_wp_range(&kvm->arch.mmu, addr_s, addr_e);
+	}
 
 	/*
 	 * Eager-splitting is done when manual-protect is set.  We