Message ID | 20210729104009.382-5-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm/arm: New VMID allocator based on asid | expand |
On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote: > Like ASID allocator, we copy the active_vmids into the > reserved_vmids on a rollover. But it's unlikely that > every CPU will have a vCPU as current task and we may > end up unnecessarily reserving the VMID space. > > Hence, clear active_vmids when scheduling out a vCPU. > > Suggested-by: Will Deacon <will@kernel.org> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > --- > arch/arm64/include/asm/kvm_host.h | 1 + > arch/arm64/kvm/arm.c | 1 + > arch/arm64/kvm/vmid.c | 6 ++++++ > 3 files changed, 8 insertions(+) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index bb993bce1363..d93141cb8d16 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits; > int kvm_arm_vmid_alloc_init(void); > void kvm_arm_vmid_alloc_free(void); > void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid); > +void kvm_arm_vmid_clear_active(void); > > static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch) > { > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 077e55a511a9..b134a1b89c84 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > kvm_timer_vcpu_put(vcpu); > kvm_vgic_put(vcpu); > kvm_vcpu_pmu_restore_host(vcpu); > + kvm_arm_vmid_clear_active(); > > vcpu->cpu = -1; > } > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c > index 5584e84aed95..5fd51f5445c1 100644 > --- a/arch/arm64/kvm/vmid.c > +++ b/arch/arm64/kvm/vmid.c > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid) > return idx2vmid(vmid) | generation; > } > > +/* Call with preemption disabled */ > +void kvm_arm_vmid_clear_active(void) > +{ > + atomic64_set(this_cpu_ptr(&active_vmids), 0); > +} I think this is very broken, as it will force everybody to take the slow-path when they see an active_vmid of 0. It also doesn't solve the issue I mentioned before, as an active_vmid of 0 means that the reserved vmid is preserved. Needs more thought... Will
> -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: 03 August 2021 12:41 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out > > On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote: > > Like ASID allocator, we copy the active_vmids into the > > reserved_vmids on a rollover. But it's unlikely that > > every CPU will have a vCPU as current task and we may > > end up unnecessarily reserving the VMID space. > > > > Hence, clear active_vmids when scheduling out a vCPU. > > > > Suggested-by: Will Deacon <will@kernel.org> > > Signed-off-by: Shameer Kolothum > <shameerali.kolothum.thodi@huawei.com> > > --- > > arch/arm64/include/asm/kvm_host.h | 1 + > > arch/arm64/kvm/arm.c | 1 + > > arch/arm64/kvm/vmid.c | 6 ++++++ > > 3 files changed, 8 insertions(+) > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > > index bb993bce1363..d93141cb8d16 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits; > > int kvm_arm_vmid_alloc_init(void); > > void kvm_arm_vmid_alloc_free(void); > > void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid); > > +void kvm_arm_vmid_clear_active(void); > > > > static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch > *vcpu_arch) > > { > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index 077e55a511a9..b134a1b89c84 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > > kvm_timer_vcpu_put(vcpu); > > kvm_vgic_put(vcpu); > > kvm_vcpu_pmu_restore_host(vcpu); > > + kvm_arm_vmid_clear_active(); > > > > vcpu->cpu = -1; > > } > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c > > index 5584e84aed95..5fd51f5445c1 100644 > > --- a/arch/arm64/kvm/vmid.c > > +++ b/arch/arm64/kvm/vmid.c > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid > *kvm_vmid) > > return idx2vmid(vmid) | generation; > > } > > > > +/* Call with preemption disabled */ > > +void kvm_arm_vmid_clear_active(void) > > +{ > > + atomic64_set(this_cpu_ptr(&active_vmids), 0); > > +} > > I think this is very broken, as it will force everybody to take the > slow-path when they see an active_vmid of 0. Yes. I have seen that happening in my test setup. > It also doesn't solve the issue I mentioned before, as an active_vmid of 0 > means that the reserved vmid is preserved. > > Needs more thought... How about we clear all the active_vmids in kvm_arch_free_vm() if it matches the kvm_vmid->id ? But we may have to hold the lock there. Thanks, Shameer
On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi wrote: > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c > > > index 5584e84aed95..5fd51f5445c1 100644 > > > --- a/arch/arm64/kvm/vmid.c > > > +++ b/arch/arm64/kvm/vmid.c > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid > > *kvm_vmid) > > > return idx2vmid(vmid) | generation; > > > } > > > > > > +/* Call with preemption disabled */ > > > +void kvm_arm_vmid_clear_active(void) > > > +{ > > > + atomic64_set(this_cpu_ptr(&active_vmids), 0); > > > +} > > > > I think this is very broken, as it will force everybody to take the > > slow-path when they see an active_vmid of 0. > > Yes. I have seen that happening in my test setup. Why didn't you say so?! > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0 > > means that the reserved vmid is preserved. > > > > Needs more thought... > > How about we clear all the active_vmids in kvm_arch_free_vm() if it > matches the kvm_vmid->id ? But we may have to hold the lock > there I think we have to be really careful not to run into the "suspended animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID allocator handle suspended animation") if we go down this road. Maybe something along the lines of: ROLLOVER * Take lock * Inc generation => This will force everybody down the slow path * Record active VMIDs * Broadcast TLBI => Only active VMIDs can be dirty => Reserve active VMIDs and mark as allocated VCPU SCHED IN * Set active VMID * Check generation * If mismatch then: * Take lock * Try to match a reserved VMID * If no reserved VMID, allocate new VCPU SCHED OUT * Clear active VMID but I'm not daft enough to think I got it right first time. I think it needs both implementing *and* modelling in TLA+ before we merge it! Will
> -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: 03 August 2021 16:31 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out > > On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi > wrote: > > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c > > > > index 5584e84aed95..5fd51f5445c1 100644 > > > > --- a/arch/arm64/kvm/vmid.c > > > > +++ b/arch/arm64/kvm/vmid.c > > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid > > > *kvm_vmid) > > > > return idx2vmid(vmid) | generation; > > > > } > > > > > > > > +/* Call with preemption disabled */ > > > > +void kvm_arm_vmid_clear_active(void) > > > > +{ > > > > + atomic64_set(this_cpu_ptr(&active_vmids), 0); > > > > +} > > > > > > I think this is very broken, as it will force everybody to take the > > > slow-path when they see an active_vmid of 0. > > > > Yes. I have seen that happening in my test setup. > > Why didn't you say so?! Sorry. I thought of getting some performance numbers with and without this patch and measure the impact. But didn't quite get time to finish it yet. > > > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0 > > > means that the reserved vmid is preserved. > > > > > > Needs more thought... > > > > How about we clear all the active_vmids in kvm_arch_free_vm() if it > > matches the kvm_vmid->id ? But we may have to hold the lock > > there > > I think we have to be really careful not to run into the "suspended > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID > allocator handle suspended animation") if we go down this road. Ok. I will go through that. > Maybe something along the lines of: > > ROLLOVER > > * Take lock > * Inc generation > => This will force everybody down the slow path > * Record active VMIDs > * Broadcast TLBI > => Only active VMIDs can be dirty > => Reserve active VMIDs and mark as allocated > > VCPU SCHED IN > > * Set active VMID > * Check generation > * If mismatch then: > * Take lock > * Try to match a reserved VMID > * If no reserved VMID, allocate new > > VCPU SCHED OUT > > * Clear active VMID > > but I'm not daft enough to think I got it right first time. I think it > needs both implementing *and* modelling in TLA+ before we merge it! > Ok. I need some time to digest the above first :). On another note, how serious do you think is the problem of extra reservation of the VMID space? Just wondering if we can skip this patch for now or not.. Thanks, Shameer
> -----Original Message----- > From: Shameerali Kolothum Thodi > Sent: 03 August 2021 16:57 > To: 'Will Deacon' <will@kernel.org> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out > > > > > -----Original Message----- > > From: Will Deacon [mailto:will@kernel.org] > > Sent: 03 August 2021 16:31 > > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > > james.morse@arm.com; julien.thierry.kdev@gmail.com; > > suzuki.poulose@arm.com; jean-philippe@linaro.org; > > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > > <linuxarm@huawei.com> > > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > > schedule out > > > > On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi > > wrote: > > > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c > > > > > index 5584e84aed95..5fd51f5445c1 100644 > > > > > --- a/arch/arm64/kvm/vmid.c > > > > > +++ b/arch/arm64/kvm/vmid.c > > > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid > > > > *kvm_vmid) > > > > > return idx2vmid(vmid) | generation; > > > > > } > > > > > > > > > > +/* Call with preemption disabled */ > > > > > +void kvm_arm_vmid_clear_active(void) > > > > > +{ > > > > > + atomic64_set(this_cpu_ptr(&active_vmids), 0); > > > > > +} > > > > > > > > I think this is very broken, as it will force everybody to take the > > > > slow-path when they see an active_vmid of 0. > > > > > > Yes. I have seen that happening in my test setup. > > > > Why didn't you say so?! > > Sorry. I thought of getting some performance numbers with and > without this patch and measure the impact. But didn't quite get time > to finish it yet. These are some test numbers with and without this patch, run on two different test setups. a)Test Setup -1 ----------------------- Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16 Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench 5 times before exiting. Measurements taken avg. of 10 Runs. Image : 5.14-rc3 --------------------------- Time(s) 44.43813888 No. of exits 145,348,264 Image: 5.14-rc3 + vmid-v3 ---------------------------------------- Time(s) 46.59789034 No. of exits 133,587,307 %diff against 5.14-rc3 Time: 4.8% more Exits: 8% less Image: 5.14-rc3 + vmid-v3 + Without active_asid clear --------------------------------------------------------------------------- Time(s) 44.5031782 No. of exits 144,443,188 %diff against 5.14-rc3 Time: 0.15% more Exits: 2.42% less b)Test Setup -2 ----------------------- Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4. Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench 5 times before exiting. Measurements taken avg. of 10 Runs. Image : 5.14-rc3-vmid4bit ------------------------------------ Time(s) 46.19963266 No. of exits 23,699,546 Image: 5.14-rc3-vmid4bit + vmid-v3 --------------------------------------------------- Time(s) 45.83307736 No. of exits 23,260,203 %diff against 5.14-rc3-vmid4bit Time: 0.8% less Exits: 1.85% less Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear ----------------------------------------------------------------------------------------- Time(s) 44.5031782 No. of exits 144,443,188 %diff against 5.14-rc3-vmid4bit Time: 1.05% less Exits: 2.06% less As expected, the active_asid clear on schedule out is not helping. But without this patch, the numbers seems to be better than the vanilla kernel when we force the setup(cpus=8, vmd=4bits) to perform rollover. Please let me know your thoughts. Thanks, Shameer > > > > > > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0 > > > > means that the reserved vmid is preserved. > > > > > > > > Needs more thought... > > > > > > How about we clear all the active_vmids in kvm_arch_free_vm() if it > > > matches the kvm_vmid->id ? But we may have to hold the lock > > > there > > > > I think we have to be really careful not to run into the "suspended > > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID > > allocator handle suspended animation") if we go down this road. > > > Ok. I will go through that. > > > Maybe something along the lines of: > > > > ROLLOVER > > > > * Take lock > > * Inc generation > > => This will force everybody down the slow path > > * Record active VMIDs > > * Broadcast TLBI > > => Only active VMIDs can be dirty > > => Reserve active VMIDs and mark as allocated > > > > VCPU SCHED IN > > > > * Set active VMID > > * Check generation > > * If mismatch then: > > * Take lock > > * Try to match a reserved VMID > > * If no reserved VMID, allocate new > > > > VCPU SCHED OUT > > > > * Clear active VMID > > > > but I'm not daft enough to think I got it right first time. I think it > > needs both implementing *and* modelling in TLA+ before we merge it! > > > > Ok. I need some time to digest the above first :). > > On another note, how serious do you think is the problem of extra > reservation of the VMID space? Just wondering if we can skip this > patch for now or not.. > > Thanks, > Shameer
On Fri, Aug 06, 2021 at 12:24:36PM +0000, Shameerali Kolothum Thodi wrote: > These are some test numbers with and without this patch, run on two > different test setups. > > > a)Test Setup -1 > ----------------------- > > Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16 > Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench > 5 times before exiting. > > Measurements taken avg. of 10 Runs. > > Image : 5.14-rc3 > --------------------------- > Time(s) 44.43813888 > No. of exits 145,348,264 > > Image: 5.14-rc3 + vmid-v3 > ---------------------------------------- > Time(s) 46.59789034 > No. of exits 133,587,307 > > %diff against 5.14-rc3 > Time: 4.8% more > Exits: 8% less > > Image: 5.14-rc3 + vmid-v3 + Without active_asid clear > --------------------------------------------------------------------------- > Time(s) 44.5031782 > No. of exits 144,443,188 > > %diff against 5.14-rc3 > Time: 0.15% more > Exits: 2.42% less > > b)Test Setup -2 > ----------------------- > > Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4. > Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench > 5 times before exiting. > > Measurements taken avg. of 10 Runs. > > Image : 5.14-rc3-vmid4bit > ------------------------------------ > Time(s) 46.19963266 > No. of exits 23,699,546 > > Image: 5.14-rc3-vmid4bit + vmid-v3 > --------------------------------------------------- > Time(s) 45.83307736 > No. of exits 23,260,203 > > %diff against 5.14-rc3-vmid4bit > Time: 0.8% less > Exits: 1.85% less > > Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear > ----------------------------------------------------------------------------------------- > Time(s) 44.5031782 > No. of exits 144,443,188 Really? The *exact* same numbers as the "Image: 5.14-rc3 + vmid-v3 + Without active_asid clear" configuration? Guessing a copy-paste error here. > %diff against 5.14-rc3-vmid4bit > Time: 1.05% less > Exits: 2.06% less > > As expected, the active_asid clear on schedule out is not helping. > But without this patch, the numbers seems to be better than the > vanilla kernel when we force the setup(cpus=8, vmd=4bits) > to perform rollover. I'm struggling a bit to understand these numbers. Are you saying that clearing the active_asid helps in the 16-bit VMID case but not in the 4-bit case? Why would the active_asid clear have any impact on the number of exits? The problem I see with not having the active_asid clear is that we will roll over more frequently as the number of reserved VMIDs increases. Will
> -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: 09 August 2021 14:09 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out > > On Fri, Aug 06, 2021 at 12:24:36PM +0000, Shameerali Kolothum Thodi > wrote: > > These are some test numbers with and without this patch, run on two > > different test setups. > > > > > > a)Test Setup -1 > > ----------------------- > > > > Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16 > > Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute > hackbench > > 5 times before exiting. > > > > Measurements taken avg. of 10 Runs. > > > > Image : 5.14-rc3 > > --------------------------- > > Time(s) 44.43813888 > > No. of exits 145,348,264 > > > > Image: 5.14-rc3 + vmid-v3 > > ---------------------------------------- > > Time(s) 46.59789034 > > No. of exits 133,587,307 > > > > %diff against 5.14-rc3 > > Time: 4.8% more > > Exits: 8% less > > > > Image: 5.14-rc3 + vmid-v3 + Without active_asid clear > > --------------------------------------------------------------------------- > > Time(s) 44.5031782 > > No. of exits 144,443,188 > > > > %diff against 5.14-rc3 > > Time: 0.15% more > > Exits: 2.42% less > > > > b)Test Setup -2 > > ----------------------- > > > > Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to > 4. > > Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute > hackbench > > 5 times before exiting. > > > > Measurements taken avg. of 10 Runs. > > > > Image : 5.14-rc3-vmid4bit > > ------------------------------------ > > Time(s) 46.19963266 > > No. of exits 23,699,546 > > > > Image: 5.14-rc3-vmid4bit + vmid-v3 > > --------------------------------------------------- > > Time(s) 45.83307736 > > No. of exits 23,260,203 > > > > %diff against 5.14-rc3-vmid4bit > > Time: 0.8% less > > Exits: 1.85% less > > > > Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear > > ----------------------------------------------------------------------------------------- > > Time(s) 44.5031782 > > No. of exits 144,443,188 > > Really? The *exact* same numbers as the "Image: 5.14-rc3 + vmid-v3 + > Without > active_asid clear" configuration? Guessing a copy-paste error here. > > > %diff against 5.14-rc3-vmid4bit > > Time: 1.05% less > > Exits: 2.06% less > > > > As expected, the active_asid clear on schedule out is not helping. > > But without this patch, the numbers seems to be better than the > > vanilla kernel when we force the setup(cpus=8, vmd=4bits) > > to perform rollover. > > I'm struggling a bit to understand these numbers. Are you saying that > clearing the active_asid helps in the 16-bit VMID case but not in the > 4-bit case? Nope, the other way around.. The point I was trying to make is that clearing the active_vmids definitely have an impact in 16-bit vmid case, where rollover is not happening, as it ends up taking the slow path more frequently. Test setup-1, case 2(with active_vmids clear): Around 4.8% more time to finish the test compared to vanilla kernel. Test setup-1, case 3(Without clear): 0.15% more time compared to vanilla kernel. For the 4-bit vmid case, the impact of clearing vmids is not that obvious probably because we have more rollovers. Test setup-2, case 2(with active_vmids clear):0.8% less time compared to vanilla. Test setup-2, case 3(Without clear): 1.05% less time compared to vanilla kernel. So between the two(with and without clearing the active_vmids), the "without" one has better numbers for both Test setups. > Why would the active_asid clear have any impact on the number of exits? In 16 bit vmid case, it looks like the no. of exits is considerably lower if we clear active_vmids. . Not sure it is because of the frequent slow path or not. But anyway, the time to finish the test is higher. > The problem I see with not having the active_asid clear is that we will > roll over more frequently as the number of reserved VMIDs increases. Ok. The idea of running the 4-bit test setup was to capture that. It doesn't look like it has a major impact when compared to the original kernel. May be I should take an average of more test runs. Please let me know if there is a better way to measure that impact. Hope, I am clear. Thanks, Shameer
Hi Will, > -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: 03 August 2021 16:31 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out [...] > I think we have to be really careful not to run into the "suspended > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID > allocator handle suspended animation") if we go down this road. > > Maybe something along the lines of: > > ROLLOVER > > * Take lock > * Inc generation > => This will force everybody down the slow path > * Record active VMIDs > * Broadcast TLBI > => Only active VMIDs can be dirty > => Reserve active VMIDs and mark as allocated > > VCPU SCHED IN > > * Set active VMID > * Check generation > * If mismatch then: > * Take lock > * Try to match a reserved VMID > * If no reserved VMID, allocate new > > VCPU SCHED OUT > > * Clear active VMID > > but I'm not daft enough to think I got it right first time. I think it > needs both implementing *and* modelling in TLA+ before we merge it! I attempted to implement the above algo as below. It seems to be working in both 16-bit vmid and 4-bit vmid test setup. Though I am not quite sure this Is exactly what you had in mind above and covers all corner cases. Please take a look and let me know. (The diff below is against this v3 series) Thanks, Shameer --->8<---- --- a/arch/arm64/kvm/vmid.c +++ b/arch/arm64/kvm/vmid.c @@ -43,7 +43,7 @@ static void flush_context(void) bitmap_clear(vmid_map, 0, NUM_USER_VMIDS); for_each_possible_cpu(cpu) { - vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0); + vmid = atomic64_read(&per_cpu(active_vmids, cpu)); /* Preserve reserved VMID */ if (vmid == 0) @@ -125,32 +125,17 @@ void kvm_arm_vmid_clear_active(void) void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid) { unsigned long flags; - u64 vmid, old_active_vmid; + u64 vmid; vmid = atomic64_read(&kvm_vmid->id); - - /* - * Please refer comments in check_and_switch_context() in - * arch/arm64/mm/context.c. - */ - old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids)); - if (old_active_vmid && vmid_gen_match(vmid) && - atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids), - old_active_vmid, vmid)) + if (vmid_gen_match(vmid)) { + atomic64_set(this_cpu_ptr(&active_vmids), vmid); return; - - raw_spin_lock_irqsave(&cpu_vmid_lock, flags); - - /* Check that our VMID belongs to the current generation. */ - vmid = atomic64_read(&kvm_vmid->id); - if (!vmid_gen_match(vmid)) { - vmid = new_vmid(kvm_vmid); - atomic64_set(&kvm_vmid->id, vmid); } - + raw_spin_lock_irqsave(&cpu_vmid_lock, flags); + vmid = new_vmid(kvm_vmid); + atomic64_set(&kvm_vmid->id, vmid); atomic64_set(this_cpu_ptr(&active_vmids), vmid); raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags); } --->8<----
> -----Original Message----- > From: Shameerali Kolothum Thodi > Sent: 11 August 2021 09:48 > To: 'Will Deacon' <will@kernel.org> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > <linuxarm@huawei.com> > Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out > > Hi Will, > > > -----Original Message----- > > From: Will Deacon [mailto:will@kernel.org] > > Sent: 03 August 2021 16:31 > > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; > > james.morse@arm.com; julien.thierry.kdev@gmail.com; > > suzuki.poulose@arm.com; jean-philippe@linaro.org; > > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm > > <linuxarm@huawei.com> > > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > > schedule out > > [...] > > > I think we have to be really careful not to run into the "suspended > > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID > > allocator handle suspended animation") if we go down this road. > > > > Maybe something along the lines of: > > > > ROLLOVER > > > > * Take lock > > * Inc generation > > => This will force everybody down the slow path > > * Record active VMIDs > > * Broadcast TLBI > > => Only active VMIDs can be dirty > > => Reserve active VMIDs and mark as allocated > > > > VCPU SCHED IN > > > > * Set active VMID > > * Check generation > > * If mismatch then: > > * Take lock > > * Try to match a reserved VMID > > * If no reserved VMID, allocate new > > > > VCPU SCHED OUT > > > > * Clear active VMID > > > > but I'm not daft enough to think I got it right first time. I think it > > needs both implementing *and* modelling in TLA+ before we merge it! > > I attempted to implement the above algo as below. It seems to be > working in both 16-bit vmid and 4-bit vmid test setup. It is not :(. I did an extended, overnight test run and it fails. It looks to me in my below implementation there is no synchronization on setting the active VMID and a concurrent rollover. I will have another go. Thanks, Shameer Though I am > not quite sure this Is exactly what you had in mind above and covers > all corner cases. > > Please take a look and let me know. > (The diff below is against this v3 series) > > Thanks, > Shameer > > --->8<---- > > --- a/arch/arm64/kvm/vmid.c > +++ b/arch/arm64/kvm/vmid.c > @@ -43,7 +43,7 @@ static void flush_context(void) > bitmap_clear(vmid_map, 0, NUM_USER_VMIDS); > > for_each_possible_cpu(cpu) { > - vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, > cpu), 0); > + vmid = atomic64_read(&per_cpu(active_vmids, cpu)); > > /* Preserve reserved VMID */ > if (vmid == 0) > @@ -125,32 +125,17 @@ void kvm_arm_vmid_clear_active(void) > void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid) > { > unsigned long flags; > - u64 vmid, old_active_vmid; > + u64 vmid; > > vmid = atomic64_read(&kvm_vmid->id); > - > - /* > - * Please refer comments in check_and_switch_context() in > - * arch/arm64/mm/context.c. > - */ > - old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids)); > - if (old_active_vmid && vmid_gen_match(vmid) && > - atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids), > - old_active_vmid, vmid)) > + if (vmid_gen_match(vmid)) { > + atomic64_set(this_cpu_ptr(&active_vmids), vmid); > return; > - > - raw_spin_lock_irqsave(&cpu_vmid_lock, flags); > - > - /* Check that our VMID belongs to the current generation. */ > - vmid = atomic64_read(&kvm_vmid->id); > - if (!vmid_gen_match(vmid)) { > - vmid = new_vmid(kvm_vmid); > - atomic64_set(&kvm_vmid->id, vmid); > } > > - > + raw_spin_lock_irqsave(&cpu_vmid_lock, flags); > + vmid = new_vmid(kvm_vmid); > + atomic64_set(&kvm_vmid->id, vmid); > atomic64_set(this_cpu_ptr(&active_vmids), vmid); > raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags); > } > --->8<---- > > >
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index bb993bce1363..d93141cb8d16 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits; int kvm_arm_vmid_alloc_init(void); void kvm_arm_vmid_alloc_free(void); void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid); +void kvm_arm_vmid_clear_active(void); static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch) { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 077e55a511a9..b134a1b89c84 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_timer_vcpu_put(vcpu); kvm_vgic_put(vcpu); kvm_vcpu_pmu_restore_host(vcpu); + kvm_arm_vmid_clear_active(); vcpu->cpu = -1; } diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c index 5584e84aed95..5fd51f5445c1 100644 --- a/arch/arm64/kvm/vmid.c +++ b/arch/arm64/kvm/vmid.c @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid) return idx2vmid(vmid) | generation; } +/* Call with preemption disabled */ +void kvm_arm_vmid_clear_active(void) +{ + atomic64_set(this_cpu_ptr(&active_vmids), 0); +} + void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid) { unsigned long flags;
Like ASID allocator, we copy the active_vmids into the reserved_vmids on a rollover. But it's unlikely that every CPU will have a vCPU as current task and we may end up unnecessarily reserving the VMID space. Hence, clear active_vmids when scheduling out a vCPU. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/vmid.c | 6 ++++++ 3 files changed, 8 insertions(+)