diff mbox

[v3,4/8] ARM: KVM: Memory virtualization setup

Message ID 20110603150346.17011.70545.stgit@ubuntu (mailing list archive)
State New, archived
Headers show

Commit Message

Christoffer Dall June 3, 2011, 3:03 p.m. UTC
Initializes a blank level-1 translation table for the second stage
translation and handles freeing it as well.
---
 arch/arm/include/asm/kvm_host.h |    4 ++-
 arch/arm/include/asm/kvm_mmu.h  |    5 ++++
 arch/arm/kvm/arm.c              |   54 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity June 5, 2011, 12:41 p.m. UTC | #1
On 06/03/2011 06:03 PM, Christoffer Dall wrote:
> Initializes a blank level-1 translation table for the second stage
> translation and handles freeing it as well.
>
> +	start = (unsigned long)kvm,
> +	end = start + sizeof(struct kvm);
> +	ret = create_hyp_mappings(kvm_hyp_pgd, start, end);

Why not map all GFP_KERNEL memory?
Christoffer Dall June 5, 2011, 2:50 p.m. UTC | #2
On Sun, Jun 5, 2011 at 2:41 PM, Avi Kivity <avi@redhat.com> wrote:
> On 06/03/2011 06:03 PM, Christoffer Dall wrote:
>>
>> Initializes a blank level-1 translation table for the second stage
>> translation and handles freeing it as well.
>>
>> +       start = (unsigned long)kvm,
>> +       end = start + sizeof(struct kvm);
>> +       ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
>
> Why not map all GFP_KERNEL memory?
>
I wanted to only map things I was sure would be there and stay there
so no assumptions were made about existing pages which could have been
removed, since I don't handle aborts taken in the hypervisor itself.
But, if it would be as safe to map all GFP_KERNEL memory and that also
maps the necessary code segments, then we could do that. Do you feel
it would me simpler/faster/easier?

> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 5, 2011, 2:53 p.m. UTC | #3
On 06/05/2011 05:50 PM, Christoffer Dall wrote:
> On Sun, Jun 5, 2011 at 2:41 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  On 06/03/2011 06:03 PM, Christoffer Dall wrote:
> >>
> >>  Initializes a blank level-1 translation table for the second stage
> >>  translation and handles freeing it as well.
> >>
> >>  +       start = (unsigned long)kvm,
> >>  +       end = start + sizeof(struct kvm);
> >>  +       ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
> >
> >  Why not map all GFP_KERNEL memory?
> >
> I wanted to only map things I was sure would be there and stay there
> so no assumptions were made about existing pages which could have been
> removed, since I don't handle aborts taken in the hypervisor itself.
> But, if it would be as safe to map all GFP_KERNEL memory and that also
> maps the necessary code segments, then we could do that. Do you feel
> it would me simpler/faster/easier?

I think so - you wouldn't have to worry about dereferencing pointers 
within the vcpu structure.

Of course, it's up to you, I don't really have a enough understanding of 
the architecture.
Avi Kivity June 5, 2011, 3:14 p.m. UTC | #4
On 06/05/2011 05:53 PM, Avi Kivity wrote:
> On 06/05/2011 05:50 PM, Christoffer Dall wrote:
>> On Sun, Jun 5, 2011 at 2:41 PM, Avi Kivity<avi@redhat.com>  wrote:
>> >  On 06/03/2011 06:03 PM, Christoffer Dall wrote:
>> >>
>> >>  Initializes a blank level-1 translation table for the second stage
>> >>  translation and handles freeing it as well.
>> >>
>> >>  +       start = (unsigned long)kvm,
>> >>  +       end = start + sizeof(struct kvm);
>> >>  +       ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
>> >
>> >  Why not map all GFP_KERNEL memory?
>> >
>> I wanted to only map things I was sure would be there and stay there
>> so no assumptions were made about existing pages which could have been
>> removed, since I don't handle aborts taken in the hypervisor itself.
>> But, if it would be as safe to map all GFP_KERNEL memory and that also
>> maps the necessary code segments, then we could do that. Do you feel
>> it would me simpler/faster/easier?
>
> I think so - you wouldn't have to worry about dereferencing pointers 
> within the vcpu structure.

Also, you could use huge pages for the mapping, yes?  that should 
improve switching performance a bit.

Can you run the host kernel in hypervisor mode?  That may reduce 
switching time even further.
Christoffer Dall June 5, 2011, 3:27 p.m. UTC | #5
On Sun, Jun 5, 2011 at 5:14 PM, Avi Kivity <avi@redhat.com> wrote:
> On 06/05/2011 05:53 PM, Avi Kivity wrote:
>>
>> On 06/05/2011 05:50 PM, Christoffer Dall wrote:
>>>
>>> On Sun, Jun 5, 2011 at 2:41 PM, Avi Kivity<avi@redhat.com>  wrote:
>>> >  On 06/03/2011 06:03 PM, Christoffer Dall wrote:
>>> >>
>>> >>  Initializes a blank level-1 translation table for the second stage
>>> >>  translation and handles freeing it as well.
>>> >>
>>> >>  +       start = (unsigned long)kvm,
>>> >>  +       end = start + sizeof(struct kvm);
>>> >>  +       ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
>>> >
>>> >  Why not map all GFP_KERNEL memory?
>>> >
>>> I wanted to only map things I was sure would be there and stay there
>>> so no assumptions were made about existing pages which could have been
>>> removed, since I don't handle aborts taken in the hypervisor itself.
>>> But, if it would be as safe to map all GFP_KERNEL memory and that also
>>> maps the necessary code segments, then we could do that. Do you feel
>>> it would me simpler/faster/easier?
>>
>> I think so - you wouldn't have to worry about dereferencing pointers
>> within the vcpu structure.
>
> Also, you could use huge pages for the mapping, yes?  that should improve
> switching performance a bit.

well, the only advantage here would be to save a few entries in the
TLB right? So really it would only be the case if the data and the
code and such accessed during switches lie within the same sections,
which could happen, but could also not happen. I don't see a big
performance gain here, but slightly more complicated code. For
instance, if the VCPU struct is mapped using a page, not a section
mapping, I cannot use section mappings since I would map weird things.
So I would have to support both for allocating and freeing.

I suggest keeping this in place for now and experiment with
performance later on to see if there's a gain. Allocating all of
GFP_KERNEL memory could be good to prevent bugs, but I would like to
clear this with the ARM memory experts that it is in fact a good idea.

>
> Can you run the host kernel in hypervisor mode?  That may reduce switching
> time even further.

No, I think the implications would be way too widespread all over the kernel.

>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 5, 2011, 4:02 p.m. UTC | #6
On 06/05/2011 06:27 PM, Christoffer Dall wrote:
> On Sun, Jun 5, 2011 at 5:14 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  On 06/05/2011 05:53 PM, Avi Kivity wrote:
> >>
> >>  On 06/05/2011 05:50 PM, Christoffer Dall wrote:
> >>>
> >>>  On Sun, Jun 5, 2011 at 2:41 PM, Avi Kivity<avi@redhat.com>    wrote:
> >>>  >    On 06/03/2011 06:03 PM, Christoffer Dall wrote:
> >>>  >>
> >>>  >>    Initializes a blank level-1 translation table for the second stage
> >>>  >>    translation and handles freeing it as well.
> >>>  >>
> >>>  >>    +       start = (unsigned long)kvm,
> >>>  >>    +       end = start + sizeof(struct kvm);
> >>>  >>    +       ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
> >>>  >
> >>>  >    Why not map all GFP_KERNEL memory?
> >>>  >
> >>>  I wanted to only map things I was sure would be there and stay there
> >>>  so no assumptions were made about existing pages which could have been
> >>>  removed, since I don't handle aborts taken in the hypervisor itself.
> >>>  But, if it would be as safe to map all GFP_KERNEL memory and that also
> >>>  maps the necessary code segments, then we could do that. Do you feel
> >>>  it would me simpler/faster/easier?
> >>
> >>  I think so - you wouldn't have to worry about dereferencing pointers
> >>  within the vcpu structure.
> >
> >  Also, you could use huge pages for the mapping, yes?  that should improve
> >  switching performance a bit.
>
> well, the only advantage here would be to save a few entries in the
> TLB right? So really it would only be the case if the data and the
> code and such accessed during switches lie within the same sections,
> which could happen, but could also not happen. I don't see a big
> performance gain here, but slightly more complicated code. For
> instance, if the VCPU struct is mapped using a page, not a section
> mapping, I cannot use section mappings since I would map weird things.
> So I would have to support both for allocating and freeing.
>
> I suggest keeping this in place for now and experiment with
> performance later on to see if there's a gain. Allocating all of
> GFP_KERNEL memory could be good to prevent bugs, but I would like to
> clear this with the ARM memory experts that it is in fact a good idea.

Sure.  All of my arch related comments are made from ignorance anyway, 
feel free to ignore or use them as you like.  The only important ones 
are those related to the API.

> >
> >  Can you run the host kernel in hypervisor mode?  That may reduce switching
> >  time even further.
>
> No, I think the implications would be way too widespread all over the kernel.

Okay.
diff mbox

Patch

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9fa9b20..5955ff4 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,7 +31,9 @@  struct kvm_vcpu;
 u32* kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 
 struct kvm_arch {
-	pgd_t *pgd;     /* 1-level 2nd stage table */
+	u32    vmid;	/* The VMID used for the virt. memory system */
+	pgd_t *pgd;	/* 1-level 2nd stage table */
+	u64    vttbr;	/* VTTBR value associated with above pgd and vmid */
 };
 
 #define EXCEPTION_NONE      0
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d22aad0..a64ab2d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,4 +37,9 @@  void remove_hyp_mappings(pgd_t *hyp_pgd,
 			 unsigned long end);
 void free_hyp_pmds(pgd_t *hyp_pgd);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4f691be..714f415 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -77,13 +77,56 @@  void kvm_arch_sync_events(struct kvm *kvm)
 
 int kvm_arch_init_vm(struct kvm *kvm)
 {
-	return 0;
+	int ret = 0;
+	phys_addr_t pgd_phys;
+	unsigned long vmid;
+	unsigned long start, end;
+
+
+	mutex_lock(&kvm_vmids_mutex);
+	vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE);
+	if (vmid >= VMID_SIZE) {
+		mutex_unlock(&kvm_vmids_mutex);
+		return -EBUSY;
+	}
+	__set_bit(vmid, kvm_vmids);
+	kvm->arch.vmid = vmid;
+	mutex_unlock(&kvm_vmids_mutex);
+
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = (pgd_phys & ((1LLU << 40) - 1) & ~((2 << VTTBR_X) - 1)) |
+			  ((u64)vmid << 48);
+
+	start = (unsigned long)kvm,
+	end = start + sizeof(struct kvm);
+	ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (ret)
+		goto out_fail_hyp_mappings;
+
+	return ret;
+out_fail_hyp_mappings:
+	remove_hyp_mappings(kvm_hyp_pgd, start, end);
+out_fail_alloc:
+	clear_bit(vmid, kvm_vmids);
+	return ret;
 }
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
+	if (kvm->arch.vmid != 0) {
+		mutex_lock(&kvm_vmids_mutex);
+		clear_bit(kvm->arch.vmid, kvm_vmids);
+		mutex_unlock(&kvm_vmids_mutex);
+	}
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -158,6 +201,7 @@  struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	int err;
 	struct kvm_vcpu *vcpu;
+	unsigned long start, end;
 
 	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
 	if (!vcpu) {
@@ -169,7 +213,15 @@  struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	start = (unsigned long)vcpu,
+	end = start + sizeof(struct kvm_vcpu);
+	err = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (err)
+		goto out_fail_hyp_mappings;
+
 	return vcpu;
+out_fail_hyp_mappings:
+	remove_hyp_mappings(kvm_hyp_pgd, start, end);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out: