From patchwork Tue Jul  3 09:01:03 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christoffer Dall <c.dall@virtualopensystems.com>
X-Patchwork-Id: 1149571
Return-Path: <kvm-owner@vger.kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork2.kernel.org (Postfix) with ESMTP id C7A7CDFF72
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue,  3 Jul 2012 09:01:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754404Ab2GCJBH (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Tue, 3 Jul 2012 05:01:07 -0400
Received: from mail-qc0-f174.google.com ([209.85.216.174]:56412 "EHLO
	mail-qc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754346Ab2GCJBF (ORCPT <rfc822; kvm@vger.kernel.org>);
	Tue, 3 Jul 2012 05:01:05 -0400
Received: by mail-qc0-f174.google.com with SMTP id o28so3431456qcr.19
	for <kvm@vger.kernel.org>; Tue, 03 Jul 2012 02:01:05 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=subject:to:from:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-type:content-transfer-encoding
	:x-gm-message-state;
	bh=pDvW1xl2xoEbqi9RpdqIyVHwPoMLrNarpcUArVMiAPA=;
	b=jZ7SI80hAe6o4j0PMdsd7MXrF7LoDt7DG6qPF4iZJFaxapczpGOPg2PBIhdtm+NJX6
	2N5mS3LtbS3d5Q3b/XOE0CejxG3X+C9nLzK7cBvUhkNGVPfCH/cfHDFDGK3w+4V2F5GV
	L4o5mBLlL2C4yhBSEf4u5aHwusdaSZDwaBsFNMEirpbu6KjHR00qzpVoA8n3MEJVLTVY
	a7bSefSmFSXvwlKgC4Rpndy9mkY3rpDScOpUDYR6pGcDrhP3BK3i4CNj5lufVyFSd6Qi
	FXUsmkZOBxzmyIcSnKEPSAglAylBvBz5Mp/yaYhmmIkb5uIJElR8hRlak3ssNLFpVOUQ
	po1Q==
Received: by 10.224.189.17 with SMTP id dc17mr28747447qab.14.1341306065289;
	Tue, 03 Jul 2012 02:01:05 -0700 (PDT)
Received: from [127.0.1.1] (fireball.cs.columbia.edu. [128.59.13.10])
	by mx.google.com with ESMTPS id w2sm37046099qao.2.2012.07.03.02.01.03
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 03 Jul 2012 02:01:04 -0700 (PDT)
Subject: [PATCH v9 10/16] ARM: KVM: Memory virtualization setup
To: android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org
From: Christoffer Dall <c.dall@virtualopensystems.com>
Cc: tech@virtualopensystems.com
Date: Tue, 03 Jul 2012 05:01:03 -0400
Message-ID: <20120703090103.27746.37222.stgit@ubuntu>
In-Reply-To: <20120703085841.27746.82730.stgit@ubuntu>
References: <20120703085841.27746.82730.stgit@ubuntu>
User-Agent: StGit/0.15
MIME-Version: 1.0
X-Gm-Message-State: 
 ALoCoQlkXsJgh3U09YN40jsJwNt5VQNtNT20fH8UI5QK8i37SA1NYRaSFg/hL9bOlZG8nC5vu886
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Christoffer Dall <cdall@cs.columbia.edu>

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Further, each entry in TLBs and caches are tagged with a VMID
identifier in addition to ASIDs. The VMIDs are assigned consecutively
to VMs in the order that VMs are executed, and caches and tlbs are
invalidated when the VMID space has been used to allow for more than
255 simultaenously running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_mmu.h |    5 ++
 arch/arm/kvm/arm.c             |   37 ++++++++++++++-
 arch/arm/kvm/mmu.c             |  102 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3a2a56c..dca7803 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -32,4 +32,9 @@
 int create_hyp_mappings(void *from, void *to);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 63593ee..ce3d258 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -74,12 +74,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	mutex_init(&kvm->arch.pgd_mutex);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -97,10 +119,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -176,7 +204,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -185,6 +219,7 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 8142eb6..ddfb3df 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -162,6 +162,108 @@ out:
 	return err;
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+	unsigned int i;
+	struct page *page;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (pte_present(*pte)) {
+			page = pfn_to_page(pte_pfn(*pte));
+			put_page(page);
+		}
+		pte++;
+	}
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
+{
+	unsigned int i;
+	pte_t *pte;
+	struct page *page;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			page = virt_to_page((void *)pte);
+			WARN_ON(atomic_read(&page->_count) != 1);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+	}
+
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;