diff mbox series

[RFC,KVM,19/27] kvm/isolation: initialize the KVM page table with core mappings

Message ID 1557758315-12667-20-git-send-email-alexandre.chartre@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM Address Space Isolation | expand

Commit Message

Alexandre Chartre May 13, 2019, 2:38 p.m. UTC
The KVM page table is initialized with adding core memory mappings:
the kernel text, the per-cpu memory, the kvm module, the cpu_entry_area,
%esp fixup stacks, IRQ stacks.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/kernel/cpu/common.c |    2 +
 arch/x86/kvm/isolation.c     |  131 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/isolation.h     |   10 +++
 include/linux/percpu.h       |    2 +
 mm/percpu.c                  |    6 +-
 5 files changed, 149 insertions(+), 2 deletions(-)

Comments

Dave Hansen May 13, 2019, 3:50 p.m. UTC | #1
> +	/*
> +	 * Copy the mapping for all the kernel text. We copy at the PMD
> +	 * level since the PUD is shared with the module mapping space.
> +	 */
> +	rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
> +	     PGT_LEVEL_PMD);
> +	if (rv)
> +		goto out_uninit_page_table;

Could you double-check this?  We (I) have had some repeated confusion
with the PTI code and kernel text vs. kernel data vs. __init.
KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
kernel text.

> +	/*
> +	 * Copy the mapping for cpu_entry_area and %esp fixup stacks
> +	 * (this is based on the PTI userland address space, but probably
> +	 * not needed because the KVM address space is not directly
> +	 * enterered from userspace). They can both be copied at the P4D
> +	 * level since they each have a dedicated P4D entry.
> +	 */
> +	rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
> +	     PGT_LEVEL_P4D);
> +	if (rv)
> +		goto out_uninit_page_table;

cpu_entry_area is used for more than just entry from userspace.  The gdt
mapping, for instance, is needed everywhere.  You might want to go look
at 'struct cpu_entry_area' in some more detail.

> +#ifdef CONFIG_X86_ESPFIX64
> +	rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
> +	     PGT_LEVEL_P4D);
> +	if (rv)
> +		goto out_uninit_page_table;
> +#endif

Why are these mappings *needed*?  I thought we only actually used these
fixup stacks for some crazy iret-to-userspace handling.  We're certainly
not doing that from KVM context.

Am I forgetting something?

> +#ifdef CONFIG_VMAP_STACK
> +	/*
> +	 * Interrupt stacks are vmap'ed with guard pages, so we need to
> +	 * copy mappings.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		stack = per_cpu(hardirq_stack_ptr, cpu);
> +		pr_debug("IRQ Stack %px\n", stack);
> +		if (!stack)
> +			continue;
> +		rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
> +		if (rv)
> +			goto out_uninit_page_table;
> +	}
> +
> +#endif

I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
 Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
the switching in there so that we never even *get* interrupts in the KVM
context?

I also share Peter's concerns about letting modules do this.  If we ever
go down this road, we're going to have to think very carefully how we
let KVM do this without giving all the not-so-nice out-of-tree modules
the keys to the castle.

A high-level comment: it looks like this is "working", but has probably
erred on the side of mapping too much.  The hard part is paring this
back to a truly minimal set of mappings.
Andy Lutomirski May 13, 2019, 4 p.m. UTC | #2
On Mon, May 13, 2019 at 8:50 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> > +     /*
> > +      * Copy the mapping for all the kernel text. We copy at the PMD
> > +      * level since the PUD is shared with the module mapping space.
> > +      */
> > +     rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
> > +          PGT_LEVEL_PMD);
> > +     if (rv)
> > +             goto out_uninit_page_table;
>
> Could you double-check this?  We (I) have had some repeated confusion
> with the PTI code and kernel text vs. kernel data vs. __init.
> KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
> kernel text.
>
> > +     /*
> > +      * Copy the mapping for cpu_entry_area and %esp fixup stacks
> > +      * (this is based on the PTI userland address space, but probably
> > +      * not needed because the KVM address space is not directly
> > +      * enterered from userspace). They can both be copied at the P4D
> > +      * level since they each have a dedicated P4D entry.
> > +      */
> > +     rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
> > +          PGT_LEVEL_P4D);
> > +     if (rv)
> > +             goto out_uninit_page_table;
>
> cpu_entry_area is used for more than just entry from userspace.  The gdt
> mapping, for instance, is needed everywhere.  You might want to go look
> at 'struct cpu_entry_area' in some more detail.
>
> > +#ifdef CONFIG_X86_ESPFIX64
> > +     rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
> > +          PGT_LEVEL_P4D);
> > +     if (rv)
> > +             goto out_uninit_page_table;
> > +#endif
>
> Why are these mappings *needed*?  I thought we only actually used these
> fixup stacks for some crazy iret-to-userspace handling.  We're certainly
> not doing that from KVM context.
>
> Am I forgetting something?
>
> > +#ifdef CONFIG_VMAP_STACK
> > +     /*
> > +      * Interrupt stacks are vmap'ed with guard pages, so we need to
> > +      * copy mappings.
> > +      */
> > +     for_each_possible_cpu(cpu) {
> > +             stack = per_cpu(hardirq_stack_ptr, cpu);
> > +             pr_debug("IRQ Stack %px\n", stack);
> > +             if (!stack)
> > +                     continue;
> > +             rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
> > +             if (rv)
> > +                     goto out_uninit_page_table;
> > +     }
> > +
> > +#endif
>
> I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
>  Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
> the switching in there so that we never even *get* interrupts in the KVM
> context?

That would be nicer.

Looking at this code, it occurs to me that mapping the IRQ stacks
seems questionable.  As it stands, this series switches to a normal
CR3 in some C code somewhere moderately deep in the APIC IRQ code.  By
that time, I think you may have executed traceable code, and, if that
happens, you lose.  i hate to say this, but any shenanigans like this
patch does might need to happen in the entry code *before* even
switching to the IRQ stack.  Or perhaps shortly thereafter.

We've talked about moving context tracking to C.  If we go that route,
then this KVM context mess could go there, too -- we'd have a
low-level C wrapper for each entry that would deal with getting us
ready to run normal C code.

(We need to do something about terminology.  This kvm_mm thing isn't
an mm in the normal sense.  An mm has normal kernel mappings and
varying user mappings.  For example, the PTI "userspace" page tables
aren't an mm.  And we really don't want a situation where the vmalloc
fault code runs with the "kvm_mm" mm active -- it will totally
malfunction.)
Sean Christopherson May 13, 2019, 4:46 p.m. UTC | #3
On Mon, May 13, 2019 at 08:50:19AM -0700, Dave Hansen wrote:
> I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
> Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
> the switching in there so that we never even *get* interrupts in the KVM
> context?

NMIs are enabled on VMX's VM-Exit.  On SVM, NMIs are blocked on exit until
STGI is executed.
Alexandre Chartre May 13, 2019, 4:47 p.m. UTC | #4
On 5/13/19 5:50 PM, Dave Hansen wrote:
>> +	/*
>> +	 * Copy the mapping for all the kernel text. We copy at the PMD
>> +	 * level since the PUD is shared with the module mapping space.
>> +	 */
>> +	rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
>> +	     PGT_LEVEL_PMD);
>> +	if (rv)
>> +		goto out_uninit_page_table;
> 
> Could you double-check this?  We (I) have had some repeated confusion
> with the PTI code and kernel text vs. kernel data vs. __init.
> KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
> kernel text.

I probably have the same confusion :-) but I will try to check again.


>> +	/*
>> +	 * Copy the mapping for cpu_entry_area and %esp fixup stacks
>> +	 * (this is based on the PTI userland address space, but probably
>> +	 * not needed because the KVM address space is not directly
>> +	 * enterered from userspace). They can both be copied at the P4D
>> +	 * level since they each have a dedicated P4D entry.
>> +	 */
>> +	rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
>> +	     PGT_LEVEL_P4D);
>> +	if (rv)
>> +		goto out_uninit_page_table;
> 
> cpu_entry_area is used for more than just entry from userspace.  The gdt
> mapping, for instance, is needed everywhere.  You might want to go look
> at 'struct cpu_entry_area' in some more detail.

Ok. Thanks.

>> +#ifdef CONFIG_X86_ESPFIX64
>> +	rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
>> +	     PGT_LEVEL_P4D);
>> +	if (rv)
>> +		goto out_uninit_page_table;
>> +#endif
> 
> Why are these mappings *needed*?  I thought we only actually used these
> fixup stacks for some crazy iret-to-userspace handling.  We're certainly
> not doing that from KVM context.

Right. I initially looked what was used for PTI, and I probably copied unneeded
mapping.

> Am I forgetting something?
> 
>> +#ifdef CONFIG_VMAP_STACK
>> +	/*
>> +	 * Interrupt stacks are vmap'ed with guard pages, so we need to
>> +	 * copy mappings.
>> +	 */
>> +	for_each_possible_cpu(cpu) {
>> +		stack = per_cpu(hardirq_stack_ptr, cpu);
>> +		pr_debug("IRQ Stack %px\n", stack);
>> +		if (!stack)
>> +			continue;
>> +		rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
>> +		if (rv)
>> +			goto out_uninit_page_table;
>> +	}
>> +
>> +#endif
> 
> I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
>   Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
> the switching in there so that we never even *get* interrupts in the KVM
> context?

Ideally we would like to run with the KVM address space when handling a VM-exit
(so between a VMEXIT and the next VMENTER) where interrupts are not disabled.

> I also share Peter's concerns about letting modules do this.  If we ever
> go down this road, we're going to have to think very carefully how we
> let KVM do this without giving all the not-so-nice out-of-tree modules
> the keys to the castle.

Right, we probably need some more generic framework for creating limited
kernel context space which kvm (or other module?) can deal with. I think
kvm is a good place to start for having this kind of limited context, hence
this RFC and my request for advice how best to do it.

> A high-level comment: it looks like this is "working", but has probably
> erred on the side of mapping too much.  The hard part is paring this
> back to a truly minimal set of mappings.
> 

Agree.

Thanks,

alex.
Alexandre Chartre May 13, 2019, 5 p.m. UTC | #5
On 5/13/19 6:00 PM, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 8:50 AM Dave Hansen <dave.hansen@intel.com> wrote:
>>
>>> +     /*
>>> +      * Copy the mapping for all the kernel text. We copy at the PMD
>>> +      * level since the PUD is shared with the module mapping space.
>>> +      */
>>> +     rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
>>> +          PGT_LEVEL_PMD);
>>> +     if (rv)
>>> +             goto out_uninit_page_table;
>>
>> Could you double-check this?  We (I) have had some repeated confusion
>> with the PTI code and kernel text vs. kernel data vs. __init.
>> KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
>> kernel text.
>>
>>> +     /*
>>> +      * Copy the mapping for cpu_entry_area and %esp fixup stacks
>>> +      * (this is based on the PTI userland address space, but probably
>>> +      * not needed because the KVM address space is not directly
>>> +      * enterered from userspace). They can both be copied at the P4D
>>> +      * level since they each have a dedicated P4D entry.
>>> +      */
>>> +     rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
>>> +          PGT_LEVEL_P4D);
>>> +     if (rv)
>>> +             goto out_uninit_page_table;
>>
>> cpu_entry_area is used for more than just entry from userspace.  The gdt
>> mapping, for instance, is needed everywhere.  You might want to go look
>> at 'struct cpu_entry_area' in some more detail.
>>
>>> +#ifdef CONFIG_X86_ESPFIX64
>>> +     rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
>>> +          PGT_LEVEL_P4D);
>>> +     if (rv)
>>> +             goto out_uninit_page_table;
>>> +#endif
>>
>> Why are these mappings *needed*?  I thought we only actually used these
>> fixup stacks for some crazy iret-to-userspace handling.  We're certainly
>> not doing that from KVM context.
>>
>> Am I forgetting something?
>>
>>> +#ifdef CONFIG_VMAP_STACK
>>> +     /*
>>> +      * Interrupt stacks are vmap'ed with guard pages, so we need to
>>> +      * copy mappings.
>>> +      */
>>> +     for_each_possible_cpu(cpu) {
>>> +             stack = per_cpu(hardirq_stack_ptr, cpu);
>>> +             pr_debug("IRQ Stack %px\n", stack);
>>> +             if (!stack)
>>> +                     continue;
>>> +             rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
>>> +             if (rv)
>>> +                     goto out_uninit_page_table;
>>> +     }
>>> +
>>> +#endif
>>
>> I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
>>   Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
>> the switching in there so that we never even *get* interrupts in the KVM
>> context?
> 
> That would be nicer.
> 
> Looking at this code, it occurs to me that mapping the IRQ stacks
> seems questionable.  As it stands, this series switches to a normal
> CR3 in some C code somewhere moderately deep in the APIC IRQ code.  By
> that time, I think you may have executed traceable code, and, if that
> happens, you lose.  i hate to say this, but any shenanigans like this
> patch does might need to happen in the entry code *before* even
> switching to the IRQ stack.  Or perhaps shortly thereafter.
>
> We've talked about moving context tracking to C.  If we go that route,
> then this KVM context mess could go there, too -- we'd have a
> low-level C wrapper for each entry that would deal with getting us
> ready to run normal C code.
> 
> (We need to do something about terminology.  This kvm_mm thing isn't
> an mm in the normal sense.  An mm has normal kernel mappings and
> varying user mappings.  For example, the PTI "userspace" page tables
> aren't an mm.  And we really don't want a situation where the vmalloc
> fault code runs with the "kvm_mm" mm active -- it will totally
> malfunction.)
> 

One of my next step is to try to put the KVM page table in the PTI userspace
page tables, and not switch CR3 on KVM_RUN ioctl. That way, we will run with
a regular mm (but using the userspace page table). Then interrupt would switch
CR3 to kernel page table (like paranoid idtentry currently do it).

alex.
Alexandre Chartre May 14, 2019, 10:26 a.m. UTC | #6
On 5/13/19 6:47 PM, Alexandre Chartre wrote:
> 
> 
> On 5/13/19 5:50 PM, Dave Hansen wrote:
>>> +    /*
>>> +     * Copy the mapping for all the kernel text. We copy at the PMD
>>> +     * level since the PUD is shared with the module mapping space.
>>> +     */
>>> +    rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
>>> +         PGT_LEVEL_PMD);
>>> +    if (rv)
>>> +        goto out_uninit_page_table;
>>
>> Could you double-check this?  We (I) have had some repeated confusion
>> with the PTI code and kernel text vs. kernel data vs. __init.
>> KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
>> kernel text.
> 
> I probably have the same confusion :-) but I will try to check again.
> 
> 

mm.txt says that kernel text is 512MB, and that's probably why I used
KERNEL_IMAGE_SIZE.

https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt

========================================================================================================================
     Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
  [...]
  ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
  [...]


However, vmlinux.lds.S does:

. = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
            "kernel image bigger than KERNEL_IMAGE_SIZE");

So this covers everything between _text and _end, which includes text, data,
init and other stuff

The end of the text section is tagged with _etext. So the text section is
effectively (_etext - _text). This matches with what efi_setup_page_tables()
used to copy kernel text:

int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
{
	[...]
         npages = (_etext - _text) >> PAGE_SHIFT;
         text = __pa(_text);
         pfn = text >> PAGE_SHIFT;

         pf = _PAGE_RW | _PAGE_ENC;
         if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
                 pr_err("Failed to map kernel text 1:1\n");
                 return 1;
         }
	[...]
}


alex.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3764054..0fa44b1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1511,6 +1511,8 @@  static __init int setup_clearcpuid(char *arg)
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+EXPORT_PER_CPU_SYMBOL_GPL(hardirq_stack_ptr);
+
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
diff --git a/arch/x86/kvm/isolation.c b/arch/x86/kvm/isolation.c
index 2052abf..cf5ee0d 100644
--- a/arch/x86/kvm/isolation.c
+++ b/arch/x86/kvm/isolation.c
@@ -10,6 +10,8 @@ 
 #include <linux/printk.h>
 #include <linux/slab.h>
 
+#include <asm/cpu_entry_area.h>
+#include <asm/processor.h>
 #include <asm/mmu_context.h>
 #include <asm/pgalloc.h>
 
@@ -88,6 +90,8 @@  struct mm_struct kvm_mm = {
 DEFINE_STATIC_KEY_FALSE(kvm_isolation_enabled);
 EXPORT_SYMBOL(kvm_isolation_enabled);
 
+static void kvm_isolation_uninit_page_table(void);
+static void kvm_isolation_uninit_mm(void);
 static void kvm_clear_mapping(void *ptr, size_t size,
 			      enum page_table_level level);
 
@@ -1024,10 +1028,130 @@  int kvm_copy_percpu_mapping(void *percpu_ptr, size_t size)
 EXPORT_SYMBOL(kvm_copy_percpu_mapping);
 
 
+static int kvm_isolation_init_page_table(void)
+{
+	void *stack;
+	int cpu, rv;
+
+	/*
+	 * Copy the mapping for all the kernel text. We copy at the PMD
+	 * level since the PUD is shared with the module mapping space.
+	 */
+	rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
+	     PGT_LEVEL_PMD);
+	if (rv)
+		goto out_uninit_page_table;
+
+	/* copy the mapping of per cpu memory */
+	rv = kvm_copy_mapping(pcpu_base_addr, pcpu_unit_size * pcpu_nr_units,
+	     PGT_LEVEL_PMD);
+	if (rv)
+		goto out_uninit_page_table;
+
+	/*
+	 * Copy the mapping for cpu_entry_area and %esp fixup stacks
+	 * (this is based on the PTI userland address space, but probably
+	 * not needed because the KVM address space is not directly
+	 * enterered from userspace). They can both be copied at the P4D
+	 * level since they each have a dedicated P4D entry.
+	 */
+	rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
+	     PGT_LEVEL_P4D);
+	if (rv)
+		goto out_uninit_page_table;
+
+#ifdef CONFIG_X86_ESPFIX64
+	rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
+	     PGT_LEVEL_P4D);
+	if (rv)
+		goto out_uninit_page_table;
+#endif
+
+#ifdef CONFIG_VMAP_STACK
+	/*
+	 * Interrupt stacks are vmap'ed with guard pages, so we need to
+	 * copy mappings.
+	 */
+	for_each_possible_cpu(cpu) {
+		stack = per_cpu(hardirq_stack_ptr, cpu);
+		pr_debug("IRQ Stack %px\n", stack);
+		if (!stack)
+			continue;
+		rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
+		if (rv)
+			goto out_uninit_page_table;
+	}
+
+#endif
+
+	/* copy mapping of the current module (kvm) */
+	rv = kvm_copy_module_mapping();
+	if (rv)
+		goto out_uninit_page_table;
+
+	return 0;
+
+out_uninit_page_table:
+	kvm_isolation_uninit_page_table();
+	return rv;
+}
+
+/*
+ * Free all buffers used by the kvm page table. These buffers are stored
+ * in the kvm_pgt_dgroup_list.
+ */
+static void kvm_isolation_uninit_page_table(void)
+{
+	struct pgt_directory_group *dgroup, *dgroup_next;
+	enum page_table_level level;
+	void *ptr;
+	int i;
+
+	mutex_lock(&kvm_pgt_dgroup_lock);
+
+	list_for_each_entry_safe(dgroup, dgroup_next,
+				 &kvm_pgt_dgroup_list, list) {
+
+		for (i = 0; i < dgroup->count; i++) {
+			ptr = dgroup->directory[i].ptr;
+			level = dgroup->directory[i].level;
+
+			switch (dgroup->directory[i].level) {
+
+			case PGT_LEVEL_PTE:
+				kvm_pte_free(NULL, ptr);
+				break;
+
+			case PGT_LEVEL_PMD:
+				kvm_pmd_free(NULL, ptr);
+				break;
+
+			case PGT_LEVEL_PUD:
+				kvm_pud_free(NULL, ptr);
+				break;
+
+			case PGT_LEVEL_P4D:
+				kvm_p4d_free(NULL, ptr);
+				break;
+
+			default:
+				pr_err("unexpected page directory %d for %px\n",
+				       level, ptr);
+			}
+		}
+
+		list_del(&dgroup->list);
+		kfree(dgroup);
+	}
+
+	mutex_unlock(&kvm_pgt_dgroup_lock);
+}
+
 static int kvm_isolation_init_mm(void)
 {
 	pgd_t *kvm_pgd;
 	gfp_t gfp_mask;
+	int rv;
 
 	gfp_mask = GFP_KERNEL | __GFP_ZERO;
 	kvm_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);
@@ -1054,6 +1178,12 @@  static int kvm_isolation_init_mm(void)
 	mm_init_cpumask(&kvm_mm);
 	init_new_context(NULL, &kvm_mm);
 
+	rv = kvm_isolation_init_page_table();
+	if (rv) {
+		kvm_isolation_uninit_mm();
+		return rv;
+	}
+
 	return 0;
 }
 
@@ -1065,6 +1195,7 @@  static void kvm_isolation_uninit_mm(void)
 
 	destroy_context(&kvm_mm);
 
+	kvm_isolation_uninit_page_table();
 	kvm_free_all_range_mapping();
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
diff --git a/arch/x86/kvm/isolation.h b/arch/x86/kvm/isolation.h
index 3ef2060..1f79e28 100644
--- a/arch/x86/kvm/isolation.h
+++ b/arch/x86/kvm/isolation.h
@@ -3,6 +3,16 @@ 
 #define ARCH_X86_KVM_ISOLATION_H
 
 #include <linux/kvm_host.h>
+#include <linux/export.h>
+
+/*
+ * Copy the memory mapping for the current module. This is defined as a
+ * macro to ensure it is expanded in the module making the call so that
+ * THIS_MODULE has the correct value.
+ */
+#define kvm_copy_module_mapping()			\
+	(kvm_copy_ptes(THIS_MODULE->core_layout.base,	\
+	    THIS_MODULE->core_layout.size))
 
 DECLARE_STATIC_KEY_FALSE(kvm_isolation_enabled);
 
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 70b7123..fb0ab9a 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -70,6 +70,8 @@ 
 
 extern void *pcpu_base_addr;
 extern const unsigned long *pcpu_unit_offsets;
+extern int pcpu_unit_size;
+extern int pcpu_nr_units;
 
 struct pcpu_group_info {
 	int			nr_units;	/* aligned # of units */
diff --git a/mm/percpu.c b/mm/percpu.c
index 68dd2e7..b68b3d8 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -119,8 +119,10 @@ 
 #endif	/* CONFIG_SMP */
 
 static int pcpu_unit_pages __ro_after_init;
-static int pcpu_unit_size __ro_after_init;
-static int pcpu_nr_units __ro_after_init;
+int pcpu_unit_size __ro_after_init;
+EXPORT_SYMBOL(pcpu_unit_size);
+int pcpu_nr_units __ro_after_init;
+EXPORT_SYMBOL(pcpu_nr_units);
 static int pcpu_atom_size __ro_after_init;
 int pcpu_nr_slots __ro_after_init;
 static size_t pcpu_chunk_struct_size __ro_after_init;