Message ID | 1557758315-12667-7-git-send-email-alexandre.chartre@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM Address Space Isolation | expand |
On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > From: Liran Alon <liran.alon@oracle.com> > > Next commits will change most of KVM #VMExit handlers to run > in KVM isolated address space. Any interrupt handler raised > during execution in KVM address space needs to switch back > to host address space. > > This patch makes sure that IRQ handlers will run in full > host address space instead of KVM isolated address space. IMO this needs to be somewhere a lot more central. What about NMI and MCE? Or async page faults? Or any other entry? --Andy
On 5/13/19 5:51 PM, Andy Lutomirski wrote: > On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre > <alexandre.chartre@oracle.com> wrote: >> >> From: Liran Alon <liran.alon@oracle.com> >> >> Next commits will change most of KVM #VMExit handlers to run >> in KVM isolated address space. Any interrupt handler raised >> during execution in KVM address space needs to switch back >> to host address space. >> >> This patch makes sure that IRQ handlers will run in full >> host address space instead of KVM isolated address space. > > IMO this needs to be somewhere a lot more central. What about NMI and > MCE? Or async page faults? Or any other entry? > Actually, I am not sure this is effectively useful because the IRQ handler is probably faulting before it tries to exit isolation, so the isolation exit will be done by the kvm page fault handler. I need to check that. alex.
On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > > > On 5/13/19 5:51 PM, Andy Lutomirski wrote: > > On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre > > <alexandre.chartre@oracle.com> wrote: > >> > >> From: Liran Alon <liran.alon@oracle.com> > >> > >> Next commits will change most of KVM #VMExit handlers to run > >> in KVM isolated address space. Any interrupt handler raised > >> during execution in KVM address space needs to switch back > >> to host address space. > >> > >> This patch makes sure that IRQ handlers will run in full > >> host address space instead of KVM isolated address space. > > > > IMO this needs to be somewhere a lot more central. What about NMI and > > MCE? Or async page faults? Or any other entry? > > > > Actually, I am not sure this is effectively useful because the IRQ > handler is probably faulting before it tries to exit isolation, so > the isolation exit will be done by the kvm page fault handler. I need > to check that. > The whole idea of having #PF exit with a different CR3 than was loaded on entry seems questionable to me. I'd be a lot more comfortable with the whole idea if a page fault due to accessing the wrong data was an OOPS and the code instead just did the right thing directly. --Andy
On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote: > On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre > <alexandre.chartre@oracle.com> wrote: > > Actually, I am not sure this is effectively useful because the IRQ > > handler is probably faulting before it tries to exit isolation, so > > the isolation exit will be done by the kvm page fault handler. I need > > to check that. > > > > The whole idea of having #PF exit with a different CR3 than was loaded > on entry seems questionable to me. I'd be a lot more comfortable with > the whole idea if a page fault due to accessing the wrong data was an > OOPS and the code instead just did the right thing directly. So I've ran into this idea before; it basically allows a lazy approach to things. I'm somewhat conflicted on things, on the one hand, changing CR3 from #PF is a natural extention in that #PF already changes page-tables (for userspace / vmalloc etc..), on the other hand, there's a thin line between being lazy and being sloppy. If we're going down this route; I think we need a very coherent design and strong rules.
On 5/14/19 9:07 AM, Peter Zijlstra wrote: > On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote: >> On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre >> <alexandre.chartre@oracle.com> wrote: > >>> Actually, I am not sure this is effectively useful because the IRQ >>> handler is probably faulting before it tries to exit isolation, so >>> the isolation exit will be done by the kvm page fault handler. I need >>> to check that. >>> >> >> The whole idea of having #PF exit with a different CR3 than was loaded >> on entry seems questionable to me. I'd be a lot more comfortable with >> the whole idea if a page fault due to accessing the wrong data was an >> OOPS and the code instead just did the right thing directly. > > So I've ran into this idea before; it basically allows a lazy approach > to things. > > I'm somewhat conflicted on things, on the one hand, changing CR3 from > #PF is a natural extention in that #PF already changes page-tables (for > userspace / vmalloc etc..), on the other hand, there's a thin line > between being lazy and being sloppy. > > If we're going down this route; I think we need a very coherent design > and strong rules. > Right. We should particularly ensure that the KVM page-table remains a subset of the kernel page-table, in particular page-table changes (e.g. for vmalloc etc...) should happen in the kernel page-table and not in the kvm page-table. So we should probably enforce switching to the kernel page-table when doing operation like vmalloc. The current code doesn't enforce it, but I can see it faulting, when doing any allocation (because the kvm page table doesn't have all structures used during an allocation). alex.
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 130e81e..606da8f 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -515,7 +515,7 @@ static inline unsigned int read_apic_id(void) static inline void entering_irq(void) { irq_enter(); - kvm_set_cpu_l1tf_flush_l1d(); + kvm_cpu_may_access_sensitive_data(); } static inline void entering_ack_irq(void) @@ -528,7 +528,7 @@ static inline void ipi_entering_ack_irq(void) { irq_enter(); ack_APIC_irq(); - kvm_set_cpu_l1tf_flush_l1d(); + kvm_cpu_may_access_sensitive_data(); } static inline void exiting_irq(void) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index d9069bb..e082ecb 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -80,4 +80,14 @@ static inline bool kvm_get_cpu_l1tf_flush_l1d(void) static inline void kvm_set_cpu_l1tf_flush_l1d(void) { } #endif /* IS_ENABLED(CONFIG_KVM_INTEL) */ +#ifdef CONFIG_HAVE_KVM +extern void (*kvm_isolation_exit_handler)(void); + +static inline void kvm_cpu_may_access_sensitive_data(void) +{ + kvm_set_cpu_l1tf_flush_l1d(); + kvm_isolation_exit_handler(); +} +#endif + #endif /* _ASM_X86_HARDIRQ_H */ diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 04adc8d..b99fda0 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -261,7 +261,7 @@ __visible void __irq_entry smp_reschedule_interrupt(struct pt_regs *regs) { ack_APIC_irq(); inc_irq_stat(irq_resched_count); - kvm_set_cpu_l1tf_flush_l1d(); + kvm_cpu_may_access_sensitive_data(); if (trace_resched_ipi_enabled()) { /* diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c index 1297e18..83a17ca 100644 --- a/arch/x86/platform/uv/tlb_uv.c +++ b/arch/x86/platform/uv/tlb_uv.c @@ -1285,7 +1285,7 @@ void uv_bau_message_interrupt(struct pt_regs *regs) struct msg_desc msgdesc; ack_APIC_irq(); - kvm_set_cpu_l1tf_flush_l1d(); + kvm_cpu_may_access_sensitive_data(); time_start = get_cycles(); bcp = &per_cpu(bau_control, smp_processor_id());