Message ID | 20210414144945.3460554-1-ltykernel@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/Hyper-V: Add Hyper-V Isolation VM support | expand |
> +/* > + * hv_set_mem_host_visibility - Set host visibility for specified memory. > + */ I don't think this comment really clarifies anything over the function name. What is 'host visibility' > +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility) Should size be a size_t? Should visibility be an enum of some kind? > +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility) Not sure what this does either. > + local_irq_save(flags); > + input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **) Is there a chance we could find a shorter but still descriptive name for this variable? Why do we need the cast? > +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE > +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE) pointlessly overlong line.
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
Just curious, who is going to use all these exports? These seems like
extremely low-level functionality. Isn't there a way to build a more
useful higher level API?
> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) > +{ > + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; > + > + if (!hv_is_isolation_supported()) > + return phys; > + > + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir, > + attrs); > + if (map == (phys_addr_t)DMA_MAPPING_ERROR) > + return DMA_MAPPING_ERROR; > + > + return map; > +} This largerly duplicates what dma-direct + swiotlb does. Please use force_dma_unencrypted to force bounce buffering and just use the generic code. > + if (hv_isolation_type_snp()) { > + ret = hv_set_mem_host_visibility( > + phys_to_virt(hyperv_io_tlb_start), > + hyperv_io_tlb_size, > + VMBUS_PAGE_VISIBLE_READ_WRITE); > + if (ret) > + panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n", > + __func__, ret); > + > + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start > + + ms_hyperv.shared_gpa_boundary, > + hyperv_io_tlb_size); > + if (!hyperv_io_tlb_remap) > + panic("%s: Fail to remap io tlb.\n", __func__); > + > + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); > + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); And this really needs to go into a common hook where we currently just call set_memory_decrypted so that all the different schemes for these trusted VMs (we have about half a dozen now) can share code rather than reinventing it.
Hi Christoph: Thanks for your review. On 4/14/2021 11:40 PM, Christoph Hellwig wrote: >> +/* >> + * hv_set_mem_host_visibility - Set host visibility for specified memory. >> + */ > > I don't think this comment really clarifies anything over the function > name. What is 'host visibility' OK. Will update the comment. > >> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility) > > Should size be a size_t? > Should visibility be an enum of some kind? > Will update. >> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility) > > Not sure what this does either. Will add a comment. > >> + local_irq_save(flags); >> + input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **) > > Is there a chance we could find a shorter but still descriptive > name for this variable? Why do we need the cast? Sure. The cast is to avoid build error due to "incompatible-pointer-types" > >> +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE >> +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE) > > pointlessly overlong line. >
On 4/14/2021 11:41 PM, Christoph Hellwig wrote: >> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write); > > Just curious, who is going to use all these exports? These seems like > extremely low-level functionality. Isn't there a way to build a more > useful higher level API? > Yes, will remove it.
On Wed, Apr 14, 2021 at 10:49:37AM -0400, Tianyu Lan wrote: > From: Tianyu Lan <Tianyu.Lan@microsoft.com> > > Hyper-V provides GHCB protocol to write Synthetic Interrupt > Controller MSR registers and these registers are emulated by > Hypervisor rather than paravisor. What is paravisor? Is that the VMPL0 to borrow AMD speak from https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf > > Hyper-V requests to write SINTx MSR registers twice(once via > GHCB and once via wrmsr instruction including the proxy bit 21) Why? And what does 'proxy bit 21' mean? Isn't it just setting a bit on the MSR? Oh. Are you writting to it twice because you need to let the hypervisor know (Hyper-V) which is done via the WRMSR. And then the inner hypervisor (VMPL0) via the SINITx MSR? > Guest OS ID MSR also needs to be set via GHCB. > > Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> > --- > arch/x86/hyperv/hv_init.c | 18 +---- > arch/x86/hyperv/ivm.c | 130 ++++++++++++++++++++++++++++++++ > arch/x86/include/asm/mshyperv.h | 87 +++++++++++++++++---- > arch/x86/kernel/cpu/mshyperv.c | 3 + > drivers/hv/hv.c | 65 +++++++++++----- > include/asm-generic/mshyperv.h | 4 +- > 6 files changed, 261 insertions(+), 46 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 90e65fbf4c58..87b1dd9c84d6 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -475,6 +475,9 @@ void __init hyperv_init(void) > > ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > *ghcb_base = ghcb_va; > + > + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */ > + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id); > } > > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > @@ -561,6 +564,7 @@ void hyperv_cleanup(void) > > /* Reset our OS id */ > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); > > /* > * Reset hypercall page reference before reset the page, > @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void) > return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4); > } > EXPORT_SYMBOL_GPL(hv_is_hibernation_supported); > - > -enum hv_isolation_type hv_get_isolation_type(void) > -{ > - if (!(ms_hyperv.features_b & HV_ISOLATION)) > - return HV_ISOLATION_TYPE_NONE; > - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b); > -} > -EXPORT_SYMBOL_GPL(hv_get_isolation_type); > - > -bool hv_is_isolation_supported(void) > -{ > - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; > -} > -EXPORT_SYMBOL_GPL(hv_is_isolation_supported); > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c > index a5950b7a9214..2ec64b367aaf 100644 > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -6,12 +6,139 @@ > * Tianyu Lan <Tianyu.Lan@microsoft.com> > */ > > +#include <linux/types.h> > +#include <linux/bitfield.h> > #include <linux/hyperv.h> > #include <linux/types.h> > #include <linux/bitfield.h> > #include <asm/io.h> > +#include <asm/svm.h> > +#include <asm/sev-es.h> > #include <asm/mshyperv.h> > > +union hv_ghcb { > + struct ghcb ghcb; > +} __packed __aligned(PAGE_SIZE); > + > +void hv_ghcb_msr_write(u64 msr, u64 value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); > + > + hv_ghcb->ghcb.protocol_version = 1; > + hv_ghcb->ghcb.ghcb_usage = 0; > + > + ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR); > + ghcb_set_rcx(&hv_ghcb->ghcb, msr); > + ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value)); > + ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32); > + ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1); > + ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0); > + > + VMGEXIT(); > + > + if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1) > + pr_warn("Fail to write msr via ghcb.\n."); > + > + local_irq_restore(flags); > +} > +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write); > + > +void hv_ghcb_msr_read(u64 msr, u64 *value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, PAGE_SIZE); > + hv_ghcb->ghcb.protocol_version = 1; > + hv_ghcb->ghcb.ghcb_usage = 0; > + > + ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR); > + ghcb_set_rcx(&hv_ghcb->ghcb, msr); > + ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 0); > + ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0); > + > + VMGEXIT(); > + > + if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1) > + pr_warn("Fail to write msr via ghcb.\n."); > + else > + *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax) > + | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32); > + local_irq_restore(flags); > +} > +EXPORT_SYMBOL_GPL(hv_ghcb_msr_read); > + > +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value) > +{ > + hv_ghcb_msr_read(msr, value); > +} > +EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb); > + > +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value) > +{ > + hv_ghcb_msr_write(msr, value); > + > + /* Write proxy bit vua wrmsrl instruction. */ > + if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15) > + wrmsrl(msr, value | 1 << 20); > +} > +EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb); > + > +inline void hv_signal_eom_ghcb(void) > +{ > + hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0); > +} > +EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb); > + > +enum hv_isolation_type hv_get_isolation_type(void) > +{ > + if (!(ms_hyperv.features_b & HV_ISOLATION)) > + return HV_ISOLATION_TYPE_NONE; > + return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b); > +} > +EXPORT_SYMBOL_GPL(hv_get_isolation_type); > + > +bool hv_is_isolation_supported(void) > +{ > + return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; > +} > +EXPORT_SYMBOL_GPL(hv_is_isolation_supported); > + > +DEFINE_STATIC_KEY_FALSE(isolation_type_snp); > + > +bool hv_isolation_type_snp(void) > +{ > + return static_branch_unlikely(&isolation_type_snp); > +} > +EXPORT_SYMBOL_GPL(hv_isolation_type_snp); > + > /* > * hv_set_mem_host_visibility - Set host visibility for specified memory. > */ > @@ -22,6 +149,9 @@ int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility) > u64 *pfn_array; > int ret = 0; > > + if (!hv_is_isolation_supported()) > + return 0; > + > pfn_array = vzalloc(HV_HYP_PAGE_SIZE); > if (!pfn_array) > return -ENOMEM; > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index d9437f096ce5..73501dbbc240 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -10,6 +10,8 @@ > #include <asm/nospec-branch.h> > #include <asm/paravirt.h> > > +DECLARE_STATIC_KEY_FALSE(isolation_type_snp); > + > typedef int (*hyperv_fill_flush_list_func)( > struct hv_guest_mapping_flush_list *flush, > void *data); > @@ -19,24 +21,64 @@ typedef int (*hyperv_fill_flush_list_func)( > #define hv_init_timer_config(timer, val) \ > wrmsrl(HV_X64_MSR_STIMER0_CONFIG + (2*timer), val) > > -#define hv_get_simp(val) rdmsrl(HV_X64_MSR_SIMP, val) > -#define hv_set_simp(val) wrmsrl(HV_X64_MSR_SIMP, val) > +#define hv_get_sint_reg(val, reg) { \ > + if (hv_isolation_type_snp()) \ > + hv_get_##reg##_ghcb(&val); \ > + else \ > + rdmsrl(HV_X64_MSR_##reg, val); \ > + } > + > +#define hv_set_sint_reg(val, reg) { \ > + if (hv_isolation_type_snp()) \ > + hv_set_##reg##_ghcb(val); \ > + else \ > + wrmsrl(HV_X64_MSR_##reg, val); \ > + } > + > > -#define hv_get_siefp(val) rdmsrl(HV_X64_MSR_SIEFP, val) > -#define hv_set_siefp(val) wrmsrl(HV_X64_MSR_SIEFP, val) > +#define hv_get_simp(val) hv_get_sint_reg(val, SIMP) > +#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP) > > -#define hv_get_synic_state(val) rdmsrl(HV_X64_MSR_SCONTROL, val) > -#define hv_set_synic_state(val) wrmsrl(HV_X64_MSR_SCONTROL, val) > +#define hv_set_simp(val) hv_set_sint_reg(val, SIMP) > +#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP) > + > +#define hv_get_synic_state(val) { \ > + if (hv_isolation_type_snp()) \ > + hv_get_synic_state_ghcb(&val); \ > + else \ > + rdmsrl(HV_X64_MSR_SCONTROL, val); \ > + } > +#define hv_set_synic_state(val) { \ > + if (hv_isolation_type_snp()) \ > + hv_set_synic_state_ghcb(val); \ > + else \ > + wrmsrl(HV_X64_MSR_SCONTROL, val); \ > + } > > #define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index) > > -#define hv_signal_eom() wrmsrl(HV_X64_MSR_EOM, 0) > +#define hv_signal_eom() { \ > + if (hv_isolation_type_snp() && \ > + old_msg_type != HVMSG_TIMER_EXPIRED) \ > + hv_signal_eom_ghcb(); \ > + else \ > + wrmsrl(HV_X64_MSR_EOM, 0); \ > + } > > -#define hv_get_synint_state(int_num, val) \ > - rdmsrl(HV_X64_MSR_SINT0 + int_num, val) > -#define hv_set_synint_state(int_num, val) \ > - wrmsrl(HV_X64_MSR_SINT0 + int_num, val) > -#define hv_recommend_using_aeoi() \ > +#define hv_get_synint_state(int_num, val) { \ > + if (hv_isolation_type_snp()) \ > + hv_get_synint_state_ghcb(int_num, &val);\ > + else \ > + rdmsrl(HV_X64_MSR_SINT0 + int_num, val);\ > + } > +#define hv_set_synint_state(int_num, val) { \ > + if (hv_isolation_type_snp()) \ > + hv_set_synint_state_ghcb(int_num, val); \ > + else \ > + wrmsrl(HV_X64_MSR_SINT0 + int_num, val);\ > + } > + > +#define hv_recommend_using_aeoi() \ > (!(ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)) > > #define hv_get_crash_ctl(val) \ > @@ -271,6 +313,25 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); > > int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility); > int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility); > +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value); > +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value); > +void hv_signal_eom_ghcb(void); > +void hv_ghcb_msr_write(u64 msr, u64 value); > +void hv_ghcb_msr_read(u64 msr, u64 *value); > + > +#define hv_get_synint_state_ghcb(int_num, val) \ > + hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val) > +#define hv_set_synint_state_ghcb(int_num, val) \ > + hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val) > + > +#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val) > +#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val) > + > +#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val) > +#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val) > + > +#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val) > +#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val) > #else /* CONFIG_HYPERV */ > static inline void hyperv_init(void) {} > static inline void hyperv_setup_mmu_ops(void) {} > @@ -289,9 +350,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as, > { > return -1; > } > +static inline void hv_signal_eom_ghcb(void) { }; > #endif /* CONFIG_HYPERV */ > > - > #include <asm-generic/mshyperv.h> > > #endif > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > index aeafd4017c89..6eaa0891c0f7 100644 > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -333,6 +333,9 @@ static void __init ms_hyperv_init_platform(void) > > pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", > ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); > + > + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) > + static_branch_enable(&isolation_type_snp); > } > > if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) { > diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c > index f202ac7f4b3d..069530eeb7c6 100644 > --- a/drivers/hv/hv.c > +++ b/drivers/hv/hv.c > @@ -99,17 +99,24 @@ int hv_synic_alloc(void) > tasklet_init(&hv_cpu->msg_dpc, > vmbus_on_msg_dpc, (unsigned long) hv_cpu); > > - hv_cpu->synic_message_page = > - (void *)get_zeroed_page(GFP_ATOMIC); > - if (hv_cpu->synic_message_page == NULL) { > - pr_err("Unable to allocate SYNIC message page\n"); > - goto err; > - } > + /* > + * Synic message and event pages are allocated by paravisor. > + * Skip these pages allocation here. > + */ > + if (!hv_isolation_type_snp()) { > + hv_cpu->synic_message_page = > + (void *)get_zeroed_page(GFP_ATOMIC); > + if (hv_cpu->synic_message_page == NULL) { > + pr_err("Unable to allocate SYNIC message page\n"); > + goto err; > + } > > - hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC); > - if (hv_cpu->synic_event_page == NULL) { > - pr_err("Unable to allocate SYNIC event page\n"); > - goto err; > + hv_cpu->synic_event_page = > + (void *)get_zeroed_page(GFP_ATOMIC); > + if (hv_cpu->synic_event_page == NULL) { > + pr_err("Unable to allocate SYNIC event page\n"); > + goto err; > + } > } > > hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC); > @@ -136,10 +143,17 @@ void hv_synic_free(void) > for_each_present_cpu(cpu) { > struct hv_per_cpu_context *hv_cpu > = per_cpu_ptr(hv_context.cpu_context, cpu); > + free_page((unsigned long)hv_cpu->post_msg_page); > + > + /* > + * Synic message and event pages are allocated by paravisor. > + * Skip free these pages here. > + */ > + if (hv_isolation_type_snp()) > + continue; > > free_page((unsigned long)hv_cpu->synic_event_page); > free_page((unsigned long)hv_cpu->synic_message_page); > - free_page((unsigned long)hv_cpu->post_msg_page); > } > > kfree(hv_context.hv_numa_map); > @@ -161,20 +175,36 @@ void hv_synic_enable_regs(unsigned int cpu) > union hv_synic_sint shared_sint; > union hv_synic_scontrol sctrl; > > - /* Setup the Synic's message page */ > + > + /* Setup the Synic's message. */ > hv_get_simp(simp.as_uint64); > simp.simp_enabled = 1; > - simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page) > - >> HV_HYP_PAGE_SHIFT; > + > + if (hv_isolation_type_snp()) { > + hv_cpu->synic_message_page > + = ioremap_cache(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT, > + HV_HYP_PAGE_SIZE); > + if (!hv_cpu->synic_message_page) > + pr_err("Fail to map syinc message page.\n"); > + } else { > + simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page) > + >> HV_HYP_PAGE_SHIFT; > + } > > hv_set_simp(simp.as_uint64); > > /* Setup the Synic's event page */ > hv_get_siefp(siefp.as_uint64); > siefp.siefp_enabled = 1; > - siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page) > - >> HV_HYP_PAGE_SHIFT; > - > + if (hv_isolation_type_snp()) { > + hv_cpu->synic_event_page = ioremap_cache( > + siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE); > + if (!hv_cpu->synic_event_page) > + pr_err("Fail to map syinc event page.\n"); > + } else { > + siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page) > + >> HV_HYP_PAGE_SHIFT; > + } > hv_set_siefp(siefp.as_uint64); > > /* Setup the shared SINT. */ > @@ -188,7 +218,6 @@ void hv_synic_enable_regs(unsigned int cpu) > /* Enable the global synic bit */ > hv_get_synic_state(sctrl.as_uint64); > sctrl.enable = 1; > - > hv_set_synic_state(sctrl.as_uint64); > } > > diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h > index b73e201abc70..59777377e641 100644 > --- a/include/asm-generic/mshyperv.h > +++ b/include/asm-generic/mshyperv.h > @@ -23,6 +23,7 @@ > #include <linux/bitops.h> > #include <linux/cpumask.h> > #include <asm/ptrace.h> > +#include <asm/mshyperv.h> > #include <asm/hyperv-tlfs.h> > > struct ms_hyperv_info { > @@ -52,7 +53,7 @@ extern struct ms_hyperv_info ms_hyperv; > > extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr); > extern u64 hv_do_fast_hypercall8(u16 control, u64 input8); > - > +extern bool hv_isolation_type_snp(void); > > /* Generate the guest OS identifier as described in the Hyper-V TLFS */ > static inline __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version, > @@ -186,6 +187,7 @@ bool hv_is_hyperv_initialized(void); > bool hv_is_hibernation_supported(void); > enum hv_isolation_type hv_get_isolation_type(void); > bool hv_is_isolation_supported(void); > +bool hv_isolation_type_snp(void); > void hyperv_cleanup(void); > #else /* CONFIG_HYPERV */ > static inline bool hv_is_hyperv_initialized(void) { return false; } > -- > 2.25.1 >
On Wed, Apr 14, 2021 at 10:49:40AM -0400, Tianyu Lan wrote: > From: Tianyu Lan <Tianyu.Lan@microsoft.com> > > VMbus ring buffer are shared with host and it's need to > be accessed via extra address space of Isolation VM with > SNP support. This patch is to map the ring buffer > address in extra address space via ioremap(). HV host Why do you need to use ioremap()? Why not just use vmap? > visibility hvcall smears data in the ring buffer and > so reset the ring buffer memory to zero after calling > visibility hvcall. So you are exposing these two: EXPORT_SYMBOL_GPL(get_vm_area); EXPORT_SYMBOL_GPL(ioremap_page_range); But if you used vmap wouldn't you get the same thing for free? > > Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> > --- > drivers/hv/channel.c | 10 +++++ > drivers/hv/hyperv_vmbus.h | 2 + > drivers/hv/ring_buffer.c | 83 +++++++++++++++++++++++++++++---------- > mm/ioremap.c | 1 + > mm/vmalloc.c | 1 + > 5 files changed, 76 insertions(+), 21 deletions(-) > > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c > index 407b74d72f3f..4a9fb7ad4c72 100644 > --- a/drivers/hv/channel.c > +++ b/drivers/hv/channel.c > @@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel, > if (err) > goto error_clean_ring; > > + err = hv_ringbuffer_post_init(&newchannel->outbound, > + page, send_pages); > + if (err) > + goto error_free_gpadl; > + > + err = hv_ringbuffer_post_init(&newchannel->inbound, > + &page[send_pages], recv_pages); > + if (err) > + goto error_free_gpadl; > + > /* Create and init the channel open message */ > open_info = kzalloc(sizeof(*open_info) + > sizeof(struct vmbus_channel_open_channel), > diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h > index 0778add21a9c..d78a04ad5490 100644 > --- a/drivers/hv/hyperv_vmbus.h > +++ b/drivers/hv/hyperv_vmbus.h > @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu); > /* Interface */ > > void hv_ringbuffer_pre_init(struct vmbus_channel *channel); > +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, > + struct page *pages, u32 page_cnt); > > int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, > struct page *pages, u32 pagecnt); > diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c > index 35833d4d1a1d..c8b0f7b45158 100644 > --- a/drivers/hv/ring_buffer.c > +++ b/drivers/hv/ring_buffer.c > @@ -17,6 +17,8 @@ > #include <linux/vmalloc.h> > #include <linux/slab.h> > #include <linux/prefetch.h> > +#include <linux/io.h> > +#include <asm/mshyperv.h> > > #include "hyperv_vmbus.h" > > @@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel) > mutex_init(&channel->outbound.ring_buffer_mutex); > } > > +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, > + struct page *pages, u32 page_cnt) > +{ > + struct vm_struct *area; > + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT; > + unsigned long vaddr; > + int err = 0; > + > + if (!hv_isolation_type_snp()) > + return 0; > + > + physic_addr += ms_hyperv.shared_gpa_boundary; > + area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP); > + if (!area || !area->addr) > + return -EFAULT; > + > + vaddr = (unsigned long)area->addr; > + err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE, > + physic_addr, PAGE_KERNEL_IO); > + err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE, > + vaddr + (2 * page_cnt - 1) * PAGE_SIZE, > + physic_addr + PAGE_SIZE, PAGE_KERNEL_IO); > + if (err) { > + vunmap((void *)vaddr); > + return -EFAULT; > + } > + > + /* Clean memory after setting host visibility. */ > + memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE); > + > + ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr; > + ring_info->ring_buffer->read_index = 0; > + ring_info->ring_buffer->write_index = 0; > + ring_info->ring_buffer->feature_bits.value = 1; > + > + return 0; > +} > + > /* Initialize the ring buffer. */ > int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, > struct page *pages, u32 page_cnt) > @@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, > > BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE)); > > - /* > - * First page holds struct hv_ring_buffer, do wraparound mapping for > - * the rest. > - */ > - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *), > - GFP_KERNEL); > - if (!pages_wraparound) > - return -ENOMEM; > - > - pages_wraparound[0] = pages; > - for (i = 0; i < 2 * (page_cnt - 1); i++) > - pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1]; > + if (!hv_isolation_type_snp()) { > + /* > + * First page holds struct hv_ring_buffer, do wraparound mapping for > + * the rest. > + */ > + pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *), > + GFP_KERNEL); > + if (!pages_wraparound) > + return -ENOMEM; > > - ring_info->ring_buffer = (struct hv_ring_buffer *) > - vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL); > + pages_wraparound[0] = pages; > + for (i = 0; i < 2 * (page_cnt - 1); i++) > + pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1]; > > - kfree(pages_wraparound); > + ring_info->ring_buffer = (struct hv_ring_buffer *) > + vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL); > > + kfree(pages_wraparound); > > - if (!ring_info->ring_buffer) > - return -ENOMEM; > + if (!ring_info->ring_buffer) > + return -ENOMEM; > > - ring_info->ring_buffer->read_index = > - ring_info->ring_buffer->write_index = 0; > + ring_info->ring_buffer->read_index = > + ring_info->ring_buffer->write_index = 0; > > - /* Set the feature bit for enabling flow control. */ > - ring_info->ring_buffer->feature_bits.value = 1; > + /* Set the feature bit for enabling flow control. */ > + ring_info->ring_buffer->feature_bits.value = 1; > + } > > ring_info->ring_size = page_cnt << PAGE_SHIFT; > ring_info->ring_size_div10_reciprocal = > diff --git a/mm/ioremap.c b/mm/ioremap.c > index 5fa1ab41d152..d63c4ba067f9 100644 > --- a/mm/ioremap.c > +++ b/mm/ioremap.c > @@ -248,6 +248,7 @@ int ioremap_page_range(unsigned long addr, > > return err; > } > +EXPORT_SYMBOL_GPL(ioremap_page_range); > > #ifdef CONFIG_GENERIC_IOREMAP > void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot) > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e6f352bf0498..19724a8ebcb7 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2131,6 +2131,7 @@ struct vm_struct *get_vm_area(unsigned long size, unsigned long flags) > NUMA_NO_NODE, GFP_KERNEL, > __builtin_return_address(0)); > } > +EXPORT_SYMBOL_GPL(get_vm_area); > > struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, > const void *caller) > -- > 2.25.1 >
On Wed, Apr 14, 2021 at 10:49:42AM -0400, Tianyu Lan wrote: > From: Tianyu Lan <Tianyu.Lan@microsoft.com> > > For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory) > needs to be accessed via extra address space(e.g address above bit39). > Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce() May? Isn't this a MUST in this case? > needs to use remap virtual address to copy data from/to bounce buffer. Add > new interface swiotlb_set_bounce_remap() to do that. I am bit lost - why can't you use the swiotlb_init and pass in your DMA pool that is already above bit 39? > > Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> > --- > include/linux/swiotlb.h | 5 +++++ > kernel/dma/swiotlb.c | 13 ++++++++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index d9c9fc9ca5d2..3ccd08116683 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void); > size_t swiotlb_max_mapping_size(struct device *dev); > bool is_swiotlb_active(void); > void __init swiotlb_adjust_size(unsigned long new_size); > +void swiotlb_set_bounce_remap(unsigned char *vaddr); > #else > #define swiotlb_force SWIOTLB_NO_FORCE > +static inline void swiotlb_set_bounce_remap(unsigned char *vaddr) > +{ > +} > + > static inline bool is_swiotlb_buffer(phys_addr_t paddr) > { > return false; > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 7c42df6e6100..5fd2db6aa149 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -94,6 +94,7 @@ static unsigned int io_tlb_index; > * not be bounced (unless SWIOTLB_FORCE is set). > */ > static unsigned int max_segment; > +static unsigned char *swiotlb_bounce_remap_addr; > > /* > * We need to save away the original address corresponding to a mapped entry > @@ -421,6 +422,11 @@ void __init swiotlb_exit(void) > swiotlb_cleanup(); > } > > +void swiotlb_set_bounce_remap(unsigned char *vaddr) > +{ > + swiotlb_bounce_remap_addr = vaddr; > +} > + > /* > * Bounce: copy the swiotlb buffer from or back to the original dma location > */ > @@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > size_t size, enum dma_data_direction dir) > { > unsigned long pfn = PFN_DOWN(orig_addr); > - unsigned char *vaddr = phys_to_virt(tlb_addr); > + unsigned char *vaddr; > + > + if (swiotlb_bounce_remap_addr) > + vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start; > + else > + vaddr = phys_to_virt(tlb_addr); > > if (PageHighMem(pfn_to_page(pfn))) { > /* The buffer does not have a mapping. Map it in and copy */ > -- > 2.25.1 >
On Thu, Apr 15, 2021 at 04:24:15PM -0400, Konrad Rzeszutek Wilk wrote: > So you are exposing these two: > EXPORT_SYMBOL_GPL(get_vm_area); > EXPORT_SYMBOL_GPL(ioremap_page_range); > > But if you used vmap wouldn't you get the same thing for free? Yes, this needs to go into some vmap version, preferably reusing the existing code in kernel/dma/remap.c. Exporting get_vm_area is a complete dealbreaker and not going to happen. We worked hard on not exposing it to modules.
Hi Christoph and Konrad: Current Swiotlb bounce buffer uses a pool for all devices. There is a high overhead to get or free bounce buffer during performance test. Swiotlb code now use a global spin lock to protect bounce buffer data. Several device queues try to acquire the spin lock and this introduce additional overhead. For performance and security perspective, each devices should have a separate swiotlb bounce buffer pool and so this part needs to rework. I want to check this is right way to resolve performance issues with swiotlb bounce buffer. If you have some other suggestions,welcome. Thanks. On 4/14/2021 11:47 PM, Christoph Hellwig wrote: >> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page, >> + unsigned long offset, size_t size, >> + enum dma_data_direction dir, >> + unsigned long attrs) >> +{ >> + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; >> + >> + if (!hv_is_isolation_supported()) >> + return phys; >> + >> + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir, >> + attrs); >> + if (map == (phys_addr_t)DMA_MAPPING_ERROR) >> + return DMA_MAPPING_ERROR; >> + >> + return map; >> +} > > This largerly duplicates what dma-direct + swiotlb does. Please use > force_dma_unencrypted to force bounce buffering and just use the generic > code. > >> + if (hv_isolation_type_snp()) { >> + ret = hv_set_mem_host_visibility( >> + phys_to_virt(hyperv_io_tlb_start), >> + hyperv_io_tlb_size, >> + VMBUS_PAGE_VISIBLE_READ_WRITE); >> + if (ret) >> + panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n", >> + __func__, ret); >> + >> + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start >> + + ms_hyperv.shared_gpa_boundary, >> + hyperv_io_tlb_size); >> + if (!hyperv_io_tlb_remap) >> + panic("%s: Fail to remap io tlb.\n", __func__); >> + >> + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); >> + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); > > And this really needs to go into a common hook where we currently just > call set_memory_decrypted so that all the different schemes for these > trusted VMs (we have about half a dozen now) can share code rather than > reinventing it. >
On 2021-05-12 17:01, Tianyu Lan wrote: > Hi Christoph and Konrad: > Current Swiotlb bounce buffer uses a pool for all devices. There > is a high overhead to get or free bounce buffer during performance test. > Swiotlb code now use a global spin lock to protect bounce buffer data. > Several device queues try to acquire the spin lock and this introduce > additional overhead. > > For performance and security perspective, each devices should have a > separate swiotlb bounce buffer pool and so this part needs to rework. > I want to check this is right way to resolve performance issues with > swiotlb bounce buffer. If you have some other suggestions,welcome. We're already well on the way to factoring out SWIOTLB to allow for just this sort of more flexible usage like per-device bounce pools - see here: https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/T/#t FWIW this looks to have an awful lot in common with what's going to be needed for Android's protected KVM and Arm's Confidential Compute Architecture, so we'll all be better off by doing it right. I'm getting the feeling that set_memory_decrypted() wants to grow into a more general abstraction that can return an alias at a different GPA if necessary. Robin. > > Thanks. > > On 4/14/2021 11:47 PM, Christoph Hellwig wrote: >>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page >>> *page, >>> + unsigned long offset, size_t size, >>> + enum dma_data_direction dir, >>> + unsigned long attrs) >>> +{ >>> + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; >>> + >>> + if (!hv_is_isolation_supported()) >>> + return phys; >>> + >>> + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, >>> dir, >>> + attrs); >>> + if (map == (phys_addr_t)DMA_MAPPING_ERROR) >>> + return DMA_MAPPING_ERROR; >>> + >>> + return map; >>> +} >> >> This largerly duplicates what dma-direct + swiotlb does. Please use >> force_dma_unencrypted to force bounce buffering and just use the generic >> code. >> >>> + if (hv_isolation_type_snp()) { >>> + ret = hv_set_mem_host_visibility( >>> + phys_to_virt(hyperv_io_tlb_start), >>> + hyperv_io_tlb_size, >>> + VMBUS_PAGE_VISIBLE_READ_WRITE); >>> + if (ret) >>> + panic("%s: Fail to mark Hyper-v swiotlb buffer visible >>> to host. err=%d\n", >>> + __func__, ret); >>> + >>> + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start >>> + + ms_hyperv.shared_gpa_boundary, >>> + hyperv_io_tlb_size); >>> + if (!hyperv_io_tlb_remap) >>> + panic("%s: Fail to remap io tlb.\n", __func__); >>> + >>> + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); >>> + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); >> >> And this really needs to go into a common hook where we currently just >> call set_memory_decrypted so that all the different schemes for these >> trusted VMs (we have about half a dozen now) can share code rather than >> reinventing it. >>
On 5/13/21 12:01 AM, Tianyu Lan wrote: > Hi Christoph and Konrad: > Current Swiotlb bounce buffer uses a pool for all devices. There > is a high overhead to get or free bounce buffer during performance test. > Swiotlb code now use a global spin lock to protect bounce buffer data. > Several device queues try to acquire the spin lock and this introduce > additional overhead. > > For performance and security perspective, each devices should have a > separate swiotlb bounce buffer pool and so this part needs to rework. > I want to check this is right way to resolve performance issues with > swiotlb bounce buffer. If you have some other suggestions,welcome. Is this what you want? https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/ Best regards, baolu
From: Tianyu Lan <Tianyu.Lan@microsoft.com> "Resend all patches because someone in CC list didn't receive all patchset. Sorry for nosy." Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. There are two exceptions - packets sent by vmbus_sendpacket_ pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets contains IO stack memory address and host will access these memory. So add allocation bounce buffer support in vmbus for these packets. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. Tianyu Lan (12): x86/HV: Initialize GHCB page in Isolation VM x86/HV: Initialize shared memory boundary in Isolation VM x86/Hyper-V: Add new hvcall guest address host visibility support HV: Add Write/Read MSR registers via ghcb HV: Add ghcb hvcall support for SNP VM HV/Vmbus: Add SNP support for VMbus channel initiate message HV/Vmbus: Initialize VMbus ring buffer for Isolation VM UIO/Hyper-V: Not load UIO HV driver in the isolation VM. swiotlb: Add bounce buffer remap address setting function HV/IOMMU: Add Hyper-V dma ops support HV/Netvsc: Add Isolation VM support for netvsc driver HV/Storvsc: Add Isolation VM support for storvsc driver arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 70 +++++-- arch/x86/hyperv/ivm.c | 289 +++++++++++++++++++++++++++++ arch/x86/include/asm/hyperv-tlfs.h | 22 +++ arch/x86/include/asm/mshyperv.h | 90 +++++++-- arch/x86/kernel/cpu/mshyperv.c | 5 + arch/x86/kernel/pci-swiotlb.c | 3 +- drivers/hv/channel.c | 44 ++++- drivers/hv/connection.c | 68 ++++++- drivers/hv/hv.c | 73 ++++++-- drivers/hv/hyperv_vmbus.h | 3 + drivers/hv/ring_buffer.c | 83 ++++++--- drivers/hv/vmbus_drv.c | 3 + drivers/iommu/hyperv-iommu.c | 127 +++++++++++++ drivers/net/hyperv/hyperv_net.h | 11 ++ drivers/net/hyperv/netvsc.c | 137 +++++++++++++- drivers/net/hyperv/rndis_filter.c | 3 + drivers/scsi/storvsc_drv.c | 67 ++++++- drivers/uio/uio_hv_generic.c | 5 + include/asm-generic/hyperv-tlfs.h | 1 + include/asm-generic/mshyperv.h | 18 +- include/linux/hyperv.h | 12 +- include/linux/swiotlb.h | 5 + kernel/dma/swiotlb.c | 13 +- mm/ioremap.c | 1 + mm/vmalloc.c | 1 + 26 files changed, 1068 insertions(+), 88 deletions(-) create mode 100644 arch/x86/hyperv/ivm.c