Message ID | 20220307213356.2797205-20-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand |
s/Brijesh/Michael On Mon, Mar 07, 2022, Brijesh Singh wrote: > The encryption attribute for the .bss..decrypted section is cleared in the > initial page table build. This is because the section contains the data > that need to be shared between the guest and the hypervisor. > > When SEV-SNP is active, just clearing the encryption attribute in the > page table is not enough. The page state need to be updated in the RMP > table. > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > arch/x86/kernel/head64.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > index 83514b9827e6..656d2f3e2cf0 100644 > --- a/arch/x86/kernel/head64.c > +++ b/arch/x86/kernel/head64.c > @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv > if (sme_get_me_mask()) { > vaddr = (unsigned long)__start_bss_decrypted; > vaddr_end = (unsigned long)__end_bss_decrypted; > + > for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { > + /* > + * On SNP, transition the page to shared in the RMP table so that > + * it is consistent with the page table attribute change. > + * > + * __start_bss_decrypted has a virtual address in the high range > + * mapping (kernel .text). PVALIDATE, by way of > + * early_snp_set_memory_shared(), requires a valid virtual > + * address but the kernel is currently running off of the identity > + * mapping so use __pa() to get a *currently* valid virtual address. > + */ > + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); This breaks SME on Rome and Milan when compiling with clang-13. I haven't been able to figure out exactly what goes wrong. printk isn't functional at this point, and interactive debug during boot on our test systems is beyond me. I can't even verify that the bug is specific to clang because the draconian build system for our test systems apparently is stuck pointing at gcc-4.9. I suspect the issue is related to relocation and/or encrypting memory, as skipping the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious use of absolute addresses. Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the moment because INIT_EX is also broken. The crash incurs a very, very slow reboot, and I was out of cycles to work on this about three hours ago. If someone on the AMD side can repro, it would be much appreciated.
On Tue, Jun 14, 2022, Sean Christopherson wrote: > s/Brijesh/Michael > > On Mon, Mar 07, 2022, Brijesh Singh wrote: > > The encryption attribute for the .bss..decrypted section is cleared in the > > initial page table build. This is because the section contains the data > > that need to be shared between the guest and the hypervisor. > > > > When SEV-SNP is active, just clearing the encryption attribute in the > > page table is not enough. The page state need to be updated in the RMP > > table. > > > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > --- > > arch/x86/kernel/head64.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > > index 83514b9827e6..656d2f3e2cf0 100644 > > --- a/arch/x86/kernel/head64.c > > +++ b/arch/x86/kernel/head64.c > > @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv > > if (sme_get_me_mask()) { > > vaddr = (unsigned long)__start_bss_decrypted; > > vaddr_end = (unsigned long)__end_bss_decrypted; > > + > > for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { > > + /* > > + * On SNP, transition the page to shared in the RMP table so that > > + * it is consistent with the page table attribute change. > > + * > > + * __start_bss_decrypted has a virtual address in the high range > > + * mapping (kernel .text). PVALIDATE, by way of > > + * early_snp_set_memory_shared(), requires a valid virtual > > + * address but the kernel is currently running off of the identity > > + * mapping so use __pa() to get a *currently* valid virtual address. > > + */ > > + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); > > This breaks SME on Rome and Milan when compiling with clang-13. I haven't been > able to figure out exactly what goes wrong. printk isn't functional at this point, > and interactive debug during boot on our test systems is beyond me. I can't even > verify that the bug is specific to clang because the draconian build system for our > test systems apparently is stuck pointing at gcc-4.9. > > I suspect the issue is related to relocation and/or encrypting memory, as skipping > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious > use of absolute addresses. > > Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the > moment because INIT_EX is also broken. The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built kernel, so either it's a finnicky relocation issue or something specific to SME. > The crash incurs a very, very slow reboot, and I was out of cycles to work on this > about three hours ago. If someone on the AMD side can repro, it would be much > appreciated.
On 6/14/22 10:43, Sean Christopherson wrote: > On Tue, Jun 14, 2022, Sean Christopherson wrote: >> s/Brijesh/Michael >> >> On Mon, Mar 07, 2022, Brijesh Singh wrote: >>> The encryption attribute for the .bss..decrypted section is cleared in the >>> initial page table build. This is because the section contains the data >>> that need to be shared between the guest and the hypervisor. >>> >>> When SEV-SNP is active, just clearing the encryption attribute in the >>> page table is not enough. The page state need to be updated in the RMP >>> table. >>> >>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >>> --- >>> arch/x86/kernel/head64.c | 13 +++++++++++++ >>> 1 file changed, 13 insertions(+) >>> >>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c >>> index 83514b9827e6..656d2f3e2cf0 100644 >>> --- a/arch/x86/kernel/head64.c >>> +++ b/arch/x86/kernel/head64.c >>> @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv >>> if (sme_get_me_mask()) { >>> vaddr = (unsigned long)__start_bss_decrypted; >>> vaddr_end = (unsigned long)__end_bss_decrypted; >>> + >>> for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { >>> + /* >>> + * On SNP, transition the page to shared in the RMP table so that >>> + * it is consistent with the page table attribute change. >>> + * >>> + * __start_bss_decrypted has a virtual address in the high range >>> + * mapping (kernel .text). PVALIDATE, by way of >>> + * early_snp_set_memory_shared(), requires a valid virtual >>> + * address but the kernel is currently running off of the identity >>> + * mapping so use __pa() to get a *currently* valid virtual address. >>> + */ >>> + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); >> >> This breaks SME on Rome and Milan when compiling with clang-13. I haven't been >> able to figure out exactly what goes wrong. printk isn't functional at this point, >> and interactive debug during boot on our test systems is beyond me. I can't even >> verify that the bug is specific to clang because the draconian build system for our >> test systems apparently is stuck pointing at gcc-4.9. >> >> I suspect the issue is related to relocation and/or encrypting memory, as skipping >> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. >> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious >> use of absolute addresses. >> >> Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the >> moment because INIT_EX is also broken. > > The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built > kernel, so either it's a finnicky relocation issue or something specific to SME. I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue: [ 4.118226] Memory Encryption Features active: AMD SME Maybe something with your kernel config? Can you send me your config? Thanks, Tom > >> The crash incurs a very, very slow reboot, and I was out of cycles to work on this >> about three hours ago. If someone on the AMD side can repro, it would be much >> appreciated.
On Tue, Jun 14, 2022, Tom Lendacky wrote: > On 6/14/22 10:43, Sean Christopherson wrote: > > On Tue, Jun 14, 2022, Sean Christopherson wrote: > > > s/Brijesh/Michael > > > > > > On Mon, Mar 07, 2022, Brijesh Singh wrote: > > > > The encryption attribute for the .bss..decrypted section is cleared in the > > > > initial page table build. This is because the section contains the data > > > > that need to be shared between the guest and the hypervisor. > > > > > > > > When SEV-SNP is active, just clearing the encryption attribute in the > > > > page table is not enough. The page state need to be updated in the RMP > > > > table. > > > > > > > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > > > --- > > > > arch/x86/kernel/head64.c | 13 +++++++++++++ > > > > 1 file changed, 13 insertions(+) > > > > > > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > > > > index 83514b9827e6..656d2f3e2cf0 100644 > > > > --- a/arch/x86/kernel/head64.c > > > > +++ b/arch/x86/kernel/head64.c > > > > @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv > > > > if (sme_get_me_mask()) { > > > > vaddr = (unsigned long)__start_bss_decrypted; > > > > vaddr_end = (unsigned long)__end_bss_decrypted; > > > > + > > > > for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { > > > > + /* > > > > + * On SNP, transition the page to shared in the RMP table so that > > > > + * it is consistent with the page table attribute change. > > > > + * > > > > + * __start_bss_decrypted has a virtual address in the high range > > > > + * mapping (kernel .text). PVALIDATE, by way of > > > > + * early_snp_set_memory_shared(), requires a valid virtual > > > > + * address but the kernel is currently running off of the identity > > > > + * mapping so use __pa() to get a *currently* valid virtual address. > > > > + */ > > > > + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); > > > > > > This breaks SME on Rome and Milan when compiling with clang-13. I haven't been > > > able to figure out exactly what goes wrong. printk isn't functional at this point, > > > and interactive debug during boot on our test systems is beyond me. I can't even > > > verify that the bug is specific to clang because the draconian build system for our > > > test systems apparently is stuck pointing at gcc-4.9. > > > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious > > > use of absolute addresses. > > > > > > Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the > > > moment because INIT_EX is also broken. > > > > The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built > > kernel, so either it's a finnicky relocation issue or something specific to SME. > > I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue: > > [ 4.118226] Memory Encryption Features active: AMD SME Phooey. > Maybe something with your kernel config? Can you send me your config? Attached. If you can't repro, I'll find someone on our end to work on this. Thanks!
On 6/14/22 11:13, Sean Christopherson wrote: > On Tue, Jun 14, 2022, Tom Lendacky wrote: >> On 6/14/22 10:43, Sean Christopherson wrote: >>> On Tue, Jun 14, 2022, Sean Christopherson wrote: >>>> s/Brijesh/Michael >>>> >>>> On Mon, Mar 07, 2022, Brijesh Singh wrote: >>>>> The encryption attribute for the .bss..decrypted section is cleared in the >>>>> initial page table build. This is because the section contains the data >>>>> that need to be shared between the guest and the hypervisor. >>>>> >>>>> When SEV-SNP is active, just clearing the encryption attribute in the >>>>> page table is not enough. The page state need to be updated in the RMP >>>>> table. >>>>> >>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >>>>> --- >>>>> arch/x86/kernel/head64.c | 13 +++++++++++++ >>>>> 1 file changed, 13 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c >>>>> index 83514b9827e6..656d2f3e2cf0 100644 >>>>> --- a/arch/x86/kernel/head64.c >>>>> +++ b/arch/x86/kernel/head64.c >>>>> @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv >>>>> if (sme_get_me_mask()) { >>>>> vaddr = (unsigned long)__start_bss_decrypted; >>>>> vaddr_end = (unsigned long)__end_bss_decrypted; >>>>> + >>>>> for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { >>>>> + /* >>>>> + * On SNP, transition the page to shared in the RMP table so that >>>>> + * it is consistent with the page table attribute change. >>>>> + * >>>>> + * __start_bss_decrypted has a virtual address in the high range >>>>> + * mapping (kernel .text). PVALIDATE, by way of >>>>> + * early_snp_set_memory_shared(), requires a valid virtual >>>>> + * address but the kernel is currently running off of the identity >>>>> + * mapping so use __pa() to get a *currently* valid virtual address. >>>>> + */ >>>>> + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); >>>> >>>> This breaks SME on Rome and Milan when compiling with clang-13. I haven't been >>>> able to figure out exactly what goes wrong. printk isn't functional at this point, >>>> and interactive debug during boot on our test systems is beyond me. I can't even >>>> verify that the bug is specific to clang because the draconian build system for our >>>> test systems apparently is stuck pointing at gcc-4.9. >>>> >>>> I suspect the issue is related to relocation and/or encrypting memory, as skipping >>>> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. >>>> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious >>>> use of absolute addresses. >>>> >>>> Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the >>>> moment because INIT_EX is also broken. >>> >>> The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built >>> kernel, so either it's a finnicky relocation issue or something specific to SME. >> >> I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue: >> >> [ 4.118226] Memory Encryption Features active: AMD SME > > Phooey. > >> Maybe something with your kernel config? Can you send me your config? > > Attached. If you can't repro, I'll find someone on our end to work on this. I was able to repro. It dies in the cc_platform_has() code, where it is trying to do an indirect jump based on the attribute (actually in the amd_cc_platform_has() which I think has been optimized in): bool cc_platform_has(enum cc_attr attr) { ffffffff81002140: 55 push %rbp ffffffff81002141: 48 89 e5 mov %rsp,%rbp switch (vendor) { ffffffff81002144: 8b 05 c6 e9 3a 01 mov 0x13ae9c6(%rip),%eax # ffffffff823b0b10 <vendor> ffffffff8100214a: 83 f8 03 cmp $0x3,%eax ffffffff8100214d: 74 25 je ffffffff81002174 <cc_platform_has+0x34> ffffffff8100214f: 83 f8 02 cmp $0x2,%eax ffffffff81002152: 74 2f je ffffffff81002183 <cc_platform_has+0x43> ffffffff81002154: 83 f8 01 cmp $0x1,%eax ffffffff81002157: 75 26 jne ffffffff8100217f <cc_platform_has+0x3f> switch (attr) { ffffffff81002159: 83 ff 05 cmp $0x5,%edi ffffffff8100215c: 77 21 ja ffffffff8100217f <cc_platform_has+0x3f> ffffffff8100215e: 89 f8 mov %edi,%eax ffffffff81002160: ff 24 c5 c0 01 00 82 jmp *-0x7dfffe40(,%rax,8) This last line is what causes the reset. I'm guessing that the jump isn't valid at this point because we are running in identity mapped mode and not with a kernel virtual address at this point. Trying to see what the difference was between your config and mine, the indirect jump lead me to check the setting of CONFIG_RETPOLINE. Your config did not have it enabled, so I set CONFIG_RETPOLINE=y, and with that, the kernel boots successfully. With retpolines, the code is completely different around here: bool cc_platform_has(enum cc_attr attr) { ffffffff81001f30: 55 push %rbp ffffffff81001f31: 48 89 e5 mov %rsp,%rbp switch (vendor) { ffffffff81001f34: 8b 05 26 8f 37 01 mov 0x1378f26(%rip),%eax # ffffffff8237ae60 <vendor> ffffffff81001f3a: 83 f8 03 cmp $0x3,%eax ffffffff81001f3d: 74 29 je ffffffff81001f68 <cc_platform_has+0x38> ffffffff81001f3f: 83 f8 02 cmp $0x2,%eax ffffffff81001f42: 74 33 je ffffffff81001f77 <cc_platform_has+0x47> ffffffff81001f44: 83 f8 01 cmp $0x1,%eax ffffffff81001f47: 75 2a jne ffffffff81001f73 <cc_platform_has+0x43> ffffffff81001f49: 31 c0 xor %eax,%eax switch (attr) { ffffffff81001f4b: 83 ff 02 cmp $0x2,%edi ffffffff81001f4e: 7f 2f jg ffffffff81001f7f <cc_platform_has+0x4f> ffffffff81001f50: 85 ff test %edi,%edi ffffffff81001f52: 74 47 je ffffffff81001f9b <cc_platform_has+0x6b> ffffffff81001f54: 83 ff 01 cmp $0x1,%edi ffffffff81001f57: 74 5b je ffffffff81001fb4 <cc_platform_has+0x84> ffffffff81001f59: 83 ff 02 cmp $0x2,%edi ffffffff81001f5c: 75 08 jne ffffffff81001f66 <cc_platform_has+0x36> return sev_status & MSR_AMD64_SEV_ENABLED; ffffffff81001f5e: 8a 05 44 3f 64 01 mov 0x1643f44(%rip),%al # ffffffff82645ea8 <sev_status> ffffffff81001f64: 24 01 and $0x1,%al case CC_VENDOR_HYPERV: return hyperv_cc_platform_has(attr); default: return false; } } ffffffff81001f66: 5d pop %rbp ffffffff81001f67: c3 ret switch (attr) { ffffffff81001f68: 83 ff 07 cmp $0x7,%edi ffffffff81001f6b: 73 06 jae ffffffff81001f73 <cc_platform_has+0x43> ffffffff81001f6d: 40 f6 c7 01 test $0x1,%dil ffffffff81001f71: eb 07 jmp ffffffff81001f7a <cc_platform_has+0x4a> ffffffff81001f73: 31 c0 xor %eax,%eax } ffffffff81001f75: 5d pop %rbp ffffffff81001f76: c3 ret return attr == CC_ATTR_GUEST_MEM_ENCRYPT; ffffffff81001f77: 83 ff 02 cmp $0x2,%edi ffffffff81001f7a: 0f 94 c0 sete %al } ffffffff81001f7d: 5d pop %rbp ffffffff81001f7e: c3 ret switch (attr) { . . . I'm not sure if there's a way to remove the jump table optimization for the arch/x86/coco/core.c file when retpolines aren't configured. Thanks, Tom > > Thanks!
On Tue, Jun 14, 2022, Tom Lendacky wrote: > On 6/14/22 11:13, Sean Christopherson wrote: > > > > > This breaks SME on Rome and Milan when compiling with clang-13. I haven't been > > > > > able to figure out exactly what goes wrong. printk isn't functional at this point, > > > > > and interactive debug during boot on our test systems is beyond me. I can't even > > > > > verify that the bug is specific to clang because the draconian build system for our > > > > > test systems apparently is stuck pointing at gcc-4.9. > > > > > > > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping > > > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. > > > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious > > > > > use of absolute addresses. > > > > > > > > > > Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the > > > > > moment because INIT_EX is also broken. > > > > > > > > The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built > > > > kernel, so either it's a finnicky relocation issue or something specific to SME. > > > > > > I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue: > > > > > > [ 4.118226] Memory Encryption Features active: AMD SME > > > > Phooey. > > > > > Maybe something with your kernel config? Can you send me your config? > > > > Attached. If you can't repro, I'll find someone on our end to work on this. > > I was able to repro. It dies in the cc_platform_has() code, where it is > trying to do an indirect jump based on the attribute (actually in the > amd_cc_platform_has() which I think has been optimized in): > > bool cc_platform_has(enum cc_attr attr) ... > ffffffff81002160: ff 24 c5 c0 01 00 82 jmp *-0x7dfffe40(,%rax,8) > > This last line is what causes the reset. I'm guessing that the jump isn't > valid at this point because we are running in identity mapped mode and not > with a kernel virtual address at this point. > > Trying to see what the difference was between your config and mine, the > indirect jump lead me to check the setting of CONFIG_RETPOLINE. Your config > did not have it enabled, so I set CONFIG_RETPOLINE=y, and with that, the > kernel boots successfully. That would explain why my VMs didn't fail, I build those kernels with CONFIG_RETPOLINE=y. > With retpolines, the code is completely different around here: ... > I'm not sure if there's a way to remove the jump table optimization for > the arch/x86/coco/core.c file when retpolines aren't configured. And for post-boot I don't think we'd want to disable any such optimizations. A possibled "fix" would be to do what sme_encrypt_kernel() does and just query sev_status directly. But even that works, the fragility of the boot code is terrifying :-( I can't think of any clever solutions though. Many thanks again Tom! --- arch/x86/include/asm/sev.h | 4 ++++ arch/x86/kernel/head64.c | 10 +++++++--- arch/x86/kernel/sev.c | 16 +++++++++++----- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 19514524f0f8..701c561fdf08 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -193,6 +193,8 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) void setup_ghcb(void); void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages); +void __init __early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, + unsigned int npages); void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages); void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op); @@ -214,6 +216,8 @@ static inline void setup_ghcb(void) { } static inline void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { } static inline void __init +__early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { } +static inline void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { } static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { } static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { } diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index bd4a34100ed0..5efab0d8e49d 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr) } #endif -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) +static unsigned long __head sme_postprocess_startup(struct boot_params *bp, + pmdval_t *pmd, + unsigned long physaddr) { unsigned long vaddr, vaddr_end; int i; @@ -156,7 +158,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv * address but the kernel is currently running off of the identity * mapping so use __pa() to get a *currently* valid virtual address. */ - early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED_BIT) + __early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), + PTRS_PER_PMD); i = pmd_index(vaddr); pmd[i] -= sme_get_me_mask(); @@ -316,7 +320,7 @@ unsigned long __head __startup_64(unsigned long physaddr, */ *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask(); - return sme_postprocess_startup(bp, pmd); + return sme_postprocess_startup(bp, pmd, physaddr); } /* Wipe all early page tables except for the kernel symbol map */ diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index c05f0124c410..48966ecc520e 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -714,12 +714,9 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd pvalidate_pages(vaddr, npages, true); } -void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, - unsigned int npages) +void __init __early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, + unsigned int npages) { - if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) - return; - /* Invalidate the memory pages before they are marked shared in the RMP table. */ pvalidate_pages(vaddr, npages, false); @@ -727,6 +724,15 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr early_set_pages_state(paddr, npages, SNP_PAGE_STATE_SHARED); } +void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, + unsigned int npages) +{ + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) + return; + + __early_snp_set_memory_shared(vaddr, paddr, npages); +} + void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { unsigned long vaddr, npages; base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 --
On 6/14/22 14:52, Sean Christopherson wrote: > On Tue, Jun 14, 2022, Tom Lendacky wrote: >> On 6/14/22 11:13, Sean Christopherson wrote: >>>>>> This breaks SME on Rome and Milan when compiling with clang-13. I haven't been >>>>>> able to figure out exactly what goes wrong. printk isn't functional at this point, >>>>>> and interactive debug during boot on our test systems is beyond me. I can't even >>>>>> verify that the bug is specific to clang because the draconian build system for our >>>>>> test systems apparently is stuck pointing at gcc-4.9. >>>>>> >>>>>> I suspect the issue is related to relocation and/or encrypting memory, as skipping >>>>>> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. >>>>>> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious >>>>>> use of absolute addresses. >>>>>> >>>>>> Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the >>>>>> moment because INIT_EX is also broken. >>>>> > >> I'm not sure if there's a way to remove the jump table optimization for >> the arch/x86/coco/core.c file when retpolines aren't configured. > > And for post-boot I don't think we'd want to disable any such optimizations. > > A possibled "fix" would be to do what sme_encrypt_kernel() does and just query > sev_status directly. But even that works, the fragility of the boot code is > terrifying :-( I can't think of any clever solutions though. I worry that another use of cc_platform_has() could creep in at some point and cause the same issue. Not sure how bad it would be, performance-wise, to remove the jump table optimization for arch/x86/coco/core.c. I guess we can wait for Boris to get back and chime in. > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > index bd4a34100ed0..5efab0d8e49d 100644 > --- a/arch/x86/kernel/head64.c > +++ b/arch/x86/kernel/head64.c > @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr) > } > #endif > > -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) > +static unsigned long __head sme_postprocess_startup(struct boot_params *bp, > + pmdval_t *pmd, > + unsigned long physaddr) I noticed that you added the physaddr parameter but never use it... Thanks, Tom > { > unsigned long vaddr, vaddr_end; > int i; > @@ -156,7 +158,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv > * address but the kernel is currently running off of the identity > * mapping so use __pa() to get a *currently* valid virtual address. > */ > - early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); > + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED_BIT) > + __early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), > + PTRS_PER_PMD); > > i = pmd_index(vaddr); > pmd[i] -= sme_get_me_mask(); > @@ -316,7 +320,7 @@ unsigned long __head __startup_64(unsigned long physaddr, > */ > *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask(); > > - return sme_postprocess_startup(bp, pmd); > + return sme_postprocess_startup(bp, pmd, physaddr); > } > > /* Wipe all early page tables except for the kernel symbol map */
On Thu, Jun 16, 2022, Tom Lendacky wrote: > On 6/14/22 14:52, Sean Christopherson wrote: > > On Tue, Jun 14, 2022, Tom Lendacky wrote: > > > On 6/14/22 11:13, Sean Christopherson wrote: > > > > > > > This breaks SME on Rome and Milan when compiling with clang-13. I haven't been > > > > > > > able to figure out exactly what goes wrong. printk isn't functional at this point, > > > > > > > and interactive debug during boot on our test systems is beyond me. I can't even > > > > > > > verify that the bug is specific to clang because the draconian build system for our > > > > > > > test systems apparently is stuck pointing at gcc-4.9. > > > > > > > > > > > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping > > > > > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. > > > > > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious > > > > > > > use of absolute addresses. > > > > > > > > > > > > > > Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the > > > > > > > moment because INIT_EX is also broken. > > > > > > > > > > > > I'm not sure if there's a way to remove the jump table optimization for > > > the arch/x86/coco/core.c file when retpolines aren't configured. > > > > And for post-boot I don't think we'd want to disable any such optimizations. > > > > A possibled "fix" would be to do what sme_encrypt_kernel() does and just query > > sev_status directly. But even that works, the fragility of the boot code is > > terrifying :-( I can't think of any clever solutions though. > > I worry that another use of cc_platform_has() could creep in at some point > and cause the same issue. Not sure how bad it would be, performance-wise, to > remove the jump table optimization for arch/x86/coco/core.c. One thought would be to initialize "vendor" to a bogus value, disallow calls to cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN (or panic?) if cc_platform_has() is called before "vendor" is explicitly set. New calls can still get in, but they'll be much easier to detect and less likely to escape initial testing. diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 49b44f881484..803220cd34a6 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -13,7 +13,11 @@ #include <asm/coco.h> #include <asm/processor.h> -static enum cc_vendor vendor __ro_after_init; +/* + * Initialize the vendor to garbage to detect usage of cc_platform_has() before + * the vendor has been set. + */ +static enum cc_vendor vendor = CC_NR_VENDORS __ro_after_init; static u64 cc_mask __ro_after_init; static bool intel_cc_platform_has(enum cc_attr attr) @@ -90,7 +94,10 @@ bool cc_platform_has(enum cc_attr attr) return intel_cc_platform_has(attr); case CC_VENDOR_HYPERV: return hyperv_cc_platform_has(attr); + case CC_VENDOR_NONE: + return false; default: + WARN_ONCE(1, "blah blah blah"); return false; } } diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h index 3d98c3a60d34..adfd2fbce7ac 100644 --- a/arch/x86/include/asm/coco.h +++ b/arch/x86/include/asm/coco.h @@ -9,6 +9,7 @@ enum cc_vendor { CC_VENDOR_AMD, CC_VENDOR_HYPERV, CC_VENDOR_INTEL, + CC_NR_VENDORS, }; void cc_set_vendor(enum cc_vendor v); > I guess we can wait for Boris to get back and chime in. > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > > index bd4a34100ed0..5efab0d8e49d 100644 > > --- a/arch/x86/kernel/head64.c > > +++ b/arch/x86/kernel/head64.c > > @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr) > > } > > #endif > > > > -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) > > +static unsigned long __head sme_postprocess_startup(struct boot_params *bp, > > + pmdval_t *pmd, > > + unsigned long physaddr) > > I noticed that you added the physaddr parameter but never use it... Likely just garbage on my end, I was trying various ideas.
On Thu, Jun 16, 2022 at 04:41:05PM +0000, Sean Christopherson wrote: > > I worry that another use of cc_platform_has() could creep in at some point > > and cause the same issue. Not sure how bad it would be, performance-wise, to > > remove the jump table optimization for arch/x86/coco/core.c. Is there a gcc switch for that? > One thought would be to initialize "vendor" to a bogus value, disallow calls to > cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN > (or panic?) if cc_platform_has() is called before "vendor" is explicitly set. > New calls can still get in, but they'll be much easier to detect and less likely > to escape initial testing. The invalid vendor thing makes sense but I don't think it'll help in this case. We set vendor in sme_enable() which comes before the __startup_64 -> sme_postprocess_startup path you're hitting. We could do only the aspect of checking whether it hasn't been set yet and warn then, in order to make the usage more robust...
On Fri, Jul 01, 2022, Borislav Petkov wrote: > On Thu, Jun 16, 2022 at 04:41:05PM +0000, Sean Christopherson wrote: > > > I worry that another use of cc_platform_has() could creep in at some point > > > and cause the same issue. Not sure how bad it would be, performance-wise, to > > > remove the jump table optimization for arch/x86/coco/core.c. > > Is there a gcc switch for that? I believe -fno-jump-tables will do the trick. That also reminds me exactly why CONFIG_RETPOLINE=y isn't broken, jump tables are disabled when retpolines are enabled[*]. [*] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952 > > One thought would be to initialize "vendor" to a bogus value, disallow calls to > > cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN > > (or panic?) if cc_platform_has() is called before "vendor" is explicitly set. > > New calls can still get in, but they'll be much easier to detect and less likely > > to escape initial testing. > > The invalid vendor thing makes sense but I don't think it'll help in > this case. > > We set vendor in sme_enable() which comes before the > > __startup_64 -> sme_postprocess_startup > > path you're hitting. Right, but that's easily solved, no? E.g. diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index e8f7953fda83..ed3118f5bf62 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -487,6 +487,8 @@ void __init sme_early_init(void) if (!sme_me_mask) return; + cc_set_vendor(CC_VENDOR_AMD); + early_pmd_flags = __sme_set(early_pmd_flags); __supported_pte_mask = __sme_set(__supported_pte_mask); diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index f415498d3175..6b1c60032400 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -611,7 +611,6 @@ void __init sme_enable(struct boot_params *bp) out: if (sme_me_mask) { physical_mask &= ~sme_me_mask; - cc_set_vendor(CC_VENDOR_AMD); cc_set_mask(sme_me_mask); } } And disallow cc_set_vendor() before x86_64_start_kernel(), then fix any fallout. > We could do only the aspect of checking whether it hasn't been set yet > and warn then, in order to make the usage more robust... > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 83514b9827e6..656d2f3e2cf0 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv if (sme_get_me_mask()) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { + /* + * On SNP, transition the page to shared in the RMP table so that + * it is consistent with the page table attribute change. + * + * __start_bss_decrypted has a virtual address in the high range + * mapping (kernel .text). PVALIDATE, by way of + * early_snp_set_memory_shared(), requires a valid virtual + * address but the kernel is currently running off of the identity + * mapping so use __pa() to get a *currently* valid virtual address. + */ + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); + i = pmd_index(vaddr); pmd[i] -= sme_get_me_mask(); }
The encryption attribute for the .bss..decrypted section is cleared in the initial page table build. This is because the section contains the data that need to be shared between the guest and the hypervisor. When SEV-SNP is active, just clearing the encryption attribute in the page table is not enough. The page state need to be updated in the RMP table. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/kernel/head64.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)