Message ID | 20211008180453.462291-9-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand |
On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote: > From: Michael Roth <michael.roth@amd.com> > > Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID > leaf indicates SEV support. With SEV-SNP, CPUID responses from the > hypervisor are not considered trustworthy, particularly for 0x8000001F. > SEV-SNP provides a firmware-validated CPUID table to use as an > alternative, but prior to checking MSR_AMD64_SEV there are no > guarantees that this is even an SEV-SNP guest. > > Rather than relying on these CPUID values early on, allow SEV-ES and > SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and > have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe > to access MSR_AMD64_SEV if a #VC has triggered. > > Signed-off-by: Michael Roth <michael.roth@amd.com> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > arch/x86/kernel/sev-shared.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c > index 8ee27d07c1cd..2796c524d174 100644 > --- a/arch/x86/kernel/sev-shared.c > +++ b/arch/x86/kernel/sev-shared.c > @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) > if (exit_code != SVM_EXIT_CPUID) > goto fail; > > + /* > + * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV > + * MSR is also available. Go ahead and initialize sev_status here to > + * allow SEV features to be checked without relying solely on the SEV > + * cpuid bit to indicate whether it is safe to do so. > + */ > + if (!sev_status) { > + unsigned long lo, hi; > + > + asm volatile("rdmsr" : "=a" (lo), "=d" (hi) > + : "c" (MSR_AMD64_SEV)); > + sev_status = (hi << 32) | lo; > + } > + > sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX)); > VMGEXIT(); > val = sev_es_rd_ghcb_msr(); > -- Ok, you guys are killing me. ;-\ How is bolting some pretty much unrelated code into the early #VC handler not a hack? Do you not see it? So sme_enable() is reading MSR_AMD64_SEV and setting up everything there, including sev_status. If a SNP guest does not trust CPUID, why can't you attempt to read that MSR there, even if CPUID has lied to the guest? And not just slap it somewhere just because it works?
On Mon, Oct 18, 2021 at 04:29:07PM +0200, Borislav Petkov wrote: > On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote: > > From: Michael Roth <michael.roth@amd.com> > > > > Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID > > leaf indicates SEV support. With SEV-SNP, CPUID responses from the > > hypervisor are not considered trustworthy, particularly for 0x8000001F. > > SEV-SNP provides a firmware-validated CPUID table to use as an > > alternative, but prior to checking MSR_AMD64_SEV there are no > > guarantees that this is even an SEV-SNP guest. > > > > Rather than relying on these CPUID values early on, allow SEV-ES and > > SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and > > have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe > > to access MSR_AMD64_SEV if a #VC has triggered. > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > --- > > arch/x86/kernel/sev-shared.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c > > index 8ee27d07c1cd..2796c524d174 100644 > > --- a/arch/x86/kernel/sev-shared.c > > +++ b/arch/x86/kernel/sev-shared.c > > @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) > > if (exit_code != SVM_EXIT_CPUID) > > goto fail; > > > > + /* > > + * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV > > + * MSR is also available. Go ahead and initialize sev_status here to > > + * allow SEV features to be checked without relying solely on the SEV > > + * cpuid bit to indicate whether it is safe to do so. > > + */ > > + if (!sev_status) { > > + unsigned long lo, hi; > > + > > + asm volatile("rdmsr" : "=a" (lo), "=d" (hi) > > + : "c" (MSR_AMD64_SEV)); > > + sev_status = (hi << 32) | lo; > > + } > > + > > sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX)); > > VMGEXIT(); > > val = sev_es_rd_ghcb_msr(); > > -- > > Ok, you guys are killing me. ;-\ > > How is bolting some pretty much unrelated code into the early #VC > handler not a hack? Do you not see it? This was the result of my proposal in v5: > More specifically, the general protocol to determine SNP is enabled > seems > to be: > > 1) check cpuid 0x8000001f to determine if SEV bit is enabled and SEV > MSR is available > 2) check the SEV MSR to see if SEV-SNP bit is set > > but the conundrum here is the CPUID page is only valid if SNP is > enabled, otherwise it can be garbage. So the code to set up the page > skips those checks initially, and relies on the expectation that UEFI, > or whatever the initial guest blob was, will only provide a CC_BLOB if > it already determined SNP is enabled. > > It's still possible something goes awry and the kernel gets handed a > bogus CC_BLOB even though SNP isn't actually enabled. In this case the > cpuid values could be bogus as well, but the guest will fail > attestation then and no secrets should be exposed. > > There is one thing that could tighten up the check a bit though. Some > bits of SEV-ES code will use the generation of a #VC as an indicator > of SEV-ES support, which implies SEV MSR is available without relying > on hypervisor-provided CPUID bits. I could add a one-time check in > the cpuid #VC to check SEV MSR for SNP bit, but it would likely > involve another static __ro_after_init variable store state. If that > seems worthwhile I can look into that more as well. Yes, the skipping of checks above sounds weird: why don't you simply keep the checks order: SEV, -ES, -SNP and then parse CPUID. It'll fail at attestation eventually, but you'll have the usual flow like with the rest of the SEV- feature picking apart. https://lore.kernel.org/lkml/YS3+saDefHwkYwny@zn.tnic/ I'd thought you didn't like the previous approach of having snp_cpuid_init() defer the CPUID/MSR checks until sme_enable() sets up sev_status later on, then failing the boot retroactively if SNP bit isn't set but CPUID table was advertised. So I added those checks in snp_cpuid_init(), along with the additional #VC-based indicator of SEV-ES/SEV-SNP support as an additional sanity check of what EFI firmware was providing, since I thought that was the key concern here. Now I'm realizing that perhaps your suggestion was to actually defer the entire CPUID page setup until after sme_enable(). Is that correct? > > So sme_enable() is reading MSR_AMD64_SEV and setting up everything > there, including sev_status. If a SNP guest does not trust CPUID, why > can't you attempt to read that MSR there, even if CPUID has lied to the > guest? If CPUID has lied, that would result in a #GP, rather than a controlled termination in the various checkers/callers. The latter is easier to debug. Additionally, #VC is arguably a better indicator of SEV MSR availability for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware and doesn't rely directly on hypervisor/EFI-provided CPUID values. It doesn't work for SEV guests, but I don't think it's a bad idea to allow SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make use of the added assurance. Is it just the way it's currently implemented as something cpuid-table-specific that's at issue, or are you opposed to doing so in general? Thanks, Mike > > And not just slap it somewhere just because it works? > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C462c7481ae414f7706a808d99243a615%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701641625364120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6MViA5KCFEgSA2fijEx3Dg05btIEAjw55bFYRKL0P6o%3D&reserved=0
On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote: > If CPUID has lied, that would result in a #GP, rather than a controlled > termination in the various checkers/callers. The latter is easier to > debug. > > Additionally, #VC is arguably a better indicator of SEV MSR availability > for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware > and doesn't rely directly on hypervisor/EFI-provided CPUID values. It > doesn't work for SEV guests, but I don't think it's a bad idea to allow > SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make > use of the added assurance. Ok, let's take a step back and analyze what we're trying to solve first. So I'm looking at sme_enable(): 1. Code checks SME/SEV support leaf. HV lies and says there's none. So guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor won't be able to give confidentiality to its users => users go away or do unencrypted like now. Problem is solved by political and economical pressure. 2. Check SEV and SME bit. HV lies here. Oh well, same as the above. 3. HV lies about 1. and 2. but says that SME/SEV is supported. Guest attempts to read the MSR Guest explodes due to the #GP. The same political/economical pressure thing happens. If the MSR is really there, we've landed at the place where we read the SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol which is independent from the HV and all good. Now, which case am I missing here which justifies the need to do those acrobatics of causing #VCs just to detect the SEV MSR? Thx.
On Mon, Oct 18, 2021 at 09:18:13PM +0200, Borislav Petkov wrote: > On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote: > > If CPUID has lied, that would result in a #GP, rather than a controlled > > termination in the various checkers/callers. The latter is easier to > > debug. > > > > Additionally, #VC is arguably a better indicator of SEV MSR availability > > for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware > > and doesn't rely directly on hypervisor/EFI-provided CPUID values. It > > doesn't work for SEV guests, but I don't think it's a bad idea to allow > > SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make > > use of the added assurance. > [Sorry for the wall of text, just trying to work through everything.] > Ok, let's take a step back and analyze what we're trying to solve first. > So I'm looking at sme_enable(): I'm not sure if this is pertaining to using the CPUID table prior to sme_enable(), or just the #VC-based SEV MSR read. The following comments assume the former. If that assumption is wrong you can basically ignore the rest of this email :) [The #VC-based SEV MSR read is not necessary for anything in sme_enable(), it's simply a way to determine whether the guest is an SNP guest, without any reliance on CPUID, which seemed useful in the context of doing some additional sanity checks against the SNP CPUID table and determining that it's appropriate to use it early on (rather than just trust that this is an SNP guest by virtue of the CC blob being present, and then failing later once sme_enable() checks for the SNP feature bits through the normal mechanism, as was done in v5).] > > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor > won't be able to give confidentiality to its users => users go away or > do unencrypted like now. > > Problem is solved by political and economical pressure. > > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above. I'd be worried about the possibility that, through some additional exploits or failures in the attestation flow, a guest owner was tricked into booting unencrypted on a compromised host and exposing their secrets. Their attestation process might even do some additional CPUID sanity checks, which would at the point be via the SNP CPUID table and look legitimate, unaware that the kernel didn't actually use the SNP CPUID table until after 0x8000001F was parsed (if we were to only initialize it after/as-part-of sme_enable()). Fortunately in this scenario I think the guest kernel actually would fail to boot due to the SNP hardware unconditionally treating code/page tables as encrypted pages. I tested some of these scenarios just to check, but not all, and I still don't feel confident enough about it to say that there's not some way to exploit this by someone who is more clever/persistant than me. > > 3. HV lies about 1. and 2. but says that SME/SEV is supported. > > Guest attempts to read the MSR Guest explodes due to the #GP. The same > political/economical pressure thing happens. That's seems likely, but maybe some future hardware bug, or some other exploit, makes it possible to intercept that MSR read? I don't know, but if that particular branch of execution can be made less likely by utilizing SNP CPUID validation I think it makes sense to make use of it. > > If the MSR is really there, we've landed at the place where we read the > SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol > which is independent from the HV and all good. At which point we then switch to using the CPUID table? But at that point all the previous CPUID checks, both SEV-related/non-SEV-related, are now possibly not consistent with what's in the CPUID table. Do we then revalidate? Even a non-malicious hypervisor might provide inconsistent values between the two sources due to bugs, or SNP validation suppressing certain feature bits that hypervisor otherwise exposes, etc. Now all the code after sme_enable() can potentially take unexpected execution paths, where post-sme_enable() code makes assumptions about pre-sme_enable() checks that may no longer hold true. Also, it would be useful from an attestation perspective that the CPUID bits visible to userspace correspond to what the kernel used during boot, which wouldn't necessarily be the case if hypervisor-provided values were used during early boot and potentially put the kernel into some unexpected state that could persist beyond the point of attestation. Code-wise, thanks in large part to your suggestions, it really isn't all that much more complicated to hook in the CPUID table lookup in the #VC handlers (which are already needed anyway for SEV-ES) early on so all these checks are against the same trusted (or more-trusted at least) CPUID source. > > Now, which case am I missing here which justifies the need to do those > acrobatics of causing #VCs just to detect the SEV MSR? There are a few more places where cpuid is utilized prior to sme_enable(): # In boot/compressed paging_prepare(): ecx = cpuid(7, 0) # SNP-verified against host values # check ecx for LA57 # In boot/compressed and kernel proper verify_cpu(): eax, ebx, ecx, edx = cpuid(0, 0) # SNP-verified against host values # check eax for range > 0 # check ebx, ecx, edx for "AuthenticAMD" or "GenuineIntel" if_amd: edx = cpuid(1, 0) # SNP-verified against host values # check edx feature bits against REQUIRED_MASK0 (PAE|FPU|PSE|etc.) eax = cpuid(0x80000001, 0) # SNP-verified against host values # check eax against REQUIRED_MASK1 (LM|3DNOW) edx = cpuid(1, 0) # SNP-verified against host values # check eax against SSE_MASK # if not set, try to force it on via MSR_K7_HWCR if this is an AMD CPU # if forcing fails, report no_longmode available if_intel: # completely different stuff It's possible that various lies about the values checked for in REQUIRED_MASK0/REQUIRED_MASK1, LA57 enablement, etc., can be audited in similar fashion as you've done above to find nothing concerning, but what about 5 years from now? And these are all checks/configuration that can put the kernel in unexpected states that persist beyond the point of attestation, where we really need to care about the possible effects. If SNP CPUID validation isn't utilized until after-the-fact, we'd end up not utilizing it for some of the more 'interesting' CPUID bits. It's also worth noting that TDX guards against most of this through CPUID virtualization, where hardware/microcode provides similar validation for these sorts of CPUID bits in early boot. It's only because the SEV-SNP CPUID 'virtualization' lives in the guest code that we have to deal with the additional complexity of initializing the CPUID table early on. But if both platforms are capable of providing similar assurances then it seems worthwhile to pursue that. > acrobatics of causing #VCs just to detect the SEV MSR? The CPUID calls in snp_cpuid_init() weren't added specifically to induce the #VC-based SEV MSR read, they were added only because I thought the gist of your earlier suggestions were to do more validation against the CPUID table advertised by EFI rather than the v5 approach of deferring them till later after sev_status gets set by sme_enable(), and then failing later if turned out that SNP CPUID feature bit wasn't set by sme_enable(). I thought this was motivated by a desire to be more paranoid about what EFI provides, so once I had the cpuid checks added in snp_cpuid_init() it seemed like a logical step to further sanity-check the SNP CPUID bit using a mechanism that was completely independent of the CPUID table. I'm not dead set on that at all however, it was only added based on me [mis-]interpreting your comments as a desire to be less trusting of EFI as to whether this is an SNP guest or not. But I also don't think it's a good idea to not utilize the CPUID table until after sme_enable(), based on the above reasons. What if we simply add a check in sme_enable() that terminates the guest if the cc_blob/cpuid table is provided, but the CPUID/MSR checks in sme_enable() determine that this isn't an SNP guest? That would be similar to the v5 approach, but in a less roundabout way, and then the cpuid/MSR checks could be dropped from snp_cpuid_init(). If we did decide that it is useful to use #VC-based initialization of sev_status there, it would all be self-contained there (but again, that's a separate thing that I don't have a strong opinion on). Thanks, Mike > > Thx. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7Cddb8c27d71794244176308d9926c094d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701815061371715%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LUArcQqb%2F0WFlworDbZOClXhgYqEGje364fpBycixUg%3D&reserved=0
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > [Sorry for the wall of text, just trying to work through everything.] And I'm going to respond in a couple of mails just for my own sanity. > I'm not sure if this is pertaining to using the CPUID table prior to > sme_enable(), or just the #VC-based SEV MSR read. The following comments > assume the former. If that assumption is wrong you can basically ignore > the rest of this email :) This is pertaining to me wanting to show you that the design of this SNP support needs to be sane and maintainable and every function needs to make sense not only now but in the future. In this particular example, we should set sev_status *once*, *before* anything accesses it so that it is prepared when something needs it. Not do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it." which basically means our design is seriously lacking. And I had suggested a similar thing for TDX and tglx was 100% right in shooting it down because we do properly designed things - not, get stuff in so that vendor is happy and then, once the vendor programmers have disappeared to do their next enablement task, the maintainers get to mop up and maintain it forever. Because this mopping up doesn't scale - trust me. > [The #VC-based SEV MSR read is not necessary for anything in sme_enable(), > it's simply a way to determine whether the guest is an SNP guest, without > any reliance on CPUID, which seemed useful in the context of doing some > additional sanity checks against the SNP CPUID table and determining that > it's appropriate to use it early on (rather than just trust that this is an > SNP guest by virtue of the CC blob being present, and then failing later > once sme_enable() checks for the SNP feature bits through the normal > mechanism, as was done in v5).] So you need to make up your mind here design-wise, what you wanna do. The proper thing to do would be, to detect *everything*, detect whether this is an SNP guest, yadda yadda, everything your code is going to need later on, and then be done with it. Then you continue with the boot and now your other code queries everything that has been detected up til now and uses it. End of mail 1.
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So > > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor > > won't be able to give confidentiality to its users => users go away or > > do unencrypted like now. > > > > Problem is solved by political and economical pressure. > > > > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above. > > I'd be worried about the possibility that, through some additional exploits > or failures in the attestation flow, Well, that puts forward an important question: how do you verify *reliably* that this is an SNP guest? - attestation? - CPUID? - anything else? I don't see this written down anywhere. Because this assumption will guide the design in the kernel. > a guest owner was tricked into booting unencrypted on a compromised > host and exposing their secrets. Their attestation process might even > do some additional CPUID sanity checks, which would at the point > be via the SNP CPUID table and look legitimate, unaware that the > kernel didn't actually use the SNP CPUID table until after 0x8000001F > was parsed (if we were to only initialize it after/as-part-of > sme_enable()). So what happens with that guest owner later? How is she to notice that she booted unencrypted? > Fortunately in this scenario I think the guest kernel actually would fail to > boot due to the SNP hardware unconditionally treating code/page tables as > encrypted pages. I tested some of these scenarios just to check, but not > all, and I still don't feel confident enough about it to say that there's > not some way to exploit this by someone who is more clever/persistant than > me. All this design needs to be preceded with: "We protect against cases A, B and C and not against D, E, etc." So that it is clear to all parties involved what we're working with and what we're protecting against and what we're *not* protecting against. End of mail 2, more later.
On Wed, Oct 20, 2021 at 08:01:07PM +0200, Borislav Petkov wrote: > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > [Sorry for the wall of text, just trying to work through everything.] > > And I'm going to respond in a couple of mails just for my own sanity. > > > I'm not sure if this is pertaining to using the CPUID table prior to > > sme_enable(), or just the #VC-based SEV MSR read. The following comments > > assume the former. If that assumption is wrong you can basically ignore > > the rest of this email :) > > This is pertaining to me wanting to show you that the design of this SNP > support needs to be sane and maintainable and every function needs to > make sense not only now but in the future. Absolutely. > > In this particular example, we should set sev_status *once*, *before* > anything accesses it so that it is prepared when something needs it. Not > do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it." > which basically means our design is seriously lacking. Yes, taking a step back there are some things that could probably be improved upon there. currently: - boot kernel initializes sev_status in set_sev_encryption_mask() - run-time kernel initializes sev_status in sme_enable() with this series the following are introduced: - boot kernel initializes sev_status on-demand in sev_snp_enabled() - initially used by snp_cpuid_init_boot(), which happens before set_sev_encryption_mask() - run-time kernel initializes sev_status on-demand via #VC handler - initially used by snp_cpuid_init(), which happens before sme_enable() Fortunately, all the code makes use of sev_status to get at the SEV MSR bits, so breaking the appropriate bits out of sme_enable() into an earlier sev_init() routine that's the exclusive writer of sev_status sounds like a promising approach. It makes sense to do it immediately after the first #VC handler is set up, so CPUID is available, and since that's where SNP CPUID table initialization would need to happen if it's to be made available in #VC handler. It may even be similar enough between boot/compressed and run-time kernel that it could be a shared routine in sev-shared.c. But then again it also sounds like the appropriate place to move the snp_cpuid_init*() calls, and locating the cc_blob, and since there's differences there it might make sense to keep the boot/compressed and kernel proper sev_init() routines separate to avoid #ifdeffery). Not to get ahead of myself though. Just seems like a good starting point for how to consolidate the various users. > > And I had suggested a similar thing for TDX and tglx was 100% right in > shooting it down because we do properly designed things - not, get stuff > in so that vendor is happy and then, once the vendor programmers have > disappeared to do their next enablement task, the maintainers get to mop > up and maintain it forever. > > Because this mopping up doesn't scale - trust me. Got it, and my apologies if I've given you that impression as it's certainly not my intent. (though I'm sure you've heard that before.) > > > [The #VC-based SEV MSR read is not necessary for anything in sme_enable(), > > it's simply a way to determine whether the guest is an SNP guest, without > > any reliance on CPUID, which seemed useful in the context of doing some > > additional sanity checks against the SNP CPUID table and determining that > > it's appropriate to use it early on (rather than just trust that this is an > > SNP guest by virtue of the CC blob being present, and then failing later > > once sme_enable() checks for the SNP feature bits through the normal > > mechanism, as was done in v5).] > > So you need to make up your mind here design-wise, what you wanna do. > > The proper thing to do would be, to detect *everything*, detect whether > this is an SNP guest, yadda yadda, everything your code is going to need > later on, and then be done with it. > > Then you continue with the boot and now your other code queries > everything that has been detected up til now and uses it. Agreed, if we need to check SEV MSR early for the purposes of SNP it makes sense to move the overall SEV feature detection code earlier as well. I should have looked into that aspect more closely before introducing the changes. > > End of mail 1. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C1f46689b40da4a700a6308d993f3993a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703496776826912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Kdi9h%2FTzuzoLn64BRsRMLWHkew14BxZHR28QsORsSxs%3D&reserved=0
On Wed, Oct 20, 2021 at 08:08:39PM +0200, Borislav Petkov wrote: > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So > > > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor > > > won't be able to give confidentiality to its users => users go away or > > > do unencrypted like now. > > > > > > Problem is solved by political and economical pressure. > > > > > > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above. > > > > I'd be worried about the possibility that, through some additional exploits > > or failures in the attestation flow, > > Well, that puts forward an important question: how do you verify > *reliably* that this is an SNP guest? > > - attestation? > > - CPUID? > > - anything else? > > I don't see this written down anywhere. Because this assumption will > guide the design in the kernel. According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the SEV MSR is the appropriate source for guests to use. This is what is used in the EFI code as well. So that seems to be the right way to make the initial determination. There's a dependency there on the SEV CPUID bit however, since setting the bit to 0 would generally result in a guest skipping the SEV MSR read and assuming 0. So for SNP it would be more reliable to make use of the CPUID table at that point, since it's less-susceptible to manipulation, or do the #VC-based SEV MSR read (or both). > > > a guest owner was tricked into booting unencrypted on a compromised > > host and exposing their secrets. Their attestation process might even > > do some additional CPUID sanity checks, which would at the point > > be via the SNP CPUID table and look legitimate, unaware that the > > kernel didn't actually use the SNP CPUID table until after 0x8000001F > > was parsed (if we were to only initialize it after/as-part-of > > sme_enable()). > > So what happens with that guest owner later? > > How is she to notice that she booted unencrypted? Fully-unencrypted should result in a crash due to the reasons below. But there may exist some carefully crafted outside influences that could goad the guest into, perhaps, not marking certain pages as private. The best that can be done to prevent that is to audit/harden all the code in the boot stack so that it is less susceptible to that kind of outside manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID table, SNP restricted injection, etc.) Then of course that boot stack needs to be part of the attestation process to provide any meaningful assurances about the resulting guest state. Outside of the boot stack the guest owner might take some extra precautions. Perhaps custom some kernel driver to verify encryption/validated status of guest pages, some checks against the CPUID table to verify it contains sane values, but not really worth speculating on that aspect as it will be ultimately dependent on how the cloud vendor decides to handle things after boot. > > > Fortunately in this scenario I think the guest kernel actually would fail to > > boot due to the SNP hardware unconditionally treating code/page tables as > > encrypted pages. I tested some of these scenarios just to check, but not > > all, and I still don't feel confident enough about it to say that there's > > not some way to exploit this by someone who is more clever/persistant than > > me. > > All this design needs to be preceded with: "We protect against cases A, > B and C and not against D, E, etc." > > So that it is clear to all parties involved what we're working with and > what we're protecting against and what we're *not* protecting against. That would indeed be useful. Perhaps as a nice big comment in sme_enable() and/or the proposed sev_init() so that those invariants can be maintained, or updated in sync with future changes. I'll look into that for the next spin and check with Brijesh on the details. > > End of mail 2, more later. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C70ce657823a441516fc808d993f4a402%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703501243595370%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nIqCXolZUNWTV6eBfLscXRfQDWJZk5fwBMghKVbIeaw%3D&reserved=0
On Wed, Oct 20, 2021 at 07:35:35PM -0500, Michael Roth wrote: > Fortunately, all the code makes use of sev_status to get at the SEV MSR > bits, so breaking the appropriate bits out of sme_enable() into an earlier > sev_init() routine that's the exclusive writer of sev_status sounds like a > promising approach. Ack. > It makes sense to do it immediately after the first #VC handler is set > up, so CPUID is available, and since that's where SNP CPUID table > initialization would need to happen if it's to be made available in > #VC handler. Right, and you can do all your init/CPUID prep there. > It may even be similar enough between boot/compressed and run-time kernel > that it could be a shared routine in sev-shared.c. Uuh, bonus points! :-) > But then again it also sounds like the appropriate place to move the > snp_cpuid_init*() calls, and locating the cc_blob, and since there's > differences there it might make sense to keep the boot/compressed and > kernel proper sev_init() routines separate to avoid #ifdeffery). > > Not to get ahead of myself though. Just seems like a good starting point > for how to consolidate the various users. I like how you're thinking. :) > Got it, and my apologies if I've given you that impression as it's > certainly not my intent. (though I'm sure you've heard that before.) Nothing to apologize - all good. > Agreed, if we need to check SEV MSR early for the purposes of SNP it makes > sense to move the overall SEV feature detection code earlier as well. I > should have looked into that aspect more closely before introducing the > changes. Thx.
On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote: > According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the > SEV MSR is the appropriate source for guests to use. This is what is used > in the EFI code as well. So that seems to be the right way to make the > initial determination. Yap. > There's a dependency there on the SEV CPUID bit however, since setting the > bit to 0 would generally result in a guest skipping the SEV MSR read and > assuming 0. So for SNP it would be more reliable to make use of the CPUID > table at that point, since it's less-susceptible to manipulation, or do the > #VC-based SEV MSR read (or both). So the CPUID page is supplied by the firmware, right? Then, you parse it and see that the CPUID bit is 1, then you start using the SEV_STATUS MSR and all good. If there *is* a CPUID page but that bit is 0, then you can safely assume that something is playing tricks on ya so you simply refuse booting. > Fully-unencrypted should result in a crash due to the reasons below. Crash is a good thing in confidential computing. :) > But there may exist some carefully crafted outside influences that could > goad the guest into, perhaps, not marking certain pages as private. The > best that can be done to prevent that is to audit/harden all the code in the > boot stack so that it is less susceptible to that kind of outside > manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID > table, SNP restricted injection, etc.) So to me I wonder why would one use anything *else* but an SNP guest. We all know that those previous technologies were just the stepping stones towards SNP. > Then of course that boot stack needs to be part of the attestation process > to provide any meaningful assurances about the resulting guest state. > > Outside of the boot stack the guest owner might take some extra precautions. > Perhaps custom some kernel driver to verify encryption/validated status of > guest pages, some checks against the CPUID table to verify it contains sane > values, but not really worth speculating on that aspect as it will be > ultimately dependent on how the cloud vendor decides to handle things after > boot. Well, I've always advocated having a best-practices writeup somewhere goes a long way to explain this technology to people and how to get their feet wet. And there you can give hints how such verification could look like in detail... > That would indeed be useful. Perhaps as a nice big comment in sme_enable() > and/or the proposed sev_init() so that those invariants can be maintained, > or updated in sync with future changes. I'll look into that for the next > spin and check with Brijesh on the details. There is Documentation/x86/amd-memory-encryption.rst, for example.
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > At which point we then switch to using the CPUID table? But at that > point all the previous CPUID checks, both SEV-related/non-SEV-related, > are now possibly not consistent with what's in the CPUID table. Do we > then revalidate? Well, that's a tough question. That's basically the same question as, does Linux support heterogeneous cores and can it handle hardware features which get enabled after boot. The perfect example is, late microcode loading which changes CPUID bits and adds new functionality. And the answer to that is, well, hard. You need to decide this on a case-by-case basis. But isn't it that the SNP CPUID page will be parsed early enough anyway so that kernel proper will see only SNP CPUID info and init properly using that? > Even a non-malicious hypervisor might provide inconsistent values > between the two sources due to bugs, or SNP validation suppressing > certain feature bits that hypervisor otherwise exposes, etc. There's also migration, lemme point to a very recent example: https://lore.kernel.org/r/20211021104744.24126-1-jane.malalane@citrix.com which is exactly what you say - a non-malicious HV taking care of its migration pool. So how do you handle that? > Now all the code after sme_enable() can potentially take unexpected > execution paths, where post-sme_enable() code makes assumptions about > pre-sme_enable() checks that may no longer hold true. So as I said above, if you parse SNP CPUID page early enough, you don't have to worry about feature rediscovery. Early enough means, before identify_boot_cpu().
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > The CPUID calls in snp_cpuid_init() weren't added specifically to induce > the #VC-based SEV MSR read, they were added only because I thought the > gist of your earlier suggestions were to do more validation against the > CPUID table advertised by EFI Well, if EFI is providing us with the CPUID table, who verified it? The attestation process? Is it signed with the AMD platform key? Because if we can verify the firmware is ok, then we can trust the CPUID page, right?
* Borislav Petkov (bp@alien8.de) wrote: > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > At which point we then switch to using the CPUID table? But at that > > point all the previous CPUID checks, both SEV-related/non-SEV-related, > > are now possibly not consistent with what's in the CPUID table. Do we > > then revalidate? > > Well, that's a tough question. That's basically the same question as, > does Linux support heterogeneous cores and can it handle hardware > features which get enabled after boot. The perfect example is, late > microcode loading which changes CPUID bits and adds new functionality. > > And the answer to that is, well, hard. You need to decide this on a > case-by-case basis. I can imagine a malicious hypervisor trying to return different cpuid answers to different threads or even the same thread at different times. > But isn't it that the SNP CPUID page will be parsed early enough anyway > so that kernel proper will see only SNP CPUID info and init properly > using that? > > > Even a non-malicious hypervisor might provide inconsistent values > > between the two sources due to bugs, or SNP validation suppressing > > certain feature bits that hypervisor otherwise exposes, etc. > > There's also migration, lemme point to a very recent example: > > https://lore.kernel.org/r/20211021104744.24126-1-jane.malalane@citrix.com Ewww. > which is exactly what you say - a non-malicious HV taking care of its > migration pool. So how do you handle that? Well, the spec (AMD 56860 SEV spec) says: 'If firmware encounters a CPUID function that is in the standard or extended ranges, then the firmware performs a check to ensure that the provided output would not lead to an insecure guest state' so I take that 'firmware' to be the PSP; that wording doesn't say that it checks that the CPUID is identical, just that it 'would not lead to an insecure guest' - so a hypervisor could hide any 'no longer affected by' flag for all the CPUs in it's migration pool and the firmware shouldn't complain; so it should be OK to pessimise. Dave > > Now all the code after sme_enable() can potentially take unexpected > > execution paths, where post-sme_enable() code makes assumptions about > > pre-sme_enable() checks that may no longer hold true. > > So as I said above, if you parse SNP CPUID page early enough, you don't > have to worry about feature rediscovery. Early enough means, before > identify_boot_cpu(). > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote: > I can imagine a malicious hypervisor trying to return different cpuid > answers to different threads or even the same thread at different times. Haha, I guess that will fail not because of SEV* but because of the kernel not really being able to handle heterogeneous CPUIDs. > Well, the spec (AMD 56860 SEV spec) says: > > 'If firmware encounters a CPUID function that is in the standard or extended ranges, then the > firmware performs a check to ensure that the provided output would not lead to an insecure guest > state' > > so I take that 'firmware' to be the PSP; that wording doesn't say that > it checks that the CPUID is identical, just that it 'would not lead to > an insecure guest' - so a hypervisor could hide any 'no longer affected > by' flag for all the CPUs in it's migration pool and the firmware > shouldn't complain; so it should be OK to pessimise. AFAIU this, I think this would depend on "[t]he policy used by the firmware to assess CPUID function output can be found in [PPR]." So if the HV sets the "no longer affected by" flag but the firmware deems this set flag as insecure, I'm assuming the firmare will clear it when it returns the CPUID leafs. I guess I need to go find that policy...
* Borislav Petkov (bp@alien8.de) wrote: > On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote: > > I can imagine a malicious hypervisor trying to return different cpuid > > answers to different threads or even the same thread at different times. > > Haha, I guess that will fail not because of SEV* but because of the > kernel not really being able to handle heterogeneous CPUIDs. My worry is if it fails cleanly or fails in a way an evil hypervisor can exploit. > > Well, the spec (AMD 56860 SEV spec) says: > > > > 'If firmware encounters a CPUID function that is in the standard or extended ranges, then the > > firmware performs a check to ensure that the provided output would not lead to an insecure guest > > state' > > > > so I take that 'firmware' to be the PSP; that wording doesn't say that > > it checks that the CPUID is identical, just that it 'would not lead to > > an insecure guest' - so a hypervisor could hide any 'no longer affected > > by' flag for all the CPUs in it's migration pool and the firmware > > shouldn't complain; so it should be OK to pessimise. > > AFAIU this, I think this would depend on "[t]he policy used by the > firmware to assess CPUID function output can be found in [PPR]." > > So if the HV sets the "no longer affected by" flag but the firmware > deems this set flag as insecure, I'm assuming the firmare will clear > it when it returns the CPUID leafs. I guess I need to go find that > policy... <digs - ppr_B1_pub_1 55898 rev 0.50 > OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109 then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as 'bitmask' 'bits set in the GuestVal must also be set in HostVal. This is often applied to feature fields where each bit indicates support for a feature' So that's right isn't it? Dave > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote: > OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109 > > then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as > 'bitmask' > 'bits set in the GuestVal must also be set in HostVal. > This is often applied to feature fields where each bit indicates > support for a feature' > > So that's right isn't it? Yap, AFAIRC, it would fail the check if: (GuestVal & HostVal) != GuestVal and GuestVal is "the CPUID result value created by the hypervisor that it wants to give to the guest". Let's say it clears bit 6 there. Then HostVal comes in which is "the actual CPUID result value specified in this PPR" and there the guest catches the HV lying its *ss off. :-)
* Borislav Petkov (bp@alien8.de) wrote: > On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote: > > OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109 > > > > then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as > > 'bitmask' > > 'bits set in the GuestVal must also be set in HostVal. > > This is often applied to feature fields where each bit indicates > > support for a feature' > > > > So that's right isn't it? > > Yap, AFAIRC, it would fail the check if: > > (GuestVal & HostVal) != GuestVal > > and GuestVal is "the CPUID result value created by the hypervisor that > it wants to give to the guest". Let's say it clears bit 6 there. ^^^^^^^ > Then HostVal comes in which is "the actual CPUID result value specified > in this PPR" and there the guest catches the HV lying its *ss off. > > :-) Hang on, I think it's perfectly fine for it to clear that bit - it just gets caught if it *sets* it (i.e. claims to be a chip unaffected by the bug). i.e. if guestval=0 then (GustVal & whatever) == GuestVal fine ? Dave > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Thu, Oct 21, 2021 at 06:47:50PM +0100, Dr. David Alan Gilbert wrote: > Hang on, I think it's perfectly fine for it to clear that bit - it just > gets caught if it *sets* it (i.e. claims to be a chip unaffected by the > bug). > > i.e. if guestval=0 then (GustVal & whatever) == GuestVal > fine > > ? Bah, ofc. The name of the bit is NullSelectorClearsBase - so when it is clear, we will note we're affected, as that patch does: + /* + * CPUID bit above wasn't set. If this kernel is still running + * as a HV guest, then the HV has decided not to advertize + * that CPUID bit for whatever reason. For example, one + * member of the migration pool might be vulnerable. Which + * means, the bug is present: set the BUG flag and return. + */ + if (cpu_has(c, X86_FEATURE_HYPERVISOR)) { + set_cpu_bug(c, X86_BUG_NULL_SEG); + return; + } I have managed to flip the meaning in my mind. Ok, that makes more sense. Thx.
On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote: > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce > > the #VC-based SEV MSR read, they were added only because I thought the > > gist of your earlier suggestions were to do more validation against the > > CPUID table advertised by EFI > > Well, if EFI is providing us with the CPUID table, who verified it? The > attestation process? Is it signed with the AMD platform key? For CPUID table pages, the only thing that's assured/attested to by firmware is that: 1) it is present at the expected guest physical address (that address is generally baked into the EFI firmware, which *is* attested to) 2) its contents have been validated by the PSP against the current host CPUID capabilities as defined by the AMD PPR (Publication #55898), Section 2.1.5.3, "CPUID Policy Enforcement" 3) it is encrypted with the guest key 4) it is in a validated state at launch The actual contents of the CPUID table are *not* attested to, so in theory it can still be manipulated by a malicious hypervisor as part of the initial SNP_LAUNCH_UPDATE firmware commands that provides the initial plain-text encoding of the CPUID table that is provided to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way (apparently there were some security reasons for that decision, though I don't know the full details). [A guest owner can still validate their CPUID values against known good ones as part of their attestation flow, but that is not part of the attestation report as reported by SNP firmware. (So long as there is some care taken to ensure the source of the CPUID values visible to userspace/guest attestion process are the same as what was used by the boot stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same initial address, or in cases where a copy is used, that copy is placed in encrypted/private/validated guest memory so it can't be tampered with during boot.] So, while it's more difficult to do, and the scope of influence is reduced, there are still some games that can be played to mess with boot via manipulation of the initial CPUID table values, so long as they are within the constraints set by the CPUID enforcement policy defined in the PPR. Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F, EAX, are not enforced by PSP. The only thing enforced there is that the hypervisor cannot advertise bits that aren't supported by hardware. So no matter how much the boot stack is trusted, the CPUID table does not inherit that trust, and even values that we *know* should be true should be verified rather than assumed. But I think there are a couple approaches for verifying this is an SNP guest that are robust against this sort of scenario. You've touched on some of them in your other replies, so I'll respond there. > > Because if we can verify the firmware is ok, then we can trust the CPUID > page, right? > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7C155dd6f54f3e4de017a908d994a236a5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704246794699901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2BS%2B%2F8%2BX8zLvPQGWvsOb7o6sKBz1MOZqU%2BVLKHiwugY%3D&reserved=0
On Thu, Oct 21, 2021 at 04:48:16PM +0200, Borislav Petkov wrote: > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > At which point we then switch to using the CPUID table? But at that > > point all the previous CPUID checks, both SEV-related/non-SEV-related, > > are now possibly not consistent with what's in the CPUID table. Do we > > then revalidate? > > Well, that's a tough question. That's basically the same question as, > does Linux support heterogeneous cores and can it handle hardware > features which get enabled after boot. The perfect example is, late > microcode loading which changes CPUID bits and adds new functionality. > > And the answer to that is, well, hard. You need to decide this on a > case-by-case basis. > > But isn't it that the SNP CPUID page will be parsed early enough anyway > so that kernel proper will see only SNP CPUID info and init properly > using that? At the time I wrote that I thought you were suggesting moving the SNP CPUID table initialization to where sme_enable() is in current upstream, so it seemed worth mentioning, but since the idea was actually to move all the sev_status initialization in sme_enable() earlier in the code to where SNP CPUID table init needs to happen (before first cpuid calls are made), I this scenario is avoided. > > > Even a non-malicious hypervisor might provide inconsistent values > > between the two sources due to bugs, or SNP validation suppressing > > certain feature bits that hypervisor otherwise exposes, etc. > > There's also migration, lemme point to a very recent example: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211021104744.24126-1-jane.malalane%40citrix.com&data=04%7C01%7Cmichael.roth%40amd.com%7C4aaa998fcd134c8d054608d994a1d1aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704245057316093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OKO9o3YzKwRkyWPpam2%2Fxn4aRSMKtPEnZjn05g81SP8%3D&reserved=0 > > which is exactly what you say - a non-malicious HV taking care of its > migration pool. So how do you handle that? I concur with David's assessment on that solution being compatible with CPUID enforcement policy. But it's certainly something to consider more generally. Fortunately I think I misspoke earlier, I thought there was a case or 2 where bits were suppressed, rather than causing a validation failure, but looking back through the PPR I doesn't seem like that's actually the case. Which is good, since that would indeed be painful to deal with in the context of migration.
On Thu, Oct 21, 2021 at 04:39:31PM +0200, Borislav Petkov wrote: > On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote: > > According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the > > SEV MSR is the appropriate source for guests to use. This is what is used > > in the EFI code as well. So that seems to be the right way to make the > > initial determination. > > Yap. > > > There's a dependency there on the SEV CPUID bit however, since setting the > > bit to 0 would generally result in a guest skipping the SEV MSR read and > > assuming 0. So for SNP it would be more reliable to make use of the CPUID > > table at that point, since it's less-susceptible to manipulation, or do the > > #VC-based SEV MSR read (or both). > > So the CPUID page is supplied by the firmware, right? Yes. > > Then, you parse it and see that the CPUID bit is 1, then you start using > the SEV_STATUS MSR and all good. > > If there *is* a CPUID page but that bit is 0, then you can safely assume > that something is playing tricks on ya so you simply refuse booting. I think that's a good way to deal with this. I was going to suggest we could assume the presence of SEV status MSR by virtue of EFI/bootloader/etc having provided a cc_blob, and just read it right away to confirm this is SNP. But with your approach we could basically just set up the table early, based on the presence of the cc_blob, and do all the checks in sme_enable() in the same order as with SEV/SEV-ES, then just have additional sanity checks against the CPUID/MSR response values to ensure the SNP bits are present for the cases where a cpuid table / cc_blob are provided. I'll work on implementing things in this way and see how it goes. > > > Fully-unencrypted should result in a crash due to the reasons below. > > Crash is a good thing in confidential computing. :) > > > But there may exist some carefully crafted outside influences that could > > goad the guest into, perhaps, not marking certain pages as private. The > > best that can be done to prevent that is to audit/harden all the code in the > > boot stack so that it is less susceptible to that kind of outside > > manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID > > table, SNP restricted injection, etc.) > > So to me I wonder why would one use anything *else* but an SNP guest. We > all know that those previous technologies were just the stepping stones > towards SNP. Yah, I think ultimately that's where things are headed. > > > Then of course that boot stack needs to be part of the attestation process > > to provide any meaningful assurances about the resulting guest state. > > > > Outside of the boot stack the guest owner might take some extra precautions. > > Perhaps custom some kernel driver to verify encryption/validated status of > > guest pages, some checks against the CPUID table to verify it contains sane > > values, but not really worth speculating on that aspect as it will be > > ultimately dependent on how the cloud vendor decides to handle things after > > boot. > > Well, I've always advocated having a best-practices writeup somewhere > goes a long way to explain this technology to people and how to get > their feet wet. And there you can give hints how such verification could > look like in detail... Our security team is working on some initial reference designs / tooling for attestation. It'll eventually make it's way to here: https://github.com/AMDESE/sev-guest but it's still mostly an internal effort so nothing there ATM. But hopefully that will fill in some of these gaps. But I agree some an accompanying best practices document to highlight some of these considerations is also something that should be considered, I'll need to check to see if there's anything like that in the works already. > > > That would indeed be useful. Perhaps as a nice big comment in sme_enable() > > and/or the proposed sev_init() so that those invariants can be maintained, > > or updated in sync with future changes. I'll look into that for the next > > spin and check with Brijesh on the details. > > There is Documentation/x86/amd-memory-encryption.rst, for example. Makes sense, will work with Brijesh on this. Thanks! -Mike > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C966f1e67d4704901d29a08d994a099e1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704239855683221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=730WoYycYnjabC4igLcViZsEokSrcMkJ9oXvZso4ULQ%3D&reserved=0
On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote: > On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote: > > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce > > > the #VC-based SEV MSR read, they were added only because I thought the > > > gist of your earlier suggestions were to do more validation against the > > > CPUID table advertised by EFI > > > > Well, if EFI is providing us with the CPUID table, who verified it? The > > attestation process? Is it signed with the AMD platform key? > > For CPUID table pages, the only thing that's assured/attested to by firmware > is that: > > 1) it is present at the expected guest physical address (that address > is generally baked into the EFI firmware, which *is* attested to) > 2) its contents have been validated by the PSP against the current host > CPUID capabilities as defined by the AMD PPR (Publication #55898), > Section 2.1.5.3, "CPUID Policy Enforcement" > 3) it is encrypted with the guest key > 4) it is in a validated state at launch > > The actual contents of the CPUID table are *not* attested to, Why? > so in theory it can still be manipulated by a malicious hypervisor as > part of the initial SNP_LAUNCH_UPDATE firmware commands that provides > the initial plain-text encoding of the CPUID table that is provided > to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way > (apparently there were some security reasons for that decision, though > I don't know the full details). So this sounds like an unnecessary complication. I'm sure there are reasons to do it this way but my simple thinking would simply want the CPUID page to be read-only and signed so that the guest can trust it unconditionally. > [A guest owner can still validate their CPUID values against known good > ones as part of their attestation flow, but that is not part of the > attestation report as reported by SNP firmware. (So long as there is some > care taken to ensure the source of the CPUID values visible to > userspace/guest attestion process are the same as what was used by the boot > stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same > initial address, or in cases where a copy is used, that copy is placed in > encrypted/private/validated guest memory so it can't be tampered with during > boot.] This sounds like the good practices advice to guest owners would be, "Hey, I just booted your SNP guest but for full trust, you should go and verify the CPUID page's contents." "And if I were you, I wouldn't want to run any verification of CPUID pages' contents on the same guest because it itself hasn't been verified yet." It all sounds weird. > So, while it's more difficult to do, and the scope of influence is reduced, > there are still some games that can be played to mess with boot via > manipulation of the initial CPUID table values, so long as they are within > the constraints set by the CPUID enforcement policy defined in the PPR. > > Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F, > EAX, are not enforced by PSP. The only thing enforced there is that the > hypervisor cannot advertise bits that aren't supported by hardware. So > no matter how much the boot stack is trusted, the CPUID table does not > inherit that trust, and even values that we *know* should be true should be > verified rather than assumed. > > But I think there are a couple approaches for verifying this is an SNP > guest that are robust against this sort of scenario. You've touched on > some of them in your other replies, so I'll respond there. Yah, I guess the kernel can do good enough verification and then the full thing needs to be done by the guest owner and in *some* userspace - not necessarily on the currently booted, unverified guest - but somewhere, where you have maximal flexibility. IMHO.
On Mon, Oct 25, 2021 at 01:04:10PM +0200, Borislav Petkov wrote: > On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote: > > On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote: > > > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote: > > > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce > > > > the #VC-based SEV MSR read, they were added only because I thought the > > > > gist of your earlier suggestions were to do more validation against the > > > > CPUID table advertised by EFI > > > > > > Well, if EFI is providing us with the CPUID table, who verified it? The > > > attestation process? Is it signed with the AMD platform key? > > > > For CPUID table pages, the only thing that's assured/attested to by firmware > > is that: > > > > 1) it is present at the expected guest physical address (that address > > is generally baked into the EFI firmware, which *is* attested to) > > 2) its contents have been validated by the PSP against the current host > > CPUID capabilities as defined by the AMD PPR (Publication #55898), > > Section 2.1.5.3, "CPUID Policy Enforcement" > > 3) it is encrypted with the guest key > > 4) it is in a validated state at launch > > > > The actual contents of the CPUID table are *not* attested to, > > Why? As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID table is part of the PSP attestation report, since: - the boot stack is attested to, and if the boot stack isn't careful to use the CPUID table at all times, then attesting CPUID table after boot doesn't provide any assurance that the boot wasn't manipulated by CPUID - given the boot stack must take these precautions, guest-specific attestation code is just as capable of attesting the CPUID table contents/values, since it has the same view of the CPUID values that were used during boot. So leaving it to the guest owner to attest it provides some flexibility to guest owners to implement it as they see fit, whereas making it part of the attestation report means that the guest needs the exact contents of the CPUID page for a particular guest configuration so it can be incorporated into the measurement they are expecting, which would likely require some tooling provided by the cloud vendor, since every different guest configuration, or even changes like the ordering in which entries are placed in the table, would affect measurement, so it's not something that could be easily surmised separately with minimal involvement from a cloud vendor. And even if the cloud vendor provided a simple way to export the table contents for measurement, can you really trust it? If you have to audit individual entries to be sure there's nothing fishy, why not just incorporate those checks into the guest owner's attestation flow and leave the vendor out of it completely? So not including it in the measurement meshes well with the overall SEV-SNP approach of reducing the cloud vendor's involvement in the overall attestation process. > > > so in theory it can still be manipulated by a malicious hypervisor as > > part of the initial SNP_LAUNCH_UPDATE firmware commands that provides > > the initial plain-text encoding of the CPUID table that is provided > > to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way > > (apparently there were some security reasons for that decision, though > > I don't know the full details). > > So this sounds like an unnecessary complication. I'm sure there are > reasons to do it this way but my simple thinking would simply want the > CPUID page to be read-only and signed so that the guest can trust it > unconditionally. The thing here is that it's not just a specific CPUID page that's valid for all guests for a particular host. Booting a guest with additional vCPUs changes the contents, different CPU models/flags changes the contents, etc. So it needs to be generated for each specific guest configuration, and can't just be a read-only page. Some sort of signature that indicates the PSP's stamp of approval on a particular CPUID page would be nice, but we do sort of have this in the sense that CPUID page 'address' is part of measurement, and can only contain values that were blessed by the PSP. The problem then becomes ensuring that only that address it used for CPUID lookups, and that it's contents weren't manipulated in a way where it's 'valid' as far as the PSP is concerned, but still not the 'expected' values for a particular guest (which is where the attestation mentioned above would come into play). > > > [A guest owner can still validate their CPUID values against known good > > ones as part of their attestation flow, but that is not part of the > > attestation report as reported by SNP firmware. (So long as there is some > > care taken to ensure the source of the CPUID values visible to > > userspace/guest attestion process are the same as what was used by the boot > > stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same > > initial address, or in cases where a copy is used, that copy is placed in > > encrypted/private/validated guest memory so it can't be tampered with during > > boot.] > > This sounds like the good practices advice to guest owners would be, > "Hey, I just booted your SNP guest but for full trust, you should go and > verify the CPUID page's contents." > > "And if I were you, I wouldn't want to run any verification of CPUID > pages' contents on the same guest because it itself hasn't been verified > yet." > > It all sounds weird. Yes, understandably so. But the only way to avoid that sort of weirdness in general is for *all* guest state to be measured, all pages, all registers, etc. Baking that directly into the SEV-SNP attestation report would be a non-starter for most since computing the measurement for all that state independently would require lots of additional inputs from cloud vendor (who we don't necessarily trust in the first place), and constant updates of measurement values since they would change with every guest configuration change, every different starting TSC offset, different, maybe the order in which vCPUs were onlined, stuff that the kernel prints to log buffers, etc. But, if a guest owner wants to attempt clever ways to account for some/all of that in their attestation flow, they are welcome to try. That's sort of the idea behind SNP attestation vs. SEV. Things like page validation/encryption, cpuid enforcement, etc., reduce some of the variables/possibilities guest owners need to account for during attestation to make the process more secure/tenable, but they don't rule out all possibilities, just as Trusted Boot doesn't necessarily mean you can fully trust your OS state immediately afer boot; there are still outside influences at play, and the boot stack should guard against them wherever possible. > > > So, while it's more difficult to do, and the scope of influence is reduced, > > there are still some games that can be played to mess with boot via > > manipulation of the initial CPUID table values, so long as they are within > > the constraints set by the CPUID enforcement policy defined in the PPR. > > > > Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F, > > EAX, are not enforced by PSP. The only thing enforced there is that the > > hypervisor cannot advertise bits that aren't supported by hardware. So > > no matter how much the boot stack is trusted, the CPUID table does not > > inherit that trust, and even values that we *know* should be true should be > > verified rather than assumed. > > > > But I think there are a couple approaches for verifying this is an SNP > > guest that are robust against this sort of scenario. You've touched on > > some of them in your other replies, so I'll respond there. > > Yah, I guess the kernel can do good enough verification and then the > full thing needs to be done by the guest owner and in *some* userspace > - not necessarily on the currently booted, unverified guest - but > somewhere, where you have maximal flexibility. Exactly, moving attestation into the guest allows for more of the these unexpected states to be accounted for at whatever level of paranoia a guest owner sees fit, while still allowing firmware to provide some basic assurances via attestation report and various features to reduce common attack vectors during/after boot. > > IMHO. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7Cd9f9c20a37ce4a4b0e0608d997a72f6a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707566641580997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OmfuUZfcf3bkC%2FmSsiInQ1vScK5Onu1lHWAUH3o%2FUmE%3D&reserved=0
On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote: > As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID > table is part of the PSP attestation report, since: Thanks for taking the time to explain in detail - I think I know now what's going on, and David explained some additional stuff to me yesterday. So, to cut to the chase: - yeah, ok, I guess guest owner attestation is what should happen. - as to the boot detection, I think you should do in sme_enable(), in pseudo: bool snp_guest_detected; if (CPUID page address) { read SEV_STATUS; snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED; } /* old SME/SEV detection path */ read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda. if (snp_guest_detected && (!SME || !SEV)) /* * HV is lying to me, do something there, dunno what. I guess we can * continue booting unencrypted so that the guest owner knows that * detection has failed and maybe the HV didn't want us to force SNP. * This way, attestation will fail and the user will know why. * Or something like that. */ /* normal feature detection continues. */ How does that sound?
On Wed, Oct 27, 2021 at 01:17:11PM +0200, Borislav Petkov wrote: > On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote: > > As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID > > table is part of the PSP attestation report, since: > > Thanks for taking the time to explain in detail - I think I know now > what's going on, and David explained some additional stuff to me > yesterday. > > So, to cut to the chase: > > - yeah, ok, I guess guest owner attestation is what should happen. > > - as to the boot detection, I think you should do in sme_enable(), in > pseudo: > > bool snp_guest_detected; > > if (CPUID page address) { > read SEV_STATUS; > > snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED; > } > > /* old SME/SEV detection path */ > read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda. > > if (snp_guest_detected && (!SME || !SEV)) > /* > * HV is lying to me, do something there, dunno what. I guess we can > * continue booting unencrypted so that the guest owner knows that > * detection has failed and maybe the HV didn't want us to force SNP. > * This way, attestation will fail and the user will know why. > * Or something like that. > */ > > > /* normal feature detection continues. */ > > How does that sound? That seems promising. I've been testing a similar approach in conjunction with moving sme_enable() to after the initial #VC handler is set up and things seem to work out pretty nicely. boot/compressed is a little less straightforward since the sme_enable() equivalent is set_sev_encryption_mask() which sets sev_status and is written in assembly, whereas the SNP-specific bits we're adding relies on C code that handles stuff like scanning EFI config table are in C, so probably worthwhile to see if everything can be redone in C. But then there's get_sev_encryption_bit(), which needs to be in assembly since it needs to be called from 32-bit entry path as well, but that doesn't actually rely on anything set by set_sev_encryption_mask(), so it seems like it should be okay to split set_sev_encryption_mask() out into a separate C routine. Will work on implementing/testing that approach, but if you or Joerg are aware of any showstoppers there just let me know. Thanks! -Mike > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C72940826a93b49882ffa08d9993b5390%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709302358641670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BoUx7zP3RA57CwGG2q5IkUkrYQZiOL9ZoLxvIVTq%2BDY%3D&reserved=0
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 8ee27d07c1cd..2796c524d174 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) if (exit_code != SVM_EXIT_CPUID) goto fail; + /* + * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV + * MSR is also available. Go ahead and initialize sev_status here to + * allow SEV features to be checked without relying solely on the SEV + * cpuid bit to indicate whether it is safe to do so. + */ + if (!sev_status) { + unsigned long lo, hi; + + asm volatile("rdmsr" : "=a" (lo), "=d" (hi) + : "c" (MSR_AMD64_SEV)); + sev_status = (hi << 32) | lo; + } + sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX)); VMGEXIT(); val = sev_es_rd_ghcb_msr();