Message ID | 20240621134041.3170480-5-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SEV-SNP: Add KVM support for attestation and KVM_EXIT_COCO | expand |
On Fri, Jun 21, 2024, Michael Roth wrote: > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index ecfa25b505e7..2eea9828d9aa 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > primary storage for certain register types. Therefore, the kernel may use the > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > > +:: > + > + /* KVM_EXIT_COCO */ > + struct kvm_exit_coco { > + #define KVM_EXIT_COCO_REQ_CERTS 0 > + #define KVM_EXIT_COCO_MAX 1 > + __u8 nr; > + __u8 pad0[7]; > + union { > + struct { > + __u64 gfn; > + __u32 npages; > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) Unless I'm mistaken, these error codes are defined by the GHCB, which means the values matter, i.e. aren't arbitrary KVM-defined values. I forget exactly what we discussed in PUCK, but for the error codes, I think KVM should either define it's own values that are completely disconnected from any "harware" spec, or KVM should very explicitly #define all hardware values and have the semantics of "ret" be vendor specific. A hybrid approach doesn't really work, e.g. KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC isn't used anywhere and and looks quite odd. My vote is for vendor specific error codes, because unlike having a common user exit reason+struct, I don't think arch-neutral error codes will minimize KVM's ABI, I think it'll do the exact opposite. The only thing we need to require is that '0' == success. E.g. I think we can end up with something like: static int snp_complete_req_certs(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); struct vmcb_control_area *control = &svm->vmcb->control; if (vcpu->run->coco.req_certs.ret) if (vcpu->run->coco.req_certs.ret == SNP_GUEST_VMM_ERR_INVALID_LEN) vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); return 1; } return snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2); } > + __u32 ret; > + } req_certs; > + };
On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: > On Fri, Jun 21, 2024, Michael Roth wrote: > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index ecfa25b505e7..2eea9828d9aa 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > > primary storage for certain register types. Therefore, the kernel may use the > > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > > > > +:: > > + > > + /* KVM_EXIT_COCO */ > > + struct kvm_exit_coco { > > + #define KVM_EXIT_COCO_REQ_CERTS 0 > > + #define KVM_EXIT_COCO_MAX 1 > > + __u8 nr; > > + __u8 pad0[7]; > > + union { > > + struct { > > + __u64 gfn; > > + __u32 npages; > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > Unless I'm mistaken, these error codes are defined by the GHCB, which means the > values matter, i.e. aren't arbitrary KVM-defined values. They do happen to coincide with the GHCB-defined values: /* * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define * a GENERIC error code such that it won't ever conflict with GHCB-defined * errors if any get added in the future. */ #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 #define SNP_GUEST_VMM_ERR_BUSY 2 #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are defined/documented without any reliance on the GHCB spec and are purely KVM-defined. I just didn't really see any reason to pick different numerical values since it seems like purposely obfuscating things for no real reason. But the code itself doesn't rely on them being the same as the spec defines, so we are free to define these however we'd like as far as the KVM API goes. > > I forget exactly what we discussed in PUCK, but for the error codes, I think KVM > should either define it's own values that are completely disconnected from any > "harware" spec, or KVM should very explicitly #define all hardware values and have I'd gotten the impression that option 1) is what we were sort of leaning toward, and that's the approach taken here. > the semantics of "ret" be vendor specific. A hybrid approach doesn't really work, > e.g. KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC isn't used anywhere and and looks quite odd. This is a catch-all error for userspace to set if any issues are encountered that don't map to any other KVM_EXIT_COCO_REQ_CERTS_ERR_* cases (like INVALID_LEN). It's defined purely for KVM/userspace and not based on any other spec. > > My vote is for vendor specific error codes, because unlike having a common user > exit reason+struct, I don't think arch-neutral error codes will minimize KVM's ABI, > I think it'll do the exact opposite. The only thing we need to require is that > '0' == success. I think this makes sense if we think of using KVM_EXIT_COCO mainly an interface for GHCB/GHCI interactions, but now that we're leveraging KVM_HC_MAP_GPA_RANGE for page-state change requests, and TDX is planning to do the same, it doesn't really seem like likely that exposing those definitions to userspace at that level will reduce ABI. For instance this is purely just a KVM interface to request a certificate blob from userspace, which is a side-note as far as all the GHCB-defined definitions KVM needs to deal with regarding handling GHCB extended/non-extended guest requests. And KVM itself might have it's own requirements on top for what it needs from userspace, and those requirements might be separate from these vendor specs. And if we expose things selectively to keep the ABI small, it's a bit awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs a way to indicate success/fail/ENOMEM. Which we have with (assuming 0==success): #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) But the GHCB also defines other values like: #define SNP_GUEST_VMM_ERR_BUSY 2 which don't make much sense to handle on the userspace side and doesn't really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, which is a separate/self-contained thing from the general guest request protocol. So would we expose that as ABI or not? If not then we end up with this weird splitting of code. And if yes, then we have to sort of give userspace a way to discover whenever new error codes are added to the GHCB spec, because KVM needs to understand these value too and users might be running on older kernel where only the currently-defined error codes are present understood. E.g. if we started off implementing KVM_EXIT_COCO_REQ_CERTS without a way to request a larger buffer from the guest, and it wasn't later on that SNP_GUEST_VMM_ERR_INVALID_LEN was added, we'd probably need a capability bit or something to see if KVM supports requesting larger page sizes from the guest. Otherwise userspace might just set it because the spec says it's valid, but it won't work as expected because KVM hasn't implemented that. I guess technically we could reason about this particular one based on which GHCB protocol version was set via KVM_SEV_INIT2, but what if KVM itself was adding that functionality separately from the spec, and now we got this intermingling of specs. > > E.g. I think we can end up with something like: > > static int snp_complete_req_certs(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > struct vmcb_control_area *control = &svm->vmcb->control; > > if (vcpu->run->coco.req_certs.ret) > if (vcpu->run->coco.req_certs.ret == SNP_GUEST_VMM_ERR_INVALID_LEN) I'm not opposed to this approach, but just deciding which of: #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 #define SNP_GUEST_VMM_ERR_BUSY 2 #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) should be exposed to userspace based on how we've defined the KVM_EXIT_COCO_REQ_CERTS already seems like an unecessary dilemma versus just defining exactly what's needed and documenting that in the KVM API. If we anticipate needing to expose big chunks of GHCB/GHCI to userspace for other reasons or future extensions of KVM_EXIT_COCO_* then I definitely see the rationale to avoid duplication. But with KVM_HC_MAP_GPA_RANGE case covered, I don't see any major reason to think this will ever end up being the case. It seems more likely this will just be KVM's handy place to handle "Hey userspace, I need you to handle some CoCo-related stuff for me" and it's really KVM that's driving those requirements vs. any particular spec. For instance, the certificate-fetching in the first place is only handled by userspace because that's how KVM communinity decided to handle it, not some general spec-driven requirement to handle these sorts of things in userspace. Similarly for the KVM_HC_MAP_GPA_RANGE that we originally considered this interface to handle: the fact that userspace handles those requests is mainly a KVM/gmem design decision. And like the KVM_HC_MAP_GPA_RANGE case, maybe we find there are cases where a common KVM-defined event type can handle the requirements of multiple specs with a common interface API, without exposing any particular vendor definitions. So based on that I sort of think giving KVM more flexibility on how it wants to implement/document specific KVM_EXIT_COCO event types will ultimately result in cleaner and more manageable ABI. -Mike > vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, > SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); > return 1; > } > > return snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2); > } > > > + __u32 ret; > > + } req_certs; > > + }; >
On Wed, Jun 26, 2024, Michael Roth wrote: > On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: > > On Fri, Jun 21, 2024, Michael Roth wrote: > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > index ecfa25b505e7..2eea9828d9aa 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > > > primary storage for certain register types. Therefore, the kernel may use the > > > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > > > > > > +:: > > > + > > > + /* KVM_EXIT_COCO */ > > > + struct kvm_exit_coco { > > > + #define KVM_EXIT_COCO_REQ_CERTS 0 > > > + #define KVM_EXIT_COCO_MAX 1 > > > + __u8 nr; > > > + __u8 pad0[7]; > > > + union { > > > + struct { > > > + __u64 gfn; > > > + __u32 npages; > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > > > Unless I'm mistaken, these error codes are defined by the GHCB, which means the > > values matter, i.e. aren't arbitrary KVM-defined values. > > They do happen to coincide with the GHCB-defined values: > > /* > * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define > * a GENERIC error code such that it won't ever conflict with GHCB-defined > * errors if any get added in the future. > */ > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > #define SNP_GUEST_VMM_ERR_BUSY 2 > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > > and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are > defined/documented without any reliance on the GHCB spec and are purely > KVM-defined. I just didn't really see any reason to pick different > numerical values since it seems like purposely obfuscating things for For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the fact that it appears to be a mask is even more odd. > no real reason. But the code itself doesn't rely on them being the same > as the spec defines, so we are free to define these however we'd like as > far as the KVM API goes. > > I forget exactly what we discussed in PUCK, but for the error codes, I think KVM > > should either define it's own values that are completely disconnected from any > > "harware" spec, or KVM should very explicitly #define all hardware values and have > > I'd gotten the impression that option 1) is what we were sort of leaning > toward, and that's the approach taken here. > And if we expose things selectively to keep the ABI small, it's a bit > awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs > a way to indicate success/fail/ENOMEM. Which we have with > (assuming 0==success): > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > But the GHCB also defines other values like: > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > which don't make much sense to handle on the userspace side and doesn't Why not? If userspace is waiting on a cert update for whatever reason, why can't it signal "busy" to the guest? > really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, > which is a separate/self-contained thing from the general guest request > protocol. So would we expose that as ABI or not? If not then we end up > with this weird splitting of code. And if yes, then we have to sort of > give userspace a way to discover whenever new error codes are added to > the GHCB spec, because KVM needs to understand these value too and Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM would happily pass that to the guest. > users might be running on older kernel where only the currently-defined > error codes are present understood. > > E.g. if we started off implementing KVM_EXIT_COCO_REQ_CERTS without a > way to request a larger buffer from the guest, and it wasn't later > on that SNP_GUEST_VMM_ERR_INVALID_LEN was added, we'd probably need a > capability bit or something to see if KVM supports requesting larger We'd need that regardless, no? Even if some other architecture added a error code for invalid length, KVM would need to reject that for SNP because KVM couldn't translate ERR_INVALID_LEN into an SNP error code. And when a KVM comes along that does support that error code, KVM would need a way to advertise support. But if KVM simply forwards error codes, then KVM only needs to advertise support if KVM reacts to the error code. As mentioned in the previous version, ideally userspace would need to set guest regs for INVALID_LEN case, but I don't see a sane/reasonable way to do that. > page sizes from the guest. Otherwise userspace might just set it because > the spec says it's valid, but it won't work as expected because KVM > hasn't implemented that. > > I guess technically we could reason about this particular one based on > which GHCB protocol version was set via KVM_SEV_INIT2, but what if > KVM itself was adding that functionality separately from the spec, and > now we got this intermingling of specs. How would KVM do that? > > E.g. I think we can end up with something like: > > > > static int snp_complete_req_certs(struct kvm_vcpu *vcpu) > > { > > struct vcpu_svm *svm = to_svm(vcpu); > > struct vmcb_control_area *control = &svm->vmcb->control; > > > > if (vcpu->run->coco.req_certs.ret) > > if (vcpu->run->coco.req_certs.ret == SNP_GUEST_VMM_ERR_INVALID_LEN) > > I'm not opposed to this approach, but just deciding which of: > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > #define SNP_GUEST_VMM_ERR_BUSY 2 > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > > should be exposed to userspace based on how we've defined the > KVM_EXIT_COCO_REQ_CERTS already seems like an unecessary dilemma > versus just defining exactly what's needed and documenting that > in the KVM API. But that's not what your code does. It exposes gunk that isn't necessary (ERR_GENERIC), and then doesn't enforce anything on the backend because snp_complete_req_certs() interprets any non-zero "return" value as a "generic" error. If we actually want to maintain extensibility, then KVM needs to enforce inputs. And if we do that, then it doesn't really matter whether KVM defines arbitrary error codes or reuses the GHCB codes, e.g. it'll either be: if (vcpu->run->coco.req_certs.ret) if (vcpu->run->coco.req_certs.ret != KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN) return -EINVAL; vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); return 1; } versus: if (vcpu->run->coco.req_certs.ret) if (vcpu->run->coco.req_certs.ret != SNP_GUEST_VMM_ERR_INVALID_LEN) return -EINVAL; vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); return 1; } (with variations depending on whether or not KVM allows SNP_GUEST_VMM_ERR_BUSY). > If we anticipate needing to expose big chunks of GHCB/GHCI to > userspace for other reasons or future extensions of KVM_EXIT_COCO_* > then I definitely see the rationale to avoid duplication. But with > KVM_HC_MAP_GPA_RANGE case covered, I don't see any major reason to > think this will ever end up being the case. > > It seems more likely this will just be KVM's handy place to handle "Hey > userspace, I need you to handle some CoCo-related stuff for me" and > it's really KVM that's driving those requirements vs. any particular > spec. > > For instance, the certificate-fetching in the first place is only > handled by userspace because that's how KVM communinity decided to > handle it, not some general spec-driven requirement to handle these > sorts of things in userspace. Similarly for the KVM_HC_MAP_GPA_RANGE > that we originally considered this interface to handle: the fact that > userspace handles those requests is mainly a KVM/gmem design decision. > > And like the KVM_HC_MAP_GPA_RANGE case, maybe we find there are cases > where a common KVM-defined event type can handle the requirements of > multiple specs with a common interface API, without exposing any > particular vendor definitions. > > So based on that I sort of think giving KVM more flexibility on how it > wants to implement/document specific KVM_EXIT_COCO event types will > ultimately result in cleaner and more manageable ABI. I don't disagree, I'm just not seeing how regurgitating the GHCB error codes provides flexibility. As above, unless KVM is super restrictive about which error codes can be returned, KVM has zero flexibility. Reusing exit reasons and whatnot, e.g. for KVM_HC_MAP_GPA_RANGE, is all about reducing copy+paste and not having to deal with 14^W15 different standards. Any ABI flexibility gained is a nice bonus. If we think there's actually a chance that a different vendor can use KVM_EXIT_COCO_REQ_CERTS and userspace won't end end up with wildly different implementations, then yeah, let's define generic return codes. But if we're just going to end up with a bunch of vendor error codes redefined by KVM, I don't see the point. Another way to approach this would be to use existing the errno values, i.e. EINVAL and EBUSY in this case. The upside is we don't have to define custom return codes. The downside is that KVM needs to translate (though if we actually expect vendors to reuse KVM_EXIT_COCO_REQ_CERTS, odds are good at least one vendor will need to translate, i.e. won't be able to use KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN verbatim like SNP).
On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote: > On Wed, Jun 26, 2024, Michael Roth wrote: > > On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: > > > On Fri, Jun 21, 2024, Michael Roth wrote: > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > index ecfa25b505e7..2eea9828d9aa 100644 > > > > --- a/Documentation/virt/kvm/api.rst > > > > +++ b/Documentation/virt/kvm/api.rst > > > > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > > > > primary storage for certain register types. Therefore, the kernel may use the > > > > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > > > > > > > > +:: > > > > + > > > > + /* KVM_EXIT_COCO */ > > > > + struct kvm_exit_coco { > > > > + #define KVM_EXIT_COCO_REQ_CERTS 0 > > > > + #define KVM_EXIT_COCO_MAX 1 > > > > + __u8 nr; > > > > + __u8 pad0[7]; > > > > + union { > > > > + struct { > > > > + __u64 gfn; > > > > + __u32 npages; > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > > > > > Unless I'm mistaken, these error codes are defined by the GHCB, which means the > > > values matter, i.e. aren't arbitrary KVM-defined values. > > > > They do happen to coincide with the GHCB-defined values: > > > > /* > > * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define > > * a GENERIC error code such that it won't ever conflict with GHCB-defined > > * errors if any get added in the future. > > */ > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > > > > and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are > > defined/documented without any reliance on the GHCB spec and are purely > > KVM-defined. I just didn't really see any reason to pick different > > numerical values since it seems like purposely obfuscating things for > > For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the > fact that it appears to be a mask is even more odd. That's fair. Values 1 and 2 made sense so just re-use, but that results in a awkward value for _GENERIC that's not really necessary for the KVM side. > > > no real reason. But the code itself doesn't rely on them being the same > > as the spec defines, so we are free to define these however we'd like as > > far as the KVM API goes. > > > > I forget exactly what we discussed in PUCK, but for the error codes, I think KVM > > > should either define it's own values that are completely disconnected from any > > > "harware" spec, or KVM should very explicitly #define all hardware values and have > > > > I'd gotten the impression that option 1) is what we were sort of leaning > > toward, and that's the approach taken here. > > > And if we expose things selectively to keep the ABI small, it's a bit > > awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs > > a way to indicate success/fail/ENOMEM. Which we have with > > (assuming 0==success): > > > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > > > But the GHCB also defines other values like: > > > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > > > which don't make much sense to handle on the userspace side and doesn't > > Why not? If userspace is waiting on a cert update for whatever reason, why can't > it signal "busy" to the guest? My thinking was that userspace is free to take it's time and doesn't need to report delays back to KVM. But it would reduce the potential for soft-lockups in the guest, so it might make sense to work that into the API. But more to original point, there could be something added in the future that really has nothing to do with anything involving KVM<->userspace interaction and so would make no sense to expose to userspace. Unfortunately I picked a bad example. :) > > > really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, > > which is a separate/self-contained thing from the general guest request > > protocol. So would we expose that as ABI or not? If not then we end up > > with this weird splitting of code. And if yes, then we have to sort of > > give userspace a way to discover whenever new error codes are added to > > the GHCB spec, because KVM needs to understand these value too and > > Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to > set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to > care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM > would happily pass that to the guest. But given we already have an exception to that where KVM does need to intervene for certain errors codes like ERR_INVALID_LEN that require modifying guest state, it doesn't seem like a good starting position to have to hope that it doesn't happen again. It just doesn't seem necessary to put ourselves in a situation where we'd need to be concerned by that at all. If the KVM API is a separate and fairly self-contained thing then these decisions are set in stone until we want to change it and not dictated/modified by changes to anything external without our explicit consideration. I know the certs things is GHCB-specific atm, but when the certs used to live inside the kernel the KVM_EXIT_* wasn't needed at all, so that's why I see this as more of a KVM interface thing rather than a GHCB one. And maybe eventually some other CoCo implementation also needs some interface for fetching certificates/blobs from userspace and is able to re-use it still because it's not too SNP-specific and the behavior isn't dictated by the GHCB spec (e.g. ERR_INVALID_LEN might result in some other state needing to be modified in their case rather than what the GHCB dictates.) > > > users might be running on older kernel where only the currently-defined > > error codes are present understood. > > > > E.g. if we started off implementing KVM_EXIT_COCO_REQ_CERTS without a > > way to request a larger buffer from the guest, and it wasn't later > > on that SNP_GUEST_VMM_ERR_INVALID_LEN was added, we'd probably need a > > capability bit or something to see if KVM supports requesting larger > > We'd need that regardless, no? Even if some other architecture added a error > code for invalid length, KVM would need to reject that for SNP because KVM couldn't > translate ERR_INVALID_LEN into an SNP error code. And when a KVM comes along > that does support that error code, KVM would need a way to advertise support. But in that case it would be immediately obvious that if they extended KVM_EXIT_COCO_REQ_CERTS (or whatever) they'd need to be aware that other architectures are already using it and make the appropriate accomodations to make those extensions discoverable. > > But if KVM simply forwards error codes, then KVM only needs to advertise support > if KVM reacts to the error code. Forwards them where though? There's not really any reason that userspace needs to be cognizant of that fact that error codes are being passed to the guest. It needs to tell KVM either: a) success: here's the cert blob b) error: i need more space c) error: i'm busy (potentially) d) error: something bad on my end, handle it as you will Being able to mediate all the architecture-specific details on the backend without complicating the front-end we expose to userspace gives more flexibility with how we handle compatibility stuff between architectures. And we'd still only need to advertise what the interface explicitly requires userspace to be aware of. > > As mentioned in the previous version, ideally userspace would need to set guest > regs for INVALID_LEN case, but I don't see a sane/reasonable way to do that. We had KVM_EXIT_VMGEXIT previously, where userspace had direct access to the GHCB. I think TDX had similar. But that went away when we unified under KVM_HC_MAP_GPA_RANGE. No question that, in that case, it made sense to lean heavily on GHCB-defined values/handling. But I thought KVM_EXIT_COCO was an attempt to capitalize on the KVM_HC_MAP_GPA_RANGE success story and further move toward providing more potential for common APIs for other CoCo stuff. > > > page sizes from the guest. Otherwise userspace might just set it because > > the spec says it's valid, but it won't work as expected because KVM > > hasn't implemented that. > > > > I guess technically we could reason about this particular one based on > > which GHCB protocol version was set via KVM_SEV_INIT2, but what if > > KVM itself was adding that functionality separately from the spec, and > > now we got this intermingling of specs. > > How would KVM do that? Hmm, good question. I don't think I was talking specifically about KVM adding *_INVALID_LEN support outside of the GHCB spec at that point, but I'm not sure I had a good example in mind. I certainly don't atm =\ > > > > E.g. I think we can end up with something like: > > > > > > static int snp_complete_req_certs(struct kvm_vcpu *vcpu) > > > { > > > struct vcpu_svm *svm = to_svm(vcpu); > > > struct vmcb_control_area *control = &svm->vmcb->control; > > > > > > if (vcpu->run->coco.req_certs.ret) > > > if (vcpu->run->coco.req_certs.ret == SNP_GUEST_VMM_ERR_INVALID_LEN) > > > > I'm not opposed to this approach, but just deciding which of: > > > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > > > > should be exposed to userspace based on how we've defined the > > KVM_EXIT_COCO_REQ_CERTS already seems like an unecessary dilemma > > versus just defining exactly what's needed and documenting that > > in the KVM API. > > But that's not what your code does. It exposes gunk that isn't necessary > (ERR_GENERIC), and then doesn't enforce anything on the backend because Agreed that I should be explicitly enforcing that only the defined error codes should be getting returned by userspace. I think that's more of a bug on my part rather than a consequence of design choices though. > snp_complete_req_certs() interprets any non-zero "return" value as a "generic" > error. If we actually want to maintain extensibility, then KVM needs to enforce > inputs. > > And if we do that, then it doesn't really matter whether KVM defines arbitrary > error codes or reuses the GHCB codes, e.g. it'll either be: For instance, the GHCB spec mentions: A SW_EXITINFO2 value of 0 indicates a successful completion of the SNP Guest Request. and goes on to document *_INVALID_LEN and *_ERR_BUSY as other possible values. ...but it doesn't really define anything for "unable to fetch certs", and that's certainly a situation userspace might hit if it got deleted/renamed/etc. It's trivial for KVM to just make up some specific or generic error to handle this case with the current approach. If we took that stance that userspace should just return what the GHCB says, then the GHCB protocol does document a more generic set of error hypervisor-specific error codes that can be set in SW_EXITINFO2 when SW_EXITINFO1 is set to '2': Table 7: Invalid GHCB Reason Codes Value Description 0x0001 The GHCB address was not registered (SEV-SNP) 0x0002 The GHCB Usage value was not valid 0x0003 The SW_SCRATCH field was not valid / could not be mapped 0x0004 The required input fields(s) for the NAE event were not marked valid in the GHCB VALID_BITMAP field 0x0005 The NAE event input was not valid (e.g., an invalid SW_EXITINFO1 value for the AP Jump Table NAE event) 0x0006 The NAE event was not valid 0x0007-0xffff Reserved 0x10000+ Available for hypervisor specific reason codes. But in order for userspace to set them, we need to expose the notion of SW_EXITINFO1 and SW_EXITINFO2 so it can set the appropriate values directly. But even then it's weird because lower 32-bits of SW_EXITINFO2 correspond to the fw_err passed back by SNP firmware, only KVM can set that. So you can't just point to the GHCB spec, you need to point to the GHCB spec, expose bits and pieces of GHCB-defined values as ABI, and then layer KVM-specific stuff on top like 'KVM will set the fw_err value so these bits are actually off-limits for you', or potentially only give them the upper 32-bits to work with and inform them to ignore the lower 32-bits mentioned in the GHCB spec. That sort of illustrates my concerns with this approach. It's unpredictable what parts of the GHCB spec are/aren't applicable for these particular interactions between KVM<->userspace, or how much relying on that sort of approach will complicate interfaces/documentation that might otherwise be much simpler KVM gives itself the leeway to define them in the manner that is most convenient to KVM. > > if (vcpu->run->coco.req_certs.ret) > if (vcpu->run->coco.req_certs.ret != KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN) > return -EINVAL; > > vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, > SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); > return 1; > } > > versus: > > if (vcpu->run->coco.req_certs.ret) > if (vcpu->run->coco.req_certs.ret != SNP_GUEST_VMM_ERR_INVALID_LEN) > return -EINVAL; > > vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages; > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, > SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0)); > return 1; > } > > (with variations depending on whether or not KVM allows SNP_GUEST_VMM_ERR_BUSY). > > > If we anticipate needing to expose big chunks of GHCB/GHCI to > > userspace for other reasons or future extensions of KVM_EXIT_COCO_* > > then I definitely see the rationale to avoid duplication. But with > > KVM_HC_MAP_GPA_RANGE case covered, I don't see any major reason to > > think this will ever end up being the case. > > > > It seems more likely this will just be KVM's handy place to handle "Hey > > userspace, I need you to handle some CoCo-related stuff for me" and > > it's really KVM that's driving those requirements vs. any particular > > spec. > > > > For instance, the certificate-fetching in the first place is only > > handled by userspace because that's how KVM communinity decided to > > handle it, not some general spec-driven requirement to handle these > > sorts of things in userspace. Similarly for the KVM_HC_MAP_GPA_RANGE > > that we originally considered this interface to handle: the fact that > > userspace handles those requests is mainly a KVM/gmem design decision. > > > > And like the KVM_HC_MAP_GPA_RANGE case, maybe we find there are cases > > where a common KVM-defined event type can handle the requirements of > > multiple specs with a common interface API, without exposing any > > particular vendor definitions. > > > > So based on that I sort of think giving KVM more flexibility on how it > > wants to implement/document specific KVM_EXIT_COCO event types will > > ultimately result in cleaner and more manageable ABI. > > I don't disagree, I'm just not seeing how regurgitating the GHCB error codes > provides flexibility. As above, unless KVM is super restrictive about which > error codes can be returned, KVM has zero flexibility. I tried to give a better example above of why I think leaning too heavily on the GHCB or other specs would potentially make for less-flexible interfaces. > > Reusing exit reasons and whatnot, e.g. for KVM_HC_MAP_GPA_RANGE, is all about > reducing copy+paste and not having to deal with 14^W15 different standards. Any > ABI flexibility gained is a nice bonus. If we think there's actually a chance > that a different vendor can use KVM_EXIT_COCO_REQ_CERTS and userspace won't end > end up with wildly different implementations, then yeah, let's define generic > return codes. That's sort of my hope here. I know the certificate blob format itself is SNP-specific, and most likely someone would need to massage it or extend it for other applications, but at least it's not 100% guaranteed to be useless for other archs. And if you want to take steps further toward that goal, then maybe we can consider not even passing in the GPAs to userspace and just have a KVM_EXIT_COCO_FETCH_BLOB interface along with a scratch buffer somewhere to handle it or something. > > But if we're just going to end up with a bunch of vendor error codes redefined > by KVM, I don't see the point. I also have no qualms about leaning on the GHCB spec where appropriate, or even wholesale if the need arose. But I just don't think fetching the certificate blob would benefit much from going down that route, and the penalty for re-definitions in this case seems smaller than the additional uAPI complexity we'd have exposing the sw_exitinfo1/sw_exitnfo2 fields needed to fully allow userspace to code against the GHCB spec rather than a self-contained KVM-defined abstraction layer. > > Another way to approach this would be to use existing the errno values, i.e. > EINVAL and EBUSY in this case. The upside is we don't have to define custom > return codes. The downside is that KVM needs to translate (though if we actually I think I would greatly prefer that as a middle ground between the 2 approaches. We wouldn't have to redefine anything, and we'd still have the flexibility to document the meaning/handling of these errors code in the KVM API documentation. > expect vendors to reuse KVM_EXIT_COCO_REQ_CERTS, odds are good at least one vendor > will need to translate, i.e. won't be able to use KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN > verbatim like SNP). If SNP translates right off the bat then I think that's a good sets a good example for others who might be fishing for an interface they can re-use. -Mike
On 6/29/2024 8:36 AM, Michael Roth wrote: > On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote: >> On Wed, Jun 26, 2024, Michael Roth wrote: >>> On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: >>>> On Fri, Jun 21, 2024, Michael Roth wrote: >>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >>>>> index ecfa25b505e7..2eea9828d9aa 100644 >>>>> --- a/Documentation/virt/kvm/api.rst >>>>> +++ b/Documentation/virt/kvm/api.rst >>>>> @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the >>>>> primary storage for certain register types. Therefore, the kernel may use the >>>>> values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. >>>>> >>>>> +:: >>>>> + >>>>> + /* KVM_EXIT_COCO */ >>>>> + struct kvm_exit_coco { >>>>> + #define KVM_EXIT_COCO_REQ_CERTS 0 >>>>> + #define KVM_EXIT_COCO_MAX 1 >>>>> + __u8 nr; >>>>> + __u8 pad0[7]; >>>>> + union { >>>>> + struct { >>>>> + __u64 gfn; >>>>> + __u32 npages; >>>>> + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 >>>>> + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) >>>> Unless I'm mistaken, these error codes are defined by the GHCB, which means the >>>> values matter, i.e. aren't arbitrary KVM-defined values. >>> They do happen to coincide with the GHCB-defined values: >>> >>> /* >>> * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define >>> * a GENERIC error code such that it won't ever conflict with GHCB-defined >>> * errors if any get added in the future. >>> */ >>> #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 >>> #define SNP_GUEST_VMM_ERR_BUSY 2 >>> #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) >>> >>> and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are >>> defined/documented without any reliance on the GHCB spec and are purely >>> KVM-defined. I just didn't really see any reason to pick different >>> numerical values since it seems like purposely obfuscating things for >> For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the >> fact that it appears to be a mask is even more odd. > That's fair. Values 1 and 2 made sense so just re-use, but that results > in a awkward value for _GENERIC that's not really necessary for the KVM > side. > >>> no real reason. But the code itself doesn't rely on them being the same >>> as the spec defines, so we are free to define these however we'd like as >>> far as the KVM API goes. >>>> I forget exactly what we discussed in PUCK, but for the error codes, I think KVM >>>> should either define it's own values that are completely disconnected from any >>>> "harware" spec, or KVM should very explicitly #define all hardware values and have >>> I'd gotten the impression that option 1) is what we were sort of leaning >>> toward, and that's the approach taken here. >>> And if we expose things selectively to keep the ABI small, it's a bit >>> awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs >>> a way to indicate success/fail/ENOMEM. Which we have with >>> (assuming 0==success): >>> >>> #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 >>> #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) >>> >>> But the GHCB also defines other values like: >>> >>> #define SNP_GUEST_VMM_ERR_BUSY 2 >>> >>> which don't make much sense to handle on the userspace side and doesn't >> Why not? If userspace is waiting on a cert update for whatever reason, why can't >> it signal "busy" to the guest? > My thinking was that userspace is free to take it's time and doesn't need > to report delays back to KVM. But it would reduce the potential for > soft-lockups in the guest, so it might make sense to work that into the > API. > > But more to original point, there could be something added in the future > that really has nothing to do with anything involving KVM<->userspace > interaction and so would make no sense to expose to userspace. > Unfortunately I picked a bad example. :) > >>> really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, >>> which is a separate/self-contained thing from the general guest request >>> protocol. So would we expose that as ABI or not? If not then we end up >>> with this weird splitting of code. And if yes, then we have to sort of >>> give userspace a way to discover whenever new error codes are added to >>> the GHCB spec, because KVM needs to understand these value too and >> Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to >> set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to >> care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM >> would happily pass that to the guest. > But given we already have an exception to that where KVM does need to > intervene for certain errors codes like ERR_INVALID_LEN that require > modifying guest state, it doesn't seem like a good starting position > to have to hope that it doesn't happen again. > > It just doesn't seem necessary to put ourselves in a situation where > we'd need to be concerned by that at all. If the KVM API is a separate > and fairly self-contained thing then these decisions are set in stone > until we want to change it and not dictated/modified by changes to > anything external without our explicit consideration. > > I know the certs things is GHCB-specific atm, but when the certs used > to live inside the kernel the KVM_EXIT_* wasn't needed at all, so > that's why I see this as more of a KVM interface thing rather than > a GHCB one. And maybe eventually some other CoCo implementation also > needs some interface for fetching certificates/blobs from userspace > and is able to re-use it still because it's not too SNP-specific > and the behavior isn't dictated by the GHCB spec (e.g. > ERR_INVALID_LEN might result in some other state needing to be > modified in their case rather than what the GHCB dictates.) TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e., TDG.VP.VMCALL<GetQuote>. This GetQuote PV interface is designed to invoke a request to generate a TD-Quote signing by a service hosting TD-Quoting Enclave operating in the host environment for a TD Report passed as a parameter by the TD. And the request will be forwarded to userspace for handling. So like GHCB, TDX needs to pass a shared buffer to userspace, which is specified by GPA and size (4K aligned) and get the error code from userspace and forward the error code to guest. But there are some differences from GHCB interface. 1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a request. I.e., it is an asynchronous request. The error code represents the status of request queuing, *not* the status of TD Quote generation.. 2. Besides the error code returned by userspace for GetQuote interface, the GHCI spec defines a "Status Code" field in the header of the shared buffer. The "Status Code" field is also updated by VMM during the real handling of getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest). After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD guest can poll the "Status Code" field to check if the processing is in-flight, succeeded or failed. Since the real handling of getting quota is happening in userspace, and it will interact directly with guest, for TDX, it has to expose TDX specific error code to userspace to update the result of quote generation. Currently, TDX is about to add a new TDX specific KVM exit reason, i.e., KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ For the error code returned by userspace, KVM simply forward the error code to guest without further translation or handling. I am neutral to have a common KVM exit reason to handle both GHCB for REQ_CERTS and GHCI for GET_QUOTE. But for the error code, can we uses vendor specific error codes if KVM cares about the error code returned by userspace in vendor specific complete_userspace_io callback? BTW, here is the plan of 4 hypercalls needing to exit to userspace for TDX basic support series: TDG.VP.VMCALL<SetupEventNotifyInterrupt> - Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY TDG.VP.VMCALL<GetQuote> - Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE TDG.VP.VMCALL<MapGPA> - Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE TDG.VP.VMCALL<ReportFatalError> - Reuse KVM_EXIT_SYSTEM_EVENT but add a new type KVM_SYSTEM_EVENT_TDX_FATAL_ERROR
On Fri, Jul 26, 2024 at 12:15 AM Binbin Wu <binbin.wu@linux.intel.com> wrote: > > > > On 6/29/2024 8:36 AM, Michael Roth wrote: > > On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote: > >> On Wed, Jun 26, 2024, Michael Roth wrote: > >>> On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: > >>>> On Fri, Jun 21, 2024, Michael Roth wrote: > >>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > >>>>> index ecfa25b505e7..2eea9828d9aa 100644 > >>>>> --- a/Documentation/virt/kvm/api.rst > >>>>> +++ b/Documentation/virt/kvm/api.rst > >>>>> @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > >>>>> primary storage for certain register types. Therefore, the kernel may use the > >>>>> values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > >>>>> > >>>>> +:: > >>>>> + > >>>>> + /* KVM_EXIT_COCO */ > >>>>> + struct kvm_exit_coco { > >>>>> + #define KVM_EXIT_COCO_REQ_CERTS 0 > >>>>> + #define KVM_EXIT_COCO_MAX 1 > >>>>> + __u8 nr; > >>>>> + __u8 pad0[7]; > >>>>> + union { > >>>>> + struct { > >>>>> + __u64 gfn; > >>>>> + __u32 npages; > >>>>> + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > >>>>> + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > >>>> Unless I'm mistaken, these error codes are defined by the GHCB, which means the > >>>> values matter, i.e. aren't arbitrary KVM-defined values. > >>> They do happen to coincide with the GHCB-defined values: > >>> > >>> /* > >>> * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define > >>> * a GENERIC error code such that it won't ever conflict with GHCB-defined > >>> * errors if any get added in the future. > >>> */ > >>> #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > >>> #define SNP_GUEST_VMM_ERR_BUSY 2 > >>> #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > >>> VMM_ERR_BUSY means something very specific to the GHCB protocol, which is that a request that would normally have increased a message sequence number was not able to be sent, and the exact same message would need to be sent again, otherwise the cryptographic protocol breaks down. In the event of firmware hotloading, SNP_COMMIT, and needing to get the right version of VCEK certificate to the VM guest, we could detect a conflict and need to say VMM_ERR_BUSY2 (or something) that says try again, but the sequence number did go up, we just couldn't coordinate a non-atomic data transfer afterward to be correct. There's a different way to solve the data race without a retry, but I'm not 100% confident that we can really generalize the error space across TEEs. To support the coordination of SNP_DOWNLOAD_FIRMWARE_EX, SNP_COMMIT, and extended guest requests, user space needs to be told which TCB_VERSION certificate it needs to provide. It can be wrong if it relies on its own call to SNP_PLATFORM_STATUS. Given that userspace can't interpret the report (encrypted by VMPCK), it won't know exactly which VCEK certificate to provide given that SNP_COMMIT can happen before or after an attestation report is taken and before KVM exits to userspace for the certificates. We can extend the ccp driver to, on extended guest request, lock the command buffer, get the REPORTED_TCB, complete the request, unlock the command buffer, and return both the response and the REPORTED_TCB at the time of the request. That will give userspace enough info to give the right certificate. That would mean a more specific KVM_EXIT_COCO_REQ_EXIT message than just where to put the certs. A SEV-SNP TCB_VERSION is also platform-specific, so not particularly generalizable to "COCO". We could also say that extended_guest_request is inherently racy and have AMD extend the GHCB spec with a new request type that doesn't communicate with the ASP at all and instead just requests certificates for a given REPORTED_TCB. The guest VM can read an attestation report's reported_tcb field and craft this request. I don't know if we want to have an arbitrary message passing interface between guest VM and user space VMM without very specific restrictions. We ought to just have paravirtualized I/O devices then. > >>> and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are > >>> defined/documented without any reliance on the GHCB spec and are purely > >>> KVM-defined. I just didn't really see any reason to pick different > >>> numerical values since it seems like purposely obfuscating things for > >> For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the > >> fact that it appears to be a mask is even more odd. > > That's fair. Values 1 and 2 made sense so just re-use, but that results > > in a awkward value for _GENERIC that's not really necessary for the KVM > > side. > > > >>> no real reason. But the code itself doesn't rely on them being the same > >>> as the spec defines, so we are free to define these however we'd like as > >>> far as the KVM API goes. > >>>> I forget exactly what we discussed in PUCK, but for the error codes, I think KVM > >>>> should either define it's own values that are completely disconnected from any > >>>> "harware" spec, or KVM should very explicitly #define all hardware values and have > >>> I'd gotten the impression that option 1) is what we were sort of leaning > >>> toward, and that's the approach taken here. > >>> And if we expose things selectively to keep the ABI small, it's a bit > >>> awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs > >>> a way to indicate success/fail/ENOMEM. Which we have with > >>> (assuming 0==success): > >>> > >>> #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > >>> #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > >>> > >>> But the GHCB also defines other values like: > >>> > >>> #define SNP_GUEST_VMM_ERR_BUSY 2 > >>> > >>> which don't make much sense to handle on the userspace side and doesn't > >> Why not? If userspace is waiting on a cert update for whatever reason, why can't > >> it signal "busy" to the guest? > > My thinking was that userspace is free to take it's time and doesn't need > > to report delays back to KVM. But it would reduce the potential for > > soft-lockups in the guest, so it might make sense to work that into the > > API. > > > > But more to original point, there could be something added in the future > > that really has nothing to do with anything involving KVM<->userspace > > interaction and so would make no sense to expose to userspace. > > Unfortunately I picked a bad example. :) > > > >>> really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, > >>> which is a separate/self-contained thing from the general guest request > >>> protocol. So would we expose that as ABI or not? If not then we end up > >>> with this weird splitting of code. And if yes, then we have to sort of > >>> give userspace a way to discover whenever new error codes are added to > >>> the GHCB spec, because KVM needs to understand these value too and > >> Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to > >> set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to > >> care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM > >> would happily pass that to the guest. > > But given we already have an exception to that where KVM does need to > > intervene for certain errors codes like ERR_INVALID_LEN that require > > modifying guest state, it doesn't seem like a good starting position > > to have to hope that it doesn't happen again. > > > > It just doesn't seem necessary to put ourselves in a situation where > > we'd need to be concerned by that at all. If the KVM API is a separate > > and fairly self-contained thing then these decisions are set in stone > > until we want to change it and not dictated/modified by changes to > > anything external without our explicit consideration. > > > > I know the certs things is GHCB-specific atm, but when the certs used > > to live inside the kernel the KVM_EXIT_* wasn't needed at all, so > > that's why I see this as more of a KVM interface thing rather than > > a GHCB one. And maybe eventually some other CoCo implementation also > > needs some interface for fetching certificates/blobs from userspace > > and is able to re-use it still because it's not too SNP-specific > > and the behavior isn't dictated by the GHCB spec (e.g. > > ERR_INVALID_LEN might result in some other state needing to be > > modified in their case rather than what the GHCB dictates.) > > TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e., > TDG.VP.VMCALL<GetQuote>. This GetQuote PV interface is designed to invoke > a request to generate a TD-Quote signing by a service hosting TD-Quoting > Enclave operating in the host environment for a TD Report passed as a > parameter by the TD. > And the request will be forwarded to userspace for handling. > > So like GHCB, TDX needs to pass a shared buffer to userspace, which is > specified by GPA and size (4K aligned) and get the error code from > userspace and forward the error code to guest. > > But there are some differences from GHCB interface. > 1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a > request. I.e., it is an asynchronous request. The error code represents > the status of request queuing, *not* the status of TD Quote generation.. > 2. Besides the error code returned by userspace for GetQuote interface, the > GHCI spec defines a "Status Code" field in the header of the shared > buffer. > The "Status Code" field is also updated by VMM during the real > handling of > getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest). > After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD > guest can poll the "Status Code" field to check if the processing is > in-flight, succeeded or failed. > Since the real handling of getting quota is happening in userspace, and > it will interact directly with guest, for TDX, it has to expose TDX > specific error code to userspace to update the result of quote > generation. > > Currently, TDX is about to add a new TDX specific KVM exit reason, i.e., > KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous > discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ > For the error code returned by userspace, KVM simply forward the error code > to guest without further translation or handling. > > I am neutral to have a common KVM exit reason to handle both GHCB for > REQ_CERTS and GHCI for GET_QUOTE. But for the error code, can we uses > vendor > specific error codes if KVM cares about the error code returned by userspace > in vendor specific complete_userspace_io callback? > > BTW, here is the plan of 4 hypercalls needing to exit to userspace for > TDX basic support series: > TDG.VP.VMCALL<SetupEventNotifyInterrupt> > - Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY > TDG.VP.VMCALL<GetQuote> > - Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE > TDG.VP.VMCALL<MapGPA> > - Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE > TDG.VP.VMCALL<ReportFatalError> > - Reuse KVM_EXIT_SYSTEM_EVENT but add a new type > KVM_SYSTEM_EVENT_TDX_FATAL_ERROR > > > -- -Dionna Glaze, PhD, CISSP, CCSP (she/her)
On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > We can extend the ccp driver to, on extended guest request, lock the > command buffer, get the REPORTED_TCB, complete the request, unlock the > command buffer, and return both the response and the REPORTED_TCB at > the time of the request. Holding a lock across an exit to userspace seems wildly unsafe. Can you explain the race that you are trying to close, with the exact "bad" sequence of events laid out in chronological order, and an explanation of why the race can't be sovled in userspace? I read through your previous comment[*] (which I assume is the race you want to close?), but I couldn't quite piece together exactly what's broken. [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@mail.gmail.com
On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > > We can extend the ccp driver to, on extended guest request, lock the > > command buffer, get the REPORTED_TCB, complete the request, unlock the > > command buffer, and return both the response and the REPORTED_TCB at > > the time of the request. > > Holding a lock across an exit to userspace seems wildly unsafe. I wasn't suggesting this. I was suggesting adding a special ccp symbol that would perform two sev commands under the same lock to ensure we know the REPORTED_TCB that was used to derive the VCEK that signs an attestation report in the MSG_REPORT_REQ guest request. We use that atomicity to be sure that when we exit to user space to request certificates that we're getting the right version certificates. > > Can you explain the race that you are trying to close, with the exact "bad" sequence > of events laid out in chronological order, and an explanation of why the race can't > be sovled in userspace? I read through your previous comment[*] (which I assume > is the race you want to close?), but I couldn't quite piece together exactly what's > broken. 1. the control plane delivers a firmware update. Current TCB version goes up. The machine signals that it needs new certificates before it can commit. 2. VM performs an extended guest request. 3. KVM exits to user space to get certificates before getting the report from firmware. 4. [what I understand Michael Roth was suggesting] User space grabs a file lock to see if it can read the cached certificates. It reads the certificates and releases the lock before returning to KVM. 5. the control plane delivers the certificates to the machine and tells it to commit. The machine grabs the certificate file lock, runs SNP_COMMIT, and releases the file lock. This command updates both COMMITTED_TCB and REPORTED_TCB. 6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's a different REPORTED_TCB. 7. Guest receives the wrong certificates for certifying the report it just received. The fact that 4 has to release the lock before getting the attestation report is the problem. If we instead get the report and know what the REPORTED_TCB was when serving that request, then we can exit to user space requesting the certificates for the report in hand. A concurrent update can update the reported_tcb like in the above scenario, but it won't interfere with certificates since the machine should have certificates for both TCB_VERSIONs to provide until the commit is complete. I don't think it's workable to have 1 grab the file lock and for 5 to release it. Waiting for a service to update stale certificates should not block user attestation requests. It would make 4's failure to get the lock return VMM_BUSY and eventually cause attestations to time out in sev-guest. > > [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@mail.gmail.com
On Fri, Nov 01, 2024 at 01:53:26PM -0700, Dionna Amalie Glaze wrote: > On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > > > We can extend the ccp driver to, on extended guest request, lock the > > > command buffer, get the REPORTED_TCB, complete the request, unlock the > > > command buffer, and return both the response and the REPORTED_TCB at > > > the time of the request. > > > > Holding a lock across an exit to userspace seems wildly unsafe. > > I wasn't suggesting this. I was suggesting adding a special ccp symbol > that would perform two sev commands under the same lock to ensure we > know the REPORTED_TCB that was used to derive the VCEK that signs an > attestation report in the MSG_REPORT_REQ guest request. We use that > atomicity to be sure that when we exit to user space to request > certificates that we're getting the right version certificates. > > > > > Can you explain the race that you are trying to close, with the exact "bad" sequence > > of events laid out in chronological order, and an explanation of why the race can't > > be sovled in userspace? I read through your previous comment[*] (which I assume > > is the race you want to close?), but I couldn't quite piece together exactly what's > > broken. Hi Dionna, > > 1. the control plane delivers a firmware update. Current TCB version > goes up. The machine signals that it needs new certificates before it > can commit. > 2. VM performs an extended guest request. > 3. KVM exits to user space to get certificates before getting the > report from firmware. > 4. [what I understand Michael Roth was suggesting] User space grabs a > file lock to see if it can read the cached certificates. It reads the > certificates and releases the lock before returning to KVM. > 5. the control plane delivers the certificates to the machine and > tells it to commit. The machine grabs the certificate file lock, runs > SNP_COMMIT, and releases the file lock. This command updates both > COMMITTED_TCB and REPORTED_TCB. > 6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's > a different REPORTED_TCB. > 7. Guest receives the wrong certificates for certifying the report it > just received. > > The fact that 4 has to release the lock before getting the attestation > report is the problem. We wouldn't actually release the lock before getting the attestation report. There's more specifics on the suggested flow in the documentation update accompanying this patch: + NOTE: In the case of SEV-SNP, the endorsement key used by firmware may + change as a result of management activities like updating SEV-SNP firmware + or loading new endorsement keys, so some care should be taken to keep the + returned certificate data in sync with the actual endorsement key in use by + firmware at the time the attestation request is sent to SNP firmware. The + recommended scheme to do this is: + + - The VMM should obtain a shared or exclusive lock on the path the + certificate blob file resides at before reading it and returning it to + KVM, and continue to hold the lock until the attestation request is + actually sent to firmware. To facilitate this, the VMM can set the + ``immediate_exit`` flag of kvm_run just after supplying the certificate + data, and just before and resuming the vCPU. This will ensure the vCPU + will exit again to userspace with ``-EINTR`` after it finishes fetching + the attestation request from firmware, at which point the VMM can + safely drop the file lock. + + - Tools/libraries that perform updates to SNP firmware TCB values or + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see + Documentation/virt/coco/sev-guest.rst for more details) in such a way + that the certificate blob needs to be updated, should similarly take an + exclusive lock on the certificate blob for the duration of any updates + to endorsement keys or the certificate blob contents to ensure that + VMMs using the above scheme will not return certificate blob data that + is out of sync with the endorsement key used by firmware. So #5 would not be able to obtain an exclusive file lock until userspace receives confirmation that the attestation request was processed by firmware. At that point it will be an accurate reflection of the attestation state associated with that particular version of the certificates that was fetched from userspace. So at that point the, transaction is done at that point and userspace can safely release the lock. -Mike > If we instead get the report and know what the REPORTED_TCB was when > serving that request, then we can exit to user space requesting the > certificates for the report in hand. > A concurrent update can update the reported_tcb like in the above > scenario, but it won't interfere with certificates since the machine > should have certificates for both TCB_VERSIONs to provide until the > commit is complete. > > I don't think it's workable to have 1 grab the file lock and for 5 to > release it. Waiting for a service to update stale certificates should > not block user attestation requests. It would make 4's failure to get > the lock return VMM_BUSY and eventually cause attestations to time out > in sev-guest. > > > > > [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@mail.gmail.com > > > > -- > -Dionna Glaze, PhD, CISSP, CCSP (she/her)
On Fri, Nov 1, 2024 at 3:04 PM Michael Roth <michael.roth@amd.com> wrote: > > On Fri, Nov 01, 2024 at 01:53:26PM -0700, Dionna Amalie Glaze wrote: > > On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > > > > We can extend the ccp driver to, on extended guest request, lock the > > > > command buffer, get the REPORTED_TCB, complete the request, unlock the > > > > command buffer, and return both the response and the REPORTED_TCB at > > > > the time of the request. > > > > > > Holding a lock across an exit to userspace seems wildly unsafe. > > > > I wasn't suggesting this. I was suggesting adding a special ccp symbol > > that would perform two sev commands under the same lock to ensure we > > know the REPORTED_TCB that was used to derive the VCEK that signs an > > attestation report in the MSG_REPORT_REQ guest request. We use that > > atomicity to be sure that when we exit to user space to request > > certificates that we're getting the right version certificates. > > > > > > > > Can you explain the race that you are trying to close, with the exact "bad" sequence > > > of events laid out in chronological order, and an explanation of why the race can't > > > be sovled in userspace? I read through your previous comment[*] (which I assume > > > is the race you want to close?), but I couldn't quite piece together exactly what's > > > broken. > > Hi Dionna, > > > > > 1. the control plane delivers a firmware update. Current TCB version > > goes up. The machine signals that it needs new certificates before it > > can commit. > > 2. VM performs an extended guest request. > > 3. KVM exits to user space to get certificates before getting the > > report from firmware. > > 4. [what I understand Michael Roth was suggesting] User space grabs a > > file lock to see if it can read the cached certificates. It reads the > > certificates and releases the lock before returning to KVM. > > 5. the control plane delivers the certificates to the machine and > > tells it to commit. The machine grabs the certificate file lock, runs > > SNP_COMMIT, and releases the file lock. This command updates both > > COMMITTED_TCB and REPORTED_TCB. > > 6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's > > a different REPORTED_TCB. > > 7. Guest receives the wrong certificates for certifying the report it > > just received. > > > > The fact that 4 has to release the lock before getting the attestation > > report is the problem. > > We wouldn't actually release the lock before getting the attestation > report. There's more specifics on the suggested flow in the documentation > update accompanying this patch: > > + NOTE: In the case of SEV-SNP, the endorsement key used by firmware may > + change as a result of management activities like updating SEV-SNP firmware > + or loading new endorsement keys, so some care should be taken to keep the > + returned certificate data in sync with the actual endorsement key in use by > + firmware at the time the attestation request is sent to SNP firmware. The > + recommended scheme to do this is: > + > + - The VMM should obtain a shared or exclusive lock on the path the > + certificate blob file resides at before reading it and returning it to > + KVM, and continue to hold the lock until the attestation request is > + actually sent to firmware. To facilitate this, the VMM can set the > + ``immediate_exit`` flag of kvm_run just after supplying the certificate > + data, and just before and resuming the vCPU. This will ensure the vCPU > + will exit again to userspace with ``-EINTR`` after it finishes fetching > + the attestation request from firmware, at which point the VMM can > + safely drop the file lock. > + > + - Tools/libraries that perform updates to SNP firmware TCB values or > + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, > + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see > + Documentation/virt/coco/sev-guest.rst for more details) in such a way > + that the certificate blob needs to be updated, should similarly take an > + exclusive lock on the certificate blob for the duration of any updates > + to endorsement keys or the certificate blob contents to ensure that > + VMMs using the above scheme will not return certificate blob data that > + is out of sync with the endorsement key used by firmware. > > So #5 would not be able to obtain an exclusive file lock until userspace > receives confirmation that the attestation request was processed by > firmware. At that point it will be an accurate reflection of the > attestation state associated with that particular version of the > certificates that was fetched from userspace. So at that point the, > transaction is done at that point and userspace can safely release the lock. > Thanks for the clarification. I'll need to understand this pathway better in our VMM to test this patch series effectively. Will get back to you. > -Mike > > > If we instead get the report and know what the REPORTED_TCB was when > > serving that request, then we can exit to user space requesting the > > certificates for the report in hand. > > A concurrent update can update the reported_tcb like in the above > > scenario, but it won't interfere with certificates since the machine > > should have certificates for both TCB_VERSIONs to provide until the > > commit is complete. > > > > I don't think it's workable to have 1 grab the file lock and for 5 to > > release it. Waiting for a service to update stale certificates should > > not block user attestation requests. It would make 4's failure to get > > the lock return VMM_BUSY and eventually cause attestations to time out > > in sev-guest. > > > > > > > > [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@mail.gmail.com > > > > > > > > -- > > -Dionna Glaze, PhD, CISSP, CCSP (she/her)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index ecfa25b505e7..2eea9828d9aa 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the primary storage for certain register types. Therefore, the kernel may use the values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. +:: + + /* KVM_EXIT_COCO */ + struct kvm_exit_coco { + #define KVM_EXIT_COCO_REQ_CERTS 0 + #define KVM_EXIT_COCO_MAX 1 + __u8 nr; + __u8 pad0[7]; + union { + struct { + __u64 gfn; + __u32 npages; + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) + __u32 ret; + } req_certs; + }; + }; + +KVM_EXIT_COCO events are intended to handle cases where a confidential +VM requires some action on the part of userspace, or cases where userspace +needs to be informed of some activity relating to a confidential VM. + +A `kvm_exit_coco` structure is defined to encapsulate the data to be sent to +or returned by userspace. The `nr` field defines the specific type of event +that needs to be serviced, and that type is used as a discriminator to +determine which union type should be used for input/output. + +The parameters for each of these event/union types are documented below: + + - ``KVM_EXIT_COCO_REQ_CERTS`` + + This event provides a way to request certificate data from userspace and + have it written into guest memory. This is intended primarily to handle + attestation requests made by SEV-SNP guests (using the Extended Guest + Requests GHCB command as defined by the GHCB 2.0 specification for SEV-SNP + guests), where additional certificate data corresponding to the + endorsement key used by firmware to sign an attestation report can be + optionally provided by userspace to pass along to the guest together with + the firmware-provided attestation report. + + In the case of ``KVM_EXIT_COCO_REQ_CERTS`` events, the `req_certs` union + type is used. KVM will supply in `gfn` the non-private guest page that + userspace should use to write the contents of certificate data. In the + case of SEV-SNP, the format of this certificate data is defined in the + GHCB 2.0 specification (see section "SNP Extended Guest Request"). KVM + will also supply in `npages` the number of contiguous pages available + for writing the certificate data into. + + - If the supplied number of pages is sufficient, userspace must write + the certificate table blob (in the format defined by the GHCB spec) + into the address corresponding to `gfn` and set `ret` to 0 to indicate + success. If no certificate data is available, then userspace can + either write an empty certificate table into the address corresponding + to `gfn`, or it can disable ``KVM_EXIT_COCO_REQ_CERTS`` (via + ``KVM_CAP_EXIT_COCO``), in which case KVM will handle returning an + empty certificate table to the guest. + + - If the number of pages supplied is not sufficient, userspace must set + the required number of pages in `npages` and then set `ret` to + ``KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN``. + + - If some other error occurred, userspace must set `ret` to + ``KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC``. + + NOTE: In the case of SEV-SNP, the endorsement key used by firmware may + change as a result of management activities like updating SEV-SNP firmware + or loading new endorsement keys, so some care should be taken to keep the + returned certificate data in sync with the actual endorsement key in use by + firmware at the time the attestation request is sent to SNP firmware. The + recommended scheme to do this is: + + - The VMM should obtain a shared or exclusive lock on the path the + certificate blob file resides at before reading it and returning it to + KVM, and continue to hold the lock until the attestation request is + actually sent to firmware. To facilitate this, the VMM can set the + ``immediate_exit`` flag of kvm_run just after supplying the certificate + data, and just before and resuming the vCPU. This will ensure the vCPU + will exit again to userspace with ``-EINTR`` after it finishes fetching + the attestation request from firmware, at which point the VMM can + safely drop the file lock. + + - Tools/libraries that perform updates to SNP firmware TCB values or + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see + Documentation/virt/coco/sev-guest.rst for more details) in such a way + that the certificate blob needs to be updated, should similarly take an + exclusive lock on the certificate blob for the duration of any updates + to endorsement keys or the certificate blob contents to ensure that + VMMs using the above scheme will not return certificate blob data that + is out of sync with the endorsement key used by firmware. 6. Capabilities that can be enabled on vCPUs ============================================ @@ -8895,6 +8986,24 @@ Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in production. The behavior and effective ABI for software-protected VMs is unstable. +8.42 KVM_CAP_EXIT_COCO +---------------------- + +:Capability: KVM_CAP_EXIT_COCO +:Architectures: x86 +:Type: vm + +This capability, if enabled, will cause KVM to exit to userspace with +KVM_EXIT_COCO exit reason to process certain events related to confidential +guests. + +Calling KVM_CHECK_EXTENSION for this capability will return a bitmask of +KVM_EXIT_COCO event types that can be configured to exit to userspace. + +The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset +of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace +the event types whose corresponding bit is in the argument. + 9. Known KVM API problems ========================= diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cef323c801f2..4b90208f9df0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1429,6 +1429,7 @@ struct kvm_arch { struct kvm_x86_msr_filter __rcu *msr_filter; u32 hypercall_exit_enabled; + u64 coco_exit_enabled; /* Guest can access the SGX PROVISIONKEY. */ bool sgx_provisioning_allowed; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a6968eadd418..94c3a82b02c7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -125,6 +125,8 @@ static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS; #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK) +#define KVM_EXIT_COCO_VALID_MASK 0 + static void update_cr8_intercept(struct kvm_vcpu *vcpu); static void process_nmi(struct kvm_vcpu *vcpu); static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); @@ -4826,6 +4828,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_TYPES: r = kvm_caps.supported_vm_types; break; + case KVM_CAP_EXIT_COCO: + r = KVM_EXIT_COCO_VALID_MASK; + break; default: break; } @@ -6748,6 +6753,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; + case KVM_CAP_EXIT_COCO: + if (cap->args[0] & ~KVM_EXIT_COCO_VALID_MASK) { + r = -EINVAL; + break; + } + kvm->arch.coco_exit_enabled = cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index e5af8c692dc0..8a3a76679224 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -135,6 +135,22 @@ struct kvm_xen_exit { } u; }; +struct kvm_exit_coco { +#define KVM_EXIT_COCO_REQ_CERTS 0 +#define KVM_EXIT_COCO_MAX 1 + __u8 nr; + __u8 pad0[7]; + union { + struct { + __u64 gfn; + __u32 npages; +#define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 +#define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) + __u32 ret; + } req_certs; + }; +}; + #define KVM_S390_GET_SKEYS_NONE 1 #define KVM_S390_SKEYS_MAX 1048576 @@ -178,6 +194,7 @@ struct kvm_xen_exit { #define KVM_EXIT_NOTIFY 37 #define KVM_EXIT_LOONGARCH_IOCSR 38 #define KVM_EXIT_MEMORY_FAULT 39 +#define KVM_EXIT_COCO 40 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -433,6 +450,8 @@ struct kvm_run { __u64 gpa; __u64 size; } memory_fault; + /* KVM_EXIT_COCO */ + struct kvm_exit_coco coco; /* Fix the size of the union. */ char padding[256]; }; @@ -918,6 +937,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 #define KVM_CAP_PRE_FAULT_MEMORY 236 +#define KVM_CAP_EXIT_COCO 237 struct kvm_irq_routing_irqchip { __u32 irqchip;
Confidential VMs have a number of additional requirements on the host side which might involve interactions with userspace. One such case is with SEV-SNP guests, where the host can optionally provide a certificate table to the guest when it issues an attestation request to firmware (see GHCB 2.0 specification regarding "SNP Extended Guest Requests"). This certificate table can then be used to verify the endorsement key used by firmware to sign the attestation report. While it is possible for guests to obtain the certificates through other means, handling it via the host provides more flexibility in being able to keep the certificate data in sync with the endorsement key throughout host-side operations that might resulting in the endorsement key changing. In the case of KVM, userspace will be responsible for fetching the certificate table and keeping it in sync with any modifications to the endorsement key. Define a new KVM_EXIT_* event where userspace is provided with the GPA of the buffer the guest has provided as part of the attestation request so that userspace can write the certificate data into it. Since there is potential for additional CoCo-related events in the future, introduce this in the form of a more general KVM_EXIT_COCO exit type that handles multiple sub-types, similarly to KVM_EXIT_HYPERCALL, and then define a KVM_EXIT_COCO_REQ_CERTS sub-type to handle the actual certificate-fetching mentioned above. Also introduce a KVM_CAP_EXIT_COCO capability to enable/disable individual sub-types, similarly to KVM_CAP_EXIT_HYPERCALL. Since actual support for KVM_EXIT_COCO_REQ_CERTS will be enabled in a subsequent patch, don't yet allow it to be enabled. Signed-off-by: Michael Roth <michael.roth@amd.com> --- Documentation/virt/kvm/api.rst | 109 ++++++++++++++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 13 ++++ include/uapi/linux/kvm.h | 20 ++++++ 4 files changed, 143 insertions(+)