Message ID | 20210707183616.5620-39-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand |
On Wed, Jul 07, 2021, Brijesh Singh wrote: > Version 2 of GHCB specification added the support two SNP Guest Request > Message NAE event. The events allows for an SEV-SNP guest to make request > to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST > API define in the SEV-SNP firmware specification. IIUC, this snippet in the spec means KVM can't restrict what requests are made by the guests. If so, that makes it difficult to detect/ratelimit a misbehaving guest, and also limits our options if there are firmware issues (hopefully there aren't). E.g. ratelimiting a guest after KVM has explicitly requested it to migrate is not exactly desirable. The hypervisor cannot alter the messages without detection nor read the plaintext of the messages. > The SNP_GUEST_REQUEST requires two unique pages, one page for the request > and one page for the response. The response page need to be in the firmware > state. The GHCB specification says that both the pages need to be in the > hypervisor state but before executing the SEV-SNP command the response page > need to be in the firmware state. ... > Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP > feature, set the hypervisor feature to advertise it. It would helpful if this changelog listed the Guest Requests that are required for "base" SNP, e.g. to provide some insight as to why we care about guest requests. > static int snp_bind_asid(struct kvm *kvm, int *error) > @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) > if (rc) > goto e_free_context; > > + /* Used for rate limiting SNP guest message request, use the default settings */ > + ratelimit_default_init(&sev->snp_guest_msg_rs); Is this exposed to userspace in any way? This feels very much like a knob that needs to be configurable per-VM. Also, what are the estimated latencies of a guest request? If the worst case latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole lot. > +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb, > + gpa_t req_gpa, gpa_t resp_gpa) > +{ > + struct sev_data_snp_guest_request data = {}; > + struct kvm_vcpu *vcpu = &svm->vcpu; > + struct kvm *kvm = vcpu->kvm; > + struct kvm_sev_info *sev; > + int rc, err = 0; > + > + if (!sev_snp_guest(vcpu->kvm)) { > + rc = -ENODEV; > + goto e_fail; > + } > + > + sev = &to_kvm_svm(kvm)->sev_info; > + > + if (!__ratelimit(&sev->snp_guest_msg_rs)) { > + pr_info_ratelimited("svm: too many guest message requests\n"); > + rc = -EAGAIN; What guarantee do we have that the guest actually understands -EAGAIN? Ditto for -EINVAL returned by snp_build_guest_buf(). AFAICT, our options are to return one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST" of the firmware ABI, kill the guest, or ratelimit the guest without returning control to the guest. > + goto e_fail; > + } > + > + rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa); > + if (rc) > + goto e_fail; > + > + sev = &to_kvm_svm(kvm)->sev_info; > + > + mutex_lock(&kvm->lock); Question on the VMPCK sequences. The firmware ABI says: Each guest has four VMPCKs ... Each message contains a sequence number per VMPCK. The sequence number is incremented with each message sent. Messages sent by the guest to the firmware and by the firmware to the guest must be delivered in order. If not, the firmware will reject subsequent messages ... Does that mean there are four independent sequences, i.e. four streams the guest can use "concurrently", or does it mean the overall freshess/integrity check is composed from four VMPCK sequences, all of which must be correct for the message to be valid? If it's the latter, then a traditional mutex isn't really necessary because the guest must implement its own serialization, e.g. it's own mutex or whatever, to ensure there is at most one request in-flight at any given time. And on the KVM side it means KVM can simpy reject requests if there is already an in-flight request. It might also give us more/better options for ratelimiting? > + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err); > + if (rc) { > + mutex_unlock(&kvm->lock); I suspect you reused this pattern from other, more complex code, but here it's overkill. E.g. if (!rc) rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); else if (err) rc = err; mutex_unlock(&kvm->lock); ghcb_set_sw_exit_info_2(ghcb, rc); > + /* If we have a firmware error code then use it. */ > + if (err) > + rc = err; > + > + goto e_fail; > + } > + > + /* Copy the response after the firmware returns success. */ > + rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); > + > + mutex_unlock(&kvm->lock); > + > +e_fail: > + ghcb_set_sw_exit_info_2(ghcb, rc); > +} > + > +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb, > + gpa_t req_gpa, gpa_t resp_gpa) > +{ > + struct sev_data_snp_guest_request req = {}; > + struct kvm_vcpu *vcpu = &svm->vcpu; > + struct kvm *kvm = vcpu->kvm; > + unsigned long data_npages; > + struct kvm_sev_info *sev; > + unsigned long err; > + u64 data_gpa; > + int rc; > + > + if (!sev_snp_guest(vcpu->kvm)) { > + rc = -ENODEV; > + goto e_fail; > + } > + > + sev = &to_kvm_svm(kvm)->sev_info; > + > + if (!__ratelimit(&sev->snp_guest_msg_rs)) { > + pr_info_ratelimited("svm: too many guest message requests\n"); > + rc = -EAGAIN; > + goto e_fail; > + } > + > + if (!sev->snp_certs_data) { > + pr_err("svm: certs data memory is not allocated\n"); > + rc = -EFAULT; Another instance where the kernel's error numbers will not suffice. > + goto e_fail; > + } > + > + data_gpa = ghcb_get_rax(ghcb); > + data_npages = ghcb_get_rbx(ghcb);
On 7/19/21 5:50 PM, Sean Christopherson wrote: ... > > IIUC, this snippet in the spec means KVM can't restrict what requests are made > by the guests. If so, that makes it difficult to detect/ratelimit a misbehaving > guest, and also limits our options if there are firmware issues (hopefully there > aren't). E.g. ratelimiting a guest after KVM has explicitly requested it to > migrate is not exactly desirable. > The guest message page contains a message header followed by the encrypted payload. So, technically KVM can peek into the message header format to determine the message request type. If needed, we can ratelimit based on the message type. In the current series we don't support migration etc so I decided to ratelimit unconditionally. ... > >> Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP >> feature, set the hypervisor feature to advertise it. > > It would helpful if this changelog listed the Guest Requests that are required > for "base" SNP, e.g. to provide some insight as to why we care about guest > requests. > Sure, I'll add more. >> static int snp_bind_asid(struct kvm *kvm, int *error) >> @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) >> if (rc) >> goto e_free_context; >> >> + /* Used for rate limiting SNP guest message request, use the default settings */ >> + ratelimit_default_init(&sev->snp_guest_msg_rs); > > Is this exposed to userspace in any way? This feels very much like a knob that > needs to be configurable per-VM. > It's not exposed to the userspace and I am not sure if userspace care about this knob. > Also, what are the estimated latencies of a guest request? If the worst case > latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole > lot. > The latency will depend on what else is going in the system at the time the request comes to the hypervisor. Access to the PSP is serialized so other parallel PSP command execution will contribute to the latency. ... >> + >> + if (!__ratelimit(&sev->snp_guest_msg_rs)) { >> + pr_info_ratelimited("svm: too many guest message requests\n"); >> + rc = -EAGAIN; > > What guarantee do we have that the guest actually understands -EAGAIN? Ditto > for -EINVAL returned by snp_build_guest_buf(). AFAICT, our options are to return > one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST" > of the firmware ABI, kill the guest, or ratelimit the guest without returning > control to the guest. > Yes, let me look into passing one of the status code defined in the spec. >> + goto e_fail; >> + } >> + >> + rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa); >> + if (rc) >> + goto e_fail; >> + >> + sev = &to_kvm_svm(kvm)->sev_info; >> + >> + mutex_lock(&kvm->lock); > > Question on the VMPCK sequences. The firmware ABI says: > > Each guest has four VMPCKs ... Each message contains a sequence number per > VMPCK. The sequence number is incremented with each message sent. Messages > sent by the guest to the firmware and by the firmware to the guest must be > delivered in order. If not, the firmware will reject subsequent messages ... > > Does that mean there are four independent sequences, i.e. four streams the guest > can use "concurrently", or does it mean the overall freshess/integrity check is > composed from four VMPCK sequences, all of which must be correct for the message > to be valid? > There are four independent sequence counter and in theory guest can use them concurrently. But the access to the PSP must be serialized. Currently, the guest driver uses the VMPCK0 key to communicate with the PSP. > If it's the latter, then a traditional mutex isn't really necessary because the > guest must implement its own serialization, e.g. it's own mutex or whatever, to > ensure there is at most one request in-flight at any given time. The guest driver uses the its own serialization to ensure that there is *exactly* one request in-flight. The mutex used here is to protect the KVM's internal firmware response buffer. And on the KVM > side it means KVM can simpy reject requests if there is already an in-flight > request. It might also give us more/better options for ratelimiting? > I don't think we should be running into this scenario unless there is a bug in the guest kernel. The guest kernel support and CCP driver both ensure that request to the PSP is serialized. In normal operation we may see 1 to 2 quest requests for the entire guest lifetime. I am thinking first request maybe for the attestation report and second maybe to derive keys etc. It may change slightly when we add the migration command; I have not looked into a great detail yet. thanks
On Tue, Jul 20, 2021, Brijesh Singh wrote: > > On 7/19/21 5:50 PM, Sean Christopherson wrote: > ... > > > > IIUC, this snippet in the spec means KVM can't restrict what requests are made > > by the guests. If so, that makes it difficult to detect/ratelimit a misbehaving > > guest, and also limits our options if there are firmware issues (hopefully there > > aren't). E.g. ratelimiting a guest after KVM has explicitly requested it to > > migrate is not exactly desirable. > > > > The guest message page contains a message header followed by the encrypted > payload. So, technically KVM can peek into the message header format to > determine the message request type. If needed, we can ratelimit based on the > message type. Ah, I got confused by this code in snp_build_guest_buf(): data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT); I was thinking that setting the C-bit meant the memory was guest private, but that's setting the C-bit for the HPA, which is correct since KVM installs guest memory with C-bit=1 in the NPT, i.e. encrypts shared memory with the host key. Tangetially related question, is it correct to say that the host can _read_ memory from a page that is assigned=1, but has asid=0? I.e. KVM can read the response page in order to copy it into the guest, even though it is a firmware page? /* Copy the response after the firmware returns success. */ rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); > In the current series we don't support migration etc so I decided to > ratelimit unconditionally. Since KVM can peek at the request header, KVM should flat out disallow requests that KVM doesn't explicitly support. E.g. migration requests should not be sent to the PSP. One concern though: How does the guest query what requests are supported? This snippet implies there's some form of enumeration: Note: This guest message may be removed in future versions as it is redundant with the CPUID page in SNP_LAUNCH_UPDATE (see Section 8.14). But all I can find is a "Message Version" in "Table 94. Message Type Encodings", which implies that request support is all or nothing for a given version. That would be rather unfortunate as KVM has no way to tell the guest that something is unsupported :-( > > Is this exposed to userspace in any way? This feels very much like a knob that > > needs to be configurable per-VM. > > It's not exposed to the userspace and I am not sure if userspace care about > this knob. Userspace definitely cares, otherwise the system would need to be rebooted just to tune the ratelimiting. And userspace may want to disable ratelimiting entirely, e.g. if the entire system is dedicated to a single VM. > > Also, what are the estimated latencies of a guest request? If the worst case > > latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole > > lot. > > > > The latency will depend on what else is going in the system at the time the > request comes to the hypervisor. Access to the PSP is serialized so other > parallel PSP command execution will contribute to the latency. I get that it will be variable, but what are some ballpark latencies? E.g. what's the latency of the slowest command without PSP contention? > > Question on the VMPCK sequences. The firmware ABI says: > > > > Each guest has four VMPCKs ... Each message contains a sequence number per > > VMPCK. The sequence number is incremented with each message sent. Messages > > sent by the guest to the firmware and by the firmware to the guest must be > > delivered in order. If not, the firmware will reject subsequent messages ... > > > > Does that mean there are four independent sequences, i.e. four streams the guest > > can use "concurrently", or does it mean the overall freshess/integrity check is > > composed from four VMPCK sequences, all of which must be correct for the message > > to be valid? > > > > There are four independent sequence counter and in theory guest can use them > concurrently. But the access to the PSP must be serialized. Technically that's not required from the guest's perspective, correct? The guest only cares about the sequence numbers for a given VMPCK, e.g. it can have one in-flight request per VMPCK and expect that to work, even without fully serializing its own requests. Out of curiosity, why 4 VMPCKs? It seems completely arbitrary. > Currently, the guest driver uses the VMPCK0 key to communicate with the PSP. > > > > If it's the latter, then a traditional mutex isn't really necessary because the > > guest must implement its own serialization, e.g. it's own mutex or whatever, to > > ensure there is at most one request in-flight at any given time. > > The guest driver uses the its own serialization to ensure that there is > *exactly* one request in-flight. But KVM can't rely on that because it doesn't control the guest, e.g. it may be running a non-Linux guest. > The mutex used here is to protect the KVM's internal firmware response > buffer. Ya, where I was going with my question was that if the guest was architecturally restricted to a single in-flight request, then KVM could do something like this instead of taking kvm->lock (bad pseudocode): if (test_and_set(sev->guest_request)) { rc = AEAD_OFLOW; goto fail; } <do request> clear_bit(...) I.e. multiple in-flight requests can't work because the guest can guarantee ordering between vCPUs. But, because the guest can theoretically have up to four in-flight requests, it's not that simple. The reason I'm going down this path is that taking kvm->lock inside vcpu->mutex violates KVM's locking rules, i.e. is susceptibl to deadlocks. Per kvm/locking.rst, - kvm->lock is taken outside vcpu->mutex That means a different mutex is needed to protect the guest request pages. > > And on the KVM side it means KVM can simpy reject requests if there is > > already an in-flight request. It might also give us more/better options > > for ratelimiting? > > I don't think we should be running into this scenario unless there is a bug > in the guest kernel. The guest kernel support and CCP driver both ensure > that request to the PSP is serialized. Again, what the Linux kernel does is irrelevant. What matters is what is architecturally allowed. > In normal operation we may see 1 to 2 quest requests for the entire guest > lifetime. I am thinking first request maybe for the attestation report and > second maybe to derive keys etc. It may change slightly when we add the > migration command; I have not looked into a great detail yet.
On 7/20/21 11:28 AM, Sean Christopherson wrote: > > Ah, I got confused by this code in snp_build_guest_buf(): > > data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT); > > I was thinking that setting the C-bit meant the memory was guest private, but > that's setting the C-bit for the HPA, which is correct since KVM installs guest > memory with C-bit=1 in the NPT, i.e. encrypts shared memory with the host key. > > Tangetially related question, is it correct to say that the host can _read_ memory > from a page that is assigned=1, but has asid=0? I.e. KVM can read the response > page in order to copy it into the guest, even though it is a firmware page? > Yes. The firmware page means that x86 cannot write to it; the read is still allowed. > /* Copy the response after the firmware returns success. */ > rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); > >> In the current series we don't support migration etc so I decided to >> ratelimit unconditionally. > > Since KVM can peek at the request header, KVM should flat out disallow requests > that KVM doesn't explicitly support. E.g. migration requests should not be sent > to the PSP. > That is acceptable. > One concern though: How does the guest query what requests are supported? This > snippet implies there's some form of enumeration: > > Note: This guest message may be removed in future versions as it is redundant > with the CPUID page in SNP_LAUNCH_UPDATE (see Section 8.14). > > But all I can find is a "Message Version" in "Table 94. Message Type Encodings", > which implies that request support is all or nothing for a given version. That > would be rather unfortunate as KVM has no way to tell the guest that something > is unsupported :-( > The firmware supports all the commands listed in the spec. The HV support is always going to be behind what a firmware or hardware is capable of doing. As per the spec is concerned, it say The firmware checks that MSG_TYPE is a valid message type. The firmware then checks that MSG_SIZE is large enough to hold the indicated message type at the indicated message version. If not, the firmware returns INVALID_PARAM. So, a hypervisor could potentially send the INVALID_PARAMS to indicate that guest that a message type is not supported. >>> Is this exposed to userspace in any way? This feels very much like a knob that >>> needs to be configurable per-VM. >> >> It's not exposed to the userspace and I am not sure if userspace care about >> this knob. > > Userspace definitely cares, otherwise the system would need to be rebooted just to > tune the ratelimiting. And userspace may want to disable ratelimiting entirely, > e.g. if the entire system is dedicated to a single VM. Ok. > >>> Also, what are the estimated latencies of a guest request? If the worst case >>> latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole >>> lot. >>> >> >> The latency will depend on what else is going in the system at the time the >> request comes to the hypervisor. Access to the PSP is serialized so other >> parallel PSP command execution will contribute to the latency. > > I get that it will be variable, but what are some ballpark latencies? E.g. what's > the latency of the slowest command without PSP contention? > In my single VM, I am seeing latency of guest request to be around ~35ms. >>> Question on the VMPCK sequences. The firmware ABI says: >>> >>> Each guest has four VMPCKs ... Each message contains a sequence number per >>> VMPCK. The sequence number is incremented with each message sent. Messages >>> sent by the guest to the firmware and by the firmware to the guest must be >>> delivered in order. If not, the firmware will reject subsequent messages ... >>> >>> Does that mean there are four independent sequences, i.e. four streams the guest >>> can use "concurrently", or does it mean the overall freshess/integrity check is >>> composed from four VMPCK sequences, all of which must be correct for the message >>> to be valid? >>> >> >> There are four independent sequence counter and in theory guest can use them >> concurrently. But the access to the PSP must be serialized. > > Technically that's not required from the guest's perspective, correct? Correct. The guest > only cares about the sequence numbers for a given VMPCK, e.g. it can have one > in-flight request per VMPCK and expect that to work, even without fully serializing > its own requests. > > Out of curiosity, why 4 VMPCKs? It seems completely arbitrary. > I believe the thought process was by providing 4 keys it can provide flexibility for each VMPL levels to use a different keys (if they wish). The firmware does not care about the vmpl level during the guest request handling, it just want to know which key is used for encrypting the payload so that he can decrypt and provide the response for it. >> Currently, the guest driver uses the VMPCK0 key to communicate with the PSP. >> >> >>> If it's the latter, then a traditional mutex isn't really necessary because the >>> guest must implement its own serialization, e.g. it's own mutex or whatever, to >>> ensure there is at most one request in-flight at any given time. >> >> The guest driver uses the its own serialization to ensure that there is >> *exactly* one request in-flight. > > But KVM can't rely on that because it doesn't control the guest, e.g. it may be > running a non-Linux guest. > Yes, KVM should not rely on it. I mentioned that mainly because you said that guest must implement its own serialization. In the case of KVM, the CCP driver ensure that the command sent to the PSP is serialized. >> The mutex used here is to protect the KVM's internal firmware response >> buffer. > > Ya, where I was going with my question was that if the guest was architecturally > restricted to a single in-flight request, then KVM could do something like this > instead of taking kvm->lock (bad pseudocode): > > if (test_and_set(sev->guest_request)) { > rc = AEAD_OFLOW; > goto fail; > } > > <do request> > > clear_bit(...) > > I.e. multiple in-flight requests can't work because the guest can guarantee > ordering between vCPUs. But, because the guest can theoretically have up to four > in-flight requests, it's not that simple. > > The reason I'm going down this path is that taking kvm->lock inside vcpu->mutex > violates KVM's locking rules, i.e. is susceptibl to deadlocks. Per kvm/locking.rst, > > - kvm->lock is taken outside vcpu->mutex > > That means a different mutex is needed to protect the guest request pages. > Ah, I see your point on the locking. From architecturally a guest can issue multiple requests in parallel. It sounds like having a separate lock to protect the guest request pages makes sense. -Brijesh
On Tue, Jul 20, 2021, Brijesh Singh wrote: > > On 7/20/21 11:28 AM, Sean Christopherson wrote: > > Out of curiosity, why 4 VMPCKs? It seems completely arbitrary. > > > > I believe the thought process was by providing 4 keys it can provide > flexibility for each VMPL levels to use a different keys (if they wish). The > firmware does not care about the vmpl level during the guest request > handling, it just want to know which key is used for encrypting the payload > so that he can decrypt and provide the response for it. Ah, I forgot about VMPLs. That makes sense. Thanks!
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 53a60edc810e..4cb4c1d7e444 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -18,6 +18,8 @@ #include <linux/processor.h> #include <linux/trace_events.h> #include <linux/sev.h> +#include <linux/kvm_host.h> +#include <linux/sev-guest.h> #include <asm/fpu/internal.h> #include <asm/trapnr.h> @@ -1534,6 +1536,7 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp) static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp) { + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; struct sev_data_snp_gctx_create data = {}; void *context; int rc; @@ -1543,14 +1546,24 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp) if (!context) return NULL; - data.gctx_paddr = __psp_pa(context); - rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error); - if (rc) { + /* Allocate a firmware buffer used during the guest command handling. */ + sev->snp_resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT); + if (!sev->snp_resp_page) { snp_free_firmware_page(context); return NULL; } + data.gctx_paddr = __psp_pa(context); + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error); + if (rc) + goto e_free; + return context; + +e_free: + snp_free_firmware_page(context); + snp_free_firmware_page(sev->snp_resp_page); + return NULL; } static int snp_bind_asid(struct kvm *kvm, int *error) @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) if (rc) goto e_free_context; + /* Used for rate limiting SNP guest message request, use the default settings */ + ratelimit_default_init(&sev->snp_guest_msg_rs); + + /* Allocate memory used for the certs data in SNP guest request */ + sev->snp_certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT); + return 0; e_free_context: @@ -2218,6 +2237,9 @@ static int snp_decommission_context(struct kvm *kvm) snp_free_firmware_page(sev->snp_context); sev->snp_context = NULL; + /* Free the response page. */ + snp_free_firmware_page(sev->snp_resp_page); + return 0; } @@ -2268,6 +2290,9 @@ void sev_vm_destroy(struct kvm *kvm) sev_unbind_asid(kvm, sev->handle); } + + kfree(sev->snp_certs_data); + sev_asid_free(sev); } @@ -2663,6 +2688,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) case SVM_VMGEXIT_UNSUPPORTED_EVENT: case SVM_VMGEXIT_HV_FT: case SVM_VMGEXIT_PSC: + case SVM_VMGEXIT_GUEST_REQUEST: + case SVM_VMGEXIT_EXT_GUEST_REQUEST: break; default: goto vmgexit_err; @@ -3053,6 +3080,181 @@ static unsigned long snp_handle_psc(struct vcpu_svm *svm, struct ghcb *ghcb) return rc ? map_to_psc_vmgexit_code(rc) : 0; } +static int snp_build_guest_buf(struct vcpu_svm *svm, struct sev_data_snp_guest_request *data, + gpa_t req_gpa, gpa_t resp_gpa) +{ + struct kvm_vcpu *vcpu = &svm->vcpu; + struct kvm *kvm = vcpu->kvm; + kvm_pfn_t req_pfn, resp_pfn; + struct kvm_sev_info *sev; + + if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE)) { + pr_err_ratelimited("svm: guest request (%#llx) or response (%#llx) is not page aligned\n", + req_gpa, resp_gpa); + return -EINVAL; + } + + req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa)); + if (is_error_noslot_pfn(req_pfn)) { + pr_err_ratelimited("svm: guest request invalid gpa=%#llx\n", req_gpa); + return -EINVAL; + } + + resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa)); + if (is_error_noslot_pfn(resp_pfn)) { + pr_err_ratelimited("svm: guest response invalid gpa=%#llx\n", resp_gpa); + return -EINVAL; + } + + sev = &to_kvm_svm(kvm)->sev_info; + + data->gctx_paddr = __psp_pa(sev->snp_context); + data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT); + data->res_paddr = __psp_pa(sev->snp_resp_page); + + return 0; +} + +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb, + gpa_t req_gpa, gpa_t resp_gpa) +{ + struct sev_data_snp_guest_request data = {}; + struct kvm_vcpu *vcpu = &svm->vcpu; + struct kvm *kvm = vcpu->kvm; + struct kvm_sev_info *sev; + int rc, err = 0; + + if (!sev_snp_guest(vcpu->kvm)) { + rc = -ENODEV; + goto e_fail; + } + + sev = &to_kvm_svm(kvm)->sev_info; + + if (!__ratelimit(&sev->snp_guest_msg_rs)) { + pr_info_ratelimited("svm: too many guest message requests\n"); + rc = -EAGAIN; + goto e_fail; + } + + rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa); + if (rc) + goto e_fail; + + sev = &to_kvm_svm(kvm)->sev_info; + + mutex_lock(&kvm->lock); + + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err); + if (rc) { + mutex_unlock(&kvm->lock); + + /* If we have a firmware error code then use it. */ + if (err) + rc = err; + + goto e_fail; + } + + /* Copy the response after the firmware returns success. */ + rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); + + mutex_unlock(&kvm->lock); + +e_fail: + ghcb_set_sw_exit_info_2(ghcb, rc); +} + +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb, + gpa_t req_gpa, gpa_t resp_gpa) +{ + struct sev_data_snp_guest_request req = {}; + struct kvm_vcpu *vcpu = &svm->vcpu; + struct kvm *kvm = vcpu->kvm; + unsigned long data_npages; + struct kvm_sev_info *sev; + unsigned long err; + u64 data_gpa; + int rc; + + if (!sev_snp_guest(vcpu->kvm)) { + rc = -ENODEV; + goto e_fail; + } + + sev = &to_kvm_svm(kvm)->sev_info; + + if (!__ratelimit(&sev->snp_guest_msg_rs)) { + pr_info_ratelimited("svm: too many guest message requests\n"); + rc = -EAGAIN; + goto e_fail; + } + + if (!sev->snp_certs_data) { + pr_err("svm: certs data memory is not allocated\n"); + rc = -EFAULT; + goto e_fail; + } + + data_gpa = ghcb_get_rax(ghcb); + data_npages = ghcb_get_rbx(ghcb); + + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) { + pr_err_ratelimited("svm: certs data GPA is not page aligned (%#llx)\n", data_gpa); + rc = -EINVAL; + goto e_fail; + } + + /* Verify that requested blob will fit in our intermediate buffer */ + if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) { + rc = -EINVAL; + goto e_fail; + } + + rc = snp_build_guest_buf(svm, &req, req_gpa, resp_gpa); + if (rc) + goto e_fail; + + mutex_lock(&kvm->lock); + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data, + &data_npages, &err); + if (rc) { + mutex_unlock(&kvm->lock); + + /* + * If buffer length is small then return the expected + * length in rbx. + */ + if (err == SNP_GUEST_REQ_INVALID_LEN) { + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages; + ghcb_set_sw_exit_info_2(ghcb, err); + return; + } + + /* If we have a firmware error code then use it. */ + if (err) + rc = (int)err; + + goto e_fail; + } + + /* Copy the response after the firmware returns success. */ + rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE); + + mutex_unlock(&kvm->lock); + + if (rc) + goto e_fail; + + /* Copy the certificate blob in the guest memory */ + if (data_npages && + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT)) + rc = -EFAULT; + +e_fail: + ghcb_set_sw_exit_info_2(ghcb, rc); +} + static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm) { struct vmcb_control_area *control = &svm->vmcb->control; @@ -3306,6 +3508,21 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu) ghcb_set_sw_exit_info_2(ghcb, rc); break; } + case SVM_VMGEXIT_GUEST_REQUEST: { + snp_handle_guest_request(svm, ghcb, control->exit_info_1, control->exit_info_2); + + ret = 1; + break; + } + case SVM_VMGEXIT_EXT_GUEST_REQUEST: { + snp_handle_ext_guest_request(svm, + ghcb, + control->exit_info_1, + control->exit_info_2); + + ret = 1; + break; + } case SVM_VMGEXIT_UNSUPPORTED_EVENT: vcpu_unimpl(vcpu, "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n", diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index ccdaaa4e1fb1..9fcfc0a51737 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -18,6 +18,7 @@ #include <linux/kvm_types.h> #include <linux/kvm_host.h> #include <linux/bits.h> +#include <linux/ratelimit.h> #include <asm/svm.h> #include <asm/sev-common.h> @@ -68,6 +69,9 @@ struct kvm_sev_info { struct kvm *enc_context_owner; /* Owner of copied encryption context */ struct misc_cg *misc_cg; /* For misc cgroup accounting */ void *snp_context; /* SNP guest context page */ + void *snp_resp_page; /* SNP guest response page */ + struct ratelimit_state snp_guest_msg_rs; /* Rate limit the SNP guest message */ + void *snp_certs_data; }; struct kvm_svm { @@ -550,7 +554,7 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu); #define GHCB_VERSION_MAX 2ULL #define GHCB_VERSION_MIN 1ULL -#define GHCB_HV_FT_SUPPORTED 0 +#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP extern unsigned int max_sev_asid;
Version 2 of GHCB specification added the support two SNP Guest Request Message NAE event. The events allows for an SEV-SNP guest to make request to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification. The SNP_GUEST_REQUEST requires two unique pages, one page for the request and one page for the response. The response page need to be in the firmware state. The GHCB specification says that both the pages need to be in the hypervisor state but before executing the SEV-SNP command the response page need to be in the firmware state. The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the difference of an additional certificate blob that can be passed through the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver exposes snp_guest_ext_guest_request() that is used by the KVM to get the both report and additional data at once. In order to minimize the page state transition during the command handling, pre-allocate a firmware page on guest creation. Use the pre-allocated firmware page to complete the command execution and copy the result in the guest response page. Ratelimit the handling of SNP_GUEST_REQUEST NAE to avoid the possibility of a guest creating a denial of service attack aginst the SNP firmware. Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP feature, set the hypervisor feature to advertise it. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/kvm/svm/sev.c | 223 ++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/svm.h | 6 +- 2 files changed, 225 insertions(+), 4 deletions(-)