diff mbox series

[Part2,RFC,v4,38/40] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

Message ID 20210707183616.5620-39-brijesh.singh@amd.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh July 7, 2021, 6:36 p.m. UTC
Version 2 of GHCB specification added the support two SNP Guest Request
Message NAE event. The events allows for an SEV-SNP guest to make request
to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST
API define in the SEV-SNP firmware specification.

The SNP_GUEST_REQUEST requires two unique pages, one page for the request
and one page for the response. The response page need to be in the firmware
state. The GHCB specification says that both the pages need to be in the
hypervisor state but before executing the SEV-SNP command the response page
need to be in the firmware state.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver exposes
snp_guest_ext_guest_request() that is used by the KVM to get the both
report and additional data at once.

In order to minimize the page state transition during the command handling,
pre-allocate a firmware page on guest creation. Use the pre-allocated
firmware page to complete the command execution and copy the result in the
guest response page.

Ratelimit the handling of SNP_GUEST_REQUEST NAE to avoid the possibility
of a guest creating a denial of service attack aginst the SNP firmware.

Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
feature, set the hypervisor feature to advertise it.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 223 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h |   6 +-
 2 files changed, 225 insertions(+), 4 deletions(-)

Comments

Sean Christopherson July 19, 2021, 10:50 p.m. UTC | #1
On Wed, Jul 07, 2021, Brijesh Singh wrote:
> Version 2 of GHCB specification added the support two SNP Guest Request
> Message NAE event. The events allows for an SEV-SNP guest to make request
> to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST
> API define in the SEV-SNP firmware specification.

IIUC, this snippet in the spec means KVM can't restrict what requests are made
by the guests.  If so, that makes it difficult to detect/ratelimit a misbehaving
guest, and also limits our options if there are firmware issues (hopefully there
aren't).  E.g. ratelimiting a guest after KVM has explicitly requested it to
migrate is not exactly desirable.

  The hypervisor cannot alter the messages without detection nor read the
  plaintext of the messages.

> The SNP_GUEST_REQUEST requires two unique pages, one page for the request
> and one page for the response. The response page need to be in the firmware
> state. The GHCB specification says that both the pages need to be in the
> hypervisor state but before executing the SEV-SNP command the response page
> need to be in the firmware state.
 
...

> Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
> feature, set the hypervisor feature to advertise it.

It would helpful if this changelog listed the Guest Requests that are required
for "base" SNP, e.g. to provide some insight as to why we care about guest
requests.

>  static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	if (rc)
>  		goto e_free_context;
>  
> +	/* Used for rate limiting SNP guest message request, use the default settings */
> +	ratelimit_default_init(&sev->snp_guest_msg_rs);

Is this exposed to userspace in any way?  This feels very much like a knob that
needs to be configurable per-VM.

Also, what are the estimated latencies of a guest request?  If the worst case
latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
lot.

> +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> +				     gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request data = {};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_sev_info *sev;
> +	int rc, err = 0;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = -ENODEV;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> +		pr_info_ratelimited("svm: too many guest message requests\n");
> +		rc = -EAGAIN;

What guarantee do we have that the guest actually understands -EAGAIN?  Ditto
for -EINVAL returned by snp_build_guest_buf().  AFAICT, our options are to return
one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST"
of the firmware ABI, kill the guest, or ratelimit the guest without returning
control to the guest.

> +		goto e_fail;
> +	}
> +
> +	rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa);
> +	if (rc)
> +		goto e_fail;
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	mutex_lock(&kvm->lock);

Question on the VMPCK sequences.  The firmware ABI says:

   Each guest has four VMPCKs ... Each message contains a sequence number per
   VMPCK. The sequence number is incremented with each message sent. Messages
   sent by the guest to the firmware and by the firmware to the guest must be
   delivered in order. If not, the firmware will reject subsequent messages ...

Does that mean there are four independent sequences, i.e. four streams the guest
can use "concurrently", or does it mean the overall freshess/integrity check is
composed from four VMPCK sequences, all of which must be correct for the message
to be valid?

If it's the latter, then a traditional mutex isn't really necessary because the
guest must implement its own serialization, e.g. it's own mutex or whatever, to
ensure there is at most one request in-flight at any given time.  And on the KVM
side it means KVM can simpy reject requests if there is already an in-flight
request.  It might also give us more/better options for ratelimiting?

> +	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +	if (rc) {
> +		mutex_unlock(&kvm->lock);

I suspect you reused this pattern from other, more complex code, but here it's
overkill.  E.g.

	if (!rc)
		rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
	else if (err)
		rc = err;

	mutex_unlock(&kvm->lock);

	ghcb_set_sw_exit_info_2(ghcb, rc);

> +		/* If we have a firmware error code then use it. */
> +		if (err)
> +			rc = err;
> +
> +		goto e_fail;
> +	}
> +
> +	/* Copy the response after the firmware returns success. */
> +	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> +
> +	mutex_unlock(&kvm->lock);
> +
> +e_fail:
> +	ghcb_set_sw_exit_info_2(ghcb, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> +					 gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request req = {};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	unsigned long data_npages;
> +	struct kvm_sev_info *sev;
> +	unsigned long err;
> +	u64 data_gpa;
> +	int rc;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = -ENODEV;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> +		pr_info_ratelimited("svm: too many guest message requests\n");
> +		rc = -EAGAIN;
> +		goto e_fail;
> +	}
> +
> +	if (!sev->snp_certs_data) {
> +		pr_err("svm: certs data memory is not allocated\n");
> +		rc = -EFAULT;

Another instance where the kernel's error numbers will not suffice.

> +		goto e_fail;
> +	}
> +
> +	data_gpa = ghcb_get_rax(ghcb);
> +	data_npages = ghcb_get_rbx(ghcb);
Brijesh Singh July 20, 2021, 2:37 p.m. UTC | #2
On 7/19/21 5:50 PM, Sean Christopherson wrote:
...
> 
> IIUC, this snippet in the spec means KVM can't restrict what requests are made
> by the guests.  If so, that makes it difficult to detect/ratelimit a misbehaving
> guest, and also limits our options if there are firmware issues (hopefully there
> aren't).  E.g. ratelimiting a guest after KVM has explicitly requested it to
> migrate is not exactly desirable.
> 

The guest message page contains a message header followed by the 
encrypted payload. So, technically KVM can peek into the message header 
format to determine the message request type. If needed, we can 
ratelimit based on the message type.

In the current series we don't support migration etc so I decided to 
ratelimit unconditionally.

...
> 
>> Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
>> feature, set the hypervisor feature to advertise it.
> 
> It would helpful if this changelog listed the Guest Requests that are required
> for "base" SNP, e.g. to provide some insight as to why we care about guest
> requests.
> 

Sure, I'll add more.


>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>> @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   	if (rc)
>>   		goto e_free_context;
>>   
>> +	/* Used for rate limiting SNP guest message request, use the default settings */
>> +	ratelimit_default_init(&sev->snp_guest_msg_rs);
> 
> Is this exposed to userspace in any way?  This feels very much like a knob that
> needs to be configurable per-VM.
> 

It's not exposed to the userspace and I am not sure if userspace care 
about this knob.


> Also, what are the estimated latencies of a guest request?  If the worst case
> latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
> lot.
> 

The latency will depend on what else is going in the system at the time 
the request comes to the hypervisor. Access to the PSP is serialized so 
other parallel PSP command execution will contribute to the latency.

...
>> +
>> +	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
>> +		pr_info_ratelimited("svm: too many guest message requests\n");
>> +		rc = -EAGAIN;
> 
> What guarantee do we have that the guest actually understands -EAGAIN?  Ditto
> for -EINVAL returned by snp_build_guest_buf().  AFAICT, our options are to return
> one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST"
> of the firmware ABI, kill the guest, or ratelimit the guest without returning
> control to the guest.
> 

Yes, let me look into passing one of the status code defined in the spec.

>> +		goto e_fail;
>> +	}
>> +
>> +	rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa);
>> +	if (rc)
>> +		goto e_fail;
>> +
>> +	sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> +	mutex_lock(&kvm->lock);
> 
> Question on the VMPCK sequences.  The firmware ABI says:
> 
>     Each guest has four VMPCKs ... Each message contains a sequence number per
>     VMPCK. The sequence number is incremented with each message sent. Messages
>     sent by the guest to the firmware and by the firmware to the guest must be
>     delivered in order. If not, the firmware will reject subsequent messages ...
> 
> Does that mean there are four independent sequences, i.e. four streams the guest
> can use "concurrently", or does it mean the overall freshess/integrity check is
> composed from four VMPCK sequences, all of which must be correct for the message
> to be valid?
> 

There are four independent sequence counter and in theory guest can use 
them concurrently. But the access to the PSP must be serialized. 
Currently, the guest driver uses the VMPCK0 key to communicate with the PSP.


> If it's the latter, then a traditional mutex isn't really necessary because the
> guest must implement its own serialization, e.g. it's own mutex or whatever, to
> ensure there is at most one request in-flight at any given time.  

The guest driver uses the its own serialization to ensure that there is 
*exactly* one request in-flight.

The mutex used here is to protect the KVM's internal firmware response 
buffer.


And on the KVM
> side it means KVM can simpy reject requests if there is already an in-flight
> request.  It might also give us more/better options for ratelimiting?
> 

I don't think we should be running into this scenario unless there is a 
bug in the guest kernel. The guest kernel support and CCP driver both 
ensure that request to the PSP is serialized.

In normal operation we may see 1 to 2 quest requests for the entire 
guest lifetime. I am thinking first request maybe for the attestation 
report and second maybe to derive keys etc. It may change slightly when 
we add the migration command; I have not looked into a great detail yet.

thanks
Sean Christopherson July 20, 2021, 4:28 p.m. UTC | #3
On Tue, Jul 20, 2021, Brijesh Singh wrote:
> 
> On 7/19/21 5:50 PM, Sean Christopherson wrote:
> ...
> > 
> > IIUC, this snippet in the spec means KVM can't restrict what requests are made
> > by the guests.  If so, that makes it difficult to detect/ratelimit a misbehaving
> > guest, and also limits our options if there are firmware issues (hopefully there
> > aren't).  E.g. ratelimiting a guest after KVM has explicitly requested it to
> > migrate is not exactly desirable.
> > 
> 
> The guest message page contains a message header followed by the encrypted
> payload. So, technically KVM can peek into the message header format to
> determine the message request type. If needed, we can ratelimit based on the
> message type.

Ah, I got confused by this code in snp_build_guest_buf():

	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);

I was thinking that setting the C-bit meant the memory was guest private, but
that's setting the C-bit for the HPA, which is correct since KVM installs guest
memory with C-bit=1 in the NPT, i.e. encrypts shared memory with the host key.

Tangetially related question, is it correct to say that the host can _read_ memory
from a page that is assigned=1, but has asid=0?  I.e. KVM can read the response
page in order to copy it into the guest, even though it is a firmware page?

	/* Copy the response after the firmware returns success. */
	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);

> In the current series we don't support migration etc so I decided to
> ratelimit unconditionally.

Since KVM can peek at the request header, KVM should flat out disallow requests
that KVM doesn't explicitly support.  E.g. migration requests should not be sent
to the PSP.

One concern though: How does the guest query what requests are supported?  This
snippet implies there's some form of enumeration:

  Note: This guest message may be removed in future versions as it is redundant
  with the CPUID page in SNP_LAUNCH_UPDATE (see Section 8.14).

But all I can find is a "Message Version" in "Table 94. Message Type Encodings",
which implies that request support is all or nothing for a given version.  That
would be rather unfortunate as KVM has no way to tell the guest that something
is unsupported :-(

> > Is this exposed to userspace in any way?  This feels very much like a knob that
> > needs to be configurable per-VM.
> 
> It's not exposed to the userspace and I am not sure if userspace care about
> this knob.

Userspace definitely cares, otherwise the system would need to be rebooted just to
tune the ratelimiting.  And userspace may want to disable ratelimiting entirely,
e.g. if the entire system is dedicated to a single VM.

> > Also, what are the estimated latencies of a guest request?  If the worst case
> > latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
> > lot.
> > 
> 
> The latency will depend on what else is going in the system at the time the
> request comes to the hypervisor. Access to the PSP is serialized so other
> parallel PSP command execution will contribute to the latency.

I get that it will be variable, but what are some ballpark latencies?  E.g. what's
the latency of the slowest command without PSP contention?

> > Question on the VMPCK sequences.  The firmware ABI says:
> > 
> >     Each guest has four VMPCKs ... Each message contains a sequence number per
> >     VMPCK. The sequence number is incremented with each message sent. Messages
> >     sent by the guest to the firmware and by the firmware to the guest must be
> >     delivered in order. If not, the firmware will reject subsequent messages ...
> > 
> > Does that mean there are four independent sequences, i.e. four streams the guest
> > can use "concurrently", or does it mean the overall freshess/integrity check is
> > composed from four VMPCK sequences, all of which must be correct for the message
> > to be valid?
> > 
> 
> There are four independent sequence counter and in theory guest can use them
> concurrently. But the access to the PSP must be serialized.

Technically that's not required from the guest's perspective, correct?  The guest
only cares about the sequence numbers for a given VMPCK, e.g. it can have one
in-flight request per VMPCK and expect that to work, even without fully serializing
its own requests.

Out of curiosity, why 4 VMPCKs?  It seems completely arbitrary.

> Currently, the guest driver uses the VMPCK0 key to communicate with the PSP.
> 
> 
> > If it's the latter, then a traditional mutex isn't really necessary because the
> > guest must implement its own serialization, e.g. it's own mutex or whatever, to
> > ensure there is at most one request in-flight at any given time.
> 
> The guest driver uses the its own serialization to ensure that there is
> *exactly* one request in-flight.

But KVM can't rely on that because it doesn't control the guest, e.g. it may be
running a non-Linux guest.

> The mutex used here is to protect the KVM's internal firmware response
> buffer.

Ya, where I was going with my question was that if the guest was architecturally
restricted to a single in-flight request, then KVM could do something like this
instead of taking kvm->lock (bad pseudocode):

	if (test_and_set(sev->guest_request)) {
		rc = AEAD_OFLOW;
		goto fail;
	}

	<do request>

	clear_bit(...)

I.e. multiple in-flight requests can't work because the guest can guarantee
ordering between vCPUs.  But, because the guest can theoretically have up to four
in-flight requests, it's not that simple.

The reason I'm going down this path is that taking kvm->lock inside vcpu->mutex
violates KVM's locking rules, i.e. is susceptibl to deadlocks.  Per kvm/locking.rst,

  - kvm->lock is taken outside vcpu->mutex

That means a different mutex is needed to protect the guest request pages.

	
> > And on the KVM side it means KVM can simpy reject requests if there is
> > already an in-flight request.  It might also give us more/better options
> > for ratelimiting?
> 
> I don't think we should be running into this scenario unless there is a bug
> in the guest kernel. The guest kernel support and CCP driver both ensure
> that request to the PSP is serialized.

Again, what the Linux kernel does is irrelevant.  What matters is what is
architecturally allowed.

> In normal operation we may see 1 to 2 quest requests for the entire guest
> lifetime. I am thinking first request maybe for the attestation report and
> second maybe to derive keys etc. It may change slightly when we add the
> migration command; I have not looked into a great detail yet.
Brijesh Singh July 20, 2021, 6:21 p.m. UTC | #4
On 7/20/21 11:28 AM, Sean Christopherson wrote:

> 
> Ah, I got confused by this code in snp_build_guest_buf():
> 
> 	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> 
> I was thinking that setting the C-bit meant the memory was guest private, but
> that's setting the C-bit for the HPA, which is correct since KVM installs guest
> memory with C-bit=1 in the NPT, i.e. encrypts shared memory with the host key.
> 
> Tangetially related question, is it correct to say that the host can _read_ memory
> from a page that is assigned=1, but has asid=0?  I.e. KVM can read the response
> page in order to copy it into the guest, even though it is a firmware page?
> 

Yes. The firmware page means that x86 cannot write to it; the read is 
still allowed.


> 	/* Copy the response after the firmware returns success. */
> 	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> 
>> In the current series we don't support migration etc so I decided to
>> ratelimit unconditionally.
> 
> Since KVM can peek at the request header, KVM should flat out disallow requests
> that KVM doesn't explicitly support.  E.g. migration requests should not be sent
> to the PSP.
> 

That is acceptable.


> One concern though: How does the guest query what requests are supported?  This
> snippet implies there's some form of enumeration:
> 
>    Note: This guest message may be removed in future versions as it is redundant
>    with the CPUID page in SNP_LAUNCH_UPDATE (see Section 8.14).
> 
> But all I can find is a "Message Version" in "Table 94. Message Type Encodings",
> which implies that request support is all or nothing for a given version.  That
> would be rather unfortunate as KVM has no way to tell the guest that something
> is unsupported :-(
> 

The firmware supports all the commands listed in the spec. The HV 
support is always going to be behind what a firmware or hardware is 
capable of doing. As per the spec is concerned, it say

   The firmware checks that MSG_TYPE is a valid message type. The
   firmware then checks that MSG_SIZE is large enough to hold the
   indicated message type at the indicated message version. If
   not, the firmware returns INVALID_PARAM.

So, a hypervisor could potentially send the INVALID_PARAMS to indicate 
that guest that a message type is not supported.


>>> Is this exposed to userspace in any way?  This feels very much like a knob that
>>> needs to be configurable per-VM.
>>
>> It's not exposed to the userspace and I am not sure if userspace care about
>> this knob.
> 
> Userspace definitely cares, otherwise the system would need to be rebooted just to
> tune the ratelimiting.  And userspace may want to disable ratelimiting entirely,
> e.g. if the entire system is dedicated to a single VM.

Ok.

> 
>>> Also, what are the estimated latencies of a guest request?  If the worst case
>>> latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
>>> lot.
>>>
>>
>> The latency will depend on what else is going in the system at the time the
>> request comes to the hypervisor. Access to the PSP is serialized so other
>> parallel PSP command execution will contribute to the latency.
> 
> I get that it will be variable, but what are some ballpark latencies?  E.g. what's
> the latency of the slowest command without PSP contention?
> 

In my single VM, I am seeing latency of guest request to be around ~35ms.

>>> Question on the VMPCK sequences.  The firmware ABI says:
>>>
>>>      Each guest has four VMPCKs ... Each message contains a sequence number per
>>>      VMPCK. The sequence number is incremented with each message sent. Messages
>>>      sent by the guest to the firmware and by the firmware to the guest must be
>>>      delivered in order. If not, the firmware will reject subsequent messages ...
>>>
>>> Does that mean there are four independent sequences, i.e. four streams the guest
>>> can use "concurrently", or does it mean the overall freshess/integrity check is
>>> composed from four VMPCK sequences, all of which must be correct for the message
>>> to be valid?
>>>
>>
>> There are four independent sequence counter and in theory guest can use them
>> concurrently. But the access to the PSP must be serialized.
> 
> Technically that's not required from the guest's perspective, correct?  

Correct.

The guest
> only cares about the sequence numbers for a given VMPCK, e.g. it can have one
> in-flight request per VMPCK and expect that to work, even without fully serializing
> its own requests.
> 
> Out of curiosity, why 4 VMPCKs?  It seems completely arbitrary.
> 

I believe the thought process was by providing 4 keys it can provide 
flexibility for each VMPL levels to use a different keys (if they wish). 
The firmware does not care about the vmpl level during the guest request 
handling, it just want to know which key is used for encrypting the 
payload so that he can decrypt and provide the  response for it.


>> Currently, the guest driver uses the VMPCK0 key to communicate with the PSP.
>>
>>
>>> If it's the latter, then a traditional mutex isn't really necessary because the
>>> guest must implement its own serialization, e.g. it's own mutex or whatever, to
>>> ensure there is at most one request in-flight at any given time.
>>
>> The guest driver uses the its own serialization to ensure that there is
>> *exactly* one request in-flight.
> 
> But KVM can't rely on that because it doesn't control the guest, e.g. it may be
> running a non-Linux guest.
>

Yes, KVM should not rely on it. I mentioned that mainly because you said 
that guest must implement its own serialization. In the case of KVM, the 
CCP driver ensure that the command sent to the PSP is serialized.


>> The mutex used here is to protect the KVM's internal firmware response
>> buffer.
> 
> Ya, where I was going with my question was that if the guest was architecturally
> restricted to a single in-flight request, then KVM could do something like this
> instead of taking kvm->lock (bad pseudocode):
> 
> 	if (test_and_set(sev->guest_request)) {
> 		rc = AEAD_OFLOW;
> 		goto fail;
> 	}
> 
> 	<do request>
> 
> 	clear_bit(...)
> 
> I.e. multiple in-flight requests can't work because the guest can guarantee
> ordering between vCPUs.  But, because the guest can theoretically have up to four
> in-flight requests, it's not that simple.
> 
> The reason I'm going down this path is that taking kvm->lock inside vcpu->mutex
> violates KVM's locking rules, i.e. is susceptibl to deadlocks.  Per kvm/locking.rst,
> 
>    - kvm->lock is taken outside vcpu->mutex
> 
> That means a different mutex is needed to protect the guest request pages.
> 

Ah, I see your point on the locking. From architecturally a guest can 
issue multiple requests in parallel. It sounds like having a separate 
lock to protect the guest request pages makes sense.


-Brijesh
Sean Christopherson July 20, 2021, 10:09 p.m. UTC | #5
On Tue, Jul 20, 2021, Brijesh Singh wrote:
> 
> On 7/20/21 11:28 AM, Sean Christopherson wrote:
> > Out of curiosity, why 4 VMPCKs?  It seems completely arbitrary.
> > 
> 
> I believe the thought process was by providing 4 keys it can provide
> flexibility for each VMPL levels to use a different keys (if they wish). The
> firmware does not care about the vmpl level during the guest request
> handling, it just want to know which key is used for encrypting the payload
> so that he can decrypt and provide the  response for it.

Ah, I forgot about VMPLs.  That makes sense.

Thanks!
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 53a60edc810e..4cb4c1d7e444 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,8 @@ 
 #include <linux/processor.h>
 #include <linux/trace_events.h>
 #include <linux/sev.h>
+#include <linux/kvm_host.h>
+#include <linux/sev-guest.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/trapnr.h>
@@ -1534,6 +1536,7 @@  static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_data_snp_gctx_create data = {};
 	void *context;
 	int rc;
@@ -1543,14 +1546,24 @@  static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (!context)
 		return NULL;
 
-	data.gctx_paddr = __psp_pa(context);
-	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
-	if (rc) {
+	/* Allocate a firmware buffer used during the guest command handling. */
+	sev->snp_resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!sev->snp_resp_page) {
 		snp_free_firmware_page(context);
 		return NULL;
 	}
 
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc)
+		goto e_free;
+
 	return context;
+
+e_free:
+	snp_free_firmware_page(context);
+	snp_free_firmware_page(sev->snp_resp_page);
+	return NULL;
 }
 
 static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -1618,6 +1631,12 @@  static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (rc)
 		goto e_free_context;
 
+	/* Used for rate limiting SNP guest message request, use the default settings */
+	ratelimit_default_init(&sev->snp_guest_msg_rs);
+
+	/* Allocate memory used for the certs data in SNP guest request */
+	sev->snp_certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
+
 	return 0;
 
 e_free_context:
@@ -2218,6 +2237,9 @@  static int snp_decommission_context(struct kvm *kvm)
 	snp_free_firmware_page(sev->snp_context);
 	sev->snp_context = NULL;
 
+	/* Free the response page. */
+	snp_free_firmware_page(sev->snp_resp_page);
+
 	return 0;
 }
 
@@ -2268,6 +2290,9 @@  void sev_vm_destroy(struct kvm *kvm)
 		sev_unbind_asid(kvm, sev->handle);
 	}
 
+
+	kfree(sev->snp_certs_data);
+
 	sev_asid_free(sev);
 }
 
@@ -2663,6 +2688,8 @@  static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FT:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		goto vmgexit_err;
@@ -3053,6 +3080,181 @@  static unsigned long snp_handle_psc(struct vcpu_svm *svm, struct ghcb *ghcb)
 	return rc ? map_to_psc_vmgexit_code(rc) : 0;
 }
 
+static int snp_build_guest_buf(struct vcpu_svm *svm, struct sev_data_snp_guest_request *data,
+			       gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t req_pfn, resp_pfn;
+	struct kvm_sev_info *sev;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE)) {
+		pr_err_ratelimited("svm: guest request (%#llx) or response (%#llx) is not page aligned\n",
+			req_gpa, resp_gpa);
+		return -EINVAL;
+	}
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn)) {
+		pr_err_ratelimited("svm: guest request invalid gpa=%#llx\n", req_gpa);
+		return -EINVAL;
+	}
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn)) {
+		pr_err_ratelimited("svm: guest response invalid gpa=%#llx\n", resp_gpa);
+		return -EINVAL;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __psp_pa(sev->snp_resp_page);
+
+	return 0;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
+				     gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request data = {};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_sev_info *sev;
+	int rc, err = 0;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = -ENODEV;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
+		pr_info_ratelimited("svm: too many guest message requests\n");
+		rc = -EAGAIN;
+		goto e_fail;
+	}
+
+	rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa);
+	if (rc)
+		goto e_fail;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_lock(&kvm->lock);
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+	if (rc) {
+		mutex_unlock(&kvm->lock);
+
+		/* If we have a firmware error code then use it. */
+		if (err)
+			rc = err;
+
+		goto e_fail;
+	}
+
+	/* Copy the response after the firmware returns success. */
+	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
+
+	mutex_unlock(&kvm->lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
+					 gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request req = {};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long data_npages;
+	struct kvm_sev_info *sev;
+	unsigned long err;
+	u64 data_gpa;
+	int rc;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = -ENODEV;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
+		pr_info_ratelimited("svm: too many guest message requests\n");
+		rc = -EAGAIN;
+		goto e_fail;
+	}
+
+	if (!sev->snp_certs_data) {
+		pr_err("svm: certs data memory is not allocated\n");
+		rc = -EFAULT;
+		goto e_fail;
+	}
+
+	data_gpa = ghcb_get_rax(ghcb);
+	data_npages = ghcb_get_rbx(ghcb);
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+		pr_err_ratelimited("svm: certs data GPA is not page aligned (%#llx)\n", data_gpa);
+		rc = -EINVAL;
+		goto e_fail;
+	}
+
+	/* Verify that requested blob will fit in our intermediate buffer */
+	if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
+		rc = -EINVAL;
+		goto e_fail;
+	}
+
+	rc = snp_build_guest_buf(svm, &req, req_gpa, resp_gpa);
+	if (rc)
+		goto e_fail;
+
+	mutex_lock(&kvm->lock);
+	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
+					 &data_npages, &err);
+	if (rc) {
+		mutex_unlock(&kvm->lock);
+
+		/*
+		 * If buffer length is small then return the expected
+		 * length in rbx.
+		 */
+		if (err == SNP_GUEST_REQ_INVALID_LEN) {
+			vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+			ghcb_set_sw_exit_info_2(ghcb, err);
+			return;
+		}
+
+		/* If we have a firmware error code then use it. */
+		if (err)
+			rc = (int)err;
+
+		goto e_fail;
+	}
+
+	/* Copy the response after the firmware returns success. */
+	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
+
+	mutex_unlock(&kvm->lock);
+
+	if (rc)
+		goto e_fail;
+
+	/* Copy the certificate blob in the guest memory */
+	if (data_npages &&
+	    kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
+		rc = -EFAULT;
+
+e_fail:
+	ghcb_set_sw_exit_info_2(ghcb, rc);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3306,6 +3508,21 @@  int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ghcb_set_sw_exit_info_2(ghcb, rc);
 		break;
 	}
+	case SVM_VMGEXIT_GUEST_REQUEST: {
+		snp_handle_guest_request(svm, ghcb, control->exit_info_1, control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
+		snp_handle_ext_guest_request(svm,
+					     ghcb,
+					     control->exit_info_1,
+					     control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ccdaaa4e1fb1..9fcfc0a51737 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -18,6 +18,7 @@ 
 #include <linux/kvm_types.h>
 #include <linux/kvm_host.h>
 #include <linux/bits.h>
+#include <linux/ratelimit.h>
 
 #include <asm/svm.h>
 #include <asm/sev-common.h>
@@ -68,6 +69,9 @@  struct kvm_sev_info {
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	void *snp_context;      /* SNP guest context page */
+	void *snp_resp_page;	/* SNP guest response page */
+	struct ratelimit_state snp_guest_msg_rs; /* Rate limit the SNP guest message */
+	void *snp_certs_data;
 };
 
 struct kvm_svm {
@@ -550,7 +554,7 @@  void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	0
+#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
 
 extern unsigned int max_sev_asid;