Message ID | 20240510015822.503071-2-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand |
On Thu, May 09, 2024, Michael Roth wrote: > The intended logic when handling #NPFs with the RMP bit set (31) is to > first check to see if the #NPF requires a shared<->private transition > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT > get forwarded on to userspace before proceeding with any handling of > other potential RMP fault conditions like needing to PSMASH the RMP > entry/etc (which will be done later if the guest still re-faults after > the KVM_EXIT_MEMORY_FAULT is processed by userspace). > > The determination of whether any userspace handling of > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code > of kvm_mmu_page_fault(). However, the current code misinterprets the > return code, expecting 0 to indicate a userspace exit rather than less > than 0 (-EFAULT). This leads to the following unexpected behavior: > > - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private > conversions, warnings get printed from sev_handle_rmp_fault() > because it does not expect to be called for GPAs where > KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't > generally do this, but it is allowed and should be handled > similarly to private->shared conversions rather than triggering any > sort of warnings > > - if gmem support for 2MB folios is enabled (via currently out-of-tree > code), implicit shared<->private conversions will always result in > a PSMASH being attempted, even if it's not actually needed to > resolve the RMP fault. This doesn't cause any harm, but results in a > needless PSMASH and zapping of the sPTE > > Resolve these issues by calling sev_handle_rmp_fault() only when > kvm_mmu_page_fault()'s return code is greater than or equal to 0, > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, > simplify the code slightly and fix up the associated comments for better > clarity. > > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") > > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > arch/x86/kvm/svm/svm.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 426ad49325d7..9431ce74c7d4 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) > svm->vmcb->control.insn_len); > > /* > - * rc == 0 indicates a userspace exit is needed to handle page > - * transitions, so do that first before updating the RMP table. > + * rc < 0 indicates a userspace exit may be needed to handle page > + * attribute updates, so deal with that first before handling other > + * potential RMP fault conditions. > */ > - if (error_code & PFERR_GUEST_RMP_MASK) { > - if (rc == 0) > - return rc; > + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) This isn't correct either. A return of '0' also indiciates "exit to userspace", it just doesn't happen with SNP because '0' is returned only when KVM attempts emulation, and that too gets short-circuited by svm_check_emulate_instruction(). And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return values overload is ubiquitous enough that it should be relatively self-explanatory. Or if you prefer to keep a comment, drop the part that specifically calls out attributes updates, because that incorrectly implies that's the _only_ reason why KVM checks the return. But my vote is to drop the comment, because it essentially becomes "don't proceed to step 2 if step 1 failed", which kind of makes the reader go "well, yeah".
On Fri, May 10, 2024 at 06:58:45AM -0700, Sean Christopherson wrote: > On Thu, May 09, 2024, Michael Roth wrote: > > The intended logic when handling #NPFs with the RMP bit set (31) is to > > first check to see if the #NPF requires a shared<->private transition > > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT > > get forwarded on to userspace before proceeding with any handling of > > other potential RMP fault conditions like needing to PSMASH the RMP > > entry/etc (which will be done later if the guest still re-faults after > > the KVM_EXIT_MEMORY_FAULT is processed by userspace). > > > > The determination of whether any userspace handling of > > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code > > of kvm_mmu_page_fault(). However, the current code misinterprets the > > return code, expecting 0 to indicate a userspace exit rather than less > > than 0 (-EFAULT). This leads to the following unexpected behavior: > > > > - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private > > conversions, warnings get printed from sev_handle_rmp_fault() > > because it does not expect to be called for GPAs where > > KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't > > generally do this, but it is allowed and should be handled > > similarly to private->shared conversions rather than triggering any > > sort of warnings > > > > - if gmem support for 2MB folios is enabled (via currently out-of-tree > > code), implicit shared<->private conversions will always result in > > a PSMASH being attempted, even if it's not actually needed to > > resolve the RMP fault. This doesn't cause any harm, but results in a > > needless PSMASH and zapping of the sPTE > > > > Resolve these issues by calling sev_handle_rmp_fault() only when > > kvm_mmu_page_fault()'s return code is greater than or equal to 0, > > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, > > simplify the code slightly and fix up the associated comments for better > > clarity. > > > > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > --- > > arch/x86/kvm/svm/svm.c | 10 ++++------ > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 426ad49325d7..9431ce74c7d4 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) > > svm->vmcb->control.insn_len); > > > > /* > > - * rc == 0 indicates a userspace exit is needed to handle page > > - * transitions, so do that first before updating the RMP table. > > + * rc < 0 indicates a userspace exit may be needed to handle page > > + * attribute updates, so deal with that first before handling other > > + * potential RMP fault conditions. > > */ > > - if (error_code & PFERR_GUEST_RMP_MASK) { > > - if (rc == 0) > > - return rc; > > + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) > > This isn't correct either. A return of '0' also indiciates "exit to userspace", > it just doesn't happen with SNP because '0' is returned only when KVM attempts > emulation, and that too gets short-circuited by svm_check_emulate_instruction(). > > And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return > values overload is ubiquitous enough that it should be relatively self-explanatory. > > Or if you prefer to keep a comment, drop the part that specifically calls out > attributes updates, because that incorrectly implies that's the _only_ reason > why KVM checks the return. But my vote is to drop the comment, because it > essentially becomes "don't proceed to step 2 if step 1 failed", which kind of > makes the reader go "well, yeah". Ok, I think I was just paranoid after missing this. I've gone ahead and dropped the comment, and hopefully it's now drilled into my head enough that it's obvious to me now as well :) I've also changed the logic to skip the extra RMP handling for rc==0 as well (should that ever arise for any future reason): https://github.com/mdroth/linux/commit/0a0ba0d7f7571a31f0abc68acc51f24c2a14a8cf Thanks! -Mike
On 5/10/24 15:58, Sean Christopherson wrote: > On Thu, May 09, 2024, Michael Roth wrote: >> The intended logic when handling #NPFs with the RMP bit set (31) is to >> first check to see if the #NPF requires a shared<->private transition >> and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT >> get forwarded on to userspace before proceeding with any handling of >> other potential RMP fault conditions like needing to PSMASH the RMP >> entry/etc (which will be done later if the guest still re-faults after >> the KVM_EXIT_MEMORY_FAULT is processed by userspace). >> >> The determination of whether any userspace handling of >> KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code >> of kvm_mmu_page_fault(). However, the current code misinterprets the >> return code, expecting 0 to indicate a userspace exit rather than less >> than 0 (-EFAULT). This leads to the following unexpected behavior: >> >> - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private >> conversions, warnings get printed from sev_handle_rmp_fault() >> because it does not expect to be called for GPAs where >> KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't >> generally do this, but it is allowed and should be handled >> similarly to private->shared conversions rather than triggering any >> sort of warnings >> >> - if gmem support for 2MB folios is enabled (via currently out-of-tree >> code), implicit shared<->private conversions will always result in >> a PSMASH being attempted, even if it's not actually needed to >> resolve the RMP fault. This doesn't cause any harm, but results in a >> needless PSMASH and zapping of the sPTE >> >> Resolve these issues by calling sev_handle_rmp_fault() only when >> kvm_mmu_page_fault()'s return code is greater than or equal to 0, >> indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, >> simplify the code slightly and fix up the associated comments for better >> clarity. >> >> Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") >> >> Signed-off-by: Michael Roth <michael.roth@amd.com> >> --- >> arch/x86/kvm/svm/svm.c | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >> index 426ad49325d7..9431ce74c7d4 100644 >> --- a/arch/x86/kvm/svm/svm.c >> +++ b/arch/x86/kvm/svm/svm.c >> @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) >> svm->vmcb->control.insn_len); >> >> /* >> - * rc == 0 indicates a userspace exit is needed to handle page >> - * transitions, so do that first before updating the RMP table. >> + * rc < 0 indicates a userspace exit may be needed to handle page >> + * attribute updates, so deal with that first before handling other >> + * potential RMP fault conditions. >> */ >> - if (error_code & PFERR_GUEST_RMP_MASK) { >> - if (rc == 0) >> - return rc; >> + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) > > This isn't correct either. A return of '0' also indiciates "exit to userspace", > it just doesn't happen with SNP because '0' is returned only when KVM attempts > emulation, and that too gets short-circuited by svm_check_emulate_instruction(). > > And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return > values overload is ubiquitous enough that it should be relatively self-explanatory. > > Or if you prefer to keep a comment, drop the part that specifically calls out > attributes updates, because that incorrectly implies that's the _only_ reason > why KVM checks the return. But my vote is to drop the comment, because it > essentially becomes "don't proceed to step 2 if step 1 failed", which kind of > makes the reader go "well, yeah". So IIUC you're suggesting diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 426ad49325d7..c39eaeb21981 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2068,16 +2068,11 @@ static int npf_interception(struct kvm_vcpu *vcpu) static_cpu_has(X86_FEATURE_DECODEASSISTS) ? svm->vmcb->control.insn_bytes : NULL, svm->vmcb->control.insn_len); + if (rc <= 0) + return rc; - /* - * rc == 0 indicates a userspace exit is needed to handle page - * transitions, so do that first before updating the RMP table. - */ - if (error_code & PFERR_GUEST_RMP_MASK) { - if (rc == 0) - return rc; + if (error_code & PFERR_GUEST_RMP_MASK) sev_handle_rmp_fault(vcpu, fault_address, error_code); - } return rc; } ? So, we're... a bit tight for 6.10 to include SNP and that is an understatement. My plan is to merge it for 6.11, but do so immediately after the merge window ends. In other words, it is a delay in terms of release but not in terms of time. I don't want QEMU and kvm-unit-tests work to be delayed any further, in particular. Once we sort out the loose ends of patches 21-23, you could send it as a pull request. Paolo
On Fri, May 10, 2024 at 06:01:52PM +0200, Paolo Bonzini wrote: > On 5/10/24 15:58, Sean Christopherson wrote: > > On Thu, May 09, 2024, Michael Roth wrote: > > > The intended logic when handling #NPFs with the RMP bit set (31) is to > > > first check to see if the #NPF requires a shared<->private transition > > > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT > > > get forwarded on to userspace before proceeding with any handling of > > > other potential RMP fault conditions like needing to PSMASH the RMP > > > entry/etc (which will be done later if the guest still re-faults after > > > the KVM_EXIT_MEMORY_FAULT is processed by userspace). > > > > > > The determination of whether any userspace handling of > > > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code > > > of kvm_mmu_page_fault(). However, the current code misinterprets the > > > return code, expecting 0 to indicate a userspace exit rather than less > > > than 0 (-EFAULT). This leads to the following unexpected behavior: > > > > > > - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private > > > conversions, warnings get printed from sev_handle_rmp_fault() > > > because it does not expect to be called for GPAs where > > > KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't > > > generally do this, but it is allowed and should be handled > > > similarly to private->shared conversions rather than triggering any > > > sort of warnings > > > > > > - if gmem support for 2MB folios is enabled (via currently out-of-tree > > > code), implicit shared<->private conversions will always result in > > > a PSMASH being attempted, even if it's not actually needed to > > > resolve the RMP fault. This doesn't cause any harm, but results in a > > > needless PSMASH and zapping of the sPTE > > > > > > Resolve these issues by calling sev_handle_rmp_fault() only when > > > kvm_mmu_page_fault()'s return code is greater than or equal to 0, > > > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, > > > simplify the code slightly and fix up the associated comments for better > > > clarity. > > > > > > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > > --- > > > arch/x86/kvm/svm/svm.c | 10 ++++------ > > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > index 426ad49325d7..9431ce74c7d4 100644 > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) > > > svm->vmcb->control.insn_len); > > > /* > > > - * rc == 0 indicates a userspace exit is needed to handle page > > > - * transitions, so do that first before updating the RMP table. > > > + * rc < 0 indicates a userspace exit may be needed to handle page > > > + * attribute updates, so deal with that first before handling other > > > + * potential RMP fault conditions. > > > */ > > > - if (error_code & PFERR_GUEST_RMP_MASK) { > > > - if (rc == 0) > > > - return rc; > > > + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) > > > > This isn't correct either. A return of '0' also indiciates "exit to userspace", > > it just doesn't happen with SNP because '0' is returned only when KVM attempts > > emulation, and that too gets short-circuited by svm_check_emulate_instruction(). > > > > And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return > > values overload is ubiquitous enough that it should be relatively self-explanatory. > > > > Or if you prefer to keep a comment, drop the part that specifically calls out > > attributes updates, because that incorrectly implies that's the _only_ reason > > why KVM checks the return. But my vote is to drop the comment, because it > > essentially becomes "don't proceed to step 2 if step 1 failed", which kind of > > makes the reader go "well, yeah". > > So IIUC you're suggesting > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 426ad49325d7..c39eaeb21981 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -2068,16 +2068,11 @@ static int npf_interception(struct kvm_vcpu *vcpu) > static_cpu_has(X86_FEATURE_DECODEASSISTS) ? > svm->vmcb->control.insn_bytes : NULL, > svm->vmcb->control.insn_len); > + if (rc <= 0) > + return rc; > - /* > - * rc == 0 indicates a userspace exit is needed to handle page > - * transitions, so do that first before updating the RMP table. > - */ > - if (error_code & PFERR_GUEST_RMP_MASK) { > - if (rc == 0) > - return rc; > + if (error_code & PFERR_GUEST_RMP_MASK) > sev_handle_rmp_fault(vcpu, fault_address, error_code); > - } > return rc; > } > > ? > > So, we're... a bit tight for 6.10 to include SNP and that is an > understatement. My plan is to merge it for 6.11, but do so > immediately after the merge window ends. In other words, it > is a delay in terms of release but not in terms of time. I > don't want QEMU and kvm-unit-tests work to be delayed any > further, in particular. That's unfortunate, I'd thought from the PUCK call that we still had some time to stabilize things before merge window. But whatever you think is best. > > Once we sort out the loose ends of patches 21-23, you could send > it as a pull request. Ok, as a pull request against kvm/next, or kvm/queue? Thanks, Mike > > Paolo > >
On 5/10/24 18:37, Michael Roth wrote: >> So, we're... a bit tight for 6.10 to include SNP and that is an >> understatement. My plan is to merge it for 6.11, but do so >> immediately after the merge window ends. In other words, it >> is a delay in terms of release but not in terms of time. I >> don't want QEMU and kvm-unit-tests work to be delayed any >> further, in particular. > > That's unfortunate, I'd thought from the PUCK call that we still had > some time to stabilize things before merge window. But whatever you > think is best. Well, the merge window starts next sunday, doesn't it? If there's an -rc8 I agree there's some leeway, but that is not too likely. >> Once we sort out the loose ends of patches 21-23, you could send >> it as a pull request. > Ok, as a pull request against kvm/next, or kvm/queue? Against kvm/next. Paolo
On Fri, May 10, 2024 at 6:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > Well, the merge window starts next sunday, doesn't it? If there's an > -rc8 I agree there's some leeway, but that is not too likely. > > >> Once we sort out the loose ends of patches 21-23, you could send > >> it as a pull request. > > Ok, as a pull request against kvm/next, or kvm/queue? > > Against kvm/next. Ah no, only kvm/queue has the preparatory hooks - they make no sense without something that uses them. kvm/queue is ready now. Also, please send the pull request "QEMU style", i.e. with patches as replies. If there's an -rc8, I'll probably pull it on Thursday morning. Paolo
On May 10, 2024 6:59:37 PM GMT+02:00, Paolo Bonzini <pbonzini@redhat.com> wrote:
>Well, the merge window starts next sunday, doesn't it? If there's an -rc8 I agree there's some leeway, but that is not too likely.
Nah, the merge window just opened yesterday.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 426ad49325d7..9431ce74c7d4 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) svm->vmcb->control.insn_len); /* - * rc == 0 indicates a userspace exit is needed to handle page - * transitions, so do that first before updating the RMP table. + * rc < 0 indicates a userspace exit may be needed to handle page + * attribute updates, so deal with that first before handling other + * potential RMP fault conditions. */ - if (error_code & PFERR_GUEST_RMP_MASK) { - if (rc == 0) - return rc; + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) sev_handle_rmp_fault(vcpu, fault_address, error_code); - } return rc; }
The intended logic when handling #NPFs with the RMP bit set (31) is to first check to see if the #NPF requires a shared<->private transition and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT get forwarded on to userspace before proceeding with any handling of other potential RMP fault conditions like needing to PSMASH the RMP entry/etc (which will be done later if the guest still re-faults after the KVM_EXIT_MEMORY_FAULT is processed by userspace). The determination of whether any userspace handling of KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code of kvm_mmu_page_fault(). However, the current code misinterprets the return code, expecting 0 to indicate a userspace exit rather than less than 0 (-EFAULT). This leads to the following unexpected behavior: - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private conversions, warnings get printed from sev_handle_rmp_fault() because it does not expect to be called for GPAs where KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't generally do this, but it is allowed and should be handled similarly to private->shared conversions rather than triggering any sort of warnings - if gmem support for 2MB folios is enabled (via currently out-of-tree code), implicit shared<->private conversions will always result in a PSMASH being attempted, even if it's not actually needed to resolve the RMP fault. This doesn't cause any harm, but results in a needless PSMASH and zapping of the sPTE Resolve these issues by calling sev_handle_rmp_fault() only when kvm_mmu_page_fault()'s return code is greater than or equal to 0, indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, simplify the code slightly and fix up the associated comments for better clarity. Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") Signed-off-by: Michael Roth <michael.roth@amd.com> --- arch/x86/kvm/svm/svm.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)