Message ID | 20210707183616.5620-26-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand |
On Wed, Jul 07, 2021, Brijesh Singh wrote: > The guest pages of the SEV-SNP VM maybe added as a private page in the > RMP entry (assigned bit is set). The guest private pages must be > transitioned to the hypervisor state before its freed. Isn't this patch needed much earlier in the series, i.e. when the first RMPUPDATE usage goes in? > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 39 insertions(+) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 1f0635ac9ff9..4468995dd209 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -1940,6 +1940,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range) > static void __unregister_enc_region_locked(struct kvm *kvm, > struct enc_region *region) > { > + struct rmpupdate val = {}; > + unsigned long i, pfn; > + struct rmpentry *e; > + int level, rc; > + > + /* > + * The guest memory pages are assigned in the RMP table. Unassign it > + * before releasing the memory. > + */ > + if (sev_snp_guest(kvm)) { > + for (i = 0; i < region->npages; i++) { > + pfn = page_to_pfn(region->pages[i]); > + > + if (need_resched()) > + schedule(); This can simply be "cond_resched();" > + > + e = snp_lookup_page_in_rmptable(region->pages[i], &level); > + if (unlikely(!e)) > + continue; > + > + /* If its not a guest assigned page then skip it. */ > + if (!rmpentry_assigned(e)) > + continue; > + > + /* Is the page part of a 2MB RMP entry? */ > + if (level == PG_LEVEL_2M) { > + val.pagesize = RMP_PG_SIZE_2M; > + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1); > + } else { > + val.pagesize = RMP_PG_SIZE_4K; This raises yet more questions (for me) as to the interaction between Page-Size and Hyperivsor-Owned flags in the RMP. It also raises questions on the correctness of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch). > + } > + > + /* Transition the page to hypervisor owned. */ > + rc = rmpupdate(pfn_to_page(pfn), &val); > + if (rc) > + pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc); This is not robust, e.g. KVM will unpin the memory and release it back to the kernel with a stale RMP entry. Shouldn't this be a WARN+leak situation? > + } > + } > + > sev_unpin_memory(kvm, region->pages, region->npages); > list_del(®ion->list); > kfree(region); > -- > 2.17.1 >
On 7/16/21 3:09 PM, Sean Christopherson wrote: > On Wed, Jul 07, 2021, Brijesh Singh wrote: >> The guest pages of the SEV-SNP VM maybe added as a private page in the >> RMP entry (assigned bit is set). The guest private pages must be >> transitioned to the hypervisor state before its freed. > Isn't this patch needed much earlier in the series, i.e. when the first RMPUPDATE > usage goes in? Yes, the first RMPUPDATE usage is in the LAUNCH_UPDATE patch and this should be squashed in that patch. >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >> --- >> arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 39 insertions(+) >> >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c >> index 1f0635ac9ff9..4468995dd209 100644 >> --- a/arch/x86/kvm/svm/sev.c >> +++ b/arch/x86/kvm/svm/sev.c >> @@ -1940,6 +1940,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range) >> static void __unregister_enc_region_locked(struct kvm *kvm, >> struct enc_region *region) >> { >> + struct rmpupdate val = {}; >> + unsigned long i, pfn; >> + struct rmpentry *e; >> + int level, rc; >> + >> + /* >> + * The guest memory pages are assigned in the RMP table. Unassign it >> + * before releasing the memory. >> + */ >> + if (sev_snp_guest(kvm)) { >> + for (i = 0; i < region->npages; i++) { >> + pfn = page_to_pfn(region->pages[i]); >> + >> + if (need_resched()) >> + schedule(); > This can simply be "cond_resched();" Yes. > >> + >> + e = snp_lookup_page_in_rmptable(region->pages[i], &level); >> + if (unlikely(!e)) >> + continue; >> + >> + /* If its not a guest assigned page then skip it. */ >> + if (!rmpentry_assigned(e)) >> + continue; >> + >> + /* Is the page part of a 2MB RMP entry? */ >> + if (level == PG_LEVEL_2M) { >> + val.pagesize = RMP_PG_SIZE_2M; >> + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1); >> + } else { >> + val.pagesize = RMP_PG_SIZE_4K; > This raises yet more questions (for me) as to the interaction between Page-Size > and Hyperivsor-Owned flags in the RMP. It also raises questions on the correctness > of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch). I assume you mean the LAUNCH_UPDATE because that's when we need to perform the RMPUPDATE. The hypervisor owned means all zero in the RMP entry. >> + } >> + >> + /* Transition the page to hypervisor owned. */ >> + rc = rmpupdate(pfn_to_page(pfn), &val); >> + if (rc) >> + pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc); > This is not robust, e.g. KVM will unpin the memory and release it back to the > kernel with a stale RMP entry. Shouldn't this be a WARN+leak situation? Yes. Maybe we should increase the page refcount to ensure that this page is not reused after the process is terminated ? >> + } >> + } >> + >> sev_unpin_memory(kvm, region->pages, region->npages); >> list_del(®ion->list); >> kfree(region); >> -- >> 2.17.1 >>
On Fri, Jul 16, 2021, Brijesh Singh wrote: > > On 7/16/21 3:09 PM, Sean Christopherson wrote: > > On Wed, Jul 07, 2021, Brijesh Singh wrote: > >> + e = snp_lookup_page_in_rmptable(region->pages[i], &level); > >> + if (unlikely(!e)) > >> + continue; > >> + > >> + /* If its not a guest assigned page then skip it. */ > >> + if (!rmpentry_assigned(e)) > >> + continue; > >> + > >> + /* Is the page part of a 2MB RMP entry? */ > >> + if (level == PG_LEVEL_2M) { > >> + val.pagesize = RMP_PG_SIZE_2M; > >> + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1); > >> + } else { > >> + val.pagesize = RMP_PG_SIZE_4K; > > This raises yet more questions (for me) as to the interaction between Page-Size > > and Hyperivsor-Owned flags in the RMP. It also raises questions on the correctness > > of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch). > > I assume you mean the LAUNCH_UPDATE because that's when we need to > perform the RMPUPDATE. Doh, yes. > The hypervisor owned means all zero in the RMP entry. Figured out where I went wrong after reading the RMPUDPATE pseudocode. RMPUPDATE takes the page size as a parameter even though it unconditionally zeros the page size flag in the RMP entry for unassigned pages. A wrapper around rmpupdate() would definitely help, e.g. (though level might need to be an "int" to avoid a bunch of casts). int rmp_make_shared(u64 pfn, enum pg_level level); Wrappers for "private" and "firmware" would probably be helpful too. And if you do that, I think you can bury both "struct rmpupdate", rmpupdate(), and X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c. snp_set_rmptable_state() might need some refactoring to avoid three booleans, but I guess maybe that could be an exception? Not sure. Anyways, was thinking something like: int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid); int rmp_make_firmware(u64 pfn); It would consolidate a bit of code, and more importantly it would give visual cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared". Side topic, what happens if a firmware entry is configured with page_size=1? And one architectural question: what prevents a malicious VMM from punching a 4k shared page into a 2mb private page? E.g. rmpupdate(1 << 20, [private, 2mb]); rmpupdate(1 << 20 + 4096, [shared, 4kb]); I don't see any checks in the pseudocode that will detect this, and presumably the whole point of a 2mb private RMP entry is to not have to go walk the individual 4kb entries on a private access. NEW_RMP = READ_MEM.o [NEW_RMP_PTR] IF ((NEW_RMP.PAGE_SIZE == 2MB) && (SYSTEM_PA[20:12] != 0)) <-- not taken, 4kb entry EAX = FAIL_INPUT EXIT IF (!NEW_RMP.ASSIGNED && (NEW_RMP.IMMUTABLE || (NEW_RMP.ASID != 0)) <-- not taken, new entry valid EAX = FAIL_INPUT EXIT RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16 IF (RMP_ENTRY_PA > RMP_END) EAX = FAIL_INPUT EXIT // System address must have an RMP entry OLD_RMP = READ_MEM_PA.o [RMP_ENTRY_PA] IF (OLD_RMP.IMMUTABLE) <-- passes, private entry not immutable EAX = FAIL_PERMISSION EXIT IF (NEW_RMP.PAGE_SIZE == 4KB) IF ((SYSTEM_PA[20:12] == 0) && (OLD_RMP.PAGE_SIZE == 2MB)) <- not taken, PA[12] == 1 EAX = FAIL_OVERLAP EXIT ELSE IF (Any 4KB RMP entry with (RMP.ASSIGNED == 1) exists in 2MB region) EAX = FAIL_OVERLAP EXIT ELSE FOR (I = 1; I < 512, I++) { temp_RMP = 0 temp_RMP.ASSIGNED = NEW_RMP.ASSIGNED WRITE_MEM.o [RMP_ENTRY_PA + I * 16] = temp_RMP; }
On 7/16/21 7:46 PM, Sean Christopherson wrote: > takes the page size as a parameter even though it unconditionally zeros the page > size flag in the RMP entry for unassigned pages. > > A wrapper around rmpupdate() would definitely help, e.g. (though level might need > to be an "int" to avoid a bunch of casts). > > int rmp_make_shared(u64 pfn, enum pg_level level); > > Wrappers for "private" and "firmware" would probably be helpful too. And if you > do that, I think you can bury both "struct rmpupdate", rmpupdate(), and > X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c. snp_set_rmptable_state() might > need some refactoring to avoid three booleans, but I guess maybe that could be > an exception? Not sure. Anyways, was thinking something like: > > int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid); > int rmp_make_firmware(u64 pfn); > > It would consolidate a bit of code, and more importantly it would give visual > cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared". Okay, I will add helper to make things easier. One case where we will need to directly call the rmpupdate() is during the LAUNCH_UPDATE command. In that case the page is private and its immutable bit is also set. This is because the firmware makes change to the page, and we are required to set the immutable bit before the call. > > Side topic, what happens if a firmware entry is configured with page_size=1? Its not any different from the guest requesting a page private with the page_size=1. Some firmware commands require the page_size=0, and others can work with page_size=1 or page_size=0. > > And one architectural question: what prevents a malicious VMM from punching a 4k > shared page into a 2mb private page? E.g. > > rmpupdate(1 << 20, [private, 2mb]); > rmpupdate(1 << 20 + 4096, [shared, 4kb]); > > I don't see any checks in the pseudocode that will detect this, and presumably the > whole point of a 2mb private RMP entry is to not have to go walk the individual > 4kb entries on a private access. I believe pseudo-code is not meant to be exactly accurate and comprehensive, but it is intended to summarize the HW behavior and explain what can cause the different fault cases. In the real design we may have a separate checks to catch the above issue. I just tested on the hardware to ensure that HW correctly detects the above error condition. However, in this case we are missingĀ a significant check (at least the check that the 2M region is not already assigned). I have raised the concern with the hardware team to look into updating the APM. thank you so much for the bringing this up. thanks
On Mon, Jul 19, 2021, Brijesh Singh wrote: > > On 7/16/21 7:46 PM, Sean Christopherson wrote: > > > takes the page size as a parameter even though it unconditionally zeros the page > > size flag in the RMP entry for unassigned pages. > > > > A wrapper around rmpupdate() would definitely help, e.g. (though level might need > > to be an "int" to avoid a bunch of casts). > > > > int rmp_make_shared(u64 pfn, enum pg_level level); > > > > Wrappers for "private" and "firmware" would probably be helpful too. And if you > > do that, I think you can bury both "struct rmpupdate", rmpupdate(), and > > X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c. snp_set_rmptable_state() might > > need some refactoring to avoid three booleans, but I guess maybe that could be > > an exception? Not sure. Anyways, was thinking something like: > > > > int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid); > > int rmp_make_firmware(u64 pfn); > > > > It would consolidate a bit of code, and more importantly it would give visual > > cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared". > > Okay, I will add helper to make things easier. One case where we will > need to directly call the rmpupdate() is during the LAUNCH_UPDATE > command. In that case the page is private and its immutable bit is also > set. This is because the firmware makes change to the page, and we are > required to set the immutable bit before the call. Or do "int rmp_make_firmware(u64 pfn, bool immutable)"? > > And one architectural question: what prevents a malicious VMM from punching a 4k > > shared page into a 2mb private page? E.g. > > > > rmpupdate(1 << 20, [private, 2mb]); > > rmpupdate(1 << 20 + 4096, [shared, 4kb]); > > > > I don't see any checks in the pseudocode that will detect this, and presumably the > > whole point of a 2mb private RMP entry is to not have to go walk the individual > > 4kb entries on a private access. > > I believe pseudo-code is not meant to be exactly accurate and > comprehensive, but it is intended to summarize the HW behavior and > explain what can cause the different fault cases. In the real design we > may have a separate checks to catch the above issue. I just tested on > the hardware to ensure that HW correctly detects the above error > condition. However, in this case we are missing a significant check (at > least the check that the 2M region is not already assigned). I have > raised the concern with the hardware team to look into updating the APM. Thanks! While you have their ear, please emphasive the importance of the pseudocode for us software folks. It's perfectly ok to omit or gloss over microarchitectural details, but ISA pseudocode is often the source of truth for behavior that is architecturally visible.
On 7/19/21 12:18 PM, Sean Christopherson wrote: >> >> Okay, I will add helper to make things easier. One case where we will >> need to directly call the rmpupdate() is during the LAUNCH_UPDATE >> command. In that case the page is private and its immutable bit is also >> set. This is because the firmware makes change to the page, and we are >> required to set the immutable bit before the call. > > Or do "int rmp_make_firmware(u64 pfn, bool immutable)"? > That's not what we need. We need 'rmp_make_private() + immutable' all in one RMPUPDATE. Here is the snippet from SNP_LAUNCH_UPDATE. + /* Transition the page state to pre-guest */ + memset(&e, 0, sizeof(e)); + e.assigned = 1; + e.gpa = gpa; + e.asid = sev_get_asid(kvm); + e.immutable = true; + e.pagesize = X86_TO_RMP_PG_LEVEL(level); + ret = rmpupdate(inpages[i], &e); thanks
On Mon, Jul 19, 2021, Brijesh Singh wrote: > > On 7/19/21 12:18 PM, Sean Christopherson wrote: > > > > > > Okay, I will add helper to make things easier. One case where we will > > > need to directly call the rmpupdate() is during the LAUNCH_UPDATE > > > command. In that case the page is private and its immutable bit is also > > > set. This is because the firmware makes change to the page, and we are > > > required to set the immutable bit before the call. > > > > Or do "int rmp_make_firmware(u64 pfn, bool immutable)"? > > That's not what we need. > > We need 'rmp_make_private() + immutable' all in one RMPUPDATE. Here is the > snippet from SNP_LAUNCH_UPDATE. Ah, not firmwrare, gotcha. But we can still use a helper, e.g. an inner double-underscore helper, __rmp_make_private().
On Mon, Jul 19, 2021, Sean Christopherson wrote: > On Mon, Jul 19, 2021, Brijesh Singh wrote: > > > > On 7/19/21 12:18 PM, Sean Christopherson wrote: > > > > > > > > Okay, I will add helper to make things easier. One case where we will > > > > need to directly call the rmpupdate() is during the LAUNCH_UPDATE > > > > command. In that case the page is private and its immutable bit is also > > > > set. This is because the firmware makes change to the page, and we are > > > > required to set the immutable bit before the call. > > > > > > Or do "int rmp_make_firmware(u64 pfn, bool immutable)"? > > > > That's not what we need. > > > > We need 'rmp_make_private() + immutable' all in one RMPUPDATE. Here is the > > snippet from SNP_LAUNCH_UPDATE. > > Ah, not firmwrare, gotcha. But we can still use a helper, e.g. an inner > double-underscore helper, __rmp_make_private(). Hmm, looking at it again, I think I also got confused by the comment for the VMSA page: /* Transition the VMSA page to a firmware state. */ e.assigned = 1; e.immutable = 1; e.asid = sev->asid; e.gpa = -1; e.pagesize = RMP_PG_SIZE_4K; Unlike __snp_alloc_firmware_pages() in the CCP code, the VMSA is associated with the guest's ASID, just not a GPA. I.e. the VMSA is more of a specialized guest private page, as opposed to a dedicated firmware page. I.e. a __rmp_make_private() and/or rmp_make_private_immutable() definitely seems like a good idea.
On 7/19/21 2:03 PM, Sean Christopherson wrote: > On Mon, Jul 19, 2021, Brijesh Singh wrote: >> >> On 7/19/21 12:18 PM, Sean Christopherson wrote: >>>> >>>> Okay, I will add helper to make things easier. One case where we will >>>> need to directly call the rmpupdate() is during the LAUNCH_UPDATE >>>> command. In that case the page is private and its immutable bit is also >>>> set. This is because the firmware makes change to the page, and we are >>>> required to set the immutable bit before the call. >>> >>> Or do "int rmp_make_firmware(u64 pfn, bool immutable)"? >> >> That's not what we need. >> >> We need 'rmp_make_private() + immutable' all in one RMPUPDATE. Here is the >> snippet from SNP_LAUNCH_UPDATE. > > Ah, not firmwrare, gotcha. But we can still use a helper, e.g. an inner > double-underscore helper, __rmp_make_private(). > In that case we are basically passing the all the fields defined in the 'struct rmpupdate' as individual arguments. How about something like this: * core kernel exports the rmpupdate() * the include/linux/sev.h header file defines the helper functions int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid) int rmp_make_firmware(u64 pfn, int psize); int rmp_make_shared(u64 pfn, int psize); In most of the case above 3 helpers are good. If driver finds that the above helper does not fit its need (such as SNP_LAUNCH_UPDATE) then call the rmpupdate() without going through the helper. -Brijesh
On Mon, Jul 19, 2021, Brijesh Singh wrote: > > On 7/19/21 2:03 PM, Sean Christopherson wrote: > > On Mon, Jul 19, 2021, Brijesh Singh wrote: > > Ah, not firmwrare, gotcha. But we can still use a helper, e.g. an inner > > double-underscore helper, __rmp_make_private(). > > In that case we are basically passing the all the fields defined in the > 'struct rmpupdate' as individual arguments. Yes, but (a) not _all_ fields, (b) it would allow hiding "struct rmpupdate", and (c) this is much friendlier to readers: __rmp_make_private(pfn, gpa, PG_LEVEL_4K, svm->asid, true); than: rmpupdate(&rmpupdate); For the former, I can see in a single line of code that KVM is creating a 4k private, immutable guest page. With the latter, I need to go hunt down all code that modifies rmpupdate to understand what the code is doing. > How about something like this: > > * core kernel exports the rmpupdate() > * the include/linux/sev.h header file defines the helper functions > > int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid) I think we'll want s/psize/level, i.e. make it more obvious clear that the input is PG_LEVEL_*.
On 7/20/21 11:40 AM, Sean Christopherson wrote: > On Mon, Jul 19, 2021, Brijesh Singh wrote: >> >> On 7/19/21 2:03 PM, Sean Christopherson wrote: >>> On Mon, Jul 19, 2021, Brijesh Singh wrote: >>> Ah, not firmwrare, gotcha. But we can still use a helper, e.g. an inner >>> double-underscore helper, __rmp_make_private(). >> >> In that case we are basically passing the all the fields defined in the >> 'struct rmpupdate' as individual arguments. > > Yes, but (a) not _all_ fields, (b) it would allow hiding "struct rmpupdate", and > (c) this is much friendlier to readers: > > __rmp_make_private(pfn, gpa, PG_LEVEL_4K, svm->asid, true); > > than: > > rmpupdate(&rmpupdate); > Ok. > For the former, I can see in a single line of code that KVM is creating a 4k > private, immutable guest page. With the latter, I need to go hunt down all code > that modifies rmpupdate to understand what the code is doing. > >> How about something like this: >> >> * core kernel exports the rmpupdate() >> * the include/linux/sev.h header file defines the helper functions >> >> int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid) > > I think we'll want s/psize/level, i.e. make it more obvious clear that the input > is PG_LEVEL_*. > ok, I will stick to x86 PG_LEVEL_* thanks
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 1f0635ac9ff9..4468995dd209 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1940,6 +1940,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range) static void __unregister_enc_region_locked(struct kvm *kvm, struct enc_region *region) { + struct rmpupdate val = {}; + unsigned long i, pfn; + struct rmpentry *e; + int level, rc; + + /* + * The guest memory pages are assigned in the RMP table. Unassign it + * before releasing the memory. + */ + if (sev_snp_guest(kvm)) { + for (i = 0; i < region->npages; i++) { + pfn = page_to_pfn(region->pages[i]); + + if (need_resched()) + schedule(); + + e = snp_lookup_page_in_rmptable(region->pages[i], &level); + if (unlikely(!e)) + continue; + + /* If its not a guest assigned page then skip it. */ + if (!rmpentry_assigned(e)) + continue; + + /* Is the page part of a 2MB RMP entry? */ + if (level == PG_LEVEL_2M) { + val.pagesize = RMP_PG_SIZE_2M; + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1); + } else { + val.pagesize = RMP_PG_SIZE_4K; + } + + /* Transition the page to hypervisor owned. */ + rc = rmpupdate(pfn_to_page(pfn), &val); + if (rc) + pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc); + } + } + sev_unpin_memory(kvm, region->pages, region->npages); list_del(®ion->list); kfree(region);
The guest pages of the SEV-SNP VM maybe added as a private page in the RMP entry (assigned bit is set). The guest private pages must be transitioned to the hypervisor state before its freed. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)