Message ID | 20231230161954.569267-16-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Initialization Support | expand |
On 12/30/23 17:19, Michael Roth wrote: > From: Ashish Kalra <ashish.kalra@amd.com> > > Pages are unsafe to be released back to the page-allocator, if they > have been transitioned to firmware/guest state and can't be reclaimed > or transitioned back to hypervisor/shared state. In this case add > them to an internal leaked pages list to ensure that they are not freed > or touched/accessed to cause fatal page faults. > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> > [mdr: relocate to arch/x86/virt/svm/sev.c] > Signed-off-by: Michael Roth <michael.roth@amd.com> Hi, sorry I didn't respond in time to the last mail discussing previous version in https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/ due to upcoming holidays. I would rather avoid the approach of allocating container objects: - it's allocating memory when effectively losing memory, a dangerous thing - are all the callers and their context ok with GFP_KERNEL? - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the current process, it's probably not its fault the pages are leaked? Also the charging can fail? - given the benefit of having leaked pages on a list is basically just debugging (i.e. crash dump or drgn inspection) this seems too heavy I think it would be better and sufficient to use page->lru for order-0 and head pages, and simply skip tail pages (possibly with adjusted warning message for that case). Vlastimil <snip> > + > +void snp_leak_pages(u64 pfn, unsigned int npages) > +{ > + struct page *page = pfn_to_page(pfn); > + struct leaked_page *leak; > + > + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages); > + > + spin_lock(&snp_leaked_pages_list_lock); > + while (npages--) { > + leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT); > + if (!leak) > + goto unlock; Should we skip the dump_rmpentry() in such a case? > + leak->page = page; > + list_add_tail(&leak->list, &snp_leaked_pages_list); > + dump_rmpentry(pfn); > + snp_nr_leaked_pages++; > + pfn++; > + page++; > + } > +unlock: > + spin_unlock(&snp_leaked_pages_list_lock); > +} > +EXPORT_SYMBOL_GPL(snp_leak_pages);
Hello Vlastimil, On 1/8/2024 4:45 AM, Vlastimil Babka wrote: > On 12/30/23 17:19, Michael Roth wrote: >> From: Ashish Kalra <ashish.kalra@amd.com> >> >> Pages are unsafe to be released back to the page-allocator, if they >> have been transitioned to firmware/guest state and can't be reclaimed >> or transitioned back to hypervisor/shared state. In this case add >> them to an internal leaked pages list to ensure that they are not freed >> or touched/accessed to cause fatal page faults. >> >> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> >> [mdr: relocate to arch/x86/virt/svm/sev.c] >> Signed-off-by: Michael Roth <michael.roth@amd.com> > Hi, sorry I didn't respond in time to the last mail discussing previous > version in > https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/ > due to upcoming holidays. > > I would rather avoid the approach of allocating container objects: > - it's allocating memory when effectively losing memory, a dangerous thing > - are all the callers and their context ok with GFP_KERNEL? > - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the > current process, it's probably not its fault the pages are leaked? Also the > charging can fail? > - given the benefit of having leaked pages on a list is basically just > debugging (i.e. crash dump or drgn inspection) this seems too heavy > > I think it would be better and sufficient to use page->lru for order-0 and > head pages, and simply skip tail pages (possibly with adjusted warning > message for that case). > > Vlastimil > > <snip Considering the above thoughts, this is updated version of snp_leak_pages(), looking forward to any review comments/feedback you have on the same: void snp_leak_pages(u64 pfn, unsigned int npages) { struct page *page = pfn_to_page(pfn); pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages); spin_lock(&snp_leaked_pages_list_lock); while (npages--) { /* * Reuse the page's buddy list for chaining into the leaked * pages list. This page should not be on a free list currently * and is also unsafe to be added to a free list. */ if ((likely(!PageCompound(page))) || (PageCompound(page) && !PageTail(page) && compound_head(page) == page)) /* * Skip inserting tail pages of compound page as * page->buddy_list of tail pages is not usable. */ list_add_tail(&page->buddy_list, &snp_leaked_pages_list); sev_dump_rmpentry(pfn); snp_nr_leaked_pages++; pfn++; page++; } spin_unlock(&snp_leaked_pages_list_lock); } Thanks, Ashish
On 1/9/24 23:19, Kalra, Ashish wrote: > Hello Vlastimil, > > On 1/8/2024 4:45 AM, Vlastimil Babka wrote: >> On 12/30/23 17:19, Michael Roth wrote: >>> From: Ashish Kalra <ashish.kalra@amd.com> >>> >>> Pages are unsafe to be released back to the page-allocator, if they >>> have been transitioned to firmware/guest state and can't be reclaimed >>> or transitioned back to hypervisor/shared state. In this case add >>> them to an internal leaked pages list to ensure that they are not freed >>> or touched/accessed to cause fatal page faults. >>> >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> >>> [mdr: relocate to arch/x86/virt/svm/sev.c] >>> Signed-off-by: Michael Roth <michael.roth@amd.com> >> Hi, sorry I didn't respond in time to the last mail discussing previous >> version in >> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/ >> due to upcoming holidays. >> >> I would rather avoid the approach of allocating container objects: >> - it's allocating memory when effectively losing memory, a dangerous thing >> - are all the callers and their context ok with GFP_KERNEL? >> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the >> current process, it's probably not its fault the pages are leaked? Also the >> charging can fail? >> - given the benefit of having leaked pages on a list is basically just >> debugging (i.e. crash dump or drgn inspection) this seems too heavy >> >> I think it would be better and sufficient to use page->lru for order-0 and >> head pages, and simply skip tail pages (possibly with adjusted warning >> message for that case). >> >> Vlastimil >> >> <snip > > Considering the above thoughts, this is updated version of > snp_leak_pages(), looking forward to any review comments/feedback you > have on the same: > > void snp_leak_pages(u64 pfn, unsigned int npages) > { > struct page *page = pfn_to_page(pfn); > > pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, > pfn, pfn + npages); > > spin_lock(&snp_leaked_pages_list_lock); > while (npages--) { > /* > * Reuse the page's buddy list for chaining into the leaked > * pages list. This page should not be on a free list > currently > * and is also unsafe to be added to a free list. > */ > if ((likely(!PageCompound(page))) || (PageCompound(page) && > !PageTail(page) && compound_head(page) == page)) This is unnecessarily paranoid wrt that compound_head(page) test, but OTOH doesn't handle the weird case when we're leaking less than whole compound page (if that can even happen). So I'd suggest: while (npages) { if ((likely(!PageCompound(page))) || (PageHead(page) && compound_nr(page) <= npages)) list_add_tail(&page->buddy_list, ...) } ... (no change from yours) npages--; } (or an equivalent for()) perhaps > /* > * Skip inserting tail pages of compound page as > * page->buddy_list of tail pages is not usable. > */ > list_add_tail(&page->buddy_list, > &snp_leaked_pages_list); > sev_dump_rmpentry(pfn); > snp_nr_leaked_pages++; > pfn++; > page++; > } > spin_unlock(&snp_leaked_pages_list_lock); > } > > Thanks, Ashish >
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index d3ccb7a0c7e9..435ba9bc4510 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -264,6 +264,7 @@ void snp_dump_hva_rmpentry(unsigned long address); int psmash(u64 pfn); int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable); int rmp_make_shared(u64 pfn, enum pg_level level); +void snp_leak_pages(u64 pfn, unsigned int npages); #else static inline bool snp_probe_rmptable_info(void) { return false; } static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; } @@ -275,6 +276,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as return -ENODEV; } static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; } +static inline void snp_leak_pages(u64 pfn, unsigned int npages) {} #endif #endif diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c index ee182351d93a..0f2e1ce241b5 100644 --- a/arch/x86/virt/svm/sev.c +++ b/arch/x86/virt/svm/sev.c @@ -60,6 +60,17 @@ static u64 probed_rmp_base, probed_rmp_size; static struct rmpentry *rmptable __ro_after_init; static u64 rmptable_max_pfn __ro_after_init; +/* List of pages which are leaked and cannot be reclaimed */ +struct leaked_page { + struct page *page; + struct list_head list; +}; + +static LIST_HEAD(snp_leaked_pages_list); +static DEFINE_SPINLOCK(snp_leaked_pages_list_lock); + +static unsigned long snp_nr_leaked_pages; + #undef pr_fmt #define pr_fmt(fmt) "SEV-SNP: " fmt @@ -476,3 +487,27 @@ int rmp_make_shared(u64 pfn, enum pg_level level) return rmpupdate(pfn, &state); } EXPORT_SYMBOL_GPL(rmp_make_shared); + +void snp_leak_pages(u64 pfn, unsigned int npages) +{ + struct page *page = pfn_to_page(pfn); + struct leaked_page *leak; + + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages); + + spin_lock(&snp_leaked_pages_list_lock); + while (npages--) { + leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT); + if (!leak) + goto unlock; + leak->page = page; + list_add_tail(&leak->list, &snp_leaked_pages_list); + dump_rmpentry(pfn); + snp_nr_leaked_pages++; + pfn++; + page++; + } +unlock: + spin_unlock(&snp_leaked_pages_list_lock); +} +EXPORT_SYMBOL_GPL(snp_leak_pages);