Message ID | 20231017202505.340906-5-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Handle set_memory_XXcrypted() errors | expand |
On Tue, Oct 17, 2023 at 01:24:59PM -0700, Rick Edgecombe wrote: > On TDX it is possible for the untrusted host to cause > set_memory_encrypted() or set_memory_decrypted() to fail such that an > error is returned and the resulting memory is shared. Callers need to take > care to handle these errors to avoid returning decrypted (shared) memory to > the page allocator, which could lead to functional or security issues. > > Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails. > Use the recently added free_decrypted_pages() to avoid this. > > In swiotlb_exit(), check for set_memory_encrypted() errors manually, > because the pages are not nessarily going to the page allocator. Whatever recently introduced it didn't make it to my mailbox. Please always CC everyone on every patch in a series, everything else is impossible to review.
On Wed, 2023-10-18 at 06:43 +0200, Christoph Hellwig wrote: > On Tue, Oct 17, 2023 at 01:24:59PM -0700, Rick Edgecombe wrote: > > On TDX it is possible for the untrusted host to cause > > set_memory_encrypted() or set_memory_decrypted() to fail such that > > an > > error is returned and the resulting memory is shared. Callers need > > to take > > care to handle these errors to avoid returning decrypted (shared) > > memory to > > the page allocator, which could lead to functional or security > > issues. > > > > Swiotlb could free decrypted/shared pages if set_memory_decrypted() > > fails. > > Use the recently added free_decrypted_pages() to avoid this. > > > > In swiotlb_exit(), check for set_memory_encrypted() errors > > manually, > > because the pages are not nessarily going to the page allocator. > > Whatever recently introduced it didn't make it to my mailbox. Please > always CC everyone on every patch in a series, everything else is > impossible to review. Ok. The series touches a bunch of set_memory() callers all over, so I was trying to manage the CC list to something reasonable. I tried to give a summary in each commit, but I guess it wasn't in depth enough. Here is the lore link, if you haven't already found it: https://lore.kernel.org/lkml/20231017202505.340906-1-rick.p.edgecombe@intel.com/
On Tue, 17 Oct 2023 13:24:59 -0700 Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > On TDX it is possible for the untrusted host to cause > set_memory_encrypted() or set_memory_decrypted() to fail such that an > error is returned and the resulting memory is shared. Callers need to take > care to handle these errors to avoid returning decrypted (shared) memory to > the page allocator, which could lead to functional or security issues. > > Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails. > Use the recently added free_decrypted_pages() to avoid this. > > In swiotlb_exit(), check for set_memory_encrypted() errors manually, > because the pages are not nessarily going to the page allocator. > > Cc: Christoph Hellwig <hch@lst.de> > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: iommu@lists.linux.dev > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > kernel/dma/swiotlb.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 394494a6b1f3..ad06786c4f98 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -524,6 +524,7 @@ void __init swiotlb_exit(void) > unsigned long tbl_vaddr; > size_t tbl_size, slots_size; > unsigned int area_order; > + int ret; > > if (swiotlb_force_bounce) > return; > @@ -536,17 +537,19 @@ void __init swiotlb_exit(void) > tbl_size = PAGE_ALIGN(mem->end - mem->start); > slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs)); > > - set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT); > + ret = set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT); > if (mem->late_alloc) { > area_order = get_order(array_size(sizeof(*mem->areas), > mem->nareas)); > free_pages((unsigned long)mem->areas, area_order); > - free_pages(tbl_vaddr, get_order(tbl_size)); > + if (!ret) > + free_pages(tbl_vaddr, get_order(tbl_size)); > free_pages((unsigned long)mem->slots, get_order(slots_size)); > } else { > memblock_free_late(__pa(mem->areas), > array_size(sizeof(*mem->areas), mem->nareas)); > - memblock_free_late(mem->start, tbl_size); > + if (!ret) > + memblock_free_late(mem->start, tbl_size); > memblock_free_late(__pa(mem->slots), slots_size); > } > > @@ -581,7 +584,7 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes) > return page; > > error: > - __free_pages(page, order); > + free_decrypted_pages((unsigned long)vaddr, order); > return NULL; > } I admit I'm not familiar with the encryption/decryption API, but if a __free_pages() is not sufficient here, then it is quite confusing. The error label is reached only if set_memory_decrypted() returns non-zero. My naive expectation is that the memory is *not* decrypted in that case and does not require special treatment. Is this assumption wrong? OTOH I believe there is a bug in the logic. The subsequent __free_pages() in swiotlb_alloc_tlb() would have to be changed to a free_decrypted_pages(). However, I'm proposing a different approach to address the latter issue here: https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/ Petr T
On Tue, 2023-10-31 at 11:43 +0100, Petr Tesařík wrote: > > I admit I'm not familiar with the encryption/decryption API, but if a > __free_pages() is not sufficient here, then it is quite confusing. > The error label is reached only if set_memory_decrypted() returns > non-zero. My naive expectation is that the memory is *not* decrypted > in > that case and does not require special treatment. Is this assumption > wrong? Yea, the memory can still be decrypted, or partially decrypted. On x86, all the set_memory() calls can fail part way through the work, and they don't rollback the changes they had made up to that point. I was thinking about trying to changes this, but that is the current behavior. But in TDX's case set_memory_decrypted() can be fully successful and just return an error. And there aren't any plans to fix the TDX case (which has special VMM work that the kernel doesn't have control over). So instead, the plan is to WARN about it and count on the caller to handle the error properly: https://lore.kernel.org/lkml/20231027214744.1742056-1-rick.p.edgecombe@intel.com/ > > OTOH I believe there is a bug in the logic. The subsequent > __free_pages() in swiotlb_alloc_tlb() would have to be changed to a > free_decrypted_pages(). However, I'm proposing a different approach > to > address the latter issue here: > > https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/ Oh, yes, that makes sense. I was planning to send a patch to just leak the pages if set_memory_decrypted() fails, after my v2 linked above is accepted. It could have a different label than the phys_limit check error path added in your linked patch, so that case would still free the perfectly fine encrypted pages.
On Tue, 31 Oct 2023 15:54:52 +0000 "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote: > On Tue, 2023-10-31 at 11:43 +0100, Petr Tesařík wrote: > > > > I admit I'm not familiar with the encryption/decryption API, but if a > > __free_pages() is not sufficient here, then it is quite confusing. > > The error label is reached only if set_memory_decrypted() returns > > non-zero. My naive expectation is that the memory is *not* decrypted > > in > > that case and does not require special treatment. Is this assumption > > wrong? > > Yea, the memory can still be decrypted, or partially decrypted. On x86, > all the set_memory() calls can fail part way through the work, and they > don't rollback the changes they had made up to that point. Thank you for the explanation. So, after set_memory_decrypted() fails, the pages become Schroedinger-crypted, but since its true state cannot be observed by the guest kernel, it stays as such forever. Sweet. >[...] > > OTOH I believe there is a bug in the logic. The subsequent > > __free_pages() in swiotlb_alloc_tlb() would have to be changed to a > > free_decrypted_pages(). However, I'm proposing a different approach > > to > > address the latter issue here: > > > > https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/ > > Oh, yes, that makes sense. I was planning to send a patch to just leak > the pages if set_memory_decrypted() fails, after my v2 linked above is > accepted. It could have a different label than the phys_limit check > error path added in your linked patch, so that case would still free > the perfectly fine encrypted pages. Hm, should I incorporate this knowledge into a v2 of my patch and address both issues? Petr T
On Tue, 2023-10-31 at 18:13 +0100, Petr Tesařík wrote: > Thank you for the explanation. So, after set_memory_decrypted() > fails, > the pages become Schroedinger-crypted, but since its true state > cannot > be observed by the guest kernel, it stays as such forever. > > Sweet. > Yes... The untrusted host (the part of the VMM TDX is defending against) gets to specify the return code of these operations (success or failure). But the coco(a general term for TDX and similar from other vendors) threat model doesn't include DOS. So the guest should trust the return code as far as trying to not crash, but not trust it in regards to the potential to leak data. It's a bit to ask of the callers, but the other solution we discussed was to panic the guest if any weirdness is observed by the VMM, in which case the callers would never see the error. And of course panicing the kernel is Bad. So that is how we arrived at this request of the callers. Appreciate the effort to handle it on that side. > Hm, should I incorporate this knowledge into a v2 of my patch and > address both issues? That sounds good to me! Feel free to CC me if you would like, and I can scrutinize it for this particular issue.
Hi, On Tue, 31 Oct 2023 17:29:25 +0000 "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote: > On Tue, 2023-10-31 at 18:13 +0100, Petr Tesařík wrote: > > Thank you for the explanation. So, after set_memory_decrypted() > > fails, > > the pages become Schroedinger-crypted, but since its true state > > cannot > > be observed by the guest kernel, it stays as such forever. > > > > Sweet. > > > Yes... The untrusted host (the part of the VMM TDX is defending > against) gets to specify the return code of these operations (success > or failure). But the coco(a general term for TDX and similar from other > vendors) threat model doesn't include DOS. So the guest should trust > the return code as far as trying to not crash, but not trust it in > regards to the potential to leak data. > > It's a bit to ask of the callers, but the other solution we discussed > was to panic the guest if any weirdness is observed by the VMM, in > which case the callers would never see the error. And of course > panicing the kernel is Bad. So that is how we arrived at this request > of the callers. Appreciate the effort to handle it on that side. > > > > Hm, should I incorporate this knowledge into a v2 of my patch and > > address both issues? > > That sounds good to me! Feel free to CC me if you would like, and I can > scrutinize it for this particular issue. I'm sorry I missed that free_decrypted_pages() is added by the very same series, so I cannot use it just yet. I can open-code it and let you convert the code to the new function. You may then also want to convert another open-coded instance further down in swiotlb_free_tlb(). In any case, there is an interdependency between the two patches, so we should agree in which order to apply them. Petr T
On Wed, 2023-11-01 at 07:27 +0100, Petr Tesařík wrote: > I'm sorry I missed that free_decrypted_pages() is added by the very > same series, so I cannot use it just yet. I can open-code it and let > you convert the code to the new function. You may then also want to > convert another open-coded instance further down in > swiotlb_free_tlb(). > > In any case, there is an interdependency between the two patches, so > we > should agree in which order to apply them. Open coding in the callers is the current plan (see an explanation after the "---" in the v1 of that patch[0] if you are interested). There might not be any interdependency between the the warning and swiotlb changes, but I can double check if you CC me. [0] https://lore.kernel.org/lkml/20231024234829.1443125-1-rick.p.edgecombe@intel.com/
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 394494a6b1f3..ad06786c4f98 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -524,6 +524,7 @@ void __init swiotlb_exit(void) unsigned long tbl_vaddr; size_t tbl_size, slots_size; unsigned int area_order; + int ret; if (swiotlb_force_bounce) return; @@ -536,17 +537,19 @@ void __init swiotlb_exit(void) tbl_size = PAGE_ALIGN(mem->end - mem->start); slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs)); - set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT); + ret = set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT); if (mem->late_alloc) { area_order = get_order(array_size(sizeof(*mem->areas), mem->nareas)); free_pages((unsigned long)mem->areas, area_order); - free_pages(tbl_vaddr, get_order(tbl_size)); + if (!ret) + free_pages(tbl_vaddr, get_order(tbl_size)); free_pages((unsigned long)mem->slots, get_order(slots_size)); } else { memblock_free_late(__pa(mem->areas), array_size(sizeof(*mem->areas), mem->nareas)); - memblock_free_late(mem->start, tbl_size); + if (!ret) + memblock_free_late(mem->start, tbl_size); memblock_free_late(__pa(mem->slots), slots_size); } @@ -581,7 +584,7 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes) return page; error: - __free_pages(page, order); + free_decrypted_pages((unsigned long)vaddr, order); return NULL; }
On TDX it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails. Use the recently added free_decrypted_pages() to avoid this. In swiotlb_exit(), check for set_memory_encrypted() errors manually, because the pages are not nessarily going to the page allocator. Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: iommu@lists.linux.dev Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- kernel/dma/swiotlb.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)