Message ID | 20210209152816.15792-5-julien@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand |
> -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 09 February 2021 15:28 > To: xen-devel@lists.xenproject.org > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich > <jbeulich@suse.com>; Paul Durrant <paul@xen.org> > Subject: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables > > From: Julien Grall <jgrall@amazon.com> > > The new IOMMU page-tables allocator will release the pages when > relinquish the domain resources. However, this is not sufficient when > the domain is dying because nothing prevents page-table to be allocated. > > iommu_alloc_pgtable() is now checking if the domain is dying before > adding the page in the list. We are relying on &hd->arch.pgtables.lock > to synchronize d->is_dying. > > Take the opportunity to check in arch_iommu_domain_destroy() that all > that page tables have been freed. > > Signed-off-by: Julien Grall <jgrall@amazon.com> > Reviewed-by: Paul Durrant <paul@xen.org>
On 09.02.2021 16:28, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The new IOMMU page-tables allocator will release the pages when > relinquish the domain resources. However, this is not sufficient when > the domain is dying because nothing prevents page-table to be allocated. > > iommu_alloc_pgtable() is now checking if the domain is dying before > adding the page in the list. We are relying on &hd->arch.pgtables.lock > to synchronize d->is_dying. As said in reply to an earlier patch, I think suppressing (really: ignoring) new mappings would be better. You could utilize the same lock, but you'd need to duplicate the checking in {amd,intel}_iommu_map_page(). I'm not entirely certain in the case about unmap requests: It may be possible to also suppress/ignore them, but this may require some further thought. Apart from this, just in case we settle on your current approach, a few spelling nits: > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d) > > void arch_iommu_domain_destroy(struct domain *d) > { > + /* > + * There should be not page-tables left allocated by the time the ... should be no ... > + * domain is destroyed. Note that arch_iommu_domain_destroy() is > + * called unconditionally, so pgtables may be unitialized. uninitialized > @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) > unmap_domain_page(p); > > spin_lock(&hd->arch.pgtables.lock); > - page_list_add(pg, &hd->arch.pgtables.list); > + /* > + * The IOMMU page-tables are freed when relinquishing the domain, but > + * nothing prevent allocation to happen afterwards. There is no valid prevents > + * reasons to continue to update the IOMMU page-tables while the reason > + * domain is dying. > + * > + * So prevent page-table allocation when the domain is dying. > + * > + * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying. rely Jan
On 09.02.2021 16:28, Julien Grall wrote: > @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) > unmap_domain_page(p); > > spin_lock(&hd->arch.pgtables.lock); > - page_list_add(pg, &hd->arch.pgtables.list); > + /* > + * The IOMMU page-tables are freed when relinquishing the domain, but > + * nothing prevent allocation to happen afterwards. There is no valid > + * reasons to continue to update the IOMMU page-tables while the > + * domain is dying. > + * > + * So prevent page-table allocation when the domain is dying. > + * > + * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying. > + */ > + if ( likely(!d->is_dying) ) > + { > + alive = true; > + page_list_add(pg, &hd->arch.pgtables.list); > + } > spin_unlock(&hd->arch.pgtables.lock); > > + if ( unlikely(!alive) ) > + { > + free_domheap_page(pg); > + pg = NULL; > + } > + > return pg; > } There's a pretty clear downside to this approach compared to that of ignoring maps (and perhaps also unmaps) for dying domains: The caller here will (hopefully) recognize and propagate an error. Additionally (considering the situation patch 5 fixes) ignoring unmaps may provide quite a bit of a performance gain for domain destruction - we don't need every individual page unmapped from the page tables. Jan
On 10/02/2021 14:32, Jan Beulich wrote: > On 09.02.2021 16:28, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> The new IOMMU page-tables allocator will release the pages when >> relinquish the domain resources. However, this is not sufficient when >> the domain is dying because nothing prevents page-table to be allocated. >> >> iommu_alloc_pgtable() is now checking if the domain is dying before >> adding the page in the list. We are relying on &hd->arch.pgtables.lock >> to synchronize d->is_dying. > > As said in reply to an earlier patch, I think suppressing > (really: ignoring) new mappings would be better. This is exactly what I suggested in v1 but you wrote: "Ignoring requests there seems fragile to me. Paul - what are your thoughts about bailing early from hvm_add_ioreq_gfn() when the domain is dying?" Are you know saying that the following snipped would be fine: if ( d->is_dying ) return 0; > You could > utilize the same lock, but you'd need to duplicate the > checking in {amd,intel}_iommu_map_page(). > > I'm not entirely certain in the case about unmap requests: > It may be possible to also suppress/ignore them, but this > may require some further thought. I think the unmap part is quite risky to d->is_dying because the PCI devices may not quiesced and still assigned to the domain. > > Apart from this, just in case we settle on your current > approach, a few spelling nits: > >> --- a/xen/drivers/passthrough/x86/iommu.c >> +++ b/xen/drivers/passthrough/x86/iommu.c >> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d) >> >> void arch_iommu_domain_destroy(struct domain *d) >> { >> + /* >> + * There should be not page-tables left allocated by the time the > > ... should be no ... > >> + * domain is destroyed. Note that arch_iommu_domain_destroy() is >> + * called unconditionally, so pgtables may be unitialized. > > uninitialized > >> @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) >> unmap_domain_page(p); >> >> spin_lock(&hd->arch.pgtables.lock); >> - page_list_add(pg, &hd->arch.pgtables.list); >> + /* >> + * The IOMMU page-tables are freed when relinquishing the domain, but >> + * nothing prevent allocation to happen afterwards. There is no valid > > prevents > >> + * reasons to continue to update the IOMMU page-tables while the > > reason > >> + * domain is dying. >> + * >> + * So prevent page-table allocation when the domain is dying. >> + * >> + * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying. > > rely > > Jan >
On 10.02.2021 16:04, Julien Grall wrote: > On 10/02/2021 14:32, Jan Beulich wrote: >> On 09.02.2021 16:28, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> The new IOMMU page-tables allocator will release the pages when >>> relinquish the domain resources. However, this is not sufficient when >>> the domain is dying because nothing prevents page-table to be allocated. >>> >>> iommu_alloc_pgtable() is now checking if the domain is dying before >>> adding the page in the list. We are relying on &hd->arch.pgtables.lock >>> to synchronize d->is_dying. >> >> As said in reply to an earlier patch, I think suppressing >> (really: ignoring) new mappings would be better. > > This is exactly what I suggested in v1 but you wrote: > > "Ignoring requests there seems fragile to me. Paul - what are your > thoughts about bailing early from hvm_add_ioreq_gfn() when the > domain is dying?" Was this on the thread of this patch? I didn't find such a reply of mine. I need more context here because you name hvm_add_ioreq_gfn() above, while I refer to iommu_map() (and downwards the call stack). > Are you know saying that the following snipped would be fine: > > if ( d->is_dying ) > return 0; In {amd,intel}_iommu_map_page(), after the lock was acquired and with it suitably released, yes. And if that's what you suggested, then I'm sorry - I don't think I can see anything fragile there anymore. >> You could >> utilize the same lock, but you'd need to duplicate the >> checking in {amd,intel}_iommu_map_page(). >> >> I'm not entirely certain in the case about unmap requests: >> It may be possible to also suppress/ignore them, but this >> may require some further thought. > > I think the unmap part is quite risky to d->is_dying because the PCI > devices may not quiesced and still assigned to the domain. Hmm, yes, good point. Of course upon first unmap with is_dying observed set we could zap the root page table, but I don't suppose that's something we want to do for 4.15. Jan
Hi Jan, On 10/02/2021 16:12, Jan Beulich wrote: > On 10.02.2021 16:04, Julien Grall wrote: >> On 10/02/2021 14:32, Jan Beulich wrote: >>> On 09.02.2021 16:28, Julien Grall wrote: >>>> From: Julien Grall <jgrall@amazon.com> >>>> >>>> The new IOMMU page-tables allocator will release the pages when >>>> relinquish the domain resources. However, this is not sufficient when >>>> the domain is dying because nothing prevents page-table to be allocated. >>>> >>>> iommu_alloc_pgtable() is now checking if the domain is dying before >>>> adding the page in the list. We are relying on &hd->arch.pgtables.lock >>>> to synchronize d->is_dying. >>> >>> As said in reply to an earlier patch, I think suppressing >>> (really: ignoring) new mappings would be better. >> >> This is exactly what I suggested in v1 but you wrote: >> >> "Ignoring requests there seems fragile to me. Paul - what are your >> thoughts about bailing early from hvm_add_ioreq_gfn() when the >> domain is dying?" > > Was this on the thread of this patch? I didn't find such a > reply of mine. I need more context here because you name > hvm_add_ioreq_gfn() above, while I refer to iommu_map() > (and downwards the call stack). See [1]. > >> Are you know saying that the following snipped would be fine: >> >> if ( d->is_dying ) >> return 0; > > In {amd,intel}_iommu_map_page(), after the lock was acquired > and with it suitably released, yes. And if that's what you > suggested, then I'm sorry - I don't think I can see anything > fragile there anymore. Duplicating the check sounds good to me. > >>> You could >>> utilize the same lock, but you'd need to duplicate the >>> checking in {amd,intel}_iommu_map_page(). >>> >>> I'm not entirely certain in the case about unmap requests: >>> It may be possible to also suppress/ignore them, but this >>> may require some further thought. >> >> I think the unmap part is quite risky to d->is_dying because the PCI >> devices may not quiesced and still assigned to the domain. > > Hmm, yes, good point. Of course upon first unmap with is_dying > observed set we could zap the root page table, but I don't > suppose that's something we want to do for 4.15. We would still need to zap the root page table in the relinquish path. So I am not sure what benefits it would give us to zap the page tables on the first iommu_unmap() afther domain dies. Cheers, [1] https://lore.kernel.org/xen-devel/f21f1f61-5213-55a8-320c-43e5fe80100f@suse.com/
On 17.02.2021 12:49, Julien Grall wrote: > On 10/02/2021 16:12, Jan Beulich wrote: >> On 10.02.2021 16:04, Julien Grall wrote: >>> Are you know saying that the following snipped would be fine: >>> >>> if ( d->is_dying ) >>> return 0; >> >> In {amd,intel}_iommu_map_page(), after the lock was acquired >> and with it suitably released, yes. And if that's what you >> suggested, then I'm sorry - I don't think I can see anything >> fragile there anymore. > > Duplicating the check sounds good to me. The checks in said functions are mandatory, and I didn't really have any duplication in mind. But yes, iommu_map() could have an early (but racy) check, if so wanted. >>>> You could >>>> utilize the same lock, but you'd need to duplicate the >>>> checking in {amd,intel}_iommu_map_page(). >>>> >>>> I'm not entirely certain in the case about unmap requests: >>>> It may be possible to also suppress/ignore them, but this >>>> may require some further thought. >>> >>> I think the unmap part is quite risky to d->is_dying because the PCI >>> devices may not quiesced and still assigned to the domain. >> >> Hmm, yes, good point. Of course upon first unmap with is_dying >> observed set we could zap the root page table, but I don't >> suppose that's something we want to do for 4.15. > > We would still need to zap the root page table in the relinquish path. > So I am not sure what benefits it would give us to zap the page tables > on the first iommu_unmap() afther domain dies. I guess we think of different aspects of the zapping - what I'm suggesting here is for the effect of no translations remaining active anymore. I'm not after freeing the memory at this point; that'll have to happen on the relinquish path, as you say. The problem with allowing individual unmaps to proceed (unlike your plan for maps) is that these, too, can trigger allocations (when a large page needs shattering). Jan
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index cea1032b3d02..82d770107a47 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d) void arch_iommu_domain_destroy(struct domain *d) { + /* + * There should be not page-tables left allocated by the time the + * domain is destroyed. Note that arch_iommu_domain_destroy() is + * called unconditionally, so pgtables may be unitialized. + */ + ASSERT(dom_iommu(d)->platform_ops == NULL || + page_list_empty(&dom_iommu(d)->arch.pgtables.list)); } static bool __hwdom_init hwdom_iommu_map(const struct domain *d, @@ -267,6 +274,12 @@ int iommu_free_pgtables(struct domain *d) struct page_info *pg; unsigned int done = 0; + if ( !is_iommu_enabled(d) ) + return 0; + + /* After this barrier no new page allocations can occur. */ + spin_barrier(&hd->arch.pgtables.lock); + while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) ) { free_domheap_page(pg); @@ -284,6 +297,7 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) unsigned int memflags = 0; struct page_info *pg; void *p; + bool alive = false; #ifdef CONFIG_NUMA if ( hd->node != NUMA_NO_NODE ) @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) unmap_domain_page(p); spin_lock(&hd->arch.pgtables.lock); - page_list_add(pg, &hd->arch.pgtables.list); + /* + * The IOMMU page-tables are freed when relinquishing the domain, but + * nothing prevent allocation to happen afterwards. There is no valid + * reasons to continue to update the IOMMU page-tables while the + * domain is dying. + * + * So prevent page-table allocation when the domain is dying. + * + * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying. + */ + if ( likely(!d->is_dying) ) + { + alive = true; + page_list_add(pg, &hd->arch.pgtables.list); + } spin_unlock(&hd->arch.pgtables.lock); + if ( unlikely(!alive) ) + { + free_domheap_page(pg); + pg = NULL; + } + return pg; }