Message ID | 98553b89-6296-9e4c-4677-9201cd7cdeef@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | IOMMU: superpage support when not sharing pagetables | expand |
On Mon, Apr 25, 2022 at 10:43:45AM +0200, Jan Beulich wrote: > When a page table ends up with all contiguous entries (including all > identical attributes), it can be replaced by a superpage entry at the > next higher level. The page table itself can then be scheduled for > freeing. > > The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap > for whenever we (and obviously hardware) start supporting 512G mappings. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > Reviewed-by: Kevin Tian <kevin.tian@intel.com> Like on the AMD side, I wonder whether you can get away with only doing a cache flush for the last (highest level) PTE, as the lower ones won't be reachable anyway, as the page-table is freed. Then the flush could be done outside of the locked region. The rest LGTM. Thanks, Roger.
On 11.05.2022 13:08, Roger Pau Monné wrote: > On Mon, Apr 25, 2022 at 10:43:45AM +0200, Jan Beulich wrote: >> When a page table ends up with all contiguous entries (including all >> identical attributes), it can be replaced by a superpage entry at the >> next higher level. The page table itself can then be scheduled for >> freeing. >> >> The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap >> for whenever we (and obviously hardware) start supporting 512G mappings. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > Like on the AMD side, I wonder whether you can get away with only FTAOD I take it you mean "like on the all-empty side", as on AMD we don't need to do any cache flushing? > doing a cache flush for the last (highest level) PTE, as the lower > ones won't be reachable anyway, as the page-table is freed. But that freeing will happen only later, with a TLB flush in between. Until then we would better make sure the IOMMU sees what was written, even if it reading a stale value _should_ be benign. Jan > Then the flush could be done outside of the locked region. > > The rest LGTM. > > Thanks, Roger. >
On Wed, May 18, 2022 at 12:44:06PM +0200, Jan Beulich wrote: > On 11.05.2022 13:08, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:43:45AM +0200, Jan Beulich wrote: > >> When a page table ends up with all contiguous entries (including all > >> identical attributes), it can be replaced by a superpage entry at the > >> next higher level. The page table itself can then be scheduled for > >> freeing. > >> > >> The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap > >> for whenever we (and obviously hardware) start supporting 512G mappings. > >> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > > Like on the AMD side, I wonder whether you can get away with only > > FTAOD I take it you mean "like on the all-empty side", as on AMD we > don't need to do any cache flushing? Heh, yes, sorry. > > doing a cache flush for the last (highest level) PTE, as the lower > > ones won't be reachable anyway, as the page-table is freed. > > But that freeing will happen only later, with a TLB flush in between. > Until then we would better make sure the IOMMU sees what was written, > even if it reading a stale value _should_ be benign. Hm, but when doing the TLB flush the paging structures will already be fully updated, and the top level visible entry will have it's cache flushed, so the lower ones would never be reachable AFAICT. Thanks, Roger.
--- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2216,14 +2216,35 @@ static int __must_check cf_check intel_i * While the (ab)use of PTE_kind_table here allows to save some work in * the function, the main motivation for it is that it avoids a so far * unexplained hang during boot (while preparing Dom0) on a Westmere - * based laptop. + * based laptop. This also has the intended effect of terminating the + * loop when super pages aren't supported anymore at the next level. */ - pt_update_contig_markers(&page->val, - address_level_offset(dfn_to_daddr(dfn), level), - level, - (hd->platform_ops->page_sizes & - (1UL << level_to_offset_bits(level + 1)) - ? PTE_kind_leaf : PTE_kind_table)); + while ( pt_update_contig_markers(&page->val, + address_level_offset(dfn_to_daddr(dfn), level), + level, + (hd->platform_ops->page_sizes & + (1UL << level_to_offset_bits(level + 1)) + ? PTE_kind_leaf : PTE_kind_table)) ) + { + struct page_info *pg = maddr_to_page(pg_maddr); + + unmap_vtd_domain_page(page); + + new.val &= ~(LEVEL_MASK << level_to_offset_bits(level)); + dma_set_pte_superpage(new); + + pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), ++level, + flush_flags, false); + BUG_ON(pg_maddr < PAGE_SIZE); + + page = map_vtd_domain_page(pg_maddr); + pte = &page[address_level_offset(dfn_to_daddr(dfn), level)]; + *pte = new; + iommu_sync_cache(pte, sizeof(*pte)); + + *flush_flags |= IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; + iommu_queue_free_pgtable(hd, pg); + } spin_unlock(&hd->arch.mapping_lock); unmap_vtd_domain_page(page); --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -232,7 +232,7 @@ struct context_entry { /* page table handling */ #define LEVEL_STRIDE (9) -#define LEVEL_MASK ((1 << LEVEL_STRIDE) - 1) +#define LEVEL_MASK (PTE_NUM - 1UL) #define PTE_NUM (1 << LEVEL_STRIDE) #define level_to_agaw(val) ((val) - 2) #define agaw_to_level(val) ((val) + 2)