diff mbox series

[v4,17/21] AMD/IOMMU: replace all-contiguous page tables by superpage mappings

Message ID e0e2d865-5ac9-d7ac-c763-f4b99b699224@suse.com (mailing list archive)
State Superseded
Headers show
Series IOMMU: superpage support when not sharing pagetables | expand

Commit Message

Jan Beulich April 25, 2022, 8:43 a.m. UTC
When a page table ends up with all contiguous entries (including all
identical attributes), it can be replaced by a superpage entry at the
next higher level. The page table itself can then be scheduled for
freeing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Unlike the freeing of all-empty page tables, this causes quite a bit of
back and forth for PV domains, due to their mapping/unmapping of pages
when they get converted to/from being page tables. It may therefore be
worth considering to delay re-coalescing a little, to avoid doing so
when the superpage would otherwise get split again pretty soon. But I
think this would better be the subject of a separate change anyway.

Of course this could also be helped by more "aware" kernel side
behavior: They could avoid immediately mapping freed page tables
writable again, in anticipation of re-using that same page for another
page table elsewhere.
---
v4: Re-base over changes earlier in the series.
v3: New.

Comments

Roger Pau Monné May 10, 2022, 3:31 p.m. UTC | #1
On Mon, Apr 25, 2022 at 10:43:16AM +0200, Jan Beulich wrote:
> When a page table ends up with all contiguous entries (including all
> identical attributes), it can be replaced by a superpage entry at the
> next higher level. The page table itself can then be scheduled for
> freeing.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Unlike the freeing of all-empty page tables, this causes quite a bit of
> back and forth for PV domains, due to their mapping/unmapping of pages
> when they get converted to/from being page tables. It may therefore be
> worth considering to delay re-coalescing a little, to avoid doing so
> when the superpage would otherwise get split again pretty soon. But I
> think this would better be the subject of a separate change anyway.
> 
> Of course this could also be helped by more "aware" kernel side
> behavior: They could avoid immediately mapping freed page tables
> writable again, in anticipation of re-using that same page for another
> page table elsewhere.
> ---
> v4: Re-base over changes earlier in the series.
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -81,7 +81,8 @@ static union amd_iommu_pte set_iommu_pte
>                                                   unsigned long dfn,
>                                                   unsigned long next_mfn,
>                                                   unsigned int level,
> -                                                 bool iw, bool ir)
> +                                                 bool iw, bool ir,
> +                                                 bool *contig)
>  {
>      union amd_iommu_pte *table, *pde, old;
>  
> @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte
>           old.iw != iw || old.ir != ir )
>      {
>          set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> -        pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
> -                                 level, PTE_kind_leaf);
> +        *contig = pt_update_contig_markers(&table->raw,
> +                                           pfn_to_pde_idx(dfn, level),
> +                                           level, PTE_kind_leaf);
>      }
>      else
> +    {
>          old.pr = false; /* signal "no change" to the caller */
> +        *contig = false;

So we assume that any caller getting contig == true must have acted
and coalesced the page table?

Might be worth a comment, to note that the function assumes that a
previous return of contig == true will have coalesced the page table
and hence a "no change" PTE write is not expected to happen on a
contig page table.

Thanks, Roger.
Jan Beulich May 18, 2022, 10:40 a.m. UTC | #2
On 10.05.2022 17:31, Roger Pau Monné wrote:
> On Mon, Apr 25, 2022 at 10:43:16AM +0200, Jan Beulich wrote:
>> @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte
>>           old.iw != iw || old.ir != ir )
>>      {
>>          set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
>> -        pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
>> -                                 level, PTE_kind_leaf);
>> +        *contig = pt_update_contig_markers(&table->raw,
>> +                                           pfn_to_pde_idx(dfn, level),
>> +                                           level, PTE_kind_leaf);
>>      }
>>      else
>> +    {
>>          old.pr = false; /* signal "no change" to the caller */
>> +        *contig = false;
> 
> So we assume that any caller getting contig == true must have acted
> and coalesced the page table?

Yes, except that I wouldn't use "must", but "would". It's not a
requirement after all, functionality-wise all will be fine without
re-coalescing.

> Might be worth a comment, to note that the function assumes that a
> previous return of contig == true will have coalesced the page table
> and hence a "no change" PTE write is not expected to happen on a
> contig page table.

I'm not convinced, as there's effectively only one caller,
amd_iommu_map_page(). I also don't see why "no change" would be a
problem. "No change" can't result in a fully contiguous page table
if the page table wasn't fully contiguous already before (at which
point it would have been replaced by a superpage).

Jan
Roger Pau Monné May 20, 2022, 10:35 a.m. UTC | #3
On Wed, May 18, 2022 at 12:40:59PM +0200, Jan Beulich wrote:
> On 10.05.2022 17:31, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:43:16AM +0200, Jan Beulich wrote:
> >> @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte
> >>           old.iw != iw || old.ir != ir )
> >>      {
> >>          set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> >> -        pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
> >> -                                 level, PTE_kind_leaf);
> >> +        *contig = pt_update_contig_markers(&table->raw,
> >> +                                           pfn_to_pde_idx(dfn, level),
> >> +                                           level, PTE_kind_leaf);
> >>      }
> >>      else
> >> +    {
> >>          old.pr = false; /* signal "no change" to the caller */
> >> +        *contig = false;
> > 
> > So we assume that any caller getting contig == true must have acted
> > and coalesced the page table?
> 
> Yes, except that I wouldn't use "must", but "would". It's not a
> requirement after all, functionality-wise all will be fine without
> re-coalescing.
> 
> > Might be worth a comment, to note that the function assumes that a
> > previous return of contig == true will have coalesced the page table
> > and hence a "no change" PTE write is not expected to happen on a
> > contig page table.
> 
> I'm not convinced, as there's effectively only one caller,
> amd_iommu_map_page(). I also don't see why "no change" would be a
> problem. "No change" can't result in a fully contiguous page table
> if the page table wasn't fully contiguous already before (at which
> point it would have been replaced by a superpage).

Right, I agree, it's just that I would have preferred the result from
set_iommu_ptes_present() to be consistent, ie: repeated calls to it
using the same PTE should set contig to the same value.  Anyway,
that's not relevant to any current callers, so:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
diff mbox series

Patch

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -81,7 +81,8 @@  static union amd_iommu_pte set_iommu_pte
                                                  unsigned long dfn,
                                                  unsigned long next_mfn,
                                                  unsigned int level,
-                                                 bool iw, bool ir)
+                                                 bool iw, bool ir,
+                                                 bool *contig)
 {
     union amd_iommu_pte *table, *pde, old;
 
@@ -94,11 +95,15 @@  static union amd_iommu_pte set_iommu_pte
          old.iw != iw || old.ir != ir )
     {
         set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
-        pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
-                                 level, PTE_kind_leaf);
+        *contig = pt_update_contig_markers(&table->raw,
+                                           pfn_to_pde_idx(dfn, level),
+                                           level, PTE_kind_leaf);
     }
     else
+    {
         old.pr = false; /* signal "no change" to the caller */
+        *contig = false;
+    }
 
     unmap_domain_page(table);
 
@@ -407,6 +412,7 @@  int cf_check amd_iommu_map_page(
 {
     struct domain_iommu *hd = dom_iommu(d);
     unsigned int level = (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1;
+    bool contig;
     int rc;
     unsigned long pt_mfn = 0;
     union amd_iommu_pte old;
@@ -447,8 +453,26 @@  int cf_check amd_iommu_map_page(
 
     /* Install mapping */
     old = set_iommu_pte_present(pt_mfn, dfn_x(dfn), mfn_x(mfn), level,
-                                (flags & IOMMUF_writable),
-                                (flags & IOMMUF_readable));
+                                flags & IOMMUF_writable,
+                                flags & IOMMUF_readable, &contig);
+
+    while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode )
+    {
+        struct page_info *pg = mfn_to_page(_mfn(pt_mfn));
+        unsigned long next_mfn;
+
+        if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags,
+                                false) )
+            BUG();
+        BUG_ON(!pt_mfn);
+
+        next_mfn = mfn_x(mfn) & (~0UL << (PTE_PER_TABLE_SHIFT * (level - 1)));
+        set_iommu_pte_present(pt_mfn, dfn_x(dfn), next_mfn, level,
+                              flags & IOMMUF_writable,
+                              flags & IOMMUF_readable, &contig);
+        *flush_flags |= IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all;
+        iommu_queue_free_pgtable(hd, pg);
+    }
 
     spin_unlock(&hd->arch.mapping_lock);