[v6,01/12] IOMMU/x86: support freeing of pagetables

Message ID	24eb0b99-c2c4-08aa-740d-df94d2505599@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> Message-ID: <24eb0b99-c2c4-08aa-740d-df94d2505599@suse.com> Date: Thu, 9 Jun 2022 12:16:38 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: [PATCH v6 01/12] IOMMU/x86: support freeing of pagetables Content-Language: en-US From: Jan Beulich <jbeulich@suse.com> To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Paul Durrant <paul@xen.org>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Wei Liu <wl@xen.org> References: <e873e30c-7a04-a8a3-2fe5-0dda30e654fe@suse.com> In-Reply-To: <e873e30c-7a04-a8a3-2fe5-0dda30e654fe@suse.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit MIME-Version: 1.0
Series	IOMMU: superpage support when not sharing pagetables \| expand [v6,00/12] IOMMU: superpage support when not sharing pagetables [v6,01/12] IOMMU/x86: support freeing of pagetables [v6,02/12] IOMMU/x86: new command line option to suppress use of superpage mappings [v6,03/12] AMD/IOMMU: allow use of superpage mappings [v6,04/12] VT-d: allow use of superpage mappings [v6,05/12] x86: introduce helper for recording degree of contiguity in page tables [v6,06/12] IOMMU/x86: prefill newly allocate page tables [v6,07/12] AMD/IOMMU: free all-empty page tables [v6,08/12] VT-d: free all-empty page tables [v6,09/12] AMD/IOMMU: replace all-contiguous page tables by superpage mappings [v6,10/12] VT-d: replace all-contiguous page tables by superpage mappings [v6,11/12] IOMMU/x86: add perf counters for page table splitting / coalescing [v6,12/12] VT-d: fold dma_pte_clear_one() into its only caller

Message ID

24eb0b99-c2c4-08aa-740d-df94d2505599@suse.com (mailing list archive)

State

Superseded

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Message-ID: <24eb0b99-c2c4-08aa-740d-df94d2505599@suse.com>
Date: Thu, 9 Jun 2022 12:16:38 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.10.0
Subject: [PATCH v6 01/12] IOMMU/x86: support freeing of pagetables
Content-Language: en-US
From: Jan Beulich <jbeulich@suse.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Paul Durrant <paul@xen.org>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Wei Liu <wl@xen.org>
References: <e873e30c-7a04-a8a3-2fe5-0dda30e654fe@suse.com>
In-Reply-To: <e873e30c-7a04-a8a3-2fe5-0dda30e654fe@suse.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?AE9Nrd/toMtSZ7tn6T0sHHdnuv/x?=
	=?utf-8?q?SArW0mQtpd6Q8gHvnbE51nMooG2bvo6/F6gcFwzAt94AepkXm/M9Gv51biUOGKrr5?=
	=?utf-8?q?XIEpHVYr1a5+50sfNbRKhdkLBgh/PTIOktPQz+WjynLM+AcLLKjcjPXyKUGVudXNa?=
	=?utf-8?q?XyOPsS1H9FQy6RjD6bfb+iGW7wS3Egr5sGgRyy1zTuf4tQd7SimItxLIFrAp7+Zfr?=
	=?utf-8?q?N4tK/SZXQjxkfWlIGSe6c27hVjn0xcHu0u7qyhUxSCRvEz14w7Ho/mxNGlzN6UZkf?=
	=?utf-8?q?QZqUFp3PXWaQyBx+TbzXQInzbCKWy8R/S29rUpSjmWT2cLajXg6fYmElq5/rvM+Cj?=
	=?utf-8?q?rzludo21PjhK0PznYNEw5PLEh0UtBovK1VM/sr8JQFstcdH3zui4KT3UrNFO88sPv?=
	=?utf-8?q?XCnhlXzcuf+NbA8qXEYV84t9tLVD0ClV745/+3kCq9ZUehyVTQN5g0NoZ3VCCcEgm?=
	=?utf-8?q?jo8YKdWvb8g0X2jcdpCxgYV7M4eCnzYYuXAWuxA9aN2OLu28O/6Ci8q/RUPsJ2aRr?=
	=?utf-8?q?h1+w0N1Hir+gDEzvMZnpIRC7uZVWOQSxz/vE/jWQKUvbnjX0XtBWneR9fq6B6wDOB?=
	=?utf-8?q?5dHyDj7k3m6kJRGqRyHjnIskYmNQm4hM7TlKknhK+D5H5cGLMJ6wzzowu+wLhQjbV?=
	=?utf-8?q?/7IAhvnjD1VOQsjlVXl0spsi3SwCOHGdkxGF6dxrI6OMy2Sr06g8S/rT5rnmn5t/0?=
	=?utf-8?q?OyEjsOzXk7Z9KQr/odl2sznoVZyznl2DwMuNi87GADPVMyIIJbwnYAbzhdRi/Bq5z?=
	=?utf-8?q?6HmNbmn0bc5wu5PRixJ84L9ezG+JTN8L8hv0yvT0SsR86i/HpyAmH/pjAPzLNawHk?=
	=?utf-8?q?XyViNhOyUWYBR2sF0T5+xTmW2pF4Eapl6aqMKsla7gQh864jBWWSInYkFYtVTytXQ?=
	=?utf-8?q?+D3c4rLEanbIZifBhZV/G9F1PXImHmgiGrEQ5881mkXt6Ss893rZJaLbHV01HCVBB?=
	=?utf-8?q?jnj86FdUeKf5YvS5PqmxRNYD6qZP0A4X9Rd8cArGEq8d+bN3ScEiPG5nz/+UvbXAv?=
	=?utf-8?q?HtASr0hPS10rDsDax2peEbGd1SMacb80f4f2yv45rtpGhICswvEzsbSgWMH+40me0?=
	=?utf-8?q?9xWdsI30KNPvHxYqAagxCl5b2Fvzh/brUhChI0Ttp6TeXVc2ujd83yNeZsJoNpxjA?=
	=?utf-8?q?b37p3brQCNOHhtLfpkRMxYGinUoKsU46Si+8KldxC8QOk36yMjGGuGFvQnSJoJNss?=
	=?utf-8?q?U8t0nB0SAf+kw2nF8R19d8chSqf3u6KO2ASPTCHX19RFjT9oBG2VEs3ftz34Z5fMz?=
	=?utf-8?q?8SKCt/GHNojndD+n5RuGyxB1gulAAnVPdipNqYG/njw20R5nXJ49I3AhL4pyMHNap?=
	=?utf-8?q?96sUtj1o5PbTN2bFLvDeOMMmTKMkKXCUqc9oSE8hgHbiYYjl/qL0F0R8POERYliNx?=
	=?utf-8?q?M6p38DNzXjqxkfDkXJEuRv31Fm/tZJ6zi7PIcHgmixNT8FTLtavPdMlQR4HVMWgH+?=
	=?utf-8?q?dIii0PjnV3O/6k2FU5eFzMWB5Ce6quhYc5XYjXxfFCPEHL/UpDrFzIvwHBtQeSEF5?=
	=?utf-8?q?xRl7kTl57suYfyruK60L0v33U4c/fEDQUb3DkglKmBjk3E4ekd2Hn3/Kketk1WOmF?=
	=?utf-8?q?QpECA8ldRP/NAzB+YxjpvNtcyzhCstyvdnLhgxug82Oo+e6XRSOwQxQtjcldP9Ns0?=
	=?utf-8?q?4oC2UREZO8Q+MElq66MLHCNMdt78YXgw=3D=3D?=
X-OriginatorOrg: suse.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 dfb4dec2-d371-4f9f-136c-08da4a012515
X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 10:16:40.5313
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 HXmYS6QAkDhzJ4I+YoqdChPw7Q9V8mQIvBdvaF01LwSdtStsMpsoejum6Ir0c1CjnkSBTSabnP44pvFHL+sK/A==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR04MB6297

Series

IOMMU: superpage support when not sharing pagetables | expand

Commit Message

Jan Beulich June 9, 2022, 10:16 a.m. UTC

For vendor specific code to support superpages we need to be able to
deal with a superpage mapping replacing an intermediate page table (or
hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
needed to free individual page tables while a domain is still alive.
Since the freeing needs to be deferred until after a suitable IOTLB
flush was performed, released page tables get queued for processing by a
tasklet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I was considering whether to use a softirq-tasklet instead. This would
have the benefit of avoiding extra scheduling operations, but come with
the risk of the freeing happening prematurely because of a
process_pending_softirqs() somewhere.
---
v6: Extend comment on the use of process_pending_softirqs().
v5: Fix CPU_UP_PREPARE for BIGMEM. Schedule tasklet in CPU_DOWN_FAILED
    when list is not empty. Skip all processing in CPU_DEAD when list is
    empty.
v4: Change type of iommu_queue_free_pgtable()'s 1st parameter. Re-base.
v3: Call process_pending_softirqs() from free_queued_pgtables().

Comments

Roger Pau Monné June 28, 2022, 12:39 p.m. UTC | #1

On Thu, Jun 09, 2022 at 12:16:38PM +0200, Jan Beulich wrote:
> For vendor specific code to support superpages we need to be able to
> deal with a superpage mapping replacing an intermediate page table (or
> hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
> needed to free individual page tables while a domain is still alive.
> Since the freeing needs to be deferred until after a suitable IOTLB
> flush was performed, released page tables get queued for processing by a
> tasklet.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I was considering whether to use a softirq-tasklet instead. This would
> have the benefit of avoiding extra scheduling operations, but come with
> the risk of the freeing happening prematurely because of a
> process_pending_softirqs() somewhere.
> ---
> v6: Extend comment on the use of process_pending_softirqs().
> v5: Fix CPU_UP_PREPARE for BIGMEM. Schedule tasklet in CPU_DOWN_FAILED
>     when list is not empty. Skip all processing in CPU_DEAD when list is
>     empty.
> v4: Change type of iommu_queue_free_pgtable()'s 1st parameter. Re-base.
> v3: Call process_pending_softirqs() from free_queued_pgtables().
> 
> --- a/xen/arch/x86/include/asm/iommu.h
> +++ b/xen/arch/x86/include/asm/iommu.h
> @@ -147,6 +147,7 @@ void iommu_free_domid(domid_t domid, uns
>  int __must_check iommu_free_pgtables(struct domain *d);
>  struct domain_iommu;
>  struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd);
> +void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg);
>  
>  #endif /* !__ARCH_X86_IOMMU_H__ */
>  /*
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -12,6 +12,7 @@
>   * this program; If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <xen/cpu.h>
>  #include <xen/sched.h>
>  #include <xen/iocap.h>
>  #include <xen/iommu.h>
> @@ -551,6 +552,103 @@ struct page_info *iommu_alloc_pgtable(st
>      return pg;
>  }
>  
> +/*
> + * Intermediate page tables which get replaced by large pages may only be
> + * freed after a suitable IOTLB flush. Hence such pages get queued on a
> + * per-CPU list, with a per-CPU tasklet processing the list on the assumption
> + * that the necessary IOTLB flush will have occurred by the time tasklets get
> + * to run. (List and tasklet being per-CPU has the benefit of accesses not
> + * requiring any locking.)
> + */
> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list);
> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet);
> +
> +static void free_queued_pgtables(void *arg)

I think this is missing a cf_check attribute?



> +{
> +    struct page_list_head *list = arg;
> +    struct page_info *pg;
> +    unsigned int done = 0;

Might be worth adding an:

ASSERT(list == &this_cpu(free_pgt_list));

To make sure tasklets are never executed on the wrong CPU.

Apart form that:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Jan Beulich July 5, 2022, 8:33 a.m. UTC | #2

On 28.06.2022 14:39, Roger Pau Monné wrote:
> On Thu, Jun 09, 2022 at 12:16:38PM +0200, Jan Beulich wrote:
>> For vendor specific code to support superpages we need to be able to
>> deal with a superpage mapping replacing an intermediate page table (or
>> hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
>> needed to free individual page tables while a domain is still alive.
>> Since the freeing needs to be deferred until after a suitable IOTLB
>> flush was performed, released page tables get queued for processing by a
>> tasklet.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I was considering whether to use a softirq-tasklet instead. This would
>> have the benefit of avoiding extra scheduling operations, but come with
>> the risk of the freeing happening prematurely because of a
>> process_pending_softirqs() somewhere.
>> ---
>> v6: Extend comment on the use of process_pending_softirqs().
>> v5: Fix CPU_UP_PREPARE for BIGMEM. Schedule tasklet in CPU_DOWN_FAILED
>>     when list is not empty. Skip all processing in CPU_DEAD when list is
>>     empty.
>> v4: Change type of iommu_queue_free_pgtable()'s 1st parameter. Re-base.
>> v3: Call process_pending_softirqs() from free_queued_pgtables().
>>
>> --- a/xen/arch/x86/include/asm/iommu.h
>> +++ b/xen/arch/x86/include/asm/iommu.h
>> @@ -147,6 +147,7 @@ void iommu_free_domid(domid_t domid, uns
>>  int __must_check iommu_free_pgtables(struct domain *d);
>>  struct domain_iommu;
>>  struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd);
>> +void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg);
>>  
>>  #endif /* !__ARCH_X86_IOMMU_H__ */
>>  /*
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -12,6 +12,7 @@
>>   * this program; If not, see <http://www.gnu.org/licenses/>.
>>   */
>>  
>> +#include <xen/cpu.h>
>>  #include <xen/sched.h>
>>  #include <xen/iocap.h>
>>  #include <xen/iommu.h>
>> @@ -551,6 +552,103 @@ struct page_info *iommu_alloc_pgtable(st
>>      return pg;
>>  }
>>  
>> +/*
>> + * Intermediate page tables which get replaced by large pages may only be
>> + * freed after a suitable IOTLB flush. Hence such pages get queued on a
>> + * per-CPU list, with a per-CPU tasklet processing the list on the assumption
>> + * that the necessary IOTLB flush will have occurred by the time tasklets get
>> + * to run. (List and tasklet being per-CPU has the benefit of accesses not
>> + * requiring any locking.)
>> + */
>> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list);
>> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet);
>> +
>> +static void free_queued_pgtables(void *arg)
> 
> I think this is missing a cf_check attribute?

Oh, indeed - thanks for spotting. We're still lacking compiler detection
of such issues.

>> +{
>> +    struct page_list_head *list = arg;
>> +    struct page_info *pg;
>> +    unsigned int done = 0;
> 
> Might be worth adding an:
> 
> ASSERT(list == &this_cpu(free_pgt_list));
> 
> To make sure tasklets are never executed on the wrong CPU.

Sure, added.

> Apart form that:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan

--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -147,6 +147,7 @@  void iommu_free_domid(domid_t domid, uns
 int __must_check iommu_free_pgtables(struct domain *d);
 struct domain_iommu;
 struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd);
+void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg);
 
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -12,6 +12,7 @@ 
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/cpu.h>
 #include <xen/sched.h>
 #include <xen/iocap.h>
 #include <xen/iommu.h>
@@ -551,6 +552,103 @@  struct page_info *iommu_alloc_pgtable(st
     return pg;
 }
 
+/*
+ * Intermediate page tables which get replaced by large pages may only be
+ * freed after a suitable IOTLB flush. Hence such pages get queued on a
+ * per-CPU list, with a per-CPU tasklet processing the list on the assumption
+ * that the necessary IOTLB flush will have occurred by the time tasklets get
+ * to run. (List and tasklet being per-CPU has the benefit of accesses not
+ * requiring any locking.)
+ */
+static DEFINE_PER_CPU(struct page_list_head, free_pgt_list);
+static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet);
+
+static void free_queued_pgtables(void *arg)
+{
+    struct page_list_head *list = arg;
+    struct page_info *pg;
+    unsigned int done = 0;
+
+    while ( (pg = page_list_remove_head(list)) )
+    {
+        free_domheap_page(pg);
+
+        /*
+         * Just to be on the safe side, check for processing softirqs every
+         * once in a while.  Generally it is expected that parties queuing
+         * pages for freeing will find a need for preemption before too many
+         * pages can be queued.  Granularity of checking is somewhat arbitrary.
+         */
+        if ( !(++done & 0x1ff) )
+             process_pending_softirqs();
+    }
+}
+
+void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg)
+{
+    unsigned int cpu = smp_processor_id();
+
+    spin_lock(&hd->arch.pgtables.lock);
+    page_list_del(pg, &hd->arch.pgtables.list);
+    spin_unlock(&hd->arch.pgtables.lock);
+
+    page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu));
+
+    tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu));
+}
+
+static int cf_check cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+    struct page_list_head *list = &per_cpu(free_pgt_list, cpu);
+    struct tasklet *tasklet = &per_cpu(free_pgt_tasklet, cpu);
+
+    switch ( action )
+    {
+    case CPU_DOWN_PREPARE:
+        tasklet_kill(tasklet);
+        break;
+
+    case CPU_DEAD:
+        if ( !page_list_empty(list) )
+        {
+            page_list_splice(list, &this_cpu(free_pgt_list));
+            INIT_PAGE_LIST_HEAD(list);
+            tasklet_schedule(&this_cpu(free_pgt_tasklet));
+        }
+        break;
+
+    case CPU_UP_PREPARE:
+        INIT_PAGE_LIST_HEAD(list);
+        fallthrough;
+    case CPU_DOWN_FAILED:
+        tasklet_init(tasklet, free_queued_pgtables, list);
+        if ( !page_list_empty(list) )
+            tasklet_schedule(tasklet);
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback,
+};
+
+static int __init cf_check bsp_init(void)
+{
+    if ( iommu_enabled )
+    {
+        cpu_callback(&cpu_nfb, CPU_UP_PREPARE,
+                     (void *)(unsigned long)smp_processor_id());
+        register_cpu_notifier(&cpu_nfb);
+    }
+
+    return 0;
+}
+presmp_initcall(bsp_init);
+
 bool arch_iommu_use_permitted(const struct domain *d)
 {
     /*

[v6,01/12] IOMMU/x86: support freeing of pagetables

Commit Message

Comments

Patch