diff mbox

[v4,2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

Message ID 1473668175-3088-2-git-send-email-dongli.zhang@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dongli Zhang Sept. 12, 2016, 8:16 a.m. UTC
This patch implemented parts of TODO left in commit id
a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out
into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very
slow to create a guest with memory size of more than 100GB on host with
100+ cpus.

This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate
whether TLB-flush should be done in alloc_heap_pages or its caller
populate_physmap. Once this bit is set in memflags, alloc_heap_pages will
ignore TLB-flush. To use this bit after vm is created might lead to
security issue, that is, this would make pages accessible to the guest B,
when guest A may still have a cached mapping to them.

Therefore, this patch also introduced a "is_ever_unpaused" field to struct
domain to indicate whether this domain has ever got unpaused by hypervisor.
MEMF_no_tlbflush can be set only during vm creation phase when
is_ever_unpaused is still false before this domain gets unpaused for the
first time.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v3:
  * Set the flag to true in domain_unpause_by_systemcontroller when
    unpausing the guest domain for the first time.
  * Use true/false for all boot_t variables.
  * Add unlikely to optimize "if statement".
  * Correct comment style.

Changed since v2:
  * Limit this optimization to domain creation time.

---
 xen/common/domain.c     | 11 +++++++++++
 xen/common/memory.c     | 34 ++++++++++++++++++++++++++++++++++
 xen/common/page_alloc.c |  3 ++-
 xen/include/xen/mm.h    |  2 ++
 xen/include/xen/sched.h |  3 +++
 5 files changed, 52 insertions(+), 1 deletion(-)

Comments

Dario Faggioli Sept. 14, 2016, 4:52 p.m. UTC | #1
On Mon, 2016-09-12 at 16:16 +0800, Dongli Zhang wrote:
> This patch implemented parts of TODO left in commit id
> a902c12ee45fc9389eb8fe54eeddaf267a555c58. 
>
We usually put both the (not necessarily full) hash and the subject
line of the commit in here.

> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>

> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index a8804e4..7be1bee 100644
> @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid,
> unsigned int domcr_flags,
>      if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) )
>          goto fail;
>  
> +    d->is_ever_unpaused = false;
> +
>
I'd go for something like "first_unpaused" or "creation_finished", but
if maintainers are happy with this one already, I'm fine too.

> @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct
> domain *d)
>  {
>      int old, new, prev = d->controller_pause_count;
>  
> +    /*
> +     * Set is_ever_unpaused to true when this domain gets unpaused
> for the
> +     * first time. We record this information here to help
> populate_physmap
> +     * verify whether the domain has ever been unpaused.
> MEMF_no_tlbflush
> +     * is allowed to be set by populate_physmap only during vm
> creation.
> +     */

"We record this information here for populate_physmap to figure out
 that the domain has already been unpaused, after finishing being
 created. That's because we're allowed to set MEMF_no_tlbflush only
 during VM creation."

Or, de-focusing the unpausing even more:

"We record this information here for populate_physmap to figure out
 tha
t the domain has finished being created. In fact, we're only
 allowed to
set the MEMF_no_tlbflush flag during VM creation."

I.e., the important thing is not really the unpausing (that's where we
found it handy to put the check), it's the fact that something should
only happen at creation time and why (see below).

> +    if ( unlikely(!d->is_ever_unpaused) )
> +        d->is_ever_unpaused = true;
> +
>      do
>      {
>          old = prev;

> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index cc0f69e..f3a733b 100644
> @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args
> *a)
>                              max_order(curr_d)) )
>          return;
>  
> +    /*
> +     * MEMF_no_tlbflush can be set only during vm creation phase
> when
> +     * is_ever_unpaused is still false before this domain gets
> unpaused for
> +     * the first time.
> +     */
>
What about, 'citing' from the changelog:

"With MEMF_no_tlbflush set, alloc_heap_pages() will ignore TLB-
 flushes. After VM creation, this is a security issue (it can make
 pages accessible to guest B, when guest A may still have a cached
 mapping to them). So we only do this only during domain creation,
 when the domain itself has not yet been unpaused for the first
 time."

> +    if ( unlikely(!d->is_ever_unpaused) )
> +        a->memflags |= MEMF_no_tlbflush;
> +
>      for ( i = a->nr_done; i < a->nr_extents; i++ )
>      {
>          if ( i != a->nr_done && hypercall_preempt_check() )

> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 2f9c15f..7fe8841 100644
> @@ -474,6 +474,9 @@ struct domain
>          unsigned int guest_request_enabled       : 1;
>          unsigned int guest_request_sync          : 1;
>      } monitor;
> +
> +    /* set to true the first time this domain gets unpaused. */
>
I think it's relevant to say _when_ that is. What about:

/*
 * Set to true at the very end of domain creation, when the domain is 
 * unpaused for the first time by the systemcontroller.
 */

(not 100% happy about the "by the systemcontroller" part... but that's
the idea.)

> +    bool_t is_ever_unpaused;
>
As said by Jan already --here and elsewhere-- new code should use
'bool'.

Regards,
Dario
Jan Beulich Sept. 15, 2016, 8:39 a.m. UTC | #2
>>> On 12.09.16 at 10:16, <dongli.zhang@oracle.com> wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
>      if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) )
>          goto fail;
>  
> +    d->is_ever_unpaused = false;

This it not needed - struct domain starts out as all zeros anyway.

> @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d)
>  {
>      int old, new, prev = d->controller_pause_count;
>  
> +    /*
> +     * Set is_ever_unpaused to true when this domain gets unpaused for the
> +     * first time. We record this information here to help populate_physmap
> +     * verify whether the domain has ever been unpaused. MEMF_no_tlbflush
> +     * is allowed to be set by populate_physmap only during vm creation.
> +     */
> +    if ( unlikely(!d->is_ever_unpaused) )
> +        d->is_ever_unpaused = true;

As mentioned before, the conditional is pointless. And just like Dario,
I dislike the name of the field. How about "has_run", "was_unpaused",
or "is_alive"? Or even better, how about combining this with the
is_shutting_down and is_shut_down into an enum? For that latter
variant, that would presumably better be a patch on its own then.

> @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a)
>                              max_order(curr_d)) )
>          return;
>  
> +    /*
> +     * MEMF_no_tlbflush can be set only during vm creation phase when
> +     * is_ever_unpaused is still false before this domain gets unpaused for
> +     * the first time.
> +     */
> +    if ( unlikely(!d->is_ever_unpaused) )
> +        a->memflags |= MEMF_no_tlbflush;

So you no longer mean to expose this to the caller?

> @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a)
>                      goto out;
>                  }
>  
> +                if ( unlikely(!d->is_ever_unpaused) )

Please check MEMF_no_tlbflush here instead.

> +                {
> +                    for ( j = 0; j < (1U << a->extent_order); j++ )
> +                    {
> +                        if ( page_needs_tlbflush(&page[j], need_tlbflush,
> +                                                 tlbflush_timestamp,
> +                                                 tlbflush_current_time()) )
> +                        {
> +                            need_tlbflush = true;
> +                            tlbflush_timestamp = page[j].tlbflush_timestamp;
> +                        }
> +                    }
> +                }
> +
>                  mfn = page_to_mfn(page);
>              }
>  
> @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a)
>      }
>  
>  out:
> +    if ( need_tlbflush )
> +    {
> +        cpumask_t mask = cpu_online_map;
> +        tlbflush_filter(mask, tlbflush_timestamp);

Blank line between declarations and statements please. Also,
considering this repeats what gets done in page_alloc.c, I think
it should also be factored out into a function. And along those
lines I think the other abstraction should then also go further
and take care of the updating of need_tlbflush and
tlbflush_timestamp.

Jan
diff mbox

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a8804e4..7be1bee 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -303,6 +303,8 @@  struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) )
         goto fail;
 
+    d->is_ever_unpaused = false;
+
     if ( domcr_flags & DOMCRF_hvm )
         d->guest_type = guest_type_hvm;
     else if ( domcr_flags & DOMCRF_pvh )
@@ -1004,6 +1006,15 @@  int domain_unpause_by_systemcontroller(struct domain *d)
 {
     int old, new, prev = d->controller_pause_count;
 
+    /*
+     * Set is_ever_unpaused to true when this domain gets unpaused for the
+     * first time. We record this information here to help populate_physmap
+     * verify whether the domain has ever been unpaused. MEMF_no_tlbflush
+     * is allowed to be set by populate_physmap only during vm creation.
+     */
+    if ( unlikely(!d->is_ever_unpaused) )
+        d->is_ever_unpaused = true;
+
     do
     {
         old = prev;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index cc0f69e..f3a733b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -141,6 +141,8 @@  static void populate_physmap(struct memop_args *a)
     unsigned int i, j;
     xen_pfn_t gpfn, mfn;
     struct domain *d = a->domain, *curr_d = current->domain;
+    bool_t need_tlbflush = false;
+    uint32_t tlbflush_timestamp = 0;
 
     if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done,
                                      a->nr_extents-1) )
@@ -150,6 +152,14 @@  static void populate_physmap(struct memop_args *a)
                             max_order(curr_d)) )
         return;
 
+    /*
+     * MEMF_no_tlbflush can be set only during vm creation phase when
+     * is_ever_unpaused is still false before this domain gets unpaused for
+     * the first time.
+     */
+    if ( unlikely(!d->is_ever_unpaused) )
+        a->memflags |= MEMF_no_tlbflush;
+
     for ( i = a->nr_done; i < a->nr_extents; i++ )
     {
         if ( i != a->nr_done && hypercall_preempt_check() )
@@ -214,6 +224,20 @@  static void populate_physmap(struct memop_args *a)
                     goto out;
                 }
 
+                if ( unlikely(!d->is_ever_unpaused) )
+                {
+                    for ( j = 0; j < (1U << a->extent_order); j++ )
+                    {
+                        if ( page_needs_tlbflush(&page[j], need_tlbflush,
+                                                 tlbflush_timestamp,
+                                                 tlbflush_current_time()) )
+                        {
+                            need_tlbflush = true;
+                            tlbflush_timestamp = page[j].tlbflush_timestamp;
+                        }
+                    }
+                }
+
                 mfn = page_to_mfn(page);
             }
 
@@ -232,6 +256,16 @@  static void populate_physmap(struct memop_args *a)
     }
 
 out:
+    if ( need_tlbflush )
+    {
+        cpumask_t mask = cpu_online_map;
+        tlbflush_filter(mask, tlbflush_timestamp);
+        if ( !cpumask_empty(&mask) )
+        {
+            perfc_incr(need_flush_tlb_flush);
+            flush_tlb_mask(&mask);
+        }
+    }
     a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5b93a01..04ca26a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -827,7 +827,8 @@  static struct page_info *alloc_heap_pages(
         BUG_ON(pg[i].count_info != PGC_state_free);
         pg[i].count_info = PGC_state_inuse;
 
-        if ( page_needs_tlbflush(&pg[i], need_tlbflush,
+        if ( !(memflags & MEMF_no_tlbflush) &&
+             page_needs_tlbflush(&pg[i], need_tlbflush,
                                  tlbflush_timestamp,
                                  tlbflush_current_time()) )
         {
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 766559d..04b10e9 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -221,6 +221,8 @@  struct npfec {
 #define  MEMF_exact_node  (1U<<_MEMF_exact_node)
 #define _MEMF_no_owner    5
 #define  MEMF_no_owner    (1U<<_MEMF_no_owner)
+#define _MEMF_no_tlbflush 6
+#define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_node        8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2f9c15f..7fe8841 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -474,6 +474,9 @@  struct domain
         unsigned int guest_request_enabled       : 1;
         unsigned int guest_request_sync          : 1;
     } monitor;
+
+    /* set to true the first time this domain gets unpaused. */
+    bool_t is_ever_unpaused;
 };
 
 /* Protect updates/reads (resp.) of domain_list and domain_hash. */