Message ID | 1473668175-3088-2-git-send-email-dongli.zhang@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2016-09-12 at 16:16 +0800, Dongli Zhang wrote: > This patch implemented parts of TODO left in commit id > a902c12ee45fc9389eb8fe54eeddaf267a555c58. > We usually put both the (not necessarily full) hash and the subject line of the commit in here. > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index a8804e4..7be1bee 100644 > @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, > unsigned int domcr_flags, > if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) ) > goto fail; > > + d->is_ever_unpaused = false; > + > I'd go for something like "first_unpaused" or "creation_finished", but if maintainers are happy with this one already, I'm fine too. > @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct > domain *d) > { > int old, new, prev = d->controller_pause_count; > > + /* > + * Set is_ever_unpaused to true when this domain gets unpaused > for the > + * first time. We record this information here to help > populate_physmap > + * verify whether the domain has ever been unpaused. > MEMF_no_tlbflush > + * is allowed to be set by populate_physmap only during vm > creation. > + */ "We record this information here for populate_physmap to figure out that the domain has already been unpaused, after finishing being created. That's because we're allowed to set MEMF_no_tlbflush only during VM creation." Or, de-focusing the unpausing even more: "We record this information here for populate_physmap to figure out tha t the domain has finished being created. In fact, we're only allowed to set the MEMF_no_tlbflush flag during VM creation." I.e., the important thing is not really the unpausing (that's where we found it handy to put the check), it's the fact that something should only happen at creation time and why (see below). > + if ( unlikely(!d->is_ever_unpaused) ) > + d->is_ever_unpaused = true; > + > do > { > old = prev; > diff --git a/xen/common/memory.c b/xen/common/memory.c > index cc0f69e..f3a733b 100644 > @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args > *a) > max_order(curr_d)) ) > return; > > + /* > + * MEMF_no_tlbflush can be set only during vm creation phase > when > + * is_ever_unpaused is still false before this domain gets > unpaused for > + * the first time. > + */ > What about, 'citing' from the changelog: "With MEMF_no_tlbflush set, alloc_heap_pages() will ignore TLB- flushes. After VM creation, this is a security issue (it can make pages accessible to guest B, when guest A may still have a cached mapping to them). So we only do this only during domain creation, when the domain itself has not yet been unpaused for the first time." > + if ( unlikely(!d->is_ever_unpaused) ) > + a->memflags |= MEMF_no_tlbflush; > + > for ( i = a->nr_done; i < a->nr_extents; i++ ) > { > if ( i != a->nr_done && hypercall_preempt_check() ) > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index 2f9c15f..7fe8841 100644 > @@ -474,6 +474,9 @@ struct domain > unsigned int guest_request_enabled : 1; > unsigned int guest_request_sync : 1; > } monitor; > + > + /* set to true the first time this domain gets unpaused. */ > I think it's relevant to say _when_ that is. What about: /* * Set to true at the very end of domain creation, when the domain is * unpaused for the first time by the systemcontroller. */ (not 100% happy about the "by the systemcontroller" part... but that's the idea.) > + bool_t is_ever_unpaused; > As said by Jan already --here and elsewhere-- new code should use 'bool'. Regards, Dario
>>> On 12.09.16 at 10:16, <dongli.zhang@oracle.com> wrote: > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, > if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) ) > goto fail; > > + d->is_ever_unpaused = false; This it not needed - struct domain starts out as all zeros anyway. > @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d) > { > int old, new, prev = d->controller_pause_count; > > + /* > + * Set is_ever_unpaused to true when this domain gets unpaused for the > + * first time. We record this information here to help populate_physmap > + * verify whether the domain has ever been unpaused. MEMF_no_tlbflush > + * is allowed to be set by populate_physmap only during vm creation. > + */ > + if ( unlikely(!d->is_ever_unpaused) ) > + d->is_ever_unpaused = true; As mentioned before, the conditional is pointless. And just like Dario, I dislike the name of the field. How about "has_run", "was_unpaused", or "is_alive"? Or even better, how about combining this with the is_shutting_down and is_shut_down into an enum? For that latter variant, that would presumably better be a patch on its own then. > @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a) > max_order(curr_d)) ) > return; > > + /* > + * MEMF_no_tlbflush can be set only during vm creation phase when > + * is_ever_unpaused is still false before this domain gets unpaused for > + * the first time. > + */ > + if ( unlikely(!d->is_ever_unpaused) ) > + a->memflags |= MEMF_no_tlbflush; So you no longer mean to expose this to the caller? > @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a) > goto out; > } > > + if ( unlikely(!d->is_ever_unpaused) ) Please check MEMF_no_tlbflush here instead. > + { > + for ( j = 0; j < (1U << a->extent_order); j++ ) > + { > + if ( page_needs_tlbflush(&page[j], need_tlbflush, > + tlbflush_timestamp, > + tlbflush_current_time()) ) > + { > + need_tlbflush = true; > + tlbflush_timestamp = page[j].tlbflush_timestamp; > + } > + } > + } > + > mfn = page_to_mfn(page); > } > > @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a) > } > > out: > + if ( need_tlbflush ) > + { > + cpumask_t mask = cpu_online_map; > + tlbflush_filter(mask, tlbflush_timestamp); Blank line between declarations and statements please. Also, considering this repeats what gets done in page_alloc.c, I think it should also be factored out into a function. And along those lines I think the other abstraction should then also go further and take care of the updating of need_tlbflush and tlbflush_timestamp. Jan
diff --git a/xen/common/domain.c b/xen/common/domain.c index a8804e4..7be1bee 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) ) goto fail; + d->is_ever_unpaused = false; + if ( domcr_flags & DOMCRF_hvm ) d->guest_type = guest_type_hvm; else if ( domcr_flags & DOMCRF_pvh ) @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d) { int old, new, prev = d->controller_pause_count; + /* + * Set is_ever_unpaused to true when this domain gets unpaused for the + * first time. We record this information here to help populate_physmap + * verify whether the domain has ever been unpaused. MEMF_no_tlbflush + * is allowed to be set by populate_physmap only during vm creation. + */ + if ( unlikely(!d->is_ever_unpaused) ) + d->is_ever_unpaused = true; + do { old = prev; diff --git a/xen/common/memory.c b/xen/common/memory.c index cc0f69e..f3a733b 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -141,6 +141,8 @@ static void populate_physmap(struct memop_args *a) unsigned int i, j; xen_pfn_t gpfn, mfn; struct domain *d = a->domain, *curr_d = current->domain; + bool_t need_tlbflush = false; + uint32_t tlbflush_timestamp = 0; if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done, a->nr_extents-1) ) @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a) max_order(curr_d)) ) return; + /* + * MEMF_no_tlbflush can be set only during vm creation phase when + * is_ever_unpaused is still false before this domain gets unpaused for + * the first time. + */ + if ( unlikely(!d->is_ever_unpaused) ) + a->memflags |= MEMF_no_tlbflush; + for ( i = a->nr_done; i < a->nr_extents; i++ ) { if ( i != a->nr_done && hypercall_preempt_check() ) @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a) goto out; } + if ( unlikely(!d->is_ever_unpaused) ) + { + for ( j = 0; j < (1U << a->extent_order); j++ ) + { + if ( page_needs_tlbflush(&page[j], need_tlbflush, + tlbflush_timestamp, + tlbflush_current_time()) ) + { + need_tlbflush = true; + tlbflush_timestamp = page[j].tlbflush_timestamp; + } + } + } + mfn = page_to_mfn(page); } @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a) } out: + if ( need_tlbflush ) + { + cpumask_t mask = cpu_online_map; + tlbflush_filter(mask, tlbflush_timestamp); + if ( !cpumask_empty(&mask) ) + { + perfc_incr(need_flush_tlb_flush); + flush_tlb_mask(&mask); + } + } a->nr_done = i; } diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 5b93a01..04ca26a 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -827,7 +827,8 @@ static struct page_info *alloc_heap_pages( BUG_ON(pg[i].count_info != PGC_state_free); pg[i].count_info = PGC_state_inuse; - if ( page_needs_tlbflush(&pg[i], need_tlbflush, + if ( !(memflags & MEMF_no_tlbflush) && + page_needs_tlbflush(&pg[i], need_tlbflush, tlbflush_timestamp, tlbflush_current_time()) ) { diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index 766559d..04b10e9 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -221,6 +221,8 @@ struct npfec { #define MEMF_exact_node (1U<<_MEMF_exact_node) #define _MEMF_no_owner 5 #define MEMF_no_owner (1U<<_MEMF_no_owner) +#define _MEMF_no_tlbflush 6 +#define MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush) #define _MEMF_node 8 #define MEMF_node_mask ((1U << (8 * sizeof(nodeid_t))) - 1) #define MEMF_node(n) ((((n) + 1) & MEMF_node_mask) << _MEMF_node) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 2f9c15f..7fe8841 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -474,6 +474,9 @@ struct domain unsigned int guest_request_enabled : 1; unsigned int guest_request_sync : 1; } monitor; + + /* set to true the first time this domain gets unpaused. */ + bool_t is_ever_unpaused; }; /* Protect updates/reads (resp.) of domain_list and domain_hash. */
This patch implemented parts of TODO left in commit id a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very slow to create a guest with memory size of more than 100GB on host with 100+ cpus. This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate whether TLB-flush should be done in alloc_heap_pages or its caller populate_physmap. Once this bit is set in memflags, alloc_heap_pages will ignore TLB-flush. To use this bit after vm is created might lead to security issue, that is, this would make pages accessible to the guest B, when guest A may still have a cached mapping to them. Therefore, this patch also introduced a "is_ever_unpaused" field to struct domain to indicate whether this domain has ever got unpaused by hypervisor. MEMF_no_tlbflush can be set only during vm creation phase when is_ever_unpaused is still false before this domain gets unpaused for the first time. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> --- Changed since v3: * Set the flag to true in domain_unpause_by_systemcontroller when unpausing the guest domain for the first time. * Use true/false for all boot_t variables. * Add unlikely to optimize "if statement". * Correct comment style. Changed since v2: * Limit this optimization to domain creation time. --- xen/common/domain.c | 11 +++++++++++ xen/common/memory.c | 34 ++++++++++++++++++++++++++++++++++ xen/common/page_alloc.c | 3 ++- xen/include/xen/mm.h | 2 ++ xen/include/xen/sched.h | 3 +++ 5 files changed, 52 insertions(+), 1 deletion(-)