Message ID | 45c03833059b0ce3f52c02693a2eb649356adf3e.1580755007.git.hongyxia@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/domain_page: implement pure per-vCPU mapping infrastructure | expand |
Thanks, I welcome effort to make Xen more scalable. :-) On Mon, Feb 03, 2020 at 06:36:53PM +0000, Hongyan Xia wrote: > Rewrite the mapcache to be purely per-vCPU instead of partly per-vCPU > and partly per-domain. > > This patch is needed to address performance issues when we start relying > on the mapcache, e.g., when we do not have a direct map. Currently, the > per-domain lock on the mapcache is a bottleneck for multicore, causing > performance degradation and even functional regressions. This patch > makes the mapping structure per-vCPU and completely lockless. > When I see "regression", I think of something that was working before but not anymore. I don't think that's the case for the following two things. I would just classify them as bug and/or improvement. > Functional regression: > > When a domain is run on more than 64 cores, FreeBSD 10 panicks frequently > due to occasional simultaneous set_singleshot_timer hypercalls from too > many cores. Some cores will be blocked waiting on map_domain_page, > eventually failing to set a timer in the future. FreeBSD cannot handle > this and panicks. This was fixed in later FreeBSD releases by handling > -ETIME, but still the degradation in timer performance is a big issue. > > Performance regression: > > Many benchmarks see a performance drop when having a large core count. > I have done a Geekbench on a 32-vCPU guest. > > perf drop old new > single 0.04% 0.18% > multi 2.43% 0.08% > > Removing the per-domain lock in the mapcache brings the multi-core > performance almost identical to using the direct map for mappings. > > There should be room for futher optimisations, but this already > improves over the old mapcache by a lot. > > Note that entries in the maphash will occupy inuse slots. With 16 slots > per vCPU and a maphash capacity of 8, we only have another 8 available, > which is not enough for nested page table walks. We need to increase the > number of slots in config.h. > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> As far as I can tell all vcpus still share the same per-domain mapcache region. The difference is now the region is divided into subregion for each vcpu to use. > --- > xen/arch/x86/domain.c | 5 +- > xen/arch/x86/domain_page.c | 229 +++++++++++------------------------ > xen/include/asm-x86/config.h | 2 +- > xen/include/asm-x86/domain.h | 30 +---- > 4 files changed, 80 insertions(+), 186 deletions(-) > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index f53ae5ff86..a278aa4678 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -445,6 +445,9 @@ void arch_vcpu_destroy(struct vcpu *v) > xfree(v->arch.msrs); > v->arch.msrs = NULL; > > + xfree(v->arch.pv.mapcache); > + v->arch.pv.mapcache = NULL; > + Keep in mind that accessing the union {pv, hvm} before knowing the exact variant is dangerous. Because mapcache is initialised for PV only, so it should be freed for PV only. I think you need to put it to pv_vcpu_destroy. If you don't want to do that because of asymmetry with mapcache_vcpu_init, you may want to invent mapcache_vcpu_destroy for your purpose. (I will need to pull this patch to a branch to see the final incarnation before I review further) Wei.
On Mon, Feb 03, 2020 at 06:36:53PM +0000, Hongyan Xia wrote: [...] > diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c > index dd32712d2f..52971d2ecc 100644 > --- a/xen/arch/x86/domain_page.c > +++ b/xen/arch/x86/domain_page.c > @@ -69,12 +69,11 @@ void __init mapcache_override_current(struct vcpu *v) > > void *map_domain_page(mfn_t mfn) > { > - unsigned long flags; > - unsigned int idx, i; > + unsigned long flags, *phashmfn; > + unsigned int idx, glb_idx, *phashidx, ohashidx; > struct vcpu *v; > - struct mapcache_domain *dcache; > struct mapcache_vcpu *vcache; > - struct vcpu_maphash_entry *hashent; > + void *ret; > > #ifdef NDEBUG > if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) > @@ -82,104 +81,59 @@ void *map_domain_page(mfn_t mfn) > #endif > > v = mapcache_current_vcpu(); > - if ( !v || !is_pv_vcpu(v) ) > + if ( !v || !is_pv_vcpu(v) || !v->arch.pv.mapcache ) > return mfn_to_virt(mfn_x(mfn)); > > - dcache = &v->domain->arch.pv.mapcache; > - vcache = &v->arch.pv.mapcache; > - if ( !dcache->inuse ) > - return mfn_to_virt(mfn_x(mfn)); > + vcache = v->arch.pv.mapcache; > + phashmfn = &vcache->hash_mfn[MAPHASH_HASHFN(mfn_x(mfn))]; > + phashidx = &vcache->hash_idx[MAPHASH_HASHFN(mfn_x(mfn))]; > > perfc_incr(map_domain_page_count); > > local_irq_save(flags); > > - hashent = &vcache->hash[MAPHASH_HASHFN(mfn_x(mfn))]; > - if ( hashent->mfn == mfn_x(mfn) ) > + ohashidx = *phashidx; This is a bit redundant because you never rewrite *phashidx until the last minute. I think ohashidx can be removed, but it's up to you. > + if ( *phashmfn != mfn_x(mfn) ) > { > - idx = hashent->idx; > - ASSERT(idx < dcache->entries); > - hashent->refcnt++; > - ASSERT(hashent->refcnt); > - ASSERT(mfn_eq(l1e_get_mfn(MAPCACHE_L1ENT(idx)), mfn)); > - goto out; > - } > + idx = find_first_zero_bit(vcache->inuse, MAPCACHE_VCPU_ENTRIES); > + BUG_ON(idx >= MAPCACHE_VCPU_ENTRIES); > > - spin_lock(&dcache->lock); > + ASSERT(vcache->refcnt[idx] == 0); > + __set_bit(idx, vcache->inuse); > > - /* Has some other CPU caused a wrap? We must flush if so. */ [...] > void unmap_domain_page(const void *ptr) > { > - unsigned int idx; > + unsigned int idx, glb_idx; > struct vcpu *v; > - struct mapcache_domain *dcache; > - unsigned long va = (unsigned long)ptr, mfn, flags; > - struct vcpu_maphash_entry *hashent; > + struct mapcache_vcpu *vcache; > + unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags; > > if ( va >= DIRECTMAP_VIRT_START ) > return; > @@ -189,73 +143,21 @@ void unmap_domain_page(const void *ptr) > v = mapcache_current_vcpu(); > ASSERT(v && is_pv_vcpu(v)); > > - dcache = &v->domain->arch.pv.mapcache; > - ASSERT(dcache->inuse); > - > - idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > - mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx)); > - hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)]; > + vcache = v->arch.pv.mapcache; > + ASSERT(vcache); > > + glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > + idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES; Add a blank line here please. > local_irq_save(flags); > > - if ( hashent->idx == idx ) > - { > - ASSERT(hashent->mfn == mfn); > - ASSERT(hashent->refcnt); > - hashent->refcnt--; > - } > - else if ( !hashent->refcnt ) > - { > - if ( hashent->idx != MAPHASHENT_NOTINUSE ) > - { > - /* /First/, zap the PTE. */ > - ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) == > - hashent->mfn); > - l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty()); > - /* /Second/, mark as garbage. */ > - set_bit(hashent->idx, dcache->garbage); > - } > - > - /* Add newly-freed mapping to the maphash. */ > - hashent->mfn = mfn; > - hashent->idx = idx; > - } > - else > - { > - /* /First/, zap the PTE. */ > - l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty()); > - /* /Second/, mark as garbage. */ > - set_bit(idx, dcache->garbage); > - } > - > - local_irq_restore(flags); > -} > - > -int mapcache_domain_init(struct domain *d) > -{ > - struct mapcache_domain *dcache = &d->arch.pv.mapcache; > - unsigned int bitmap_pages; > + mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx)); > + hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)]; > > - ASSERT(is_pv_domain(d)); > + vcache->refcnt[idx]--; > + if ( hashmfn != mfn && !vcache->refcnt[idx] ) > + __clear_bit(idx, vcache->inuse); > > -#ifdef NDEBUG > - if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) > - return 0; > -#endif Would be great if some form or shape of this check can be added to mapcache_vcpu_init such that you don't unnecessarily initialise data structures that will never be used. > - > - BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 + > - 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) > > - MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20)); > - bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long)); > - dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE; > - dcache->garbage = dcache->inuse + > - (bitmap_pages + 1) * PAGE_SIZE / sizeof(long); > - > - spin_lock_init(&dcache->lock); > - > - return create_perdomain_mapping(d, (unsigned long)dcache->inuse, > - 2 * bitmap_pages + 1, > - NIL(l1_pgentry_t *), NULL); > + local_irq_restore(flags); > } > [...] > diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h > index d0cfbb70a8..4b2217151b 100644 > --- a/xen/include/asm-x86/config.h > +++ b/xen/include/asm-x86/config.h > @@ -296,7 +296,7 @@ extern unsigned long xen_phys_start; > (GDT_VIRT_START(v) + (64*1024)) > > /* map_domain_page() map cache. The second per-domain-mapping sub-area. */ > -#define MAPCACHE_VCPU_ENTRIES (CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS) > +#define MAPCACHE_VCPU_ENTRIES 32 > #define MAPCACHE_ENTRIES (MAX_VIRT_CPUS * MAPCACHE_VCPU_ENTRIES) > #define MAPCACHE_VIRT_START PERDOMAIN_VIRT_SLOT(1) > #define MAPCACHE_VIRT_END (MAPCACHE_VIRT_START + \ > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index a3ae5d9a20..367bba7110 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -40,35 +40,17 @@ struct trap_bounce { > #define MAPHASH_HASHFN(pfn) ((pfn) & (MAPHASH_ENTRIES-1)) > #define MAPHASHENT_NOTINUSE ((u32)~0U) > struct mapcache_vcpu { > - /* Shadow of mapcache_domain.epoch. */ > - unsigned int shadow_epoch; > - > - /* Lock-free per-VCPU hash of recently-used mappings. */ > - struct vcpu_maphash_entry { > - unsigned long mfn; > - uint32_t idx; > - uint32_t refcnt; > - } hash[MAPHASH_ENTRIES]; > + unsigned long hash_mfn[MAPHASH_ENTRIES]; > + uint32_t hash_idx[MAPHASH_ENTRIES]; > + > + uint8_t refcnt[MAPCACHE_VCPU_ENTRIES]; What's the reason for breaking vcpu_maphash_entry into three distinct arrays? Would it make the code shorter/cleaner if you make everything MAPCACHE_VCPU_ENTRIES? A vcpu now only needs to manage its own subregion after all. > + unsigned long inuse[BITS_TO_LONGS(MAPCACHE_VCPU_ENTRIES)]; > }; > > struct mapcache_domain { > - /* The number of array entries, and a cursor into the array. */ > unsigned int entries; > - unsigned int cursor; > - > - /* Protects map_domain_page(). */ > - spinlock_t lock; > - > - /* Garbage mappings are flushed from TLBs in batches called 'epochs'. */ > - unsigned int epoch; > - u32 tlbflush_timestamp; > - > - /* Which mappings are in use, and which are garbage to reap next epoch? */ > - unsigned long *inuse; > - unsigned long *garbage; > }; > > -int mapcache_domain_init(struct domain *); > int mapcache_vcpu_init(struct vcpu *); > void mapcache_override_current(struct vcpu *); > > @@ -473,7 +455,7 @@ struct arch_domain > struct pv_vcpu > { > /* map_domain_page() mapping cache. */ > - struct mapcache_vcpu mapcache; > + struct mapcache_vcpu *mapcache; And why do you need to change this to a pointer? Is it now very large and causing issue(s)? Wei. > > unsigned int vgc_flags; > > -- > 2.17.1 >
On 03/02/2020 18:36, Hongyan Xia wrote: > Rewrite the mapcache to be purely per-vCPU instead of partly per-vCPU > and partly per-domain. > > This patch is needed to address performance issues when we start relying > on the mapcache, e.g., when we do not have a direct map. Currently, the > per-domain lock on the mapcache is a bottleneck for multicore, causing > performance degradation and even functional regressions. Do you mean that this patch causes a regression, or that removing the directmap causes a regression? The rest of the commit message is very confusing to follow. > This patch > makes the mapping structure per-vCPU and completely lockless. > > Functional regression: > > When a domain is run on more than 64 cores, FreeBSD 10 panicks frequently > due to occasional simultaneous set_singleshot_timer hypercalls from too > many cores. Some cores will be blocked waiting on map_domain_page, > eventually failing to set a timer in the future. FreeBSD cannot handle > this and panicks. This was fixed in later FreeBSD releases by handling > -ETIME, but still the degradation in timer performance is a big issue. > > Performance regression: > > Many benchmarks see a performance drop when having a large core count. > I have done a Geekbench on a 32-vCPU guest. > > perf drop old new > single 0.04% 0.18% > multi 2.43% 0.08% > > Removing the per-domain lock in the mapcache brings the multi-core > performance almost identical to using the direct map for mappings. > > There should be room for futher optimisations, but this already > improves over the old mapcache by a lot. > > Note that entries in the maphash will occupy inuse slots. With 16 slots > per vCPU and a maphash capacity of 8, we only have another 8 available, > which is not enough for nested page table walks. We need to increase the > number of slots in config.h. I'm afraid that I don't follow what you're trying to say here. The number of slots should either be fine, or we've got a pre-exiting bug. ~Andrew
On Tue, 2020-02-04 at 14:00 +0000, Wei Liu wrote: > On Mon, Feb 03, 2020 at 06:36:53PM +0000, Hongyan Xia wrote: > [...] > > diff --git a/xen/arch/x86/domain_page.c > > b/xen/arch/x86/domain_page.c > > index dd32712d2f..52971d2ecc 100644 > > --- a/xen/arch/x86/domain_page.c > > +++ b/xen/arch/x86/domain_page.c > > @@ -69,12 +69,11 @@ void __init mapcache_override_current(struct > > vcpu *v) > > > > void *map_domain_page(mfn_t mfn) > > { > > - unsigned long flags; > > - unsigned int idx, i; > > + unsigned long flags, *phashmfn; > > + unsigned int idx, glb_idx, *phashidx, ohashidx; > > struct vcpu *v; > > - struct mapcache_domain *dcache; > > struct mapcache_vcpu *vcache; > > - struct vcpu_maphash_entry *hashent; > > + void *ret; > > > > #ifdef NDEBUG > > if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) > > @@ -82,104 +81,59 @@ void *map_domain_page(mfn_t mfn) > > #endif > > > > v = mapcache_current_vcpu(); > > - if ( !v || !is_pv_vcpu(v) ) > > + if ( !v || !is_pv_vcpu(v) || !v->arch.pv.mapcache ) > > return mfn_to_virt(mfn_x(mfn)); > > > > - dcache = &v->domain->arch.pv.mapcache; > > - vcache = &v->arch.pv.mapcache; > > - if ( !dcache->inuse ) > > - return mfn_to_virt(mfn_x(mfn)); > > + vcache = v->arch.pv.mapcache; > > + phashmfn = &vcache->hash_mfn[MAPHASH_HASHFN(mfn_x(mfn))]; > > + phashidx = &vcache->hash_idx[MAPHASH_HASHFN(mfn_x(mfn))]; > > > > perfc_incr(map_domain_page_count); > > > > local_irq_save(flags); > > > > - hashent = &vcache->hash[MAPHASH_HASHFN(mfn_x(mfn))]; > > - if ( hashent->mfn == mfn_x(mfn) ) > > + ohashidx = *phashidx; > > This is a bit redundant because you never rewrite *phashidx until the > last minute. I think ohashidx can be removed, but it's up to you. I actually just want to tell the compiler that ohashidx does not change and it does not have to re-read phashidx every time, in case it attempts to do so. Functionally speaking, I agree that removing it does not change anything. > > > + if ( *phashmfn != mfn_x(mfn) ) > > { > > - idx = hashent->idx; > > - ASSERT(idx < dcache->entries); > > - hashent->refcnt++; > > - ASSERT(hashent->refcnt); > > - ASSERT(mfn_eq(l1e_get_mfn(MAPCACHE_L1ENT(idx)), mfn)); > > - goto out; > > - } > > + idx = find_first_zero_bit(vcache->inuse, > > MAPCACHE_VCPU_ENTRIES); > > + BUG_ON(idx >= MAPCACHE_VCPU_ENTRIES); > > > > - spin_lock(&dcache->lock); > > + ASSERT(vcache->refcnt[idx] == 0); > > + __set_bit(idx, vcache->inuse); > > > > - /* Has some other CPU caused a wrap? We must flush if so. */ > > [...] > > void unmap_domain_page(const void *ptr) > > { > > - unsigned int idx; > > + unsigned int idx, glb_idx; > > struct vcpu *v; > > - struct mapcache_domain *dcache; > > - unsigned long va = (unsigned long)ptr, mfn, flags; > > - struct vcpu_maphash_entry *hashent; > > + struct mapcache_vcpu *vcache; > > + unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags; > > > > if ( va >= DIRECTMAP_VIRT_START ) > > return; > > @@ -189,73 +143,21 @@ void unmap_domain_page(const void *ptr) > > v = mapcache_current_vcpu(); > > ASSERT(v && is_pv_vcpu(v)); > > > > - dcache = &v->domain->arch.pv.mapcache; > > - ASSERT(dcache->inuse); > > - > > - idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > > - mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx)); > > - hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)]; > > + vcache = v->arch.pv.mapcache; > > + ASSERT(vcache); > > > > + glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > > + idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES; > > Add a blank line here please. > > > ... > > - > > -int mapcache_domain_init(struct domain *d) > > -{ > > - struct mapcache_domain *dcache = &d->arch.pv.mapcache; > > - unsigned int bitmap_pages; > > + mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx)); > > + hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)]; > > > > - ASSERT(is_pv_domain(d)); > > + vcache->refcnt[idx]--; > > + if ( hashmfn != mfn && !vcache->refcnt[idx] ) > > + __clear_bit(idx, vcache->inuse); > > > > -#ifdef NDEBUG > > - if ( !mem_hotplug && max_page <= > > PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) > > - return 0; > > -#endif > > Would be great if some form or shape of this check can be added to > mapcache_vcpu_init such that you don't unnecessarily initialise data > structures that will never be used. Good idea. > > ... > > diff --git a/xen/include/asm-x86/config.h b/xen/include/asm- > > x86/config.h > > index d0cfbb70a8..4b2217151b 100644 > > --- a/xen/include/asm-x86/config.h > > +++ b/xen/include/asm-x86/config.h > > @@ -296,7 +296,7 @@ extern unsigned long xen_phys_start; > > (GDT_VIRT_START(v) + (64*1024)) > > > > /* map_domain_page() map cache. The second per-domain-mapping sub- > > area. */ > > -#define MAPCACHE_VCPU_ENTRIES (CONFIG_PAGING_LEVELS * > > CONFIG_PAGING_LEVELS) > > +#define MAPCACHE_VCPU_ENTRIES 32 > > #define MAPCACHE_ENTRIES (MAX_VIRT_CPUS * > > MAPCACHE_VCPU_ENTRIES) > > #define MAPCACHE_VIRT_START PERDOMAIN_VIRT_SLOT(1) > > #define MAPCACHE_VIRT_END (MAPCACHE_VIRT_START + \ > > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm- > > x86/domain.h > > index a3ae5d9a20..367bba7110 100644 > > --- a/xen/include/asm-x86/domain.h > > +++ b/xen/include/asm-x86/domain.h > > @@ -40,35 +40,17 @@ struct trap_bounce { > > #define MAPHASH_HASHFN(pfn) ((pfn) & (MAPHASH_ENTRIES-1)) > > #define MAPHASHENT_NOTINUSE ((u32)~0U) > > struct mapcache_vcpu { > > - /* Shadow of mapcache_domain.epoch. */ > > - unsigned int shadow_epoch; > > - > > - /* Lock-free per-VCPU hash of recently-used mappings. */ > > - struct vcpu_maphash_entry { > > - unsigned long mfn; > > - uint32_t idx; > > - uint32_t refcnt; > > - } hash[MAPHASH_ENTRIES]; > > + unsigned long hash_mfn[MAPHASH_ENTRIES]; > > + uint32_t hash_idx[MAPHASH_ENTRIES]; > > + > > + uint8_t refcnt[MAPCACHE_VCPU_ENTRIES]; > > What's the reason for breaking vcpu_maphash_entry into three distinct > arrays? > > Would it make the code shorter/cleaner if you make everything > MAPCACHE_VCPU_ENTRIES? A vcpu now only needs to manage its own > subregion > after all. In this design I cannot. MAPCACHE_VCPU_ENTRIES tells how many slots a vcpu can use, and MAPHASH_ENTRIES tells how many entries there are in the per-vcpu hash table (the true cache). Maphash will not immediately clear the inuse bit when the refcnt drops to 0, so the worst case is that even if you unmap everything, you still only have MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES free slots, so MAPCACHE_VCPU_ENTRIES has to be larger than MAPHASH_ENTRIES. Also I refcnt all slots, not just entries in the maphash. hash_mfn and hash_idx should be able to combine into a single structure, but I don't like the 25% wasted memory padding when you have unsigned long followed by an int in a struct... > > + unsigned long inuse[BITS_TO_LONGS(MAPCACHE_VCPU_ENTRIES)]; > > }; > > > > struct mapcache_domain { > > - /* The number of array entries, and a cursor into the array. > > */ > > unsigned int entries; > > - unsigned int cursor; > > - > > - /* Protects map_domain_page(). */ > > - spinlock_t lock; > > - > > - /* Garbage mappings are flushed from TLBs in batches called > > 'epochs'. */ > > - unsigned int epoch; > > - u32 tlbflush_timestamp; > > - > > - /* Which mappings are in use, and which are garbage to reap > > next epoch? */ > > - unsigned long *inuse; > > - unsigned long *garbage; > > }; > > > > -int mapcache_domain_init(struct domain *); > > int mapcache_vcpu_init(struct vcpu *); > > void mapcache_override_current(struct vcpu *); > > > > @@ -473,7 +455,7 @@ struct arch_domain > > struct pv_vcpu > > { > > /* map_domain_page() mapping cache. */ > > - struct mapcache_vcpu mapcache; > > + struct mapcache_vcpu *mapcache; > > And why do you need to change this to a pointer? Is it now very large > and causing issue(s)? Yes. It is now pushing struct vcpu close to the size of a page, which we cannot exceed. Also, changing it into a pointer allows us to experiment with larger caches later. Hongyan
On Tue, 2020-02-04 at 14:17 +0000, Andrew Cooper wrote: > On 03/02/2020 18:36, Hongyan Xia wrote: > > Rewrite the mapcache to be purely per-vCPU instead of partly per- > > vCPU > > and partly per-domain. > > > > This patch is needed to address performance issues when we start > > relying > > on the mapcache, e.g., when we do not have a direct map. Currently, > > the > > per-domain lock on the mapcache is a bottleneck for multicore, > > causing > > performance degradation and even functional regressions. > > Do you mean that this patch causes a regression, or that removing the > directmap causes a regression? > > The rest of the commit message is very confusing to follow. Once the direct map is gone, using the existing mapcache implementation in map_domain_page causes these problems. Even if the direct map is still there, currently some guests on debug build rely on the mapcache, which will see similar problems when the vCPU count is high. I can reword the commit message to make it clearer. > > This patch > > makes the mapping structure per-vCPU and completely lockless. > > > > Functional regression: > > > > When a domain is run on more than 64 cores, FreeBSD 10 panicks > > frequently > > due to occasional simultaneous set_singleshot_timer hypercalls from > > too > > many cores. Some cores will be blocked waiting on map_domain_page, > > eventually failing to set a timer in the future. FreeBSD cannot > > handle > > this and panicks. This was fixed in later FreeBSD releases by > > handling > > -ETIME, but still the degradation in timer performance is a big > > issue. > > > > Performance regression: > > > > Many benchmarks see a performance drop when having a large core > > count. > > I have done a Geekbench on a 32-vCPU guest. > > > > perf drop old new > > single 0.04% 0.18% > > multi 2.43% 0.08% > > > > Removing the per-domain lock in the mapcache brings the multi-core > > performance almost identical to using the direct map for mappings. > > > > There should be room for futher optimisations, but this already > > improves over the old mapcache by a lot. > > > > Note that entries in the maphash will occupy inuse slots. With 16 > > slots > > per vCPU and a maphash capacity of 8, we only have another 8 > > available, > > which is not enough for nested page table walks. We need to > > increase the > > number of slots in config.h. > > I'm afraid that I don't follow what you're trying to say here. The > number of slots should either be fine, or we've got a pre-exiting > bug. The mapcache design is now different. The slots now have to include spaces for the maphash, which was not the case before. Hongyan
On Tue, 2020-02-04 at 12:08 +0000, Wei Liu wrote: > Thanks, I welcome effort to make Xen more scalable. :-) > > On Mon, Feb 03, 2020 at 06:36:53PM +0000, Hongyan Xia wrote: > > Rewrite the mapcache to be purely per-vCPU instead of partly per- > > vCPU > > and partly per-domain. > > > > This patch is needed to address performance issues when we start > > relying > > on the mapcache, e.g., when we do not have a direct map. Currently, > > the > > per-domain lock on the mapcache is a bottleneck for multicore, > > causing > > performance degradation and even functional regressions. This patch > > makes the mapping structure per-vCPU and completely lockless. > > > > When I see "regression", I think of something that was working before > but not anymore. I don't think that's the case for the following two > things. > > I would just classify them as bug and/or improvement. We probably have different definitions for "regression"... but I can reword. > > Functional regression: > > > > When a domain is run on more than 64 cores, FreeBSD 10 panicks > > frequently > > due to occasional simultaneous set_singleshot_timer hypercalls from > > too > > many cores. Some cores will be blocked waiting on map_domain_page, > > eventually failing to set a timer in the future. FreeBSD cannot > > handle > > this and panicks. This was fixed in later FreeBSD releases by > > handling > > -ETIME, but still the degradation in timer performance is a big > > issue. > > > > Performance regression: > > > > Many benchmarks see a performance drop when having a large core > > count. > > I have done a Geekbench on a 32-vCPU guest. > > > > perf drop old new > > single 0.04% 0.18% > > multi 2.43% 0.08% > > > > Removing the per-domain lock in the mapcache brings the multi-core > > performance almost identical to using the direct map for mappings. > > > > There should be room for futher optimisations, but this already > > improves over the old mapcache by a lot. > > > > Note that entries in the maphash will occupy inuse slots. With 16 > > slots > > per vCPU and a maphash capacity of 8, we only have another 8 > > available, > > which is not enough for nested page table walks. We need to > > increase the > > number of slots in config.h. > > > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> > > > As far as I can tell all vcpus still share the same per-domain > mapcache > region. The difference is now the region is divided into subregion > for > each vcpu to use. You are right. We have a per-domain VA range and we have create_perdomain_mappings which can be reused nicely. The patch divides the regions into vCPUs. > > > --- > > xen/arch/x86/domain.c | 5 +- > > xen/arch/x86/domain_page.c | 229 +++++++++++------------------ > > ------ > > xen/include/asm-x86/config.h | 2 +- > > xen/include/asm-x86/domain.h | 30 +---- > > 4 files changed, 80 insertions(+), 186 deletions(-) > > > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > > index f53ae5ff86..a278aa4678 100644 > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -445,6 +445,9 @@ void arch_vcpu_destroy(struct vcpu *v) > > xfree(v->arch.msrs); > > v->arch.msrs = NULL; > > > > + xfree(v->arch.pv.mapcache); > > + v->arch.pv.mapcache = NULL; > > + > > Keep in mind that accessing the union {pv, hvm} before knowing the > exact variant is dangerous. > > Because mapcache is initialised for PV only, so it should be freed > for > PV only. I think you need to put it to pv_vcpu_destroy. > > If you don't want to do that because of asymmetry with > mapcache_vcpu_init, you may want to invent mapcache_vcpu_destroy for > your purpose. Ah right this is a problem. I was working on a tree where everyone has a mapcache, which is not true for current upstream when I cherry- picked. Will fix. Hongyan
On Tue, Feb 04, 2020 at 04:02:38PM +0000, Xia, Hongyan wrote: [...] > > Keep in mind that accessing the union {pv, hvm} before knowing the > > exact variant is dangerous. > > > > Because mapcache is initialised for PV only, so it should be freed > > for > > PV only. I think you need to put it to pv_vcpu_destroy. > > > > If you don't want to do that because of asymmetry with > > mapcache_vcpu_init, you may want to invent mapcache_vcpu_destroy for > > your purpose. > > Ah right this is a problem. I was working on a tree where everyone has > a mapcache, which is not true for current upstream when I cherry- > picked. Will fix. > If the future direction is for both hvm and pv to have a mapcache (?) then I would certainly recommend inventing a mapcache_vcpu_destroy. Wei. > Hongyan
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index f53ae5ff86..a278aa4678 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -445,6 +445,9 @@ void arch_vcpu_destroy(struct vcpu *v) xfree(v->arch.msrs); v->arch.msrs = NULL; + xfree(v->arch.pv.mapcache); + v->arch.pv.mapcache = NULL; + if ( is_hvm_vcpu(v) ) hvm_vcpu_destroy(v); else @@ -633,8 +636,6 @@ int arch_domain_create(struct domain *d, } else if ( is_pv_domain(d) ) { - mapcache_domain_init(d); - if ( (rc = pv_domain_initialise(d)) != 0 ) goto fail; } diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index dd32712d2f..52971d2ecc 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -69,12 +69,11 @@ void __init mapcache_override_current(struct vcpu *v) void *map_domain_page(mfn_t mfn) { - unsigned long flags; - unsigned int idx, i; + unsigned long flags, *phashmfn; + unsigned int idx, glb_idx, *phashidx, ohashidx; struct vcpu *v; - struct mapcache_domain *dcache; struct mapcache_vcpu *vcache; - struct vcpu_maphash_entry *hashent; + void *ret; #ifdef NDEBUG if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) @@ -82,104 +81,59 @@ void *map_domain_page(mfn_t mfn) #endif v = mapcache_current_vcpu(); - if ( !v || !is_pv_vcpu(v) ) + if ( !v || !is_pv_vcpu(v) || !v->arch.pv.mapcache ) return mfn_to_virt(mfn_x(mfn)); - dcache = &v->domain->arch.pv.mapcache; - vcache = &v->arch.pv.mapcache; - if ( !dcache->inuse ) - return mfn_to_virt(mfn_x(mfn)); + vcache = v->arch.pv.mapcache; + phashmfn = &vcache->hash_mfn[MAPHASH_HASHFN(mfn_x(mfn))]; + phashidx = &vcache->hash_idx[MAPHASH_HASHFN(mfn_x(mfn))]; perfc_incr(map_domain_page_count); local_irq_save(flags); - hashent = &vcache->hash[MAPHASH_HASHFN(mfn_x(mfn))]; - if ( hashent->mfn == mfn_x(mfn) ) + ohashidx = *phashidx; + if ( *phashmfn != mfn_x(mfn) ) { - idx = hashent->idx; - ASSERT(idx < dcache->entries); - hashent->refcnt++; - ASSERT(hashent->refcnt); - ASSERT(mfn_eq(l1e_get_mfn(MAPCACHE_L1ENT(idx)), mfn)); - goto out; - } + idx = find_first_zero_bit(vcache->inuse, MAPCACHE_VCPU_ENTRIES); + BUG_ON(idx >= MAPCACHE_VCPU_ENTRIES); - spin_lock(&dcache->lock); + ASSERT(vcache->refcnt[idx] == 0); + __set_bit(idx, vcache->inuse); - /* Has some other CPU caused a wrap? We must flush if so. */ - if ( unlikely(dcache->epoch != vcache->shadow_epoch) ) - { - vcache->shadow_epoch = dcache->epoch; - if ( NEED_FLUSH(this_cpu(tlbflush_time), dcache->tlbflush_timestamp) ) - { - perfc_incr(domain_page_tlb_flush); - flush_tlb_local(); - } - } + glb_idx = idx + v->vcpu_id * MAPCACHE_VCPU_ENTRIES; + l1e_write(&MAPCACHE_L1ENT(glb_idx), + l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW)); + ret = (void *)MAPCACHE_VIRT_START + pfn_to_paddr(glb_idx); + flush_tlb_one_local(ret); + + if ( ohashidx != MAPHASHENT_NOTINUSE && !vcache->refcnt[ohashidx] ) + __clear_bit(ohashidx, vcache->inuse); - idx = find_next_zero_bit(dcache->inuse, dcache->entries, dcache->cursor); - if ( unlikely(idx >= dcache->entries) ) + *phashmfn = mfn_x(mfn); + *phashidx = idx; + } + else { - unsigned long accum = 0, prev = 0; - - /* /First/, clean the garbage map and update the inuse list. */ - for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ ) - { - accum |= prev; - dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0); - prev = ~dcache->inuse[i]; - } - - if ( accum | (prev & BITMAP_LAST_WORD_MASK(dcache->entries)) ) - idx = find_first_zero_bit(dcache->inuse, dcache->entries); - else - { - /* Replace a hash entry instead. */ - i = MAPHASH_HASHFN(mfn_x(mfn)); - do { - hashent = &vcache->hash[i]; - if ( hashent->idx != MAPHASHENT_NOTINUSE && !hashent->refcnt ) - { - idx = hashent->idx; - ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(idx)) == hashent->mfn); - l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty()); - hashent->idx = MAPHASHENT_NOTINUSE; - hashent->mfn = ~0UL; - break; - } - if ( ++i == MAPHASH_ENTRIES ) - i = 0; - } while ( i != MAPHASH_HASHFN(mfn_x(mfn)) ); - } - BUG_ON(idx >= dcache->entries); - - /* /Second/, flush TLBs. */ - perfc_incr(domain_page_tlb_flush); - flush_tlb_local(); - vcache->shadow_epoch = ++dcache->epoch; - dcache->tlbflush_timestamp = tlbflush_current_time(); + idx = ohashidx; + glb_idx = idx + v->vcpu_id * MAPCACHE_VCPU_ENTRIES; + ret = (void *)MAPCACHE_VIRT_START + pfn_to_paddr(glb_idx); } - set_bit(idx, dcache->inuse); - dcache->cursor = idx + 1; - - spin_unlock(&dcache->lock); - - l1e_write(&MAPCACHE_L1ENT(idx), l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW)); + vcache->refcnt[idx]++; + ASSERT(vcache->refcnt[idx]); + ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(glb_idx)) == mfn_x(mfn)); - out: local_irq_restore(flags); - return (void *)MAPCACHE_VIRT_START + pfn_to_paddr(idx); + return ret; } void unmap_domain_page(const void *ptr) { - unsigned int idx; + unsigned int idx, glb_idx; struct vcpu *v; - struct mapcache_domain *dcache; - unsigned long va = (unsigned long)ptr, mfn, flags; - struct vcpu_maphash_entry *hashent; + struct mapcache_vcpu *vcache; + unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags; if ( va >= DIRECTMAP_VIRT_START ) return; @@ -189,73 +143,21 @@ void unmap_domain_page(const void *ptr) v = mapcache_current_vcpu(); ASSERT(v && is_pv_vcpu(v)); - dcache = &v->domain->arch.pv.mapcache; - ASSERT(dcache->inuse); - - idx = PFN_DOWN(va - MAPCACHE_VIRT_START); - mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx)); - hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)]; + vcache = v->arch.pv.mapcache; + ASSERT(vcache); + glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START); + idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES; local_irq_save(flags); - if ( hashent->idx == idx ) - { - ASSERT(hashent->mfn == mfn); - ASSERT(hashent->refcnt); - hashent->refcnt--; - } - else if ( !hashent->refcnt ) - { - if ( hashent->idx != MAPHASHENT_NOTINUSE ) - { - /* /First/, zap the PTE. */ - ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) == - hashent->mfn); - l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty()); - /* /Second/, mark as garbage. */ - set_bit(hashent->idx, dcache->garbage); - } - - /* Add newly-freed mapping to the maphash. */ - hashent->mfn = mfn; - hashent->idx = idx; - } - else - { - /* /First/, zap the PTE. */ - l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty()); - /* /Second/, mark as garbage. */ - set_bit(idx, dcache->garbage); - } - - local_irq_restore(flags); -} - -int mapcache_domain_init(struct domain *d) -{ - struct mapcache_domain *dcache = &d->arch.pv.mapcache; - unsigned int bitmap_pages; + mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx)); + hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)]; - ASSERT(is_pv_domain(d)); + vcache->refcnt[idx]--; + if ( hashmfn != mfn && !vcache->refcnt[idx] ) + __clear_bit(idx, vcache->inuse); -#ifdef NDEBUG - if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) - return 0; -#endif - - BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 + - 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) > - MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20)); - bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long)); - dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE; - dcache->garbage = dcache->inuse + - (bitmap_pages + 1) * PAGE_SIZE / sizeof(long); - - spin_lock_init(&dcache->lock); - - return create_perdomain_mapping(d, (unsigned long)dcache->inuse, - 2 * bitmap_pages + 1, - NIL(l1_pgentry_t *), NULL); + local_irq_restore(flags); } int mapcache_vcpu_init(struct vcpu *v) @@ -264,39 +166,48 @@ int mapcache_vcpu_init(struct vcpu *v) struct mapcache_domain *dcache = &d->arch.pv.mapcache; unsigned long i; unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; - unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long)); - if ( !is_pv_vcpu(v) || !dcache->inuse ) + if ( !is_pv_vcpu(v) ) return 0; + BUILD_BUG_ON(MAPCACHE_VIRT_END > ARG_XLAT_VIRT_START); + if ( ents > dcache->entries ) { + int rc; + + ASSERT(ents * PAGE_SIZE <= (PERDOMAIN_SLOT_MBYTES << 20)); + /* Populate page tables. */ - int rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents, + rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents, NIL(l1_pgentry_t *), NULL); - /* Populate bit maps. */ - if ( !rc ) - rc = create_perdomain_mapping(d, (unsigned long)dcache->inuse, - nr, NULL, NIL(struct page_info *)); - if ( !rc ) - rc = create_perdomain_mapping(d, (unsigned long)dcache->garbage, - nr, NULL, NIL(struct page_info *)); - if ( rc ) return rc; dcache->entries = ents; } - /* Mark all maphash entries as not in use. */ BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES); + /* MAPHASH_ENTRIES has to be power-of-two to make hashing work. */ + BUILD_BUG_ON(MAPHASH_ENTRIES & (MAPHASH_ENTRIES - 1)); + /* + * Since entries in the maphash also occupy inuse slots, we have to make + * sure MAPCACHE_VCPU_ENTRIES is large enough to accommodate both the + * maphash and a nested page table walk. + */ + BUILD_BUG_ON(MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES >= + CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS); + + v->arch.pv.mapcache = xzalloc(struct mapcache_vcpu); + if ( !v->arch.pv.mapcache ) + return -ENOMEM; + + /* Mark all maphash entries as not in use. */ for ( i = 0; i < MAPHASH_ENTRIES; i++ ) { - struct vcpu_maphash_entry *hashent = &v->arch.pv.mapcache.hash[i]; - - hashent->mfn = ~0UL; /* never valid to map */ - hashent->idx = MAPHASHENT_NOTINUSE; + v->arch.pv.mapcache->hash_mfn[i] = ~0UL; + v->arch.pv.mapcache->hash_idx[i] = MAPHASHENT_NOTINUSE; } return 0; diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h index d0cfbb70a8..4b2217151b 100644 --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -296,7 +296,7 @@ extern unsigned long xen_phys_start; (GDT_VIRT_START(v) + (64*1024)) /* map_domain_page() map cache. The second per-domain-mapping sub-area. */ -#define MAPCACHE_VCPU_ENTRIES (CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS) +#define MAPCACHE_VCPU_ENTRIES 32 #define MAPCACHE_ENTRIES (MAX_VIRT_CPUS * MAPCACHE_VCPU_ENTRIES) #define MAPCACHE_VIRT_START PERDOMAIN_VIRT_SLOT(1) #define MAPCACHE_VIRT_END (MAPCACHE_VIRT_START + \ diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index a3ae5d9a20..367bba7110 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -40,35 +40,17 @@ struct trap_bounce { #define MAPHASH_HASHFN(pfn) ((pfn) & (MAPHASH_ENTRIES-1)) #define MAPHASHENT_NOTINUSE ((u32)~0U) struct mapcache_vcpu { - /* Shadow of mapcache_domain.epoch. */ - unsigned int shadow_epoch; - - /* Lock-free per-VCPU hash of recently-used mappings. */ - struct vcpu_maphash_entry { - unsigned long mfn; - uint32_t idx; - uint32_t refcnt; - } hash[MAPHASH_ENTRIES]; + unsigned long hash_mfn[MAPHASH_ENTRIES]; + uint32_t hash_idx[MAPHASH_ENTRIES]; + + uint8_t refcnt[MAPCACHE_VCPU_ENTRIES]; + unsigned long inuse[BITS_TO_LONGS(MAPCACHE_VCPU_ENTRIES)]; }; struct mapcache_domain { - /* The number of array entries, and a cursor into the array. */ unsigned int entries; - unsigned int cursor; - - /* Protects map_domain_page(). */ - spinlock_t lock; - - /* Garbage mappings are flushed from TLBs in batches called 'epochs'. */ - unsigned int epoch; - u32 tlbflush_timestamp; - - /* Which mappings are in use, and which are garbage to reap next epoch? */ - unsigned long *inuse; - unsigned long *garbage; }; -int mapcache_domain_init(struct domain *); int mapcache_vcpu_init(struct vcpu *); void mapcache_override_current(struct vcpu *); @@ -473,7 +455,7 @@ struct arch_domain struct pv_vcpu { /* map_domain_page() mapping cache. */ - struct mapcache_vcpu mapcache; + struct mapcache_vcpu *mapcache; unsigned int vgc_flags;
Rewrite the mapcache to be purely per-vCPU instead of partly per-vCPU and partly per-domain. This patch is needed to address performance issues when we start relying on the mapcache, e.g., when we do not have a direct map. Currently, the per-domain lock on the mapcache is a bottleneck for multicore, causing performance degradation and even functional regressions. This patch makes the mapping structure per-vCPU and completely lockless. Functional regression: When a domain is run on more than 64 cores, FreeBSD 10 panicks frequently due to occasional simultaneous set_singleshot_timer hypercalls from too many cores. Some cores will be blocked waiting on map_domain_page, eventually failing to set a timer in the future. FreeBSD cannot handle this and panicks. This was fixed in later FreeBSD releases by handling -ETIME, but still the degradation in timer performance is a big issue. Performance regression: Many benchmarks see a performance drop when having a large core count. I have done a Geekbench on a 32-vCPU guest. perf drop old new single 0.04% 0.18% multi 2.43% 0.08% Removing the per-domain lock in the mapcache brings the multi-core performance almost identical to using the direct map for mappings. There should be room for futher optimisations, but this already improves over the old mapcache by a lot. Note that entries in the maphash will occupy inuse slots. With 16 slots per vCPU and a maphash capacity of 8, we only have another 8 available, which is not enough for nested page table walks. We need to increase the number of slots in config.h. Signed-off-by: Hongyan Xia <hongyxia@amazon.com> --- xen/arch/x86/domain.c | 5 +- xen/arch/x86/domain_page.c | 229 +++++++++++------------------------ xen/include/asm-x86/config.h | 2 +- xen/include/asm-x86/domain.h | 30 +---- 4 files changed, 80 insertions(+), 186 deletions(-)