Message ID | 20190329150934.17694-45-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen: add core scheduling support | expand |
On 29/03/2019 15:09, Juergen Gross wrote: > Make sure the number of vcpus is always a multiple of the scheduling > granularity. Note that we don't support a scheduling granularity above > one on ARM. I'm afraid that I don't think this is a clever move. In turn, this brings into question the approach to idle handling. Firstly, with a proposed socket granularity, this would be 128 on some systems which exist today. Furthermore, consider the case where cpupool0 has a granularity of 1, and a second pool has a granularity of 2. A domain can be created with an odd number of vcpus and operate in pool0 fine, but can't now be moved to pool1. If at all possible, I think it would be better to try and reuse the idle cpus for holes like this. Seeing as you've been playing with this code a lot, what is your assessment? ~Andrew
On 01/04/2019 10:50, Andrew Cooper wrote: > On 29/03/2019 15:09, Juergen Gross wrote: >> Make sure the number of vcpus is always a multiple of the scheduling >> granularity. Note that we don't support a scheduling granularity above >> one on ARM. > > I'm afraid that I don't think this is a clever move. In turn, this > brings into question the approach to idle handling. > > Firstly, with a proposed socket granularity, this would be 128 on some > systems which exist today. Furthermore, consider the case where > cpupool0 has a granularity of 1, and a second pool has a granularity of > 2. A domain can be created with an odd number of vcpus and operate in > pool0 fine, but can't now be moved to pool1. For now granularity is the same for all pools, but I plan to enhance that in future. The answer to that problem might be either to allow for later addition of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed number), or to really disallow moving such a domain between pools. > If at all possible, I think it would be better to try and reuse the idle > cpus for holes like this. Seeing as you've been playing with this code > a lot, what is your assessment? This would be rather complicated. I'd either need to switch vcpus dynamically in schedule items, or I'd need to special case the idle vcpus in _lots_ of places. Juergen
On 01/04/2019 11:47, Juergen Gross wrote: > On 01/04/2019 10:50, Andrew Cooper wrote: >> On 29/03/2019 15:09, Juergen Gross wrote: >>> Make sure the number of vcpus is always a multiple of the scheduling >>> granularity. Note that we don't support a scheduling granularity above >>> one on ARM. >> >> I'm afraid that I don't think this is a clever move. In turn, this >> brings into question the approach to idle handling. >> >> Firstly, with a proposed socket granularity, this would be 128 on some >> systems which exist today. Furthermore, consider the case where >> cpupool0 has a granularity of 1, and a second pool has a granularity of >> 2. A domain can be created with an odd number of vcpus and operate in >> pool0 fine, but can't now be moved to pool1. > > For now granularity is the same for all pools, but I plan to enhance > that in future. > > The answer to that problem might be either to allow for later addition > of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed > number), or to really disallow moving such a domain between pools. > >> If at all possible, I think it would be better to try and reuse the idle >> cpus for holes like this. Seeing as you've been playing with this code >> a lot, what is your assessment? > > This would be rather complicated. I'd either need to switch vcpus > dynamically in schedule items, or I'd need to special case the idle > vcpus in _lots_ of places. I have thought more about this and maybe I have found a way to make that less intrusive as I thought in the beginning. I'll give it a try... Juergen
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index 77b5646424..76a81dd4a9 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -258,6 +258,7 @@ unsigned int __init dom0_max_vcpus(void) max_vcpus = opt_dom0_max_vcpus_min; if ( opt_dom0_max_vcpus_max < max_vcpus ) max_vcpus = opt_dom0_max_vcpus_max; + max_vcpus = sched_max_vcpus(max_vcpus); limit = dom0_pvh ? HVM_MAX_VCPUS : MAX_VIRT_CPUS; if ( max_vcpus > limit ) max_vcpus = limit; diff --git a/xen/common/domain.c b/xen/common/domain.c index b448d20d40..d338a2204c 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -290,6 +290,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) return -EINVAL; } + config->max_vcpus = sched_max_vcpus(config->max_vcpus); if ( config->max_vcpus < 1 ) { dprintk(XENLOG_INFO, "No vCPUS\n"); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index ccde1ba706..80837a2a5e 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -542,6 +542,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { unsigned int i, max = op->u.max_vcpus.max; + max = sched_max_vcpus(max); ret = -EINVAL; if ( (d == current->domain) || /* no domain_pause() */ (max != d->max_vcpus) ) /* max_vcpus set up in createdomain */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 52a1abfca9..314a453a60 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -490,6 +490,11 @@ extern struct vcpu *idle_vcpu[NR_CPUS]; extern unsigned int sched_granularity; +static inline unsigned int sched_max_vcpus(unsigned int n_vcpus) +{ + return DIV_ROUND_UP(n_vcpus, sched_granularity) * sched_granularity; +} + static inline bool is_system_domain(const struct domain *d) { return d->domain_id >= DOMID_FIRST_RESERVED;
Make sure the number of vcpus is always a multiple of the scheduling granularity. Note that we don't support a scheduling granularity above one on ARM. Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/arch/x86/dom0_build.c | 1 + xen/common/domain.c | 1 + xen/common/domctl.c | 1 + xen/include/xen/sched.h | 5 +++++ 4 files changed, 8 insertions(+)