Message ID | 1473373615-51427-6-git-send-email-srinivas.pandruvada@linux.intel.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Thu, 8 Sep 2016, Srinivas Pandruvada wrote: > + > +DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority); > +static DEFINE_MUTEX(itmt_update_mutex); > + > +static unsigned int zero; > +static unsigned int one = 1; Please stick that close to the ctl_table so it gets obvious why we need this. My first reaction to this was: WTF! > +/* > + * Boolean to control whether we want to move processes to cpu capable > + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max > + * Technology 3.0. > + * > + * It can be set via /proc/sys/kernel/sched_itmt_enabled > + */ > +unsigned int __read_mostly sysctl_sched_itmt_enabled; Please put the sysctl into a seperate patch and document the sysctl at the proper place. > + > +/* > + * The pstate_driver calls set_sched_itmt to indicate if the system > + * is ITMT capable. > + */ > +static bool __read_mostly sched_itmt_capable; > + > +int arch_asym_cpu_priority(int cpu) > +{ > + return per_cpu(sched_core_priority, cpu); > +} > + The ordering or your functions is completely random and makes not sense whatsoever. > +/* Called with itmt_update_mutex lock held */ > +static void enable_sched_itmt(bool enable_itmt) > +{ > + sysctl_sched_itmt_enabled = enable_itmt; > + x86_topology_update = true; > + rebuild_sched_domains(); So you rebuild that even if the state of the sysctl did not change, > +} > + > +static int sched_itmt_update_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + int ret; > + > + mutex_lock(&itmt_update_mutex); > + > + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > + > + if (ret || !write) { > + mutex_unlock(&itmt_update_mutex); > + return ret; > + } > + > + enable_sched_itmt(sysctl_sched_itmt_enabled); So you hand in sysctl_sched_itmt_enabled so that enable_sched_itmt() can store that value in sysctl_sched_itmt_enabled. Brilliant stuff that. > +static struct ctl_table_header *itmt_sysctl_header; > + > +/* > + * The boot code will find out the max boost frequency > + * and call this function to set a priority proportional > + * to the max boost frequency. CPU with higher boost > + * frequency will receive higher priority. > + */ > +void sched_set_itmt_core_prio(int prio, int core_cpu) > +{ > + int cpu, i = 1; > + > + for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) { > + int smt_prio; > + > + /* > + * Discount the priority of sibling so that we don't > + * pack all loads to the same core before using other cores. -EMAKESNOSENSE Is this a winter or summer sale where we get a discount? 10% or 50% or what? The math here is: smt_prio = prio * num_siblings / sibling_idx; The purpose of this is to ensure that the siblings are moved to the end of the priority chain and therefor only used when all other high priority cpus are out of capacity. > + */ > + smt_prio = prio * smp_num_siblings / i; > + i++; > + per_cpu(sched_core_priority, cpu) = smt_prio; > + } > +} > + > +/* > + * During boot up, boot code will detect if the system > + * is ITMT capable and call set_sched_itmt. Groan. In the comment above the declaration of sched_itmt_capable you write: > + * The pstate_driver calls set_sched_itmt to indicate if the system > + * is ITMT capable. So what is calling this? Boot code or pstate driver or something else? > + * > + * This should be called after sched_set_itmt_core_prio > + * has been called to set the cpus' priorities. should? So the call order is optional and this is just a recommendation. > + * > + * This function should be called without cpu hot plug lock > + * as we need to acquire the lock to rebuild sched domains > + * later. Ditto. Must not be called with cpu hotplug lock held. > + */ > +void set_sched_itmt(bool itmt_capable) This function can be called more than once and it can be called to anable and disable the feature, right? So again: instead of incoherent comments above the function describe the calling conventions and what the function does proper. > +{ > + mutex_lock(&itmt_update_mutex); > + > + if (itmt_capable != sched_itmt_capable) { If you use a goto out_unlock here, then you spare one indentation level and the annoying line breaks. > + > + if (itmt_capable) { > + itmt_sysctl_header = > + register_sysctl_table(itmt_root_table); What if this fails? Error handling is overrated ... > + /* > + * ITMT capability automatically enables ITMT > + * scheduling for client systems (single node). Why? Just because it can and just because something thinks that it's a good idea? There are server systems with single node as well.... > + */ > + if (topology_num_packages() == 1) > + sysctl_sched_itmt_enabled = 1; > + } else { > + if (itmt_sysctl_header) > + unregister_sysctl_table(itmt_sysctl_header); And what clears imt_sysctl_header ? > + sysctl_sched_itmt_enabled = 0; > + } > + > + sched_itmt_capable = itmt_capable; > + x86_topology_update = true; > + rebuild_sched_domains(); So another place where you call that unconditionally. If the sysctl is disabled then this is completely pointless. What about a single function which is called from the sysctl write and from this code, which does the update and rebuild only if something has changed? > + } > + > + mutex_unlock(&itmt_update_mutex); > +} > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 737b9edf..17f3ac7 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly; > /* Maximum number of SMT threads on any online core */ > int __max_smt_threads __read_mostly; > > -unsigned int __read_mostly sysctl_sched_itmt_enabled; So if you wouldn't have added this in the first place then you would not have to remove it. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2016-09-10 at 18:21 +0200, Thomas Gleixner wrote: > On Thu, 8 Sep 2016, Srinivas Pandruvada wrote: > > > > + > > +DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority); > > +static DEFINE_MUTEX(itmt_update_mutex); > > + > > +static unsigned int zero; > > +static unsigned int one = 1; > Please stick that close to the ctl_table so it gets obvious why we need > this. My first reaction to this was: WTF! Will do. > > > > > +/* > > + * Boolean to control whether we want to move processes to cpu capable > > + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max > > + * Technology 3.0. > > + * > > + * It can be set via /proc/sys/kernel/sched_itmt_enabled > > + */ > > +unsigned int __read_mostly sysctl_sched_itmt_enabled; > Please put the sysctl into a seperate patch and document the sysctl at the > proper place. > Okay, I can move the sysctl related code to a separate patch. > > > > + > > +/* > > + * The pstate_driver calls set_sched_itmt to indicate if the system > > + * is ITMT capable. > > + */ > > +static bool __read_mostly sched_itmt_capable; > > + > > +int arch_asym_cpu_priority(int cpu) > > +{ > > + return per_cpu(sched_core_priority, cpu); > > +} > > + > The ordering or your functions is completely random and makes not sense > whatsoever. Okay, I'll put it at a more logical place. > > > > > +/* Called with itmt_update_mutex lock held */ > > +static void enable_sched_itmt(bool enable_itmt) > > +{ > > + sysctl_sched_itmt_enabled = enable_itmt; > > + x86_topology_update = true; > > + rebuild_sched_domains(); > So you rebuild that even if the state of the sysctl did not change, Will check for change in updated patch. > > > > > +} > > + > > +static int sched_itmt_update_handler(struct ctl_table *table, int write, > > + void __user *buffer, size_t *lenp, loff_t *ppos) > > +{ > > + int ret; > > + > > + mutex_lock(&itmt_update_mutex); > > + > > + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > > + > > + if (ret || !write) { > > + mutex_unlock(&itmt_update_mutex); > > + return ret; > > + } > > + > > + enable_sched_itmt(sysctl_sched_itmt_enabled); > So you hand in sysctl_sched_itmt_enabled so that enable_sched_itmt() can > store that value in sysctl_sched_itmt_enabled. Brilliant stuff that. Will merge enable_sched_itmt into sched_itmt_update_handler. > > > > > +static struct ctl_table_header *itmt_sysctl_header; > > + > > +/* > > + * The boot code will find out the max boost frequency > > + * and call this function to set a priority proportional > > + * to the max boost frequency. CPU with higher boost > > + * frequency will receive higher priority. > > + */ > > +void sched_set_itmt_core_prio(int prio, int core_cpu) > > +{ > > + int cpu, i = 1; > > + > > + for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) { > > + int smt_prio; > > + > > + /* > > + * Discount the priority of sibling so that we don't > > + * pack all loads to the same core before using other cores. > -EMAKESNOSENSE > > Is this a winter or summer sale where we get a discount? 10% or 50% or what? > > The math here is: > > smt_prio = prio * num_siblings / sibling_idx; > > The purpose of this is to ensure that the siblings are moved to the end of > the priority chain and therefor only used when all other high priority > cpus are out of capacity. Will update comment here. > > > > > + */ > > + smt_prio = prio * smp_num_siblings / i; > > + i++; > > + per_cpu(sched_core_priority, cpu) = smt_prio; > > + } > > +} > > + > > +/* > > + * During boot up, boot code will detect if the system > > + * is ITMT capable and call set_sched_itmt. > Groan. In the comment above the declaration of sched_itmt_capable you > write: > > > > > + * The pstate_driver calls set_sched_itmt to indicate if the system > > + * is ITMT capable. > So what is calling this? Boot code or pstate driver or something else? Will just say pstate driver. > > > > > + * > > + * This should be called after sched_set_itmt_core_prio > > + * has been called to set the cpus' priorities. > should? So the call order is optional and this is just a recommendation. > > > > > + * > > + * This function should be called without cpu hot plug lock > > + * as we need to acquire the lock to rebuild sched domains > > + * later. > Ditto. Must not be called with cpu hotplug lock held. Okay. > > > > > + */ > > +void set_sched_itmt(bool itmt_capable) > This function can be called more than once and it can be called to > anable and disable the feature, right? > So again: instead of incoherent comments above the function describe the > calling conventions and what the function does proper. Will add that. > > > > > +{ > > + mutex_lock(&itmt_update_mutex); > > + > > + if (itmt_capable != sched_itmt_capable) { > If you use a goto out_unlock here, then you spare one indentation level and > the annoying line breaks. Will do. > > > > > + > > + if (itmt_capable) { > > + itmt_sysctl_header = > > + register_sysctl_table(itmt_root_table); > What if this fails? Error handling is overrated ... Will add code to disable itmt totally then if we can register sysctl table. > > > > > + /* > > + * ITMT capability automatically enables ITMT > > + * scheduling for client systems (single node). > Why? Just because it can and just because something thinks that it's a good > idea? There are server systems with single node as well.... Using the single node as a decision criteria is a compromise on guessing what the usage scenario is likely to be. For lightly loaded client system, this feature will help to boost performance. But on very busy server system, it is unlikely to operate in turbo mode for this feature to help. So in a previous discussion thread with Ingo and Peter Z, we decided that we will use the number of node as a criteria to turn this feature on by default. > > > > > + */ > > + if (topology_num_packages() == 1) > > + sysctl_sched_itmt_enabled = 1; > > + } else { > > + if (itmt_sysctl_header) > > + unregister_sysctl_table(itmt_sysctl_header); > And what clears imt_sysctl_header ? > > > > > + sysctl_sched_itmt_enabled = 0; > > + } > > + > > + sched_itmt_capable = itmt_capable; > > + x86_topology_update = true; > > + rebuild_sched_domains(); > So another place where you call that unconditionally. If the sysctl is > disabled then this is completely pointless. Okay, will only rebuild sched domains here when sysctl_sched_itmt_enabled is true. > > What about a single function which is called from the sysctl write and from > this code, which does the update and rebuild only if something has changed? Sure. > > > > > + } > > + > > + mutex_unlock(&itmt_update_mutex); > > +} > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index 737b9edf..17f3ac7 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly; > > /* Maximum number of SMT threads on any online core */ > > int __max_smt_threads __read_mostly; > > > > -unsigned int __read_mostly sysctl_sched_itmt_enabled; > So if you wouldn't have added this in the first place then you would not > have to remove it. > > Thanks, > > tglx > > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2a1f0ce..6dfb97d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -927,6 +927,15 @@ config SCHED_MC making when dealing with multi-core CPU chips at a cost of slightly increased overhead in some places. If unsure say N here. +config SCHED_ITMT + bool "Intel Turbo Boost Max Technology (ITMT) scheduler support" + depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE + ---help--- + ITMT enabled scheduler support improves the CPU scheduler's decision + to move tasks to cpu core that can be boosted to a higher frequency + than others. It will have better performance at a cost of slightly + increased overhead in task migrations. If unsure say N here. + source "kernel/Kconfig.preempt" config UP_LATE_INIT diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 8d6df77..ac86a0b 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -150,7 +150,25 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources); extern bool x86_topology_update; #ifdef CONFIG_SCHED_ITMT +#include <asm/percpu.h> + +DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority); extern unsigned int __read_mostly sysctl_sched_itmt_enabled; + +/* Interface to set priority of a cpu */ +void sched_set_itmt_core_prio(int prio, int core_cpu); + +/* Interface to notify scheduler that system supports ITMT */ +void set_sched_itmt(bool support_itmt); + +#else /* CONFIG_SCHED_ITMT */ + +static inline void set_sched_itmt(bool support_itmt) +{ +} +static inline void sched_set_itmt_core_prio(int prio, int core_cpu) +{ +} #endif /* CONFIG_SCHED_ITMT */ #endif /* _ASM_X86_TOPOLOGY_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 0503f5b..2008335 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o +obj-$(CONFIG_SCHED_ITMT) += itmt.o ### # 64 bit specific files diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c new file mode 100644 index 0000000..93f9316 --- /dev/null +++ b/arch/x86/kernel/itmt.c @@ -0,0 +1,164 @@ +/* + * itmt.c: Functions and data structures for enabling + * scheduler to favor scheduling on cores that + * can be boosted to a higher frequency using + * Intel Turbo Boost Max Technology 3.0 + * + * (C) Copyright 2016 Intel Corporation + * Author: Tim Chen <tim.c.chen@linux.intel.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include <linux/sched.h> +#include <linux/cpumask.h> +#include <linux/cpuset.h> +#include <asm/mutex.h> +#include <linux/sched.h> +#include <linux/sysctl.h> +#include <linux/nodemask.h> + +DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority); +static DEFINE_MUTEX(itmt_update_mutex); + +static unsigned int zero; +static unsigned int one = 1; + +/* + * Boolean to control whether we want to move processes to cpu capable + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max + * Technology 3.0. + * + * It can be set via /proc/sys/kernel/sched_itmt_enabled + */ +unsigned int __read_mostly sysctl_sched_itmt_enabled; + +/* + * The pstate_driver calls set_sched_itmt to indicate if the system + * is ITMT capable. + */ +static bool __read_mostly sched_itmt_capable; + +int arch_asym_cpu_priority(int cpu) +{ + return per_cpu(sched_core_priority, cpu); +} + +/* Called with itmt_update_mutex lock held */ +static void enable_sched_itmt(bool enable_itmt) +{ + sysctl_sched_itmt_enabled = enable_itmt; + x86_topology_update = true; + rebuild_sched_domains(); +} + +static int sched_itmt_update_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + + mutex_lock(&itmt_update_mutex); + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + if (ret || !write) { + mutex_unlock(&itmt_update_mutex); + return ret; + } + + enable_sched_itmt(sysctl_sched_itmt_enabled); + + mutex_unlock(&itmt_update_mutex); + + return ret; +} + +static struct ctl_table itmt_kern_table[] = { + { + .procname = "sched_itmt_enabled", + .data = &sysctl_sched_itmt_enabled, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = sched_itmt_update_handler, + .extra1 = &zero, + .extra2 = &one, + }, + {} +}; + +static struct ctl_table itmt_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = itmt_kern_table, + }, + {} +}; + +static struct ctl_table_header *itmt_sysctl_header; + +/* + * The boot code will find out the max boost frequency + * and call this function to set a priority proportional + * to the max boost frequency. CPU with higher boost + * frequency will receive higher priority. + */ +void sched_set_itmt_core_prio(int prio, int core_cpu) +{ + int cpu, i = 1; + + for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) { + int smt_prio; + + /* + * Discount the priority of sibling so that we don't + * pack all loads to the same core before using other cores. + */ + smt_prio = prio * smp_num_siblings / i; + i++; + per_cpu(sched_core_priority, cpu) = smt_prio; + } +} + +/* + * During boot up, boot code will detect if the system + * is ITMT capable and call set_sched_itmt. + * + * This should be called after sched_set_itmt_core_prio + * has been called to set the cpus' priorities. + * + * This function should be called without cpu hot plug lock + * as we need to acquire the lock to rebuild sched domains + * later. + */ +void set_sched_itmt(bool itmt_capable) +{ + mutex_lock(&itmt_update_mutex); + + if (itmt_capable != sched_itmt_capable) { + + if (itmt_capable) { + itmt_sysctl_header = + register_sysctl_table(itmt_root_table); + /* + * ITMT capability automatically enables ITMT + * scheduling for client systems (single node). + */ + if (topology_num_packages() == 1) + sysctl_sched_itmt_enabled = 1; + } else { + if (itmt_sysctl_header) + unregister_sysctl_table(itmt_sysctl_header); + sysctl_sched_itmt_enabled = 0; + } + + sched_itmt_capable = itmt_capable; + x86_topology_update = true; + rebuild_sched_domains(); + } + + mutex_unlock(&itmt_update_mutex); +} diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 737b9edf..17f3ac7 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly; /* Maximum number of SMT threads on any online core */ int __max_smt_threads __read_mostly; -unsigned int __read_mostly sysctl_sched_itmt_enabled; /* Flag to indicate if a complete sched domain rebuild is required */ bool x86_topology_update;