Message ID | 20230719092838.2302-1-yangyicong@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | sched/fair: Scan cluster before scanning LLC in wake-up path | expand |
Hi Peter, A gentle ping for this. Any further comment? Thanks. On 2023/7/19 17:28, Yicong Yang wrote: > From: Yicong Yang <yangyicong@hisilicon.com> > > This is the follow-up work to support cluster scheduler. Previously > we have added cluster level in the scheduler for both ARM64[1] and > X86[2] to support load balance between clusters to bring more memory > bandwidth and decrease cache contention. This patchset, on the other > hand, takes care of wake-up path by giving CPUs within the same cluster > a try before scanning the whole LLC to benefit those tasks communicating > with each other. > > [1] 778c558f49a2 ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") > [2] 66558b730f25 ("sched: Add cluster scheduler level for x86") > > Since we're using sd->groups->flags to determine a cluster, core should ensure > the flags set correctly on domain generation. This is done by [*]. > > [*] https://lore.kernel.org/all/20230713013133.2314153-1-yu.c.chen@intel.com/ > > Change since v8: > - Peter find cpus_share_lowest_cache() is weired so fallback to cpus_share_resources() > suggested in v4 > - Use sd->groups->flags to find the cluster when scanning, save one per-cpu pointer > - Fix sched_cluster_active enabled incorrectly on domain degeneration > - Use sched_cluster_active to avoid repeated check on non-cluster machines, per Gautham > Link: https://lore.kernel.org/all/20230530070253.33306-1-yangyicong@huawei.com/ > > Change since v7: > - Optimize by choosing prev_cpu/recent_used_cpu when possible after failed to > scanning for an idle CPU in cluster/LLC. Thanks Chen Yu for testing on Jacobsville > Link: https://lore.kernel.org/all/20220915073423.25535-1-yangyicong@huawei.com/ > > Change for RESEND: > - Collect tag from Chen Yu and rebase on the latest tip/sched/core. Thanks. > Link: https://lore.kernel.org/lkml/20220822073610.27205-1-yangyicong@huawei.com/ > > Change since v6: > - rebase on 6.0-rc1 > Link: https://lore.kernel.org/lkml/20220726074758.46686-1-yangyicong@huawei.com/ > > Change since v5: > - Improve patch 2 according to Peter's suggestion: > - use sched_cluster_active to indicate whether cluster is active > - consider SMT case and use wrap iteration when scanning cluster > - Add Vincent's tag > Thanks. > Link: https://lore.kernel.org/lkml/20220720081150.22167-1-yangyicong@hisilicon.com/ > > Change since v4: > - rename cpus_share_resources to cpus_share_lowest_cache to be more informative, per Tim > - return -1 when nr==0 in scan_cluster(), per Abel > Thanks! > Link: https://lore.kernel.org/lkml/20220609120622.47724-1-yangyicong@hisilicon.com/ > > Change since v3: > - fix compile error when !CONFIG_SCHED_CLUSTER, reported by lkp test. > Link: https://lore.kernel.org/lkml/20220608095758.60504-1-yangyicong@hisilicon.com/ > > Change since v2: > - leverage SIS_PROP to suspend redundant scanning when LLC is overloaded > - remove the ping-pong suppression > - address the comment from Tim, thanks. > Link: https://lore.kernel.org/lkml/20220126080947.4529-1-yangyicong@hisilicon.com/ > > Change since v1: > - regain the performance data based on v5.17-rc1 > - rename cpus_share_cluster to cpus_share_resources per Vincent and Gautham, thanks! > Link: https://lore.kernel.org/lkml/20211215041149.73171-1-yangyicong@hisilicon.com/ > > Barry Song (2): > sched: Add cpus_share_resources API > sched/fair: Scan cluster before scanning LLC in wake-up path > > include/linux/sched/sd_flags.h | 7 ++++ > include/linux/sched/topology.h | 8 ++++- > kernel/sched/core.c | 12 +++++++ > kernel/sched/fair.c | 59 +++++++++++++++++++++++++++++++--- > kernel/sched/sched.h | 2 ++ > kernel/sched/topology.c | 25 ++++++++++++++ > 6 files changed, 107 insertions(+), 6 deletions(-) >
From: Yicong Yang <yangyicong@hisilicon.com> This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler for both ARM64[1] and X86[2] to support load balance between clusters to bring more memory bandwidth and decrease cache contention. This patchset, on the other hand, takes care of wake-up path by giving CPUs within the same cluster a try before scanning the whole LLC to benefit those tasks communicating with each other. [1] 778c558f49a2 ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") [2] 66558b730f25 ("sched: Add cluster scheduler level for x86") Since we're using sd->groups->flags to determine a cluster, core should ensure the flags set correctly on domain generation. This is done by [*]. [*] https://lore.kernel.org/all/20230713013133.2314153-1-yu.c.chen@intel.com/ Change since v8: - Peter find cpus_share_lowest_cache() is weired so fallback to cpus_share_resources() suggested in v4 - Use sd->groups->flags to find the cluster when scanning, save one per-cpu pointer - Fix sched_cluster_active enabled incorrectly on domain degeneration - Use sched_cluster_active to avoid repeated check on non-cluster machines, per Gautham Link: https://lore.kernel.org/all/20230530070253.33306-1-yangyicong@huawei.com/ Change since v7: - Optimize by choosing prev_cpu/recent_used_cpu when possible after failed to scanning for an idle CPU in cluster/LLC. Thanks Chen Yu for testing on Jacobsville Link: https://lore.kernel.org/all/20220915073423.25535-1-yangyicong@huawei.com/ Change for RESEND: - Collect tag from Chen Yu and rebase on the latest tip/sched/core. Thanks. Link: https://lore.kernel.org/lkml/20220822073610.27205-1-yangyicong@huawei.com/ Change since v6: - rebase on 6.0-rc1 Link: https://lore.kernel.org/lkml/20220726074758.46686-1-yangyicong@huawei.com/ Change since v5: - Improve patch 2 according to Peter's suggestion: - use sched_cluster_active to indicate whether cluster is active - consider SMT case and use wrap iteration when scanning cluster - Add Vincent's tag Thanks. Link: https://lore.kernel.org/lkml/20220720081150.22167-1-yangyicong@hisilicon.com/ Change since v4: - rename cpus_share_resources to cpus_share_lowest_cache to be more informative, per Tim - return -1 when nr==0 in scan_cluster(), per Abel Thanks! Link: https://lore.kernel.org/lkml/20220609120622.47724-1-yangyicong@hisilicon.com/ Change since v3: - fix compile error when !CONFIG_SCHED_CLUSTER, reported by lkp test. Link: https://lore.kernel.org/lkml/20220608095758.60504-1-yangyicong@hisilicon.com/ Change since v2: - leverage SIS_PROP to suspend redundant scanning when LLC is overloaded - remove the ping-pong suppression - address the comment from Tim, thanks. Link: https://lore.kernel.org/lkml/20220126080947.4529-1-yangyicong@hisilicon.com/ Change since v1: - regain the performance data based on v5.17-rc1 - rename cpus_share_cluster to cpus_share_resources per Vincent and Gautham, thanks! Link: https://lore.kernel.org/lkml/20211215041149.73171-1-yangyicong@hisilicon.com/ Barry Song (2): sched: Add cpus_share_resources API sched/fair: Scan cluster before scanning LLC in wake-up path include/linux/sched/sd_flags.h | 7 ++++ include/linux/sched/topology.h | 8 ++++- kernel/sched/core.c | 12 +++++++ kernel/sched/fair.c | 59 +++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 2 ++ kernel/sched/topology.c | 25 ++++++++++++++ 6 files changed, 107 insertions(+), 6 deletions(-)