Message ID | 20241108054831.2094883-3-costa.shul@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,v1] blk-mq: isolate CPUs from hctx | expand |
On Fri, Nov 08, 2024 at 07:48:30AM +0200, Costa Shulyupin wrote: > The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full" > boot command line options, are used at boot time to exclude selected > CPUs from running some kernel housekeeping subsystems to minimize > disturbance to latency sensitive userspace applications such as DPDK. > This options can only be changed with a reboot. This is a problem for > containerized workloads running on OpenShift/Kubernetes where a > mix of low latency and "normal" workloads can be created/destroyed > dynamically and the number of CPUs allocated to each workload is often > not known at boot time. > > Cgroups allow configuring isolated_cpus at runtime. > However, blk-mq may still use managed interrupts on the > newly isolated CPUs. > > Rebuild hctx->cpumask considering isolated CPUs to avoid > managed interrupts on those CPUs and reclaim non-isolated ones. As far I understand this doesn't address the issue that the drivers need also to be aware of isolcpu mask changes. That means even though the cpumask is updated in the block layer, the driver doesn't know about it and still runs on the isolated CPUs.
Hello. On Fri, Nov 08, 2024 at 07:48:30AM GMT, Costa Shulyupin <costa.shul@redhat.com> wrote: > Cgroups allow configuring isolated_cpus at runtime. > However, blk-mq may still use managed interrupts on the > newly isolated CPUs. > > Rebuild hctx->cpumask considering isolated CPUs to avoid > managed interrupts on those CPUs and reclaim non-isolated ones. > > The patch is based on > isolation: Exclude dynamically isolated CPUs from housekeeping masks: > https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/ Even based on that this seems incomplete to me the CPUs that are part of isolcpus mask on boot time won't be excluded from this? IOW, isolating CPUs from blk_mq_hw_ctx would only be possible via cpuset but not "statically" throught the cmdline option, or would it? Thanks, Michal (-Cc: lizefan.x@bytedance.com)
On 11/15/24 10:45 AM, Michal Koutný wrote: > Hello. > > On Fri, Nov 08, 2024 at 07:48:30AM GMT, Costa Shulyupin <costa.shul@redhat.com> wrote: >> Cgroups allow configuring isolated_cpus at runtime. >> However, blk-mq may still use managed interrupts on the >> newly isolated CPUs. >> >> Rebuild hctx->cpumask considering isolated CPUs to avoid >> managed interrupts on those CPUs and reclaim non-isolated ones. >> >> The patch is based on >> isolation: Exclude dynamically isolated CPUs from housekeeping masks: >> https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/ > Even based on that this seems incomplete to me the CPUs that are part of > isolcpus mask on boot time won't be excluded from this? > IOW, isolating CPUs from blk_mq_hw_ctx would only be possible via cpuset > but not "statically" throught the cmdline option, or would it? The cpuset had already been changed to take note of the statically isolated CPUs and included them in its consideration. They are recorded in the boot_hk_cpus mask. It relies on the fact that most users will set both nohz_full and isolcpus boot parameters. If only nohz_full is set without isolcpus, it will not be recorded. Cheers, Longman
Hello Michal. Isolation of CPUs from blk_mq_hw_ctx during boot is already handled on call hierarchy: ... nvme_probe() nvme_alloc_admin_tag_set() blk_mq_alloc_queue() blk_mq_init_allocated_queue() blk_mq_map_swqueue() blk_mq_map_swqueue() performs: for_each_cpu(cpu, hctx->cpumask) { if (cpu_is_isolated(cpu)) cpumask_clear_cpu(cpu, hctx->cpumask); } static inline bool cpu_is_isolated(int cpu) { return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || !housekeeping_test_cpu(cpu, HK_TYPE_TICK) || cpuset_cpu_is_isolated(cpu); } cpu_is_isolated() is introduced by 3232e7aad11e5. Thanks, Costa On Fri, 15 Nov 2024 at 17:45, Michal Koutný <mkoutny@suse.com> wrote: > > Hello. > > On Fri, Nov 08, 2024 at 07:48:30AM GMT, Costa Shulyupin <costa.shul@redhat.com> wrote: > > Cgroups allow configuring isolated_cpus at runtime. > > However, blk-mq may still use managed interrupts on the > > newly isolated CPUs. > > > > Rebuild hctx->cpumask considering isolated CPUs to avoid > > managed interrupts on those CPUs and reclaim non-isolated ones. > > > > The patch is based on > > isolation: Exclude dynamically isolated CPUs from housekeeping masks: > > https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/ > > Even based on that this seems incomplete to me the CPUs that are part of > isolcpus mask on boot time won't be excluded from this? > IOW, isolating CPUs from blk_mq_hw_ctx would only be possible via cpuset > but not "statically" throught the cmdline option, or would it? > > Thanks, > Michal > > (-Cc: lizefan.x@bytedance.com)
On 11/15/24 3:25 PM, Costa Shulyupin wrote: > Hello Michal. > > Isolation of CPUs from blk_mq_hw_ctx during boot is already handled on > call hierarchy: > ... > nvme_probe() > nvme_alloc_admin_tag_set() > blk_mq_alloc_queue() > blk_mq_init_allocated_queue() > blk_mq_map_swqueue() > > blk_mq_map_swqueue() performs: > for_each_cpu(cpu, hctx->cpumask) { > if (cpu_is_isolated(cpu)) > cpumask_clear_cpu(cpu, hctx->cpumask); > } > > static inline bool cpu_is_isolated(int cpu) > { > return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || > !housekeeping_test_cpu(cpu, HK_TYPE_TICK) || > cpuset_cpu_is_isolated(cpu); > } cpuset_cpu_is_isolated() can be removed once the cpumasks can be changed dynamically. Cheers, Longman
diff --git a/block/blk-mq.c b/block/blk-mq.c index 12ee37986331..d5786b953d17 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4145,6 +4145,36 @@ static void blk_mq_map_swqueue(struct request_queue *q) } } +/** + * blk_mq_isolate_cpus() - rebuild hctx->cpumask considering isolated CPUs + * to avoid managed interrupts on those CPUs. + */ + +void blk_mq_isolate_cpus(const struct cpumask *isolcpus) +{ + struct class_dev_iter iter; + struct device *dev; + + class_dev_iter_init(&iter, &block_class, NULL, &disk_type); + while ((dev = class_dev_iter_next(&iter))) { + struct request_queue *q = bdev_get_queue(dev_to_bdev(dev)); + struct blk_mq_hw_ctx *hctx; + unsigned long i; + + if (!queue_is_mq(q)) + continue; + + blk_mq_map_swqueue(q); + /* + * Postcondition: + * cpumask must not intersect with isolated CPUs. + */ + queue_for_each_hw_ctx(q, hctx, i) + WARN_ON_ONCE(cpumask_intersects(hctx->cpumask, isolcpus)); + } + class_dev_iter_exit(&iter); +} + /* * Caller needs to ensure that we're either frozen/quiesced, or that * the queue isn't live yet. diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2035fad3131f..a1f57b5ad46d 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -924,6 +924,7 @@ void blk_freeze_queue_start_non_owner(struct request_queue *q); void blk_mq_map_queues(struct blk_mq_queue_map *qmap); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); +void blk_mq_isolate_cpus(const struct cpumask *isolcpus); void blk_mq_quiesce_queue_nowait(struct request_queue *q); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 5066397899c9..cad17f3f3315 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -41,6 +41,7 @@ #include <linux/sched/isolation.h> #include <linux/wait.h> #include <linux/workqueue.h> +#include <linux/blk-mq.h> #undef pr_fmt #define pr_fmt(fmt) "%s:%d: %s " fmt, __FILE__, __LINE__, __func__ @@ -1317,6 +1318,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated) return; ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS); WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP)); + blk_mq_isolate_cpus(isolated_cpus); } /**
The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full" boot command line options, are used at boot time to exclude selected CPUs from running some kernel housekeeping subsystems to minimize disturbance to latency sensitive userspace applications such as DPDK. This options can only be changed with a reboot. This is a problem for containerized workloads running on OpenShift/Kubernetes where a mix of low latency and "normal" workloads can be created/destroyed dynamically and the number of CPUs allocated to each workload is often not known at boot time. Cgroups allow configuring isolated_cpus at runtime. However, blk-mq may still use managed interrupts on the newly isolated CPUs. Rebuild hctx->cpumask considering isolated CPUs to avoid managed interrupts on those CPUs and reclaim non-isolated ones. The patch is based on isolation: Exclude dynamically isolated CPUs from housekeeping masks: https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/ Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> --- block/blk-mq.c | 30 ++++++++++++++++++++++++++++++ include/linux/blk-mq.h | 1 + kernel/cgroup/cpuset.c | 2 ++ 3 files changed, 33 insertions(+)