Message ID | 20230608114326.27649-1-hejunhao3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drivers/perf: hisi: Don't migrate perf to the CPU going to teardown | expand |
On 2023/6/8 19:43, Junhao He wrote: > The driver needs to migrate the perf context if the current using CPU going > to teardown. By the time calling the cpuhp::teardown() callback the > cpu_online_mask() hasn't updated yet and still includes the CPU going to > teardown. In current driver's implementation we may migrate the context > to the teardown CPU and leads to the below calltrace: > > ... > [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 > [ 368.113699][ T932] Call trace: > [ 368.116834][ T932] __switch_to+0x7c/0xbc > [ 368.120924][ T932] __schedule+0x338/0x6f0 > [ 368.125098][ T932] schedule+0x50/0xe0 > [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 > [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc > [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 > [ 368.144573][ T932] mutex_lock+0x50/0x60 > [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 > [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] > [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 > [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 > [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 > [ 368.175099][ T932] kthread+0x108/0x13c > [ 368.179012][ T932] ret_from_fork+0x10/0x18 > ... > > Use function cpumask_any_but() to find one correct active cpu to fixes > this issue. > > Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") > Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> > --- > drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c > index 0bc8dc36aff5..14f8b4b03337 100644 > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c > @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) > > pcie_pmu->on_cpu = -1; > /* Choose a new CPU from all online cpus. */ > - target = cpumask_first(cpu_online_mask); > + target = cpumask_any_but(cpu_online_mask, cpu); > if (target >= nr_cpu_ids) { > pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); > return 0; >
On Thu, Jun 08, 2023 at 07:43:26PM +0800, Junhao He wrote: > The driver needs to migrate the perf context if the current using CPU going > to teardown. By the time calling the cpuhp::teardown() callback the > cpu_online_mask() hasn't updated yet and still includes the CPU going to > teardown. In current driver's implementation we may migrate the context > to the teardown CPU and leads to the below calltrace: > > ... > [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 > [ 368.113699][ T932] Call trace: > [ 368.116834][ T932] __switch_to+0x7c/0xbc > [ 368.120924][ T932] __schedule+0x338/0x6f0 > [ 368.125098][ T932] schedule+0x50/0xe0 > [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 > [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc > [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 > [ 368.144573][ T932] mutex_lock+0x50/0x60 > [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 > [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] > [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 > [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 > [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 > [ 368.175099][ T932] kthread+0x108/0x13c > [ 368.179012][ T932] ret_from_fork+0x10/0x18 > ... > > Use function cpumask_any_but() to find one correct active cpu to fixes > this issue. > > Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") > Signed-off-by: Junhao He <hejunhao3@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> I assume that Will can pick this up. I did a quick check, and all other perf drivers seem to do the right thing here, either using cpumask_any_but(), or generating a temporary mask with the cpu being offlined removed. Mark. > --- > drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c > index 0bc8dc36aff5..14f8b4b03337 100644 > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c > @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) > > pcie_pmu->on_cpu = -1; > /* Choose a new CPU from all online cpus. */ > - target = cpumask_first(cpu_online_mask); > + target = cpumask_any_but(cpu_online_mask, cpu); > if (target >= nr_cpu_ids) { > pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); > return 0; > -- > 2.30.0 >
On Thu, 8 Jun 2023 19:43:26 +0800 Junhao He <hejunhao3@huawei.com> wrote: > The driver needs to migrate the perf context if the current using CPU going > to teardown. By the time calling the cpuhp::teardown() callback the > cpu_online_mask() hasn't updated yet and still includes the CPU going to > teardown. In current driver's implementation we may migrate the context > to the teardown CPU and leads to the below calltrace: > > ... > [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 > [ 368.113699][ T932] Call trace: > [ 368.116834][ T932] __switch_to+0x7c/0xbc > [ 368.120924][ T932] __schedule+0x338/0x6f0 > [ 368.125098][ T932] schedule+0x50/0xe0 > [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 > [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc > [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 > [ 368.144573][ T932] mutex_lock+0x50/0x60 > [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 > [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] > [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 > [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 > [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 > [ 368.175099][ T932] kthread+0x108/0x13c > [ 368.179012][ T932] ret_from_fork+0x10/0x18 > ... > > Use function cpumask_any_but() to find one correct active cpu to fixes > this issue. > > Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") > Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c > index 0bc8dc36aff5..14f8b4b03337 100644 > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c > @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) > > pcie_pmu->on_cpu = -1; > /* Choose a new CPU from all online cpus. */ > - target = cpumask_first(cpu_online_mask); > + target = cpumask_any_but(cpu_online_mask, cpu); > if (target >= nr_cpu_ids) { > pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); > return 0;
On Thu, 8 Jun 2023 19:43:26 +0800, Junhao He wrote: > The driver needs to migrate the perf context if the current using CPU going > to teardown. By the time calling the cpuhp::teardown() callback the > cpu_online_mask() hasn't updated yet and still includes the CPU going to > teardown. In current driver's implementation we may migrate the context > to the teardown CPU and leads to the below calltrace: > > ... > [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 > [ 368.113699][ T932] Call trace: > [ 368.116834][ T932] __switch_to+0x7c/0xbc > [ 368.120924][ T932] __schedule+0x338/0x6f0 > [ 368.125098][ T932] schedule+0x50/0xe0 > [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 > [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc > [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 > [ 368.144573][ T932] mutex_lock+0x50/0x60 > [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 > [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] > [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 > [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 > [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 > [ 368.175099][ T932] kthread+0x108/0x13c > [ 368.179012][ T932] ret_from_fork+0x10/0x18 > ... > > [...] Applied to will (for-next/perf), thanks! [1/1] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown https://git.kernel.org/will/c/7a6a9f1c5a0a Cheers,
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index 0bc8dc36aff5..14f8b4b03337 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) pcie_pmu->on_cpu = -1; /* Choose a new CPU from all online cpus. */ - target = cpumask_first(cpu_online_mask); + target = cpumask_any_but(cpu_online_mask, cpu); if (target >= nr_cpu_ids) { pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); return 0;
The driver needs to migrate the perf context if the current using CPU going to teardown. By the time calling the cpuhp::teardown() callback the cpu_online_mask() hasn't updated yet and still includes the CPU going to teardown. In current driver's implementation we may migrate the context to the teardown CPU and leads to the below calltrace: ... [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 [ 368.113699][ T932] Call trace: [ 368.116834][ T932] __switch_to+0x7c/0xbc [ 368.120924][ T932] __schedule+0x338/0x6f0 [ 368.125098][ T932] schedule+0x50/0xe0 [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 [ 368.144573][ T932] mutex_lock+0x50/0x60 [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 [ 368.175099][ T932] kthread+0x108/0x13c [ 368.179012][ T932] ret_from_fork+0x10/0x18 ... Use function cpumask_any_but() to find one correct active cpu to fixes this issue. Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Junhao He <hejunhao3@huawei.com> --- drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)