Message ID | 20190812210200.13653-1-daniel.m.jordan@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v2] padata: validate cpumask without removed CPU during offline | expand |
On Mon, Aug 12, 2019 at 05:02:00PM -0400, Daniel Jordan wrote: > __padata_remove_cpu clears the offlined CPU from the usable masks after > padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be > initialized to this CPU, causing padata to wait indefinitely for the > next job in padata_get_next. > > Make the usable masks reflect the offline CPU when they're established > in padata_setup_cpumasks so pd->cpu is initialized properly. > > Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder") > Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> > Cc: Herbert Xu <herbert@gondor.apana.org.au> > Cc: Steffen Klassert <steffen.klassert@secunet.com> > Cc: linux-crypto@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > > Hi, one more edge case. All combinations of CPUs among > parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested > in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these > settings for over an hour. > > kernel/padata.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) If we modify patch 2/2 by calling this after cpu_online_mask has been updated then this problem should go away because we can then remove the cpumask_clear_cpu calls. Cheers,
On 8/21/19 11:51 PM, Herbert Xu wrote: > On Mon, Aug 12, 2019 at 05:02:00PM -0400, Daniel Jordan wrote: >> __padata_remove_cpu clears the offlined CPU from the usable masks after >> padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be >> initialized to this CPU, causing padata to wait indefinitely for the >> next job in padata_get_next. >> >> Make the usable masks reflect the offline CPU when they're established >> in padata_setup_cpumasks so pd->cpu is initialized properly. >> >> Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder") >> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> >> Cc: Herbert Xu <herbert@gondor.apana.org.au> >> Cc: Steffen Klassert <steffen.klassert@secunet.com> >> Cc: linux-crypto@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> --- >> >> Hi, one more edge case. All combinations of CPUs among >> parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested >> in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these >> settings for over an hour. >> >> kernel/padata.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) > > If we modify patch 2/2 by calling this after cpu_online_mask > has been updated then this problem should go away because we > can then remove the cpumask_clear_cpu calls. Yep, agreed.
diff --git a/kernel/padata.c b/kernel/padata.c index 01460ea1d160..c1002ac4720c 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -702,17 +702,27 @@ static int __padata_remove_cpu(struct padata_instance *pinst, int cpu) struct parallel_data *pd = NULL; if (cpumask_test_cpu(cpu, cpu_online_mask)) { + cpumask_var_t pcpu, cbcpu; + __padata_stop(pinst); - pd = padata_alloc_pd(pinst, pinst->cpumask.pcpu, - pinst->cpumask.cbcpu); + /* + * padata_alloc_pd uses cpu_online_mask to get the usable + * masks, but @cpu hasn't been removed from it yet, so use + * temporary masks that exclude @cpu so the usable masks show + * @cpu as offline for pd->cpu's initialization. + */ + cpumask_copy(pcpu, pinst->cpumask.pcpu); + cpumask_copy(cbcpu, pinst->cpumask.cbcpu); + cpumask_clear_cpu(cpu, cbcpu); + cpumask_clear_cpu(cpu, pcpu); + + pd = padata_alloc_pd(pinst, pcpu, cbcpu); if (!pd) return -ENOMEM; padata_replace(pinst, pd); - cpumask_clear_cpu(cpu, pd->cpumask.cbcpu); - cpumask_clear_cpu(cpu, pd->cpumask.pcpu); if (padata_validate_cpumask(pinst, pd->cpumask.pcpu) && padata_validate_cpumask(pinst, pd->cpumask.cbcpu)) __padata_start(pinst);
__padata_remove_cpu clears the offlined CPU from the usable masks after padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be initialized to this CPU, causing padata to wait indefinitely for the next job in padata_get_next. Make the usable masks reflect the offline CPU when they're established in padata_setup_cpumasks so pd->cpu is initialized properly. Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- Hi, one more edge case. All combinations of CPUs among parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these settings for over an hour. kernel/padata.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)