Message ID | 20230824084206.22844-1-qiang.zhang1211@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | ed763051f63059bdb6d728c0890993eab8833feb |
Headers | show |
Series | rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() | expand |
Sorry for the repeat sending. > > Currently, the maxcpu is set by traversing online CPUs, however, if > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > non-zero, and the some CPUs with larger cpuid has been offline before > setting maxcpu, for these CPUs, even if they are online again, also > cannot be offload or deoffload. > > This commit therefore use for_each_possible_cpu() instead of > for_each_online_cpu() in rcu_nocb_toggle(). > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > kernel/rcu/rcutorture.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > index a58372bdf0c1..b75d0fe558ce 100644 > --- a/kernel/rcu/rcutorture.c > +++ b/kernel/rcu/rcutorture.c > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > while (!rcu_inkernel_boot_has_ended()) > schedule_timeout_interruptible(HZ / 10); > - for_each_online_cpu(cpu) > + for_each_possible_cpu(cpu) > maxcpu = cpu; > WARN_ON(maxcpu < 0); > if (toggle_interval > ULONG_MAX) > -- > 2.17.1 >
On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > Currently, the maxcpu is set by traversing online CPUs, however, if > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > non-zero, and the some CPUs with larger cpuid has been offline before > setting maxcpu, for these CPUs, even if they are online again, also > cannot be offload or deoffload. > > This commit therefore use for_each_possible_cpu() instead of > for_each_online_cpu() in rcu_nocb_toggle(). > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > kernel/rcu/rcutorture.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > index a58372bdf0c1..b75d0fe558ce 100644 > --- a/kernel/rcu/rcutorture.c > +++ b/kernel/rcu/rcutorture.c > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > while (!rcu_inkernel_boot_has_ended()) > schedule_timeout_interruptible(HZ / 10); > - for_each_online_cpu(cpu) > + for_each_possible_cpu(cpu) Last I checked, bad things could happen if the code attempted to nocb_toggle a CPU that had not yet come online. Has that changed? Thanx, Paul > maxcpu = cpu; > WARN_ON(maxcpu < 0); > if (toggle_interval > ULONG_MAX) > -- > 2.17.1 >
> > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > Currently, the maxcpu is set by traversing online CPUs, however, if > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > non-zero, and the some CPUs with larger cpuid has been offline before > > setting maxcpu, for these CPUs, even if they are online again, also > > cannot be offload or deoffload. > > > > This commit therefore use for_each_possible_cpu() instead of > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > --- > > kernel/rcu/rcutorture.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > index a58372bdf0c1..b75d0fe558ce 100644 > > --- a/kernel/rcu/rcutorture.c > > +++ b/kernel/rcu/rcutorture.c > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > while (!rcu_inkernel_boot_has_ended()) > > schedule_timeout_interruptible(HZ / 10); > > - for_each_online_cpu(cpu) > > + for_each_possible_cpu(cpu) > > Last I checked, bad things could happen if the code attempted to > nocb_toggle a CPU that had not yet come online. Has that changed? For example, there are 8 online CPUs in the system, before we traversing online CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle for CPU7(maxcpu=6) Even though we still use for_each_online_cpu(), the things described above also happen. before we toggle the CPU, this CPU has been offline. Thanks Zqiang > > Thanx, Paul > > > maxcpu = cpu; > > WARN_ON(maxcpu < 0); > > if (toggle_interval > ULONG_MAX) > > -- > > 2.17.1 > >
On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > setting maxcpu, for these CPUs, even if they are online again, also > > > cannot be offload or deoffload. > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > --- > > > kernel/rcu/rcutorture.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > --- a/kernel/rcu/rcutorture.c > > > +++ b/kernel/rcu/rcutorture.c > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > while (!rcu_inkernel_boot_has_ended()) > > > schedule_timeout_interruptible(HZ / 10); > > > - for_each_online_cpu(cpu) > > > + for_each_possible_cpu(cpu) > > > > Last I checked, bad things could happen if the code attempted to > > nocb_toggle a CPU that had not yet come online. Has that changed? > > For example, there are 8 online CPUs in the system, before we traversing online > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > for CPU7(maxcpu=6) > > Even though we still use for_each_online_cpu(), the things described > above also happen. before we toggle the CPU, this CPU has been offline. Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, only 0 and 1 are present in this system, and until some manual action is taken, only 0 and 1 will ever be online. (Yes, this really can happen!) In that state, won't toggling CPU 2 and 3 result in failures? Thanx, Paul > Thanks > Zqiang > > > > > > Thanx, Paul > > > > > maxcpu = cpu; > > > WARN_ON(maxcpu < 0); > > > if (toggle_interval > ULONG_MAX) > > > -- > > > 2.17.1 > > >
> > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > cannot be offload or deoffload. > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > --- > > > > kernel/rcu/rcutorture.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > --- a/kernel/rcu/rcutorture.c > > > > +++ b/kernel/rcu/rcutorture.c > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > while (!rcu_inkernel_boot_has_ended()) > > > > schedule_timeout_interruptible(HZ / 10); > > > > - for_each_online_cpu(cpu) > > > > + for_each_possible_cpu(cpu) > > > > > > Last I checked, bad things could happen if the code attempted to > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > For example, there are 8 online CPUs in the system, before we traversing online > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > for CPU7(maxcpu=6) > > > > Even though we still use for_each_online_cpu(), the things described > > above also happen. before we toggle the CPU, this CPU has been offline. > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > only 0 and 1 are present in this system, and until some manual action is > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > In that state, won't toggling CPU 2 and 3 result in failures? > Agree. As long as we enabled rcutorture.onoff_interval, regardless of whether we use online CPUs or possible CPUs to set maxcpu, It is all possible to toggling the CPUs failure and print "NOCB: Cannot CB-offload offline CPU" log. but the failures due to CPU offline are acceptable. but at least the toggling operation on CPU7 will not be missed. when CPU7 comes online again. Would it be better to use for_each_present_cpu() ? Thanks Zqiang > > Thanx, Paul > > > Thanks > > Zqiang > > > > > > > > > > Thanx, Paul > > > > > > > maxcpu = cpu; > > > > WARN_ON(maxcpu < 0); > > > > if (toggle_interval > ULONG_MAX) > > > > -- > > > > 2.17.1 > > > >
On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: > > > > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > > cannot be offload or deoffload. > > > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > --- > > > > > kernel/rcu/rcutorture.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > > --- a/kernel/rcu/rcutorture.c > > > > > +++ b/kernel/rcu/rcutorture.c > > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > > while (!rcu_inkernel_boot_has_ended()) > > > > > schedule_timeout_interruptible(HZ / 10); > > > > > - for_each_online_cpu(cpu) > > > > > + for_each_possible_cpu(cpu) > > > > > > > > Last I checked, bad things could happen if the code attempted to > > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > > > For example, there are 8 online CPUs in the system, before we traversing online > > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > > for CPU7(maxcpu=6) > > > > > > Even though we still use for_each_online_cpu(), the things described > > > above also happen. before we toggle the CPU, this CPU has been offline. > > > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > > only 0 and 1 are present in this system, and until some manual action is > > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > > In that state, won't toggling CPU 2 and 3 result in failures? > > > > Agree. > As long as we enabled rcutorture.onoff_interval, regardless of whether we use > online CPUs or possible CPUs to set maxcpu, It is all possible to > toggling the CPUs failure > and print "NOCB: Cannot CB-offload offline CPU" log. but the failures > due to CPU offline are acceptable. > > but at least the toggling operation on CPU7 will not be missed. when > CPU7 comes online again. > > Would it be better to use for_each_present_cpu() ? The problem we face is that RCU and rcutorture have no reasonable way of knowing when the boot-time CPU bringup has completed. If there was a way of knowing that, then my approach would be to make rcutorture react to a holdoff of zero by waiting for all the CPUs to come online. Failing that, for_each_present_cpu() with a holdoff of zero will likely get us transient failures between the time rcutorture starts and the last CPU has come online. Or is there now a way for in-kernel code know when boot-time CPU onlining has completed? Thanx, Paul > Thanks > Zqiang > > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > maxcpu = cpu; > > > > > WARN_ON(maxcpu < 0); > > > > > if (toggle_interval > ULONG_MAX) > > > > > -- > > > > > 2.17.1 > > > > >
Le Sat, Aug 26, 2023 at 06:06:20AM -0700, Paul E. McKenney a écrit : > On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: > > > > > > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > > > cannot be offload or deoffload. > > > > > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > --- > > > > > > kernel/rcu/rcutorture.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > > > --- a/kernel/rcu/rcutorture.c > > > > > > +++ b/kernel/rcu/rcutorture.c > > > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > > > while (!rcu_inkernel_boot_has_ended()) > > > > > > schedule_timeout_interruptible(HZ / 10); > > > > > > - for_each_online_cpu(cpu) > > > > > > + for_each_possible_cpu(cpu) > > > > > > > > > > Last I checked, bad things could happen if the code attempted to > > > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > > > > > For example, there are 8 online CPUs in the system, before we traversing online > > > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > > > for CPU7(maxcpu=6) > > > > > > > > Even though we still use for_each_online_cpu(), the things described > > > > above also happen. before we toggle the CPU, this CPU has been offline. > > > > > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > > > only 0 and 1 are present in this system, and until some manual action is > > > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > > > In that state, won't toggling CPU 2 and 3 result in failures? > > > > > > > Agree. > > As long as we enabled rcutorture.onoff_interval, regardless of whether we use > > online CPUs or possible CPUs to set maxcpu, It is all possible to > > toggling the CPUs failure > > and print "NOCB: Cannot CB-offload offline CPU" log. but the failures > > due to CPU offline are acceptable. > > > > but at least the toggling operation on CPU7 will not be missed. when > > CPU7 comes online again. > > > > Would it be better to use for_each_present_cpu() ? > > The problem we face is that RCU and rcutorture have no reasonable way > of knowing when the boot-time CPU bringup has completed. If there was a > way of knowing that, then my approach would be to make rcutorture react > to a holdoff of zero by waiting for all the CPUs to come online. > > Failing that, for_each_present_cpu() with a holdoff of zero will likely > get us transient failures between the time rcutorture starts and the > last CPU has come online. > > Or is there now a way for in-kernel code know when boot-time CPU onlining > has completed? We don't need to wait for all CPUs to be online though. Toggling already handles well failures due to offline CPUs and since toggling happens concurrently with offlining in rcutorture, we already see lots of failures reported in the logs. Thanks.
> On Aug 28, 2023, at 11:17 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > Le Sat, Aug 26, 2023 at 06:06:20AM -0700, Paul E. McKenney a écrit : >> On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: >>>> >>>> On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: >>>>>> >>>>>>> On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: >>>>>>>> Currently, the maxcpu is set by traversing online CPUs, however, if >>>>>>>> the rcutorture.onoff_holdoff is set zero and onoff_interval is set >>>>>>>> non-zero, and the some CPUs with larger cpuid has been offline before >>>>>>>> setting maxcpu, for these CPUs, even if they are online again, also >>>>>>>> cannot be offload or deoffload. >>>>>>>> >>>>>>>> This commit therefore use for_each_possible_cpu() instead of >>>>>>>> for_each_online_cpu() in rcu_nocb_toggle(). >>>>>>>> >>>>>>>> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> >>>>>>>> --- >>>>>>>> kernel/rcu/rcutorture.c | 2 +- >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c >>>>>>>> index a58372bdf0c1..b75d0fe558ce 100644 >>>>>>>> --- a/kernel/rcu/rcutorture.c >>>>>>>> +++ b/kernel/rcu/rcutorture.c >>>>>>>> @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) >>>>>>>> VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); >>>>>>>> while (!rcu_inkernel_boot_has_ended()) >>>>>>>> schedule_timeout_interruptible(HZ / 10); >>>>>>>> - for_each_online_cpu(cpu) >>>>>>>> + for_each_possible_cpu(cpu) >>>>>>> >>>>>>> Last I checked, bad things could happen if the code attempted to >>>>>>> nocb_toggle a CPU that had not yet come online. Has that changed? >>>>>> >>>>>> For example, there are 8 online CPUs in the system, before we traversing online >>>>>> CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle >>>>>> for CPU7(maxcpu=6) >>>>>> >>>>>> Even though we still use for_each_online_cpu(), the things described >>>>>> above also happen. before we toggle the CPU, this CPU has been offline. >>>>> >>>>> Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, >>>>> only 0 and 1 are present in this system, and until some manual action is >>>>> taken, only 0 and 1 will ever be online. (Yes, this really can happen!) >>>>> In that state, won't toggling CPU 2 and 3 result in failures? >>>>> >>> >>> Agree. >>> As long as we enabled rcutorture.onoff_interval, regardless of whether we use >>> online CPUs or possible CPUs to set maxcpu, It is all possible to >>> toggling the CPUs failure >>> and print "NOCB: Cannot CB-offload offline CPU" log. but the failures >>> due to CPU offline are acceptable. >>> >>> but at least the toggling operation on CPU7 will not be missed. when >>> CPU7 comes online again. >>> >>> Would it be better to use for_each_present_cpu() ? >> >> The problem we face is that RCU and rcutorture have no reasonable way >> of knowing when the boot-time CPU bringup has completed. If there was a >> way of knowing that, then my approach would be to make rcutorture react >> to a holdoff of zero by waiting for all the CPUs to come online. >> >> Failing that, for_each_present_cpu() with a holdoff of zero will likely >> get us transient failures between the time rcutorture starts and the >> last CPU has come online. >> >> Or is there now a way for in-kernel code know when boot-time CPU onlining >> has completed? > > We don't need to wait for all CPUs to be online though. Toggling > already handles well failures due to offline CPUs and since toggling > happens concurrently with offlining in rcutorture, we already see lots > of failures reported in the logs. I think the issue is the loop later in the function does not try to toggle cpus that came online too late. So it does not test offloading on all CPUs just because max got updated too late. One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 ever got onlined. If it did, update the maxcpu. Thanks. > > Thanks.
On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > I think the issue is the loop later in the function does > not try to toggle cpus that came online too late. > > So it does not test offloading on all CPUs just because max got updated too > late. Right, and therefore for_each_possible_cpu() or for_each_present_cpu() should be fine to iterate since it's ok to try to toggle an offline CPU. > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > ever got onlined. If it did, update the maxcpu. Is it worth the complication though? Thanks. > > Thanks. > > > > > > Thanks.
> On Aug 29, 2023, at 5:24 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: >> I think the issue is the loop later in the function does >> not try to toggle cpus that came online too late. >> >> So it does not test offloading on all CPUs just because max got updated too >> late. > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > should be fine to iterate since it's ok to try to toggle an offline CPU. Ah I see what you mean, sounds good. > >> >> One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 >> ever got onlined. If it did, update the maxcpu. > > Is it worth the complication though? Probably not and so your suggestion sounds fine. Thanks! - Joel > > Thanks. > >> >> Thanks. >> >> >>> >>> Thanks.
On Tue, Aug 29, 2023 at 11:24:24AM +0200, Frederic Weisbecker wrote: > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > > I think the issue is the loop later in the function does > > not try to toggle cpus that came online too late. > > > > So it does not test offloading on all CPUs just because max got updated too > > late. > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > should be fine to iterate since it's ok to try to toggle an offline CPU. OK, so I will accept the original patch which did just that. Thank you! I recently got burned by lack of workqueues on a not-yet-onlined CPU, hence my questions here. ;-) Thanx, Paul > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > > ever got onlined. If it did, update the maxcpu. > > Is it worth the complication though? > > Thanks. > > > > > Thanks. > > > > > > > > > > Thanks.
On Tue, Aug 29, 2023 at 05:38:54AM -0700, Paul E. McKenney wrote: > On Tue, Aug 29, 2023 at 11:24:24AM +0200, Frederic Weisbecker wrote: > > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > > > I think the issue is the loop later in the function does > > > not try to toggle cpus that came online too late. > > > > > > So it does not test offloading on all CPUs just because max got updated too > > > late. > > > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > > should be fine to iterate since it's ok to try to toggle an offline CPU. > > OK, so I will accept the original patch which did just that. Which I finally did. I took the liberty of adding Frederic's Reviewed-by, but please let me know if this is in any way inappropriate. Thanx, Paul > Thank you! > > I recently got burned by lack of workqueues on a not-yet-onlined CPU, > hence my questions here. ;-) > > Thanx, Paul > > > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > > > ever got onlined. If it did, update the maxcpu. > > > > Is it worth the complication though? > > > > Thanks. > > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks.
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index a58372bdf0c1..b75d0fe558ce 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); while (!rcu_inkernel_boot_has_ended()) schedule_timeout_interruptible(HZ / 10); - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) maxcpu = cpu; WARN_ON(maxcpu < 0); if (toggle_interval > ULONG_MAX)
Currently, the maxcpu is set by traversing online CPUs, however, if the rcutorture.onoff_holdoff is set zero and onoff_interval is set non-zero, and the some CPUs with larger cpuid has been offline before setting maxcpu, for these CPUs, even if they are online again, also cannot be offload or deoffload. This commit therefore use for_each_possible_cpu() instead of for_each_online_cpu() in rcu_nocb_toggle(). Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- kernel/rcu/rcutorture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)