Message ID | 20230225033421.3323164-1-joel@joelfernandes.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC,v2] rcu: Add a minimum time for marking boot as completed | expand |
On 2/24/23 19:34, Joel Fernandes (Google) wrote: > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 2429b5e3184b..0943139fdf01 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -5085,6 +5085,10 @@ > rcutorture.verbose= [KNL] > Enable additional printk() statements. > > + rcupdate.boot_end_delay= [KNL] > + Minimum time that must elapse before the boot > + sequence can be marked as completed. + in milliseconds. > + > rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] > Dump ftrace buffer after reporting RCU CPU > stall warning.
On Sat, Feb 25, 2023 at 03:34:21AM +0000, Joel Fernandes (Google) wrote: > On many systems, a great deal of boot happens after the kernel thinks the boot > has completed. It is difficult to determine if the system has really booted > from the kernel side. Is this only about kernel booting? What makes it hard to determine if the system has really booted? Thanks.
On Sun, Feb 26, 2023 at 2:45 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Sat, Feb 25, 2023 at 03:34:21AM +0000, Joel Fernandes (Google) wrote: > > On many systems, a great deal of boot happens after the kernel thinks the boot > > has completed. It is difficult to determine if the system has really booted > > from the kernel side. > > Is this only about kernel booting? What makes it hard to determine if the > system has really booted? Yes, I should probably clarify in the change log. It is more than the kernel booting, it is all OS userspace code that runs as well during which the CPU load is high (init spawning new programs such as shells, daemons setup etc). The kernel does not know when that will complete per-se. A timer based approach is not ideal, but it does not hurt either IMHO. In other words, it is a "something better than nothing" patch which reduces risk of slowness. Thanks, - Joel
> From: Joel Fernandes (Google) <joel@joelfernandes.org> > Sent: Saturday, February 25, 2023 11:34 AM > To: linux-kernel@vger.kernel.org > Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > rcu@vger.kernel.org > Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > completed > > On many systems, a great deal of boot happens after the kernel thinks the > boot has completed. It is difficult to determine if the system has really > booted from the kernel side. Some features like lazy-RCU can risk slowing > down boot time if, say, a callback has been added that the boot > synchronously depends on. > > Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > stay expedited for as long as the system is still booting. > > For these reasons, this commit adds a config option > 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > rcupdate.boot_end_delay. > > By default, this value is 20s. A system designer can choose to specify a value > here to keep RCU from marking boot completion. The boot sequence will not > be marked ended until at least boot_end_delay milliseconds have passed. Hi Joel, Just some thoughts on the default value of 20s, correct me if I'm wrong :-). Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the real-time latency than the overall OS boot time? If so, we might make rcupdate.boot_end_delay = 0 as the default value (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? -Qiuxu > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > v1->v2: > Update some comments and description. > ...
> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > >> >> From: Joel Fernandes (Google) <joel@joelfernandes.org> >> Sent: Saturday, February 25, 2023 11:34 AM >> To: linux-kernel@vger.kernel.org >> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker >> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- >> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; >> rcu@vger.kernel.org >> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as >> completed >> >> On many systems, a great deal of boot happens after the kernel thinks the >> boot has completed. It is difficult to determine if the system has really >> booted from the kernel side. Some features like lazy-RCU can risk slowing >> down boot time if, say, a callback has been added that the boot >> synchronously depends on. >> >> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to >> stay expedited for as long as the system is still booting. >> >> For these reasons, this commit adds a config option >> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter >> rcupdate.boot_end_delay. >> >> By default, this value is 20s. A system designer can choose to specify a value >> here to keep RCU from marking boot completion. The boot sequence will not >> be marked ended until at least boot_end_delay milliseconds have passed. > > Hi Joel, > > Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > real-time latency than the overall OS boot time? But every system has to boot, even an RT system. > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? Could you measure how much time your RT system takes to boot before the application runs? I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. Thanks, - Joel > > -Qiuxu > >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> v1->v2: >> Update some comments and description. >> ... >
On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > > On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > > >> > >> From: Joel Fernandes (Google) <joel@joelfernandes.org> > >> Sent: Saturday, February 25, 2023 11:34 AM > >> To: linux-kernel@vger.kernel.org > >> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > >> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > >> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > >> rcu@vger.kernel.org > >> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > >> completed > >> > >> On many systems, a great deal of boot happens after the kernel thinks the > >> boot has completed. It is difficult to determine if the system has really > >> booted from the kernel side. Some features like lazy-RCU can risk slowing > >> down boot time if, say, a callback has been added that the boot > >> synchronously depends on. > >> > >> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > >> stay expedited for as long as the system is still booting. > >> > >> For these reasons, this commit adds a config option > >> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > >> rcupdate.boot_end_delay. > >> > >> By default, this value is 20s. A system designer can choose to specify a value > >> here to keep RCU from marking boot completion. The boot sequence will not > >> be marked ended until at least boot_end_delay milliseconds have passed. > > > > Hi Joel, > > > > Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > > Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > real-time latency than the overall OS boot time? > > But every system has to boot, even an RT system. > > > > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > Could you measure how much time your RT system takes to boot before the application runs? > > I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. Provide a /sys location that the userspace code writes to when it is ready? Different systems with different hardware and software configurations are going to take different amounts of time to boot, correct? Thanx, Paul > Thanks, > > - Joel > > > > > > -Qiuxu > > > >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >> --- > >> v1->v2: > >> Update some comments and description. > >> ... > >
On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > > > > > On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > > > > > >> > > >> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > >> Sent: Saturday, February 25, 2023 11:34 AM > > >> To: linux-kernel@vger.kernel.org > > >> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > >> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > >> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > >> rcu@vger.kernel.org > > >> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > >> completed > > >> > > >> On many systems, a great deal of boot happens after the kernel thinks the > > >> boot has completed. It is difficult to determine if the system has really > > >> booted from the kernel side. Some features like lazy-RCU can risk slowing > > >> down boot time if, say, a callback has been added that the boot > > >> synchronously depends on. > > >> > > >> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > >> stay expedited for as long as the system is still booting. > > >> > > >> For these reasons, this commit adds a config option > > >> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > >> rcupdate.boot_end_delay. > > >> > > >> By default, this value is 20s. A system designer can choose to specify a value > > >> here to keep RCU from marking boot completion. The boot sequence will not > > >> be marked ended until at least boot_end_delay milliseconds have passed. > > > > > > Hi Joel, > > > > > > Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > > > > Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > > real-time latency than the overall OS boot time? > > > > But every system has to boot, even an RT system. > > > > > > > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > > Could you measure how much time your RT system takes to boot before the application runs? > > > > I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > Provide a /sys location that the userspace code writes to when it > is ready? Different systems with different hardware and software > configurations are going to take different amounts of time to boot, > correct? I could add a sysfs node, but I still wanted this patch as well because I am wary of systems where yet more userspace changes are required. I feel the kernel should itself be able to do this. Yes, it is possible the system completes "booting" at a different time than what the kernel thinks. But it does that anyway (even without this patch), so I am not seeing a good reason to not do this in the kernel. It is also only a minimum cap, so if the in-kernel boot takes too long, then the patch will have no effect. Thoughts? thanks, - Joel
On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > > > > > > > > On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > > > > > > > > >> > > > >> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >> Sent: Saturday, February 25, 2023 11:34 AM > > > >> To: linux-kernel@vger.kernel.org > > > >> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > > >> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > > >> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > > >> rcu@vger.kernel.org > > > >> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > > >> completed > > > >> > > > >> On many systems, a great deal of boot happens after the kernel thinks the > > > >> boot has completed. It is difficult to determine if the system has really > > > >> booted from the kernel side. Some features like lazy-RCU can risk slowing > > > >> down boot time if, say, a callback has been added that the boot > > > >> synchronously depends on. > > > >> > > > >> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > > >> stay expedited for as long as the system is still booting. > > > >> > > > >> For these reasons, this commit adds a config option > > > >> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > > >> rcupdate.boot_end_delay. > > > >> > > > >> By default, this value is 20s. A system designer can choose to specify a value > > > >> here to keep RCU from marking boot completion. The boot sequence will not > > > >> be marked ended until at least boot_end_delay milliseconds have passed. > > > > > > > > Hi Joel, > > > > > > > > Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > > > > > > Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > > > real-time latency than the overall OS boot time? > > > > > > But every system has to boot, even an RT system. > > > > > > > > > > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > > > > Could you measure how much time your RT system takes to boot before the application runs? > > > > > > I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > > > Provide a /sys location that the userspace code writes to when it > > is ready? Different systems with different hardware and software > > configurations are going to take different amounts of time to boot, > > correct? > > I could add a sysfs node, but I still wanted this patch as well > because I am wary of systems where yet more userspace changes are > required. I feel the kernel should itself be able to do this. Yes, it > is possible the system completes "booting" at a different time than > what the kernel thinks. But it does that anyway (even without this > patch), so I am not seeing a good reason to not do this in the kernel. > It is also only a minimum cap, so if the in-kernel boot takes too > long, then the patch will have no effect. > > Thoughts? > Why "rcu_boot_ended" is not enough? As i see right after that an "init" process or shell or panic is going to be invoked by the kernel. It basically indicates that a kernel is fully functional. Or an idea to wait even further? Until all kernel modules are loaded by user space. Thanks! -- Uladzislau Rezki
> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: >>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: >>> >>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: >>>> >>>> >>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: >>>>> >>>>> >>>>>> >>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>> Sent: Saturday, February 25, 2023 11:34 AM >>>>>> To: linux-kernel@vger.kernel.org >>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker >>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- >>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; >>>>>> rcu@vger.kernel.org >>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as >>>>>> completed >>>>>> >>>>>> On many systems, a great deal of boot happens after the kernel thinks the >>>>>> boot has completed. It is difficult to determine if the system has really >>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing >>>>>> down boot time if, say, a callback has been added that the boot >>>>>> synchronously depends on. >>>>>> >>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to >>>>>> stay expedited for as long as the system is still booting. >>>>>> >>>>>> For these reasons, this commit adds a config option >>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter >>>>>> rcupdate.boot_end_delay. >>>>>> >>>>>> By default, this value is 20s. A system designer can choose to specify a value >>>>>> here to keep RCU from marking boot completion. The boot sequence will not >>>>>> be marked ended until at least boot_end_delay milliseconds have passed. >>>>> >>>>> Hi Joel, >>>>> >>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). >>>>> >>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the >>>>> real-time latency than the overall OS boot time? >>>> >>>> But every system has to boot, even an RT system. >>>> >>>>> >>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value >>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? >>>> >>>> Could you measure how much time your RT system takes to boot before the application runs? >>>> >>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. >>> >>> Provide a /sys location that the userspace code writes to when it >>> is ready? Different systems with different hardware and software >>> configurations are going to take different amounts of time to boot, >>> correct? >> >> I could add a sysfs node, but I still wanted this patch as well >> because I am wary of systems where yet more userspace changes are >> required. I feel the kernel should itself be able to do this. Yes, it >> is possible the system completes "booting" at a different time than >> what the kernel thinks. But it does that anyway (even without this >> patch), so I am not seeing a good reason to not do this in the kernel. >> It is also only a minimum cap, so if the in-kernel boot takes too >> long, then the patch will have no effect. >> >> Thoughts? >> > Why "rcu_boot_ended" is not enough? As i see right after that an "init" > process or shell or panic is going to be invoked by the kernel. It basically > indicates that a kernel is fully functional. > > Or an idea to wait even further? Until all kernel modules are loaded by > user space. I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? Thanks, - Joel (And Steven please dont kick my butt for the lack of wrapping, I am working on fixing it :P). > > Thanks! > > -- > Uladzislau Rezki
On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: > > > > On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > >>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > >>> > >>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > >>>> > >>>> > >>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > >>>>> > >>>>> > >>>>>> > >>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>>> Sent: Saturday, February 25, 2023 11:34 AM > >>>>>> To: linux-kernel@vger.kernel.org > >>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > >>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > >>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > >>>>>> rcu@vger.kernel.org > >>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > >>>>>> completed > >>>>>> > >>>>>> On many systems, a great deal of boot happens after the kernel thinks the > >>>>>> boot has completed. It is difficult to determine if the system has really > >>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > >>>>>> down boot time if, say, a callback has been added that the boot > >>>>>> synchronously depends on. > >>>>>> > >>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > >>>>>> stay expedited for as long as the system is still booting. > >>>>>> > >>>>>> For these reasons, this commit adds a config option > >>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > >>>>>> rcupdate.boot_end_delay. > >>>>>> > >>>>>> By default, this value is 20s. A system designer can choose to specify a value > >>>>>> here to keep RCU from marking boot completion. The boot sequence will not > >>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > >>>>> > >>>>> Hi Joel, > >>>>> > >>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > >>>>> > >>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > >>>>> real-time latency than the overall OS boot time? > >>>> > >>>> But every system has to boot, even an RT system. > >>>> > >>>>> > >>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > >>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > >>>> > >>>> Could you measure how much time your RT system takes to boot before the application runs? > >>>> > >>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > >>> > >>> Provide a /sys location that the userspace code writes to when it > >>> is ready? Different systems with different hardware and software > >>> configurations are going to take different amounts of time to boot, > >>> correct? > >> > >> I could add a sysfs node, but I still wanted this patch as well > >> because I am wary of systems where yet more userspace changes are > >> required. I feel the kernel should itself be able to do this. Yes, it > >> is possible the system completes "booting" at a different time than > >> what the kernel thinks. But it does that anyway (even without this > >> patch), so I am not seeing a good reason to not do this in the kernel. > >> It is also only a minimum cap, so if the in-kernel boot takes too > >> long, then the patch will have no effect. > >> > >> Thoughts? > >> > > Why "rcu_boot_ended" is not enough? As i see right after that an "init" > > process or shell or panic is going to be invoked by the kernel. It basically > > indicates that a kernel is fully functional. > > > > Or an idea to wait even further? Until all kernel modules are loaded by > > user space. > > I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > > So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > Than it is up to user space to decide when it is ready in terms of "boot completed". -- Uladzislau Rezki
> On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: >> >> >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: >>> >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: >>>>> >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: >>>>>> >>>>>> >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM >>>>>>>> To: linux-kernel@vger.kernel.org >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; >>>>>>>> rcu@vger.kernel.org >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as >>>>>>>> completed >>>>>>>> >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the >>>>>>>> boot has completed. It is difficult to determine if the system has really >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing >>>>>>>> down boot time if, say, a callback has been added that the boot >>>>>>>> synchronously depends on. >>>>>>>> >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to >>>>>>>> stay expedited for as long as the system is still booting. >>>>>>>> >>>>>>>> For these reasons, this commit adds a config option >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter >>>>>>>> rcupdate.boot_end_delay. >>>>>>>> >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. >>>>>>> >>>>>>> Hi Joel, >>>>>>> >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). >>>>>>> >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the >>>>>>> real-time latency than the overall OS boot time? >>>>>> >>>>>> But every system has to boot, even an RT system. >>>>>> >>>>>>> >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? >>>>>> >>>>>> Could you measure how much time your RT system takes to boot before the application runs? >>>>>> >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. >>>>> >>>>> Provide a /sys location that the userspace code writes to when it >>>>> is ready? Different systems with different hardware and software >>>>> configurations are going to take different amounts of time to boot, >>>>> correct? >>>> >>>> I could add a sysfs node, but I still wanted this patch as well >>>> because I am wary of systems where yet more userspace changes are >>>> required. I feel the kernel should itself be able to do this. Yes, it >>>> is possible the system completes "booting" at a different time than >>>> what the kernel thinks. But it does that anyway (even without this >>>> patch), so I am not seeing a good reason to not do this in the kernel. >>>> It is also only a minimum cap, so if the in-kernel boot takes too >>>> long, then the patch will have no effect. >>>> >>>> Thoughts? >>>> >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" >>> process or shell or panic is going to be invoked by the kernel. It basically >>> indicates that a kernel is fully functional. >>> >>> Or an idea to wait even further? Until all kernel modules are loaded by >>> user space. >> >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. >> >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? >> > Than it is up to user space to decide when it is ready in terms of "boot completed". I dont know if you caught up with the other threads. See replies from Paul and my reply to that. Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. Thanks, - Joel > > -- > Uladzislau Rezki
On Mon, Feb 27, 2023 at 01:27:20PM -0500, Joel Fernandes wrote: > > > > On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: > >> > >> > >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > >>> > >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > >>>>> > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > >>>>>> > >>>>>> > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM > >>>>>>>> To: linux-kernel@vger.kernel.org > >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > >>>>>>>> rcu@vger.kernel.org > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > >>>>>>>> completed > >>>>>>>> > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the > >>>>>>>> boot has completed. It is difficult to determine if the system has really > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > >>>>>>>> down boot time if, say, a callback has been added that the boot > >>>>>>>> synchronously depends on. > >>>>>>>> > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > >>>>>>>> stay expedited for as long as the system is still booting. > >>>>>>>> > >>>>>>>> For these reasons, this commit adds a config option > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > >>>>>>>> rcupdate.boot_end_delay. > >>>>>>>> > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > >>>>>>> > >>>>>>> Hi Joel, > >>>>>>> > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > >>>>>>> > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > >>>>>>> real-time latency than the overall OS boot time? > >>>>>> > >>>>>> But every system has to boot, even an RT system. > >>>>>> > >>>>>>> > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > >>>>>> > >>>>>> Could you measure how much time your RT system takes to boot before the application runs? > >>>>>> > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > >>>>> > >>>>> Provide a /sys location that the userspace code writes to when it > >>>>> is ready? Different systems with different hardware and software > >>>>> configurations are going to take different amounts of time to boot, > >>>>> correct? > >>>> > >>>> I could add a sysfs node, but I still wanted this patch as well > >>>> because I am wary of systems where yet more userspace changes are > >>>> required. I feel the kernel should itself be able to do this. Yes, it > >>>> is possible the system completes "booting" at a different time than > >>>> what the kernel thinks. But it does that anyway (even without this > >>>> patch), so I am not seeing a good reason to not do this in the kernel. > >>>> It is also only a minimum cap, so if the in-kernel boot takes too > >>>> long, then the patch will have no effect. > >>>> > >>>> Thoughts? > >>>> > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" > >>> process or shell or panic is going to be invoked by the kernel. It basically > >>> indicates that a kernel is fully functional. > >>> > >>> Or an idea to wait even further? Until all kernel modules are loaded by > >>> user space. > >> > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > >> > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > >> > > Than it is up to user space to decide when it is ready in terms of "boot completed". > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that. > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. > I saw that Paul proposed to have a sysfs attribute using which you can send a notification. IMHO, to me this patch does not provide a clear correlation between what is a boot complete and when it occurs. A boot complete is a synchronous event whereas the patch thinks that after some interval a "boot" is completed. We can imply that after, say 100 seconds an initialization of user space is done. Maybe 100 seconds then? :) -- Uladzislau Rezki
On Mon, Feb 27, 2023 at 1:57 PM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Mon, Feb 27, 2023 at 01:27:20PM -0500, Joel Fernandes wrote: > > > > > > > On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: > > >> > > >> > > >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > >>> > > >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > > >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > >>>>> > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > >>>>>> > > >>>>>> > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > >>>>>>> > > >>>>>>> > > >>>>>>>> > > >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM > > >>>>>>>> To: linux-kernel@vger.kernel.org > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > >>>>>>>> rcu@vger.kernel.org > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > >>>>>>>> completed > > >>>>>>>> > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the > > >>>>>>>> boot has completed. It is difficult to determine if the system has really > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > > >>>>>>>> down boot time if, say, a callback has been added that the boot > > >>>>>>>> synchronously depends on. > > >>>>>>>> > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > >>>>>>>> stay expedited for as long as the system is still booting. > > >>>>>>>> > > >>>>>>>> For these reasons, this commit adds a config option > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > >>>>>>>> rcupdate.boot_end_delay. > > >>>>>>>> > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value > > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > > >>>>>>> > > >>>>>>> Hi Joel, > > >>>>>>> > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > >>>>>>> > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > >>>>>>> real-time latency than the overall OS boot time? > > >>>>>> > > >>>>>> But every system has to boot, even an RT system. > > >>>>>> > > >>>>>>> > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > >>>>>> > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs? > > >>>>>> > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > >>>>> > > >>>>> Provide a /sys location that the userspace code writes to when it > > >>>>> is ready? Different systems with different hardware and software > > >>>>> configurations are going to take different amounts of time to boot, > > >>>>> correct? > > >>>> > > >>>> I could add a sysfs node, but I still wanted this patch as well > > >>>> because I am wary of systems where yet more userspace changes are > > >>>> required. I feel the kernel should itself be able to do this. Yes, it > > >>>> is possible the system completes "booting" at a different time than > > >>>> what the kernel thinks. But it does that anyway (even without this > > >>>> patch), so I am not seeing a good reason to not do this in the kernel. > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too > > >>>> long, then the patch will have no effect. > > >>>> > > >>>> Thoughts? > > >>>> > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" > > >>> process or shell or panic is going to be invoked by the kernel. It basically > > >>> indicates that a kernel is fully functional. > > >>> > > >>> Or an idea to wait even further? Until all kernel modules are loaded by > > >>> user space. > > >> > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > > >> > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > > >> > > > Than it is up to user space to decide when it is ready in terms of "boot completed". > > > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that. > > > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. > > > I saw that Paul proposed to have a sysfs attribute using which you can > send a notification. Maybe I am missing something but how will a sysfs node on its own work really? 1. delete kernel marking itself boot completed -- and then sysfs marks it completed? 2. delete kernel marking itself boot completed -- and then sysfs marks it completed, if sysfs does not come in in N seconds, then kernel marks as completed? #1 is a no go, that just means a bug waiting to happen if userspace forgets to write to sysfs. #2 is just an extension of this patch. So I can add a sysfs node on top of this. And we can make the minimum time as a long period of time, as you noted below: > IMHO, to me this patch does not provide a clear correlation between what > is a boot complete and when it occurs. A boot complete is a synchronous > event whereas the patch thinks that after some interval a "boot" is completed. But that is exactly how the kernel code is now without this patch, so it is already broken in that sense, I am not really breaking it more ;-) > We can imply that after, say 100 seconds an initialization of user space > is done. Maybe 100 seconds then? :) Yes I am Ok with that. So are you suggesting we change the default to 100 seconds and then add a sysfs node to mark as boot done whenever userspace notifies? Thanks, - Joel
On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote: > On Mon, Feb 27, 2023 at 1:57 PM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Feb 27, 2023 at 01:27:20PM -0500, Joel Fernandes wrote: > > > > > > > > > > On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: > > > >> > > > >> > > > >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > >>> > > > >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > > > >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>>>> > > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > >>>>>> > > > >>>>>> > > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM > > > >>>>>>>> To: linux-kernel@vger.kernel.org > > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > > >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > > >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > > >>>>>>>> rcu@vger.kernel.org > > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > > >>>>>>>> completed > > > >>>>>>>> > > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the > > > >>>>>>>> boot has completed. It is difficult to determine if the system has really > > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > > > >>>>>>>> down boot time if, say, a callback has been added that the boot > > > >>>>>>>> synchronously depends on. > > > >>>>>>>> > > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > > >>>>>>>> stay expedited for as long as the system is still booting. > > > >>>>>>>> > > > >>>>>>>> For these reasons, this commit adds a config option > > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > > >>>>>>>> rcupdate.boot_end_delay. > > > >>>>>>>> > > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value > > > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not > > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > > > >>>>>>> > > > >>>>>>> Hi Joel, > > > >>>>>>> > > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > >>>>>>> > > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > > >>>>>>> real-time latency than the overall OS boot time? > > > >>>>>> > > > >>>>>> But every system has to boot, even an RT system. > > > >>>>>> > > > >>>>>>> > > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > >>>>>> > > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs? > > > >>>>>> > > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > > >>>>> > > > >>>>> Provide a /sys location that the userspace code writes to when it > > > >>>>> is ready? Different systems with different hardware and software > > > >>>>> configurations are going to take different amounts of time to boot, > > > >>>>> correct? > > > >>>> > > > >>>> I could add a sysfs node, but I still wanted this patch as well > > > >>>> because I am wary of systems where yet more userspace changes are > > > >>>> required. I feel the kernel should itself be able to do this. Yes, it > > > >>>> is possible the system completes "booting" at a different time than > > > >>>> what the kernel thinks. But it does that anyway (even without this > > > >>>> patch), so I am not seeing a good reason to not do this in the kernel. > > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too > > > >>>> long, then the patch will have no effect. > > > >>>> > > > >>>> Thoughts? > > > >>>> > > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" > > > >>> process or shell or panic is going to be invoked by the kernel. It basically > > > >>> indicates that a kernel is fully functional. > > > >>> > > > >>> Or an idea to wait even further? Until all kernel modules are loaded by > > > >>> user space. > > > >> > > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > > > >> > > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > > > >> > > > > Than it is up to user space to decide when it is ready in terms of "boot completed". > > > > > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that. > > > > > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. > > > > > I saw that Paul proposed to have a sysfs attribute using which you can > > send a notification. > > Maybe I am missing something but how will a sysfs node on its own work really? > > 1. delete kernel marking itself boot completed -- and then sysfs > marks it completed? > > 2. delete kernel marking itself boot completed -- and then sysfs > marks it completed, if sysfs does not come in in N seconds, then > kernel marks as completed? > > #1 is a no go, that just means a bug waiting to happen if userspace > forgets to write to sysfs. > > #2 is just an extension of this patch. So I can add a sysfs node on > top of this. And we can make the minimum time as a long period of > time, as you noted below: > > > IMHO, to me this patch does not provide a clear correlation between what > > is a boot complete and when it occurs. A boot complete is a synchronous > > event whereas the patch thinks that after some interval a "boot" is completed. > > But that is exactly how the kernel code is now without this patch, so > it is already broken in that sense, I am not really breaking it more > ;-) > > > We can imply that after, say 100 seconds an initialization of user space > > is done. Maybe 100 seconds then? :) > > Yes I am Ok with that. So are you suggesting we change the default to > 100 seconds and then add a sysfs node to mark as boot done whenever > userspace notifies? The combination of sysfs manipulated by userspace and a kernel failsafe makes sense to me. Especially if by default triggering the failsafe splats. That way, bugs where userspace fails to update the sysfs file get caught. The non-default silent-failsafe mode is also useful to allow some power savings in advance of userspace getting the sysfs updating in place. And of course the default splatting setup can be used in internal testing with the release software being more tolerant of userspace foibles. Thanx, Paul
On Mon, Feb 27, 2023 at 6:05 PM Paul E. McKenney <paulmck@kernel.org> wrote: [...] > > > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM > > > > >>>>>>>> To: linux-kernel@vger.kernel.org > > > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > > > >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > > > >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > > > >>>>>>>> rcu@vger.kernel.org > > > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > > > >>>>>>>> completed > > > > >>>>>>>> > > > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the > > > > >>>>>>>> boot has completed. It is difficult to determine if the system has really > > > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > > > > >>>>>>>> down boot time if, say, a callback has been added that the boot > > > > >>>>>>>> synchronously depends on. > > > > >>>>>>>> > > > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > > > >>>>>>>> stay expedited for as long as the system is still booting. > > > > >>>>>>>> > > > > >>>>>>>> For these reasons, this commit adds a config option > > > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > > > >>>>>>>> rcupdate.boot_end_delay. > > > > >>>>>>>> > > > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value > > > > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not > > > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > > > > >>>>>>> > > > > >>>>>>> Hi Joel, > > > > >>>>>>> > > > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > > >>>>>>> > > > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > > > >>>>>>> real-time latency than the overall OS boot time? > > > > >>>>>> > > > > >>>>>> But every system has to boot, even an RT system. > > > > >>>>>> > > > > >>>>>>> > > > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > > >>>>>> > > > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs? > > > > >>>>>> > > > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > > > >>>>> > > > > >>>>> Provide a /sys location that the userspace code writes to when it > > > > >>>>> is ready? Different systems with different hardware and software > > > > >>>>> configurations are going to take different amounts of time to boot, > > > > >>>>> correct? > > > > >>>> > > > > >>>> I could add a sysfs node, but I still wanted this patch as well > > > > >>>> because I am wary of systems where yet more userspace changes are > > > > >>>> required. I feel the kernel should itself be able to do this. Yes, it > > > > >>>> is possible the system completes "booting" at a different time than > > > > >>>> what the kernel thinks. But it does that anyway (even without this > > > > >>>> patch), so I am not seeing a good reason to not do this in the kernel. > > > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too > > > > >>>> long, then the patch will have no effect. > > > > >>>> > > > > >>>> Thoughts? > > > > >>>> > > > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" > > > > >>> process or shell or panic is going to be invoked by the kernel. It basically > > > > >>> indicates that a kernel is fully functional. > > > > >>> > > > > >>> Or an idea to wait even further? Until all kernel modules are loaded by > > > > >>> user space. > > > > >> > > > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > > > > >> > > > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > > > > >> > > > > > Than it is up to user space to decide when it is ready in terms of "boot completed". > > > > > > > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that. > > > > > > > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. > > > > > > > I saw that Paul proposed to have a sysfs attribute using which you can > > > send a notification. > > > > Maybe I am missing something but how will a sysfs node on its own work really? > > > > 1. delete kernel marking itself boot completed -- and then sysfs > > marks it completed? > > > > 2. delete kernel marking itself boot completed -- and then sysfs > > marks it completed, if sysfs does not come in in N seconds, then > > kernel marks as completed? > > > > #1 is a no go, that just means a bug waiting to happen if userspace > > forgets to write to sysfs. > > > > #2 is just an extension of this patch. So I can add a sysfs node on > > top of this. And we can make the minimum time as a long period of > > time, as you noted below: > > > > > IMHO, to me this patch does not provide a clear correlation between what > > > is a boot complete and when it occurs. A boot complete is a synchronous > > > event whereas the patch thinks that after some interval a "boot" is completed. > > > > But that is exactly how the kernel code is now without this patch, so > > it is already broken in that sense, I am not really breaking it more > > ;-) > > > > > We can imply that after, say 100 seconds an initialization of user space > > > is done. Maybe 100 seconds then? :) > > > > Yes I am Ok with that. So are you suggesting we change the default to > > 100 seconds and then add a sysfs node to mark as boot done whenever > > userspace notifies? > > The combination of sysfs manipulated by userspace and a kernel failsafe > makes sense to me. Especially if by default triggering the failsafe > splats. That way, bugs where userspace fails to update the sysfs file > get caught. By splat, if we could do an "info" message, that would work for me instead of a WARN_ON. I'm afraid of Android and other folks who upgrade to the new kernel only to now have to go patch userspace. So, pr_info("RCU is still in boot-mode for the next N seconds, please consider writing X to /sys/.. to avoid this message."); ? > The non-default silent-failsafe mode is also useful to allow some power > savings in advance of userspace getting the sysfs updating in place. > And of course the default splatting setup can be used in internal testing > with the release software being more tolerant of userspace foibles. Sounds good, would 100 seconds be a good fail-safe trigger value? Thanks, - Joel
On Mon, Feb 27, 2023 at 03:05:02PM -0800, Paul E. McKenney wrote: > On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote: > > The combination of sysfs manipulated by userspace and a kernel failsafe > makes sense to me. Especially if by default triggering the failsafe > splats. That way, bugs where userspace fails to update the sysfs file > get caught. > > The non-default silent-failsafe mode is also useful to allow some power > savings in advance of userspace getting the sysfs updating in place. > And of course the default splatting setup can be used in internal testing > with the release software being more tolerant of userspace foibles. I'm wondering, this is all about CONFIG_RCU_LAZY, right? Or does also expedited GP turned off a bit early or late on boot matter for anybody in practice? So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready about it? We can still keep the current call to rcu_end_inkernel_boot(). And if suddenly disabling lazy by default is an ABI breakage we can still add CONFIG_RCU_LAZY_DEFAULT_DISABLED.
On Tue, Feb 28, 2023 at 12:40:38AM +0100, Frederic Weisbecker wrote: > On Mon, Feb 27, 2023 at 03:05:02PM -0800, Paul E. McKenney wrote: > > On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote: > > > > The combination of sysfs manipulated by userspace and a kernel failsafe > > makes sense to me. Especially if by default triggering the failsafe > > splats. That way, bugs where userspace fails to update the sysfs file > > get caught. > > > > The non-default silent-failsafe mode is also useful to allow some power > > savings in advance of userspace getting the sysfs updating in place. > > And of course the default splatting setup can be used in internal testing > > with the release software being more tolerant of userspace foibles. > > I'm wondering, this is all about CONFIG_RCU_LAZY, right? Or does also expedited > GP turned off a bit early or late on boot matter for anybody in practice? Yes, if you provide 'rcu_normal_after_boot', then after the boot ends, it switches expedited GPs to normal ones. It is the same issue for expedited, the kernel's version of what is 'boot' is much shorter than what is actually boot. This is also the case with suspend/resume's rcu_pm_notify(). See the comment: /* * On non-huge systems, use expedited RCU grace periods to make suspend * and hibernation run faster. */ There also we turn on/off both lazy and expedited. I don't see why we shouldn't do it for boot. > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready > about it? We can still keep the current call to rcu_end_inkernel_boot(). Hmm IMHO that would add more knobs for not much reason honestly. We already have CONFIG_RCU_LAZY default disabled, I really don't want to add more dependency (like user enables the config and does not see laziness). Thanks! - Joel
> From: Joel Fernandes <joel@joelfernandes.org> > Sent: Monday, February 27, 2023 9:22 PM > To: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > Cc: linux-kernel@vger.kernel.org; Frederic Weisbecker <frederic@kernel.org>; > Lai Jiangshan <jiangshanlai@gmail.com>; linux-doc@vger.kernel.org; Paul E. > McKenney <paulmck@kernel.org>; rcu@vger.kernel.org > Subject: Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > completed > > > > > On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > > >> > >> From: Joel Fernandes (Google) <joel@joelfernandes.org> > >> Sent: Saturday, February 25, 2023 11:34 AM > >> To: linux-kernel@vger.kernel.org > >> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic > >> Weisbecker <frederic@kernel.org>; Lai Jiangshan > >> <jiangshanlai@gmail.com>; linux- doc@vger.kernel.org; Paul E. > >> McKenney <paulmck@kernel.org>; rcu@vger.kernel.org > >> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > >> completed > >> > >> On many systems, a great deal of boot happens after the kernel thinks > >> the boot has completed. It is difficult to determine if the system > >> has really booted from the kernel side. Some features like lazy-RCU > >> can risk slowing down boot time if, say, a callback has been added > >> that the boot synchronously depends on. > >> > >> Further, it is better to boot systems which pass > >> 'rcu_normal_after_boot' to stay expedited for as long as the system is still > booting. > >> > >> For these reasons, this commit adds a config option > >> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > >> rcupdate.boot_end_delay. > >> > >> By default, this value is 20s. A system designer can choose to > >> specify a value here to keep RCU from marking boot completion. The > >> boot sequence will not be marked ended until at least boot_end_delay > milliseconds have passed. > > > > Hi Joel, > > > > Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > > Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > real-time latency than the overall OS boot time? > > But every system has to boot, even an RT system. Yes, this is true. > > > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > Could you measure how much time your RT system takes to boot before the > application runs? I don't have a real-time OS environment to measure the OS boot time. I tried to measure the OS boot time of my "CentOS Stream 8" w/o and w/ Joel’s patch. My testing showed the positive result that the OS boot time was reduced by ~4.6% on my side after applying Joel’s patch. For testing details, please see the below: 1) Testing environment: OS : CentOS Stream 8 (non-RT OS) Kernel : v6.2 Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads) Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, … 2) My OS boot time definition: The time from the start of the kernel boot to the shell command line prompt is shown from the console. [ Different people may have different OS boot time definitions. ] 3) My measurement method (very rough method): A timer in the kernel periodically prints the boot time every 100ms. As soon as the shell command line prompt is shown from the console, we record the boot time printed by the timer, then the printed boot time is the OS boot time. The console log (mixed userspace and kernel logs) looked like this: [ OK ] Started Permit User Sessions. Starting Terminate Plymouth Boot Screen... Starting Hold until boot process finishes up... [ OK ] Started Command Scheduler. [ 6.824466] input: ImExPS/2 Generic Explorer ... [ 6.884685] Boot ms 6863 ... [ 7.170920] Spectre V2 : WARNING: Unprivileged eBPF ... [ 7.173140] Spectre V2 : WARNING: Unprivileged eBPF ... [ 7.196741] Boot ms 7175 ... [ 8.236757] Boot ms 8215 CentOS Stream 8 Kernel 6.2.0-rcu+ on an x86_64 login: [ 8.340751] Boot ms 8319 [ 8.444756] Boot ms 8423 ... Then the log "login: [ 8.340751] Boot ms 8319" roughly showed the OS boot time was ~8.3s. 4) Measured OS boot time (in seconds) a) Measured 10 times w/o Joel's patch: 8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s The average OS boot time was: ~8.7s b) Measure 10 times w/ Joel's patch: 8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s The average OS boot time was: ~8.3s. The OS boot time was reduced by : 8.7 – 8.3 = 0.4 second The reduction percentage was : 0.4/8.7 * 100% = 4.6% If the testing above makes sense to you, please feel free to add Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Thanks! -Qiuxu > I can change it to default 0 essentially NOOPing it, but I would rather have a > saner default (10 seconds even), than having someone forget to tune this for > their system. > > Thanks, > > - Joel
On Tue, Feb 28, 2023 at 01:30:25AM +0000, Joel Fernandes wrote: > On Tue, Feb 28, 2023 at 12:40:38AM +0100, Frederic Weisbecker wrote: > > On Mon, Feb 27, 2023 at 03:05:02PM -0800, Paul E. McKenney wrote: > > > On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote: > > > > > > The combination of sysfs manipulated by userspace and a kernel failsafe > > > makes sense to me. Especially if by default triggering the failsafe > > > splats. That way, bugs where userspace fails to update the sysfs file > > > get caught. > > > > > > The non-default silent-failsafe mode is also useful to allow some power > > > savings in advance of userspace getting the sysfs updating in place. > > > And of course the default splatting setup can be used in internal testing > > > with the release software being more tolerant of userspace foibles. > > > > I'm wondering, this is all about CONFIG_RCU_LAZY, right? Or does also expedited > > GP turned off a bit early or late on boot matter for anybody in practice? > > Yes, if you provide 'rcu_normal_after_boot', then after the boot ends, it > switches expedited GPs to normal ones. > > It is the same issue for expedited, the kernel's version of what is 'boot' is > much shorter than what is actually boot. > > This is also the case with suspend/resume's rcu_pm_notify(). See the comment: > /* > * On non-huge systems, use expedited RCU grace periods to make suspend > * and hibernation run faster. > */ > > There also we turn on/off both lazy and expedited. I don't see why we > shouldn't do it for boot. Of course but I mean currently rcu_end_inkernel_boot() is called explicitly before the kernel calls init. From that point on, what is the source of the issue? Delaying lazy further would be enough or do we really need to delay forcing expedited as well? Or is it the reverse: delaying expedited further would matter and lazy doesn't play much role from there. It matters to know because if delaying expedited further is enough, then indeed we must delay the call to rcu_end_inkernel_boot() somehow. But if delaying expedited further doesn't matter and delaying lazy matter then it's possible that the issue is a callback that should be marked as call_rcu_hurry() and then the source of the problem is much broader. I think the confusion comes from the fact that your changelog doesn't state precisely what the problem exactly is. Also do we need to wait for the kernel boot completion? And if so what is missing from kernel boot after the current explicit call to rcu_end_inkernel_boot()? Or do we also need to wait for userspace to complete the boot? Different problems, different solutions. But in any case a countdown is not a way to go. Consider that rcu_lazy may be used by a larger audience than just chromium in the long run. You can not ask every admin to provide his own estimation per type of machine. You can't either rely on a long default value because that may have bad impact on workload assumptions launched right after boot. > > > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then > > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready > > about it? We can still keep the current call to rcu_end_inkernel_boot(). > > Hmm IMHO that would add more knobs for not much reason honestly. We already > have CONFIG_RCU_LAZY default disabled, I really don't want to add more > dependency (like user enables the config and does not see laziness). I don't know. Like I said, different problems, different solutions. Let's identify what the issue is precisely. For example can we expect that the issues on boot can be a problem also on some temporary workloads? Besides I'm currently testing a very hacky flavour of rcu_lazy and so far it shows many idle calls that would have been delayed if callbacks weren't queued as lazy. I have yet to do actual energy and performance measurements but if it happens to show improvements, I suspect distros will want a supported yet default disabled Kconfig that can be turned on on boot or later. Of course we are not there yet but things to keep in mind... Thanks.
> On Mon, Feb 27, 2023 at 1:57 PM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Feb 27, 2023 at 01:27:20PM -0500, Joel Fernandes wrote: > > > > > > > > > > On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote: > > > >> > > > >> > > > >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > > >>> > > > >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote: > > > >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>>>> > > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote: > > > >>>>>> > > > >>>>>> > > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>> From: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM > > > >>>>>>>> To: linux-kernel@vger.kernel.org > > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>; Frederic Weisbecker > > > >>>>>>>> <frederic@kernel.org>; Lai Jiangshan <jiangshanlai@gmail.com>; linux- > > > >>>>>>>> doc@vger.kernel.org; Paul E. McKenney <paulmck@kernel.org>; > > > >>>>>>>> rcu@vger.kernel.org > > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > > >>>>>>>> completed > > > >>>>>>>> > > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the > > > >>>>>>>> boot has completed. It is difficult to determine if the system has really > > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing > > > >>>>>>>> down boot time if, say, a callback has been added that the boot > > > >>>>>>>> synchronously depends on. > > > >>>>>>>> > > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to > > > >>>>>>>> stay expedited for as long as the system is still booting. > > > >>>>>>>> > > > >>>>>>>> For these reasons, this commit adds a config option > > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter > > > >>>>>>>> rcupdate.boot_end_delay. > > > >>>>>>>> > > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value > > > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not > > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed. > > > >>>>>>> > > > >>>>>>> Hi Joel, > > > >>>>>>> > > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-). > > > >>>>>>> > > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the > > > >>>>>>> real-time latency than the overall OS boot time? > > > >>>>>> > > > >>>>>> But every system has to boot, even an RT system. > > > >>>>>> > > > >>>>>>> > > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > >>>>>> > > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs? > > > >>>>>> > > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system. > > > >>>>> > > > >>>>> Provide a /sys location that the userspace code writes to when it > > > >>>>> is ready? Different systems with different hardware and software > > > >>>>> configurations are going to take different amounts of time to boot, > > > >>>>> correct? > > > >>>> > > > >>>> I could add a sysfs node, but I still wanted this patch as well > > > >>>> because I am wary of systems where yet more userspace changes are > > > >>>> required. I feel the kernel should itself be able to do this. Yes, it > > > >>>> is possible the system completes "booting" at a different time than > > > >>>> what the kernel thinks. But it does that anyway (even without this > > > >>>> patch), so I am not seeing a good reason to not do this in the kernel. > > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too > > > >>>> long, then the patch will have no effect. > > > >>>> > > > >>>> Thoughts? > > > >>>> > > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init" > > > >>> process or shell or panic is going to be invoked by the kernel. It basically > > > >>> indicates that a kernel is fully functional. > > > >>> > > > >>> Or an idea to wait even further? Until all kernel modules are loaded by > > > >>> user space. > > > >> > > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so. > > > >> > > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing? > > > >> > > > > Than it is up to user space to decide when it is ready in terms of "boot completed". > > > > > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that. > > > > > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse. > > > > > I saw that Paul proposed to have a sysfs attribute using which you can > > send a notification. > > Maybe I am missing something but how will a sysfs node on its own work really? > > 1. delete kernel marking itself boot completed -- and then sysfs > marks it completed? > > 2. delete kernel marking itself boot completed -- and then sysfs > marks it completed, if sysfs does not come in in N seconds, then > kernel marks as completed? > > #1 is a no go, that just means a bug waiting to happen if userspace > forgets to write to sysfs. > > #2 is just an extension of this patch. So I can add a sysfs node on > top of this. And we can make the minimum time as a long period of > time, as you noted below: > > > IMHO, to me this patch does not provide a clear correlation between what > > is a boot complete and when it occurs. A boot complete is a synchronous > > event whereas the patch thinks that after some interval a "boot" is completed. > > But that is exactly how the kernel code is now without this patch, so > it is already broken in that sense, I am not really breaking it more > ;-) > > > We can imply that after, say 100 seconds an initialization of user space > > is done. Maybe 100 seconds then? :) > > Yes I am Ok with that. So are you suggesting we change the default to > 100 seconds and then add a sysfs node to mark as boot done whenever > userspace notifies? > As i see it correctly the patch tries to address at least two issues. Fist one is about lazy callbacks. If it stucks the boot-time is degraded so you want it to be disabled during the boot. I wonder why not keeping lazy stuff to be disabled by __default__. User decides if it is needed or not? No matter if user space hangs or whatever. Or is it hard to add such runtime switcher? Second you want to have an expidited gp during a boot. I guess same here it is about boot performance. Should not it be as a separate patch with different commit message and with some data if you see a speedup? Thanks! -- Uladzislau Rezki
On Tue, Feb 28, 2023 at 1:40 AM Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > [...] > > > If so, we might make rcupdate.boot_end_delay = 0 as the default value > > > (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels? > > > > Could you measure how much time your RT system takes to boot before the > > application runs? > > I don't have a real-time OS environment to measure the OS boot time. > I tried to measure the OS boot time of my "CentOS Stream 8" w/o and > w/ Joel’s patch. > > My testing showed the positive result that the OS boot time was > reduced by ~4.6% on my side after applying Joel’s patch. Wow, this is great! I am guessing you have CONFIG_RCU_LAZY disabled, when you tested. If so, that is great news that expediting RCU for a bit longer improves boot time! Please confirm that your config had LAZY disabled. > For testing details, please see the below: > > 1) Testing environment: > OS : CentOS Stream 8 (non-RT OS) > Kernel : v6.2 > Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads) > Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, … > > 2) My OS boot time definition: > The time from the start of the kernel boot to the shell command line > prompt is shown from the console. [ Different people may have > different OS boot time definitions. ] > > 3) My measurement method (very rough method): > A timer in the kernel periodically prints the boot time every 100ms. > As soon as the shell command line prompt is shown from the console, > we record the boot time printed by the timer, then the printed boot > time is the OS boot time. Hmm, Can you not just print the boot time from userspace using clock_gettime() and CLOCK_BOOTTIME? But yeah either way, good data! > The console log (mixed userspace and kernel logs) looked like this: > > [ OK ] Started Permit User Sessions. > Starting Terminate Plymouth Boot Screen... > Starting Hold until boot process finishes up... > [ OK ] Started Command Scheduler. > [ 6.824466] input: ImExPS/2 Generic Explorer ... > [ 6.884685] Boot ms 6863 > ... > [ 7.170920] Spectre V2 : WARNING: Unprivileged eBPF ... > [ 7.173140] Spectre V2 : WARNING: Unprivileged eBPF ... > [ 7.196741] Boot ms 7175 > ... > [ 8.236757] Boot ms 8215 > > CentOS Stream 8 > Kernel 6.2.0-rcu+ on an x86_64 > > login: [ 8.340751] Boot ms 8319 > [ 8.444756] Boot ms 8423 > ... > > Then the log "login: [ 8.340751] Boot ms 8319" roughly showed the OS boot time was ~8.3s. > > 4) Measured OS boot time (in seconds) > a) Measured 10 times w/o Joel's patch: > 8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s > The average OS boot time was: ~8.7s > > b) Measure 10 times w/ Joel's patch: > 8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s > The average OS boot time was: ~8.3s. > > The OS boot time was reduced by : 8.7 – 8.3 = 0.4 second > The reduction percentage was : 0.4/8.7 * 100% = 4.6% > > If the testing above makes sense to you, please feel free to > add > > Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Yes, it makes sense. I will add these. Thanks! - Joel > > Thanks! > -Qiuxu > > > I can change it to default 0 essentially NOOPing it, but I would rather have a > > saner default (10 seconds even), than having someone forget to tune this for > > their system. > > > > Thanks, > > > > - Joel >
Hi Frederic, On Tue, Feb 28, 2023 at 12:04:36PM +0100, Frederic Weisbecker wrote: > On Tue, Feb 28, 2023 at 01:30:25AM +0000, Joel Fernandes wrote: > > On Tue, Feb 28, 2023 at 12:40:38AM +0100, Frederic Weisbecker wrote: > > > On Mon, Feb 27, 2023 at 03:05:02PM -0800, Paul E. McKenney wrote: > > > > On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote: > > > > > > > > The combination of sysfs manipulated by userspace and a kernel failsafe > > > > makes sense to me. Especially if by default triggering the failsafe > > > > splats. That way, bugs where userspace fails to update the sysfs file > > > > get caught. > > > > > > > > The non-default silent-failsafe mode is also useful to allow some power > > > > savings in advance of userspace getting the sysfs updating in place. > > > > And of course the default splatting setup can be used in internal testing > > > > with the release software being more tolerant of userspace foibles. > > > > > > I'm wondering, this is all about CONFIG_RCU_LAZY, right? Or does also expedited > > > GP turned off a bit early or late on boot matter for anybody in practice? > > > > Yes, if you provide 'rcu_normal_after_boot', then after the boot ends, it > > switches expedited GPs to normal ones. > > > > It is the same issue for expedited, the kernel's version of what is 'boot' is > > much shorter than what is actually boot. > > > > This is also the case with suspend/resume's rcu_pm_notify(). See the comment: > > /* > > * On non-huge systems, use expedited RCU grace periods to make suspend > > * and hibernation run faster. > > */ > > > > There also we turn on/off both lazy and expedited. I don't see why we > > shouldn't do it for boot. > > Of course but I mean currently rcu_end_inkernel_boot() is called explicitly > before the kernel calls init. From that point on, what is the source of the > issue? Delaying lazy further would be enough or do we really need to delay > forcing expedited as well? Or is it the reverse: delaying expedited further > would matter and lazy doesn't play much role from there. Both should play a role. For lazy, we found callbacks that showed later in the full boot sequence (like the SCSI issue). For expedited, there is new data from Qiuxu showing 5% improvement in boot time. > It matters to know because if delaying expedited further is enough, then indeed > we must delay the call to rcu_end_inkernel_boot() somehow. But if delaying > expedited further doesn't matter and delaying lazy matter then it's possible > that the issue is a callback that should be marked as call_rcu_hurry() and then > the source of the problem is much broader. Right, and we also don't know if in the future, somebody queues a CB that slows down boot as well (say they queue a lazy CB that does a wakeup), even if currently there are not any such. As noted, that SCSI issue did show. Just to note, callbacks doing wakeups are supposed to call call_rcu_hurry(). > I think the confusion comes from the fact that your changelog doesn't state precisely > what the problem exactly is. Also do we need to wait for the kernel boot completion? > And if so what is missing from kernel boot after the current explicit call to > rcu_end_inkernel_boot()? Yes, sorry, it was more an RFC but still should have been more clear. For the v3 I'll definitely make it clear. rcu_end_inkernel_boot() is called before init is run. But the kernel cannot posibly know when init has finished running and say the system is now waiting for user login, or something. There's a considerable amount time from rcu_end_inkernel_boot() to when the system is actually "booted". That's the main issue. We could look at CPU load, but that's not ideal. Maybe wait for user input, but that sucks as well. > Or do we also need to wait for userspace to complete the boot? Different > problems, different solutions. > > But in any case a countdown is not a way to go. Consider that rcu_lazy may > be used by a larger audience than just chromium in the long run. You can not > ask every admin to provide his own estimation per type of machine. You can't > either rely on a long default value because that may have bad impact on > workload assumptions launched right after boot. Hmmm I see what you mean, so a conservative and configurable "fail-safe" timeout followed by sysctl to end the boot earlier than the timeout, should do it (something like 30 seconds IMHO sounds reasonable)? In any case, whatever way we go, we would not end the kernel boot before rcu_end_inkernel_boot() is called at least once (which is the current behavior). So it would be: low level boot + initcalls 20 sec 30 second timeout |------------------------------|-------------------------- | | old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() But it could be, if user decides: low level boot + initcalls 20 sec 10 second timeout |------------------------------|-------------------------- | | old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() via /sys/ entry. > > > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then > > > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready > > > about it? We can still keep the current call to rcu_end_inkernel_boot(). > > > > Hmm IMHO that would add more knobs for not much reason honestly. We already > > have CONFIG_RCU_LAZY default disabled, I really don't want to add more > > dependency (like user enables the config and does not see laziness). > > I don't know. Like I said, different problems, different solutions. Let's > identify what the issue is precisely. For example can we expect that the issues > on boot can be a problem also on some temporary workloads? > > Besides I'm currently testing a very hacky flavour of rcu_lazy and so far it > shows many idle calls that would have been delayed if callbacks weren't queued > as lazy. Can you provide more details? What kind of hack flavor, and what is it doing? thanks, - Joel > I have yet to do actual energy and performance measurements but if it > happens to show improvements, I suspect distros will want a supported yet > default disabled Kconfig that can be turned on on boot or later. Of course we > are not there yet but things to keep in mind... > > Thanks.
> From: Joel Fernandes <joel@joelfernandes.org> > Sent: Tuesday, February 28, 2023 10:27 PM > To: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > Cc: linux-kernel@vger.kernel.org; Frederic Weisbecker <frederic@kernel.org>; > Lai Jiangshan <jiangshanlai@gmail.com>; linux-doc@vger.kernel.org; Paul E. > McKenney <paulmck@kernel.org>; rcu@vger.kernel.org > Subject: Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > completed > > On Tue, Feb 28, 2023 at 1:40 AM Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > [...] > > My testing showed the positive result that the OS boot time was > > reduced by ~4.6% on my side after applying Joel’s patch. > > Wow, this is great! I am guessing you have CONFIG_RCU_LAZY disabled, > when you tested. If so, that is great news that expediting RCU for a bit longer > improves boot time! Please confirm that your config had LAZY disabled. > I confirm that CONFIG_RCU_LAZY kernel configuration was disabled in my testing. All the kernel configurations for RCU used in my testing are as below: # # RCU Subsystem # CONFIG_TREE_RCU=y CONFIG_PREEMPT_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU_GENERIC=y # CONFIG_FORCE_TASKS_RCU is not set CONFIG_TASKS_RCU=y # CONFIG_FORCE_TASKS_RUDE_RCU is not set # CONFIG_FORCE_TASKS_TRACE_RCU is not set CONFIG_TASKS_TRACE_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_BOOST is not set CONFIG_RCU_BOOT_END_DELAY=20000 CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set # CONFIG_TASKS_TRACE_RCU_READ_MB is not set # CONFIG_RCU_LAZY is not set # end of RCU Subsystem -Qiuxu
On Tue, Feb 28, 2023 at 8:34 PM Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > From: Joel Fernandes <joel@joelfernandes.org> > > Sent: Tuesday, February 28, 2023 10:27 PM > > To: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > > Cc: linux-kernel@vger.kernel.org; Frederic Weisbecker <frederic@kernel.org>; > > Lai Jiangshan <jiangshanlai@gmail.com>; linux-doc@vger.kernel.org; Paul E. > > McKenney <paulmck@kernel.org>; rcu@vger.kernel.org > > Subject: Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as > > completed > > > > On Tue, Feb 28, 2023 at 1:40 AM Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote: > > > > > [...] > > > My testing showed the positive result that the OS boot time was > > > reduced by ~4.6% on my side after applying Joel’s patch. > > > > Wow, this is great! I am guessing you have CONFIG_RCU_LAZY disabled, > > when you tested. If so, that is great news that expediting RCU for a bit longer > > improves boot time! Please confirm that your config had LAZY disabled. > > > > I confirm that CONFIG_RCU_LAZY kernel configuration was disabled in my testing. > All the kernel configurations for RCU used in my testing are as below: Thank you! Good to know there is improvement in *existing code*, that's the power of upstream... new features fix/improve old ones sometimes. Any case, I will include all your data in v3. Thanks! - Joel > > # > # RCU Subsystem > # > CONFIG_TREE_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_RCU_EXPERT=y > CONFIG_SRCU=y > CONFIG_TREE_SRCU=y > CONFIG_TASKS_RCU_GENERIC=y > # CONFIG_FORCE_TASKS_RCU is not set > CONFIG_TASKS_RCU=y > # CONFIG_FORCE_TASKS_RUDE_RCU is not set > # CONFIG_FORCE_TASKS_TRACE_RCU is not set > CONFIG_TASKS_TRACE_RCU=y > CONFIG_RCU_STALL_COMMON=y > CONFIG_RCU_NEED_SEGCBLIST=y > CONFIG_RCU_FANOUT=32 > CONFIG_RCU_FANOUT_LEAF=16 > # CONFIG_RCU_BOOST is not set > CONFIG_RCU_BOOT_END_DELAY=20000 > CONFIG_RCU_NOCB_CPU=y > # CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set > # CONFIG_TASKS_TRACE_RCU_READ_MB is not set > # CONFIG_RCU_LAZY is not set > # end of RCU Subsystem > > -Qiuxu
On Tue, Feb 28, 2023 at 08:09:02PM +0000, Joel Fernandes wrote: > Hi Frederic, > > On Tue, Feb 28, 2023 at 12:04:36PM +0100, Frederic Weisbecker wrote: > Both should play a role. For lazy, we found callbacks that showed later in > the full boot sequence (like the SCSI issue). > > For expedited, there is new data from Qiuxu showing 5% improvement in boot > time. Indeed, nice one! > Right, and we also don't know if in the future, somebody queues a CB that > slows down boot as well (say they queue a lazy CB that does a wakeup), even > if currently there are not any such. As noted, that SCSI issue did show. Just > to note, callbacks doing wakeups are supposed to call call_rcu_hurry(). Sure, ideally we should have a way to debug lazy callbacks involved in wait/wake dependency against the enqueuer but that doesn't sound easy to do... > Yes, sorry, it was more an RFC but still should have been more clear. For the > v3 I'll definitely make it clear. > > rcu_end_inkernel_boot() is called before init is run. But the kernel cannot > posibly know when init has finished running and say the system is now waiting > for user login, or something. There's a considerable amount time from > rcu_end_inkernel_boot() to when the system is actually "booted". That's the > main issue. We could look at CPU load, but that's not ideal. Maybe wait for > user input, but that sucks as well. So it looks like user boot is involved in what you want to wait for. In that case only userspace can tell. > Hmmm I see what you mean, so a conservative and configurable "fail-safe" > timeout followed by sysctl to end the boot earlier than the timeout, should > do it (something like 30 seconds IMHO sounds reasonable)? In any case, > whatever way we go, we would not end the kernel boot before > rcu_end_inkernel_boot() is called at least once (which is the current > behavior). > > So it would be: > > low level boot + initcalls > 20 sec 30 second timeout > |------------------------------|-------------------------- > | | > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > But it could be, if user decides: > low level boot + initcalls > 20 sec 10 second timeout > |------------------------------|-------------------------- > | | > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > via /sys/ entry. The problem I have with a random default timeout is that it may break sensitive workloads. If the default is 30 and say the boot only takes 5 seconds and immediately launches a latency sensitive task, this may break things in a subtle way during these 25 seconds when it usually didn't. Because expedited RCU is a hammer interrupting all non-idle CPUs. Until now forcing expedited RCU was only performed before any user code. Now it crosses the boundary so better be careful. I'd personally advocate for keeping the current call before init is launched. Then provide an end_boot_sysctl kernel boot parameter that will ignore the current call before init and let the user signal that through sysctl. > > > > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then > > > > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready > > > > about it? We can still keep the current call to rcu_end_inkernel_boot(). > > > > > > Hmm IMHO that would add more knobs for not much reason honestly. We already > > > have CONFIG_RCU_LAZY default disabled, I really don't want to add more > > > dependency (like user enables the config and does not see laziness). > > > > I don't know. Like I said, different problems, different solutions. Let's > > identify what the issue is precisely. For example can we expect that the issues > > on boot can be a problem also on some temporary workloads? > > > > Besides I'm currently testing a very hacky flavour of rcu_lazy and so far it > > shows many idle calls that would have been delayed if callbacks weren't queued > > as lazy. > > Can you provide more details? What kind of hack flavor, and what is it doing? Sorry, I meant a hacky implementation of lazy to work with !NOCB. Thanks.
On Wed, Mar 1, 2023 at 12:11 PM Frederic Weisbecker <frederic@kernel.org> wrote: [...] > > Hmmm I see what you mean, so a conservative and configurable "fail-safe" > > timeout followed by sysctl to end the boot earlier than the timeout, should > > do it (something like 30 seconds IMHO sounds reasonable)? In any case, > > whatever way we go, we would not end the kernel boot before > > rcu_end_inkernel_boot() is called at least once (which is the current > > behavior). > > > > So it would be: > > > > low level boot + initcalls > > 20 sec 30 second timeout > > |------------------------------|-------------------------- > > | | > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > But it could be, if user decides: > > low level boot + initcalls > > 20 sec 10 second timeout > > |------------------------------|-------------------------- > > | | > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > via /sys/ entry. > > The problem I have with a random default timeout is that it may break sensitive > workloads. If the default is 30 and say the boot only takes 5 seconds and > immediately launches a latency sensitive task, this may break things in a > subtle way during these 25 seconds when it usually didn't. Because expedited > RCU is a hammer interrupting all non-idle CPUs. > > Until now forcing expedited RCU was only performed before any user code. Now it > crosses the boundary so better be careful. I'd personally advocate for keeping > the current call before init is launched. Then provide an end_boot_sysctl kernel > boot parameter that will ignore the current call before init and let the user > signal that through sysctl. Considering that the PREEMPT-RT system benefits from it within the 8 seconds, I will go ahead make the default 15 seconds or so and make it tunable. Hopefully that will be an acceptable compromise, with sufficient documentation, changelog, and so forth... If you agree I'd appreciate your Ack on the next posting. > > > > > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then > > > > > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready > > > > > about it? We can still keep the current call to rcu_end_inkernel_boot(). > > > > > > > > Hmm IMHO that would add more knobs for not much reason honestly. We already > > > > have CONFIG_RCU_LAZY default disabled, I really don't want to add more > > > > dependency (like user enables the config and does not see laziness). > > > > > > I don't know. Like I said, different problems, different solutions. Let's > > > identify what the issue is precisely. For example can we expect that the issues > > > on boot can be a problem also on some temporary workloads? > > > > > > Besides I'm currently testing a very hacky flavour of rcu_lazy and so far it > > > shows many idle calls that would have been delayed if callbacks weren't queued > > > as lazy. > > > > Can you provide more details? What kind of hack flavor, and what is it doing? > > Sorry, I meant a hacky implementation of lazy to work with !NOCB. Interesting, I'm curious which calls those are and if they should also be call_rcu_hurry(). - Joel
On Wed, Mar 01, 2023 at 04:31:01PM -0500, Joel Fernandes wrote: > On Wed, Mar 1, 2023 at 12:11 PM Frederic Weisbecker <frederic@kernel.org> wrote: > [...] > > > Hmmm I see what you mean, so a conservative and configurable "fail-safe" > > > timeout followed by sysctl to end the boot earlier than the timeout, should > > > do it (something like 30 seconds IMHO sounds reasonable)? In any case, > > > whatever way we go, we would not end the kernel boot before > > > rcu_end_inkernel_boot() is called at least once (which is the current > > > behavior). > > > > > > So it would be: > > > > > > low level boot + initcalls > > > 20 sec 30 second timeout > > > |------------------------------|-------------------------- > > > | | > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > > > But it could be, if user decides: > > > low level boot + initcalls > > > 20 sec 10 second timeout > > > |------------------------------|-------------------------- > > > | | > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > via /sys/ entry. > > > > The problem I have with a random default timeout is that it may break sensitive > > workloads. If the default is 30 and say the boot only takes 5 seconds and > > immediately launches a latency sensitive task, this may break things in a > > subtle way during these 25 seconds when it usually didn't. Because expedited > > RCU is a hammer interrupting all non-idle CPUs. > > > > Until now forcing expedited RCU was only performed before any user code. Now it > > crosses the boundary so better be careful. I'd personally advocate for keeping > > the current call before init is launched. Then provide an end_boot_sysctl kernel > > boot parameter that will ignore the current call before init and let the user > > signal that through sysctl. > > Considering that the PREEMPT-RT system benefits from it within the 8 > seconds, I will go ahead make the default 15 seconds or so and make it > tunable. Hopefully that will be an acceptable compromise, with > sufficient documentation, changelog, and so forth... If you agree I'd > appreciate your Ack on the next posting. Just checking on the sysfs portion of this. After all, tuning kernel boot parameters to specific systems is not all that much fun, especially when you have lots of systems. Thanx, Paul
On Wed, Mar 1, 2023 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Mar 01, 2023 at 04:31:01PM -0500, Joel Fernandes wrote: > > On Wed, Mar 1, 2023 at 12:11 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > [...] > > > > Hmmm I see what you mean, so a conservative and configurable "fail-safe" > > > > timeout followed by sysctl to end the boot earlier than the timeout, should > > > > do it (something like 30 seconds IMHO sounds reasonable)? In any case, > > > > whatever way we go, we would not end the kernel boot before > > > > rcu_end_inkernel_boot() is called at least once (which is the current > > > > behavior). > > > > > > > > So it would be: > > > > > > > > low level boot + initcalls > > > > 20 sec 30 second timeout > > > > |------------------------------|-------------------------- > > > > | | > > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > > > > > But it could be, if user decides: > > > > low level boot + initcalls > > > > 20 sec 10 second timeout > > > > |------------------------------|-------------------------- > > > > | | > > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > via /sys/ entry. > > > > > > The problem I have with a random default timeout is that it may break sensitive > > > workloads. If the default is 30 and say the boot only takes 5 seconds and > > > immediately launches a latency sensitive task, this may break things in a > > > subtle way during these 25 seconds when it usually didn't. Because expedited > > > RCU is a hammer interrupting all non-idle CPUs. > > > > > > Until now forcing expedited RCU was only performed before any user code. Now it > > > crosses the boundary so better be careful. I'd personally advocate for keeping > > > the current call before init is launched. Then provide an end_boot_sysctl kernel > > > boot parameter that will ignore the current call before init and let the user > > > signal that through sysctl. > > > > Considering that the PREEMPT-RT system benefits from it within the 8 > > seconds, I will go ahead make the default 15 seconds or so and make it > > tunable. Hopefully that will be an acceptable compromise, with > > sufficient documentation, changelog, and so forth... If you agree I'd > > appreciate your Ack on the next posting. > > Just checking on the sysfs portion of this. Yes, the current plan is to add a sysfs node (likely sysctl) with the 15-second failsafe. > After all, tuning kernel > boot parameters to specific systems is not all that much fun, especially > when you have lots of systems. Are you suggesting to drop "end the boot" sysfs and just have a minimum-time approach as I initially did? - Joel
On Wed, Mar 01, 2023 at 08:08:54PM -0500, Joel Fernandes wrote: > On Wed, Mar 1, 2023 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Mar 01, 2023 at 04:31:01PM -0500, Joel Fernandes wrote: > > > On Wed, Mar 1, 2023 at 12:11 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > [...] > > > > > Hmmm I see what you mean, so a conservative and configurable "fail-safe" > > > > > timeout followed by sysctl to end the boot earlier than the timeout, should > > > > > do it (something like 30 seconds IMHO sounds reasonable)? In any case, > > > > > whatever way we go, we would not end the kernel boot before > > > > > rcu_end_inkernel_boot() is called at least once (which is the current > > > > > behavior). > > > > > > > > > > So it would be: > > > > > > > > > > low level boot + initcalls > > > > > 20 sec 30 second timeout > > > > > |------------------------------|-------------------------- > > > > > | | > > > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > > > > > > > But it could be, if user decides: > > > > > low level boot + initcalls > > > > > 20 sec 10 second timeout > > > > > |------------------------------|-------------------------- > > > > > | | > > > > > old rcu_end_inkernel_boot() new rcu_end_inkernel_boot() > > > > > via /sys/ entry. > > > > > > > > The problem I have with a random default timeout is that it may break sensitive > > > > workloads. If the default is 30 and say the boot only takes 5 seconds and > > > > immediately launches a latency sensitive task, this may break things in a > > > > subtle way during these 25 seconds when it usually didn't. Because expedited > > > > RCU is a hammer interrupting all non-idle CPUs. > > > > > > > > Until now forcing expedited RCU was only performed before any user code. Now it > > > > crosses the boundary so better be careful. I'd personally advocate for keeping > > > > the current call before init is launched. Then provide an end_boot_sysctl kernel > > > > boot parameter that will ignore the current call before init and let the user > > > > signal that through sysctl. > > > > > > Considering that the PREEMPT-RT system benefits from it within the 8 > > > seconds, I will go ahead make the default 15 seconds or so and make it > > > tunable. Hopefully that will be an acceptable compromise, with > > > sufficient documentation, changelog, and so forth... If you agree I'd > > > appreciate your Ack on the next posting. > > > > Just checking on the sysfs portion of this. > > Yes, the current plan is to add a sysfs node (likely sysctl) with the > 15-second failsafe. Whew!!! Very good, then! > > After all, tuning kernel > > boot parameters to specific systems is not all that much fun, especially > > when you have lots of systems. > > Are you suggesting to drop "end the boot" sysfs and just have a > minimum-time approach as I initially did? Not at all. I was just concerned that the sysfs was getting lost in the shuffle. All good now, thank you! Thanx, Paul
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2429b5e3184b..0943139fdf01 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5085,6 +5085,10 @@ rcutorture.verbose= [KNL] Enable additional printk() statements. + rcupdate.boot_end_delay= [KNL] + Minimum time that must elapse before the boot + sequence can be marked as completed. + rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] Dump ftrace buffer after reporting RCU CPU stall warning. diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index 9071182b1284..0d77006a948f 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -217,6 +217,18 @@ config RCU_BOOST_DELAY Accept the default if unsure. +config RCU_BOOT_END_DELAY + int "Minimum time before RCU may consider in-kernel boot as completed" + range 0 120000 + default 20000 + help + This option specifies the minimum time in milliseconds from boot start + before RCU believes boot has completed. The actual delay can be + higher than this if the kernel takes a long time to initialize + but it will never be smaller than this. + + Accept the default if unsure. + config RCU_EXP_KTHREAD bool "Perform RCU expedited work in a real-time kthread" depends on RCU_BOOST && RCU_EXPERT diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 19bf6fa3ee6a..cbdad7b46841 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -62,6 +62,10 @@ module_param(rcu_normal_after_boot, int, 0444); #endif #endif /* #ifndef CONFIG_TINY_RCU */ +/* Minimum time in ms until RCU can consider in-kernel boot as completed. */ +static int boot_end_delay = CONFIG_RCU_BOOT_END_DELAY; +module_param(boot_end_delay, int, 0444); + #ifdef CONFIG_DEBUG_LOCK_ALLOC /** * rcu_read_lock_held_common() - might we be in RCU-sched read-side critical section? @@ -225,12 +229,29 @@ void rcu_unexpedite_gp(void) EXPORT_SYMBOL_GPL(rcu_unexpedite_gp); static bool rcu_boot_ended __read_mostly; - /* - * Inform RCU of the end of the in-kernel boot sequence. + * Inform RCU of the end of the in-kernel boot sequence. The boot sequence will + * not be marked ended until at least boot_end_delay milliseconds have passed. */ +void rcu_end_inkernel_boot(void); +static void boot_rcu_work_fn(struct work_struct *work) +{ + rcu_end_inkernel_boot(); +} +static DECLARE_DELAYED_WORK(boot_rcu_work, boot_rcu_work_fn); + void rcu_end_inkernel_boot(void) { + if (boot_end_delay) { + u64 boot_ms = ktime_get_boot_fast_ns() / 1000000UL; + + if (boot_ms < boot_end_delay) { + schedule_delayed_work(&boot_rcu_work, + boot_end_delay - boot_ms); + return; + } + } + rcu_unexpedite_gp(); rcu_async_relax(); if (rcu_normal_after_boot)
On many systems, a great deal of boot happens after the kernel thinks the boot has completed. It is difficult to determine if the system has really booted from the kernel side. Some features like lazy-RCU can risk slowing down boot time if, say, a callback has been added that the boot synchronously depends on. Further, it is better to boot systems which pass 'rcu_normal_after_boot' to stay expedited for as long as the system is still booting. For these reasons, this commit adds a config option 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay. By default, this value is 20s. A system designer can choose to specify a value here to keep RCU from marking boot completion. The boot sequence will not be marked ended until at least boot_end_delay milliseconds have passed. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- v1->v2: Update some comments and description. .../admin-guide/kernel-parameters.txt | 4 +++ cc_list | 7 ++++++ kernel/rcu/Kconfig | 12 +++++++++ kernel/rcu/update.c | 25 +++++++++++++++++-- 4 files changed, 46 insertions(+), 2 deletions(-) create mode 100644 cc_list