Message ID | 20250128000737.1264218-1-joel@joelfernandes.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() | expand |
On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) <joel@joelfernandes.org> wrote: > > The rcu_seq_done() API has a large "false-negative" windows of size > ULONG_MAX/2, where after wrap around, it is possible that it will think > that a GP has not completed if a wrap around happens and the delta is > large. > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > by making the window of false-negativity by only 3 GPs. Use this logic > for rcu_seq_done() which is a nice negative code delta and could > potentially avoid issues in the future where rcu_seq_done() was > reporting false-negatives for too long. > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > was done of all users to convince the change would work. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> I am leaving a 60 minute overnight run of all scenarios on my personal server for further testing. thanks, - Joel > --- > kernel/rcu/rcu.h | 13 ++----------- > kernel/rcu/tree.c | 6 +++--- > 2 files changed, 5 insertions(+), 14 deletions(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index eed2951a4962..c2ca196907cb 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > /* > * Given a snapshot from rcu_seq_snap(), determine whether or not a > - * full update-side operation has occurred. > + * full update-side operation has occurred while also handling > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > */ > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > -{ > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > -} > - > -/* > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > - * full update-side operation has occurred, but do not allow the > - * (ULONG_MAX / 2) safety-factor/guard-band. > - */ > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > { > unsigned long cur_s = READ_ONCE(*sp); > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b77ccc55557b..835600cec9ba 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > bool poll_state_synchronize_rcu(unsigned long oldstate) > { > if (oldstate == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > return true; > } > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > return true; > } > -- > 2.34.1 >
On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > <joel@joelfernandes.org> wrote: > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > that a GP has not completed if a wrap around happens and the delta is > > large. > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > by making the window of false-negativity by only 3 GPs. Use this logic > > for rcu_seq_done() which is a nice negative code delta and could > > potentially avoid issues in the future where rcu_seq_done() was > > reporting false-negatives for too long. > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > was done of all users to convince the change would work. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > I am leaving a 60 minute overnight run of all scenarios on my personal > server for further testing. The run passed, details below: --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: Results directory: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 thanks, - Joel > > thanks, > > - Joel > > > > > > --- > > kernel/rcu/rcu.h | 13 ++----------- > > kernel/rcu/tree.c | 6 +++--- > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index eed2951a4962..c2ca196907cb 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > /* > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > - * full update-side operation has occurred. > > + * full update-side operation has occurred while also handling > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > */ > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > -{ > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > -} > > - > > -/* > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > - * full update-side operation has occurred, but do not allow the > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > - */ > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > { > > unsigned long cur_s = READ_ONCE(*sp); > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b77ccc55557b..835600cec9ba 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > { > > if (oldstate == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > return true; > > } > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > return true; > > } > > -- > > 2.34.1 > >
On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > <joel@joelfernandes.org> wrote: > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > that a GP has not completed if a wrap around happens and the delta is > > > large. > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > for rcu_seq_done() which is a nice negative code delta and could > > > potentially avoid issues in the future where rcu_seq_done() was > > > reporting false-negatives for too long. > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > was done of all users to convince the change would work. > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > server for further testing. > > The run passed, details below: > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > Results directory: > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 Very good! How would you go about analyzing whether this is really safe vs. getting just getting lucky and not having provoked an overflow? Thanx, Paul > thanks, > > - Joel > > > > > > thanks, > > > > - Joel > > > > > > > > > > > --- > > > kernel/rcu/rcu.h | 13 ++----------- > > > kernel/rcu/tree.c | 6 +++--- > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index eed2951a4962..c2ca196907cb 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > /* > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > - * full update-side operation has occurred. > > > + * full update-side operation has occurred while also handling > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > */ > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > -{ > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > -} > > > - > > > -/* > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > - * full update-side operation has occurred, but do not allow the > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > - */ > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > { > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index b77ccc55557b..835600cec9ba 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > { > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > return true; > > > } > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > return true; > > > } > > > -- > > > 2.34.1 > > >
On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > <joel@joelfernandes.org> wrote: > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > that a GP has not completed if a wrap around happens and the delta is > > > > large. > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > reporting false-negatives for too long. > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > was done of all users to convince the change would work. > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > server for further testing. > > > > The run passed, details below: > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > Results directory: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > Very good! > > How would you go about analyzing whether this is really safe vs. getting > just getting lucky and not having provoked an overflow? I would probably add a more specific test case stressing the API, or even a unit test of just the API by passing a range of sequences.. I should go ahead and do that but it sounds like you feel there is an issue with the patch? :) thanks, - Joel > > Thanx, Paul > > > thanks, > > > > - Joel > > > > > > > > > > thanks, > > > > > > - Joel > > > > > > > > > > > > > > > > --- > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > kernel/rcu/tree.c | 6 +++--- > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > index eed2951a4962..c2ca196907cb 100644 > > > > --- a/kernel/rcu/rcu.h > > > > +++ b/kernel/rcu/rcu.h > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > /* > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > - * full update-side operation has occurred. > > > > + * full update-side operation has occurred while also handling > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > */ > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > -{ > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > -} > > > > - > > > > -/* > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > - * full update-side operation has occurred, but do not allow the > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > - */ > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > { > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index b77ccc55557b..835600cec9ba 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > { > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > return true; > > > > } > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > return true; > > > > } > > > > -- > > > > 2.34.1 > > > >
On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > large. > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > reporting false-negatives for too long. > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > was done of all users to convince the change would work. > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > server for further testing. > > > > > > The run passed, details below: > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > Results directory: > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > Very good! > > > > How would you go about analyzing whether this is really safe vs. getting > > just getting lucky and not having provoked an overflow? > > I would probably add a more specific test case stressing the API, or > even a unit test of just the API by passing a range of sequences.. I > should go ahead and do that but it sounds like you feel there is an > issue with the patch? :) 2^31 (let alone 2^63) is a very large number of grace periods, and so it is hard to test grace-period sequence-number wrap. Not impossible, though... Thanx, Paul > thanks, > > - Joel > > > > > > Thanx, Paul > > > > > thanks, > > > > > > - Joel > > > > > > > > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > --- > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > --- a/kernel/rcu/rcu.h > > > > > +++ b/kernel/rcu/rcu.h > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > /* > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > - * full update-side operation has occurred. > > > > > + * full update-side operation has occurred while also handling > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > */ > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > -{ > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > -} > > > > > - > > > > > -/* > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > - * full update-side operation has occurred, but do not allow the > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > - */ > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > { > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > { > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > return true; > > > > > } > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > return true; > > > > > } > > > > > -- > > > > > 2.34.1 > > > > >
On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > large. > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > server for further testing. > > > > > > > > The run passed, details below: > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > Results directory: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > Very good! > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > just getting lucky and not having provoked an overflow? > > > > I would probably add a more specific test case stressing the API, or > > even a unit test of just the API by passing a range of sequences.. I > > should go ahead and do that but it sounds like you feel there is an > > issue with the patch? :) > > 2^31 (let alone 2^63) is a very large number of grace periods, and > so it is hard to test grace-period sequence-number wrap. > > Not impossible, though... I realized with some simple tests this can break because of some ambiguity around whether a wraparound really happened: say *sp is 100, and s is 200. This could either be a full 32/64-bit wrap around, or it could be that *sp is not yet caught up. I guess there is no way to distinguish between the two cases. For this example, ULONG_CMP_GE will be false and ULONG_CMP_LT will be true making the new rcu_seq_done() version to be true. But if *sp is not yet caught up, that means the new rcu_seq_done() should be false, not true! Sigh, I look like a fool now. But I am also wondering what makes rcu_seq_done_exact() correct then - because it cannot really tell when a wrap around actually occurred. Hmmm... Confused, - Joel > > Thanx, Paul > > > thanks, > > > > - Joel > > > > > > > > > > Thanx, Paul > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > --- a/kernel/rcu/rcu.h > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > /* > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred. > > > > > > + * full update-side operation has occurred while also handling > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > */ > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > -{ > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > -} > > > > > > - > > > > > > -/* > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > - */ > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > { > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > { > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > -- > > > > > > 2.34.1 > > > > > >
On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > large. > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > server for further testing. > > > > > > > > The run passed, details below: > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > Results directory: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > Very good! > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > just getting lucky and not having provoked an overflow? > > > > I would probably add a more specific test case stressing the API, or > > even a unit test of just the API by passing a range of sequences.. I > > should go ahead and do that but it sounds like you feel there is an > > issue with the patch? :) > > 2^31 (let alone 2^63) is a very large number of grace periods, and > so it is hard to test grace-period sequence-number wrap. > > Not impossible, though... > We could test a decent number of candidate sequences to cover different cases. Not ideal like bruteforcing, but... Another idea is to hardcode/assume ULONG_MAX as 16-bit in a unit test. thanks, - Joel > > > > > > > > Thanx, Paul > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > --- a/kernel/rcu/rcu.h > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > /* > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred. > > > > > > + * full update-side operation has occurred while also handling > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > */ > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > -{ > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > -} > > > > > > - > > > > > > -/* > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > - */ > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > { > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > { > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > -- > > > > > > 2.34.1 > > > > > >
On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > > large. > > > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > > server for further testing. > > > > > > > > > > The run passed, details below: > > > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > > Results directory: > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > > > Very good! > > > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > > just getting lucky and not having provoked an overflow? > > > > > > I would probably add a more specific test case stressing the API, or > > > even a unit test of just the API by passing a range of sequences.. I > > > should go ahead and do that but it sounds like you feel there is an > > > issue with the patch? :) > > > > 2^31 (let alone 2^63) is a very large number of grace periods, and > > so it is hard to test grace-period sequence-number wrap. > > > > Not impossible, though... > > We could test a decent number of candidate sequences to cover > different cases. Not ideal like bruteforcing, but... Another idea is > to hardcode/assume ULONG_MAX as 16-bit in a unit test. Or put the various sequence numbers into an unsigned short or even an unsigned char. One set of use cases checks to see if a given CPU's ->gp_seq has fallen too far behind the current grace period, and sets a flag to alert that CPU. Others rely on a false negative being functionally OK. Or so I believe. ;-) Thanx, Paul > thanks, > > - Joel > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > > > /* > > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > > - * full update-side operation has occurred. > > > > > > > + * full update-side operation has occurred while also handling > > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > > */ > > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > > -{ > > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > > -} > > > > > > > - > > > > > > > -/* > > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > > - */ > > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > > { > > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > > { > > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > > return true; > > > > > > > } > > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > > return true; > > > > > > > } > > > > > > > -- > > > > > > > 2.34.1 > > > > > > >
> On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>> >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>>>> >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) >>>>>>> <joel@joelfernandes.org> wrote: >>>>>>>> >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think >>>>>>>> that a GP has not completed if a wrap around happens and the delta is >>>>>>>> large. >>>>>>>> >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was >>>>>>>> reporting false-negatives for too long. >>>>>>>> >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection >>>>>>>> was done of all users to convince the change would work. >>>>>>>> >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>>> >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal >>>>>>> server for further testing. >>>>>> >>>>>> The run passed, details below: >>>>>> >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: >>>>>> Results directory: >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 >>>>> >>>>> Very good! >>>>> >>>>> How would you go about analyzing whether this is really safe vs. getting >>>>> just getting lucky and not having provoked an overflow? >>>> >>>> I would probably add a more specific test case stressing the API, or >>>> even a unit test of just the API by passing a range of sequences.. I >>>> should go ahead and do that but it sounds like you feel there is an >>>> issue with the patch? :) >>> >>> 2^31 (let alone 2^63) is a very large number of grace periods, and >>> so it is hard to test grace-period sequence-number wrap. >>> >>> Not impossible, though... >> >> We could test a decent number of candidate sequences to cover >> different cases. Not ideal like bruteforcing, but... Another idea is >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > Or put the various sequence numbers into an unsigned short or even > an unsigned char. > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > too far behind the current grace period, and sets a flag to alert > that CPU. Others rely on a false negative being functionally OK. > > Or so I believe. ;-) Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ Thanks, - Joel > > Thanx, Paul > >> thanks, >> >> - Joel >> >> >> >>>> >>>>> >>>>> Thanx, Paul >>>>> >>>>>> thanks, >>>>>> >>>>>> - Joel >>>>>> >>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> - Joel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> --- >>>>>>>> kernel/rcu/rcu.h | 13 ++----------- >>>>>>>> kernel/rcu/tree.c | 6 +++--- >>>>>>>> 2 files changed, 5 insertions(+), 14 deletions(-) >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h >>>>>>>> index eed2951a4962..c2ca196907cb 100644 >>>>>>>> --- a/kernel/rcu/rcu.h >>>>>>>> +++ b/kernel/rcu/rcu.h >>>>>>>> @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) >>>>>>>> >>>>>>>> /* >>>>>>>> * Given a snapshot from rcu_seq_snap(), determine whether or not a >>>>>>>> - * full update-side operation has occurred. >>>>>>>> + * full update-side operation has occurred while also handling >>>>>>>> + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. >>>>>>>> */ >>>>>>>> static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) >>>>>>>> -{ >>>>>>>> - return ULONG_CMP_GE(READ_ONCE(*sp), s); >>>>>>>> -} >>>>>>>> - >>>>>>>> -/* >>>>>>>> - * Given a snapshot from rcu_seq_snap(), determine whether or not a >>>>>>>> - * full update-side operation has occurred, but do not allow the >>>>>>>> - * (ULONG_MAX / 2) safety-factor/guard-band. >>>>>>>> - */ >>>>>>>> -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) >>>>>>>> { >>>>>>>> unsigned long cur_s = READ_ONCE(*sp); >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>>>>>>> index b77ccc55557b..835600cec9ba 100644 >>>>>>>> --- a/kernel/rcu/tree.c >>>>>>>> +++ b/kernel/rcu/tree.c >>>>>>>> @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); >>>>>>>> bool poll_state_synchronize_rcu(unsigned long oldstate) >>>>>>>> { >>>>>>>> if (oldstate == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { >>>>>>>> + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ >>>>>>>> return true; >>>>>>>> } >>>>>>>> @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) >>>>>>>> >>>>>>>> smp_mb(); // Order against root rcu_node structure grace-period cleanup. >>>>>>>> if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || >>>>>>>> + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || >>>>>>>> rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { >>>>>>>> + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ >>>>>>>> return true; >>>>>>>> } >>>>>>>> -- >>>>>>>> 2.34.1 >>>>>>>>
On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >>> > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >>>>> > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > >>>>>>> <joel@joelfernandes.org> wrote: > >>>>>>>> > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > >>>>>>>> large. > >>>>>>>> > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > >>>>>>>> reporting false-negatives for too long. > >>>>>>>> > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > >>>>>>>> was done of all users to convince the change would work. > >>>>>>>> > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>>>> > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > >>>>>>> server for further testing. > >>>>>> > >>>>>> The run passed, details below: > >>>>>> > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > >>>>>> Results directory: > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > >>>>> > >>>>> Very good! > >>>>> > >>>>> How would you go about analyzing whether this is really safe vs. getting > >>>>> just getting lucky and not having provoked an overflow? > >>>> > >>>> I would probably add a more specific test case stressing the API, or > >>>> even a unit test of just the API by passing a range of sequences.. I > >>>> should go ahead and do that but it sounds like you feel there is an > >>>> issue with the patch? :) > >>> > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > >>> so it is hard to test grace-period sequence-number wrap. > >>> > >>> Not impossible, though... > >> > >> We could test a decent number of candidate sequences to cover > >> different cases. Not ideal like bruteforcing, but... Another idea is > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > Or put the various sequence numbers into an unsigned short or even > > an unsigned char. > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > too far behind the current grace period, and sets a flag to alert > > that CPU. Others rely on a false negative being functionally OK. > > > > Or so I believe. ;-) > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ The big questions are "under what conditions does it need to distinguish, and what are the consequences of failing to get this right?" Also, "what is the purpose of ->gpwrap?". Thanx, Paul > Thanks, > > - Joel > > > > > > Thanx, Paul > > > >> thanks, > >> > >> - Joel > >> > >> > >> > >>>> > >>>>> > >>>>> Thanx, Paul > >>>>> > >>>>>> thanks, > >>>>>> > >>>>>> - Joel > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> thanks, > >>>>>>> > >>>>>>> - Joel > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> --- > >>>>>>>> kernel/rcu/rcu.h | 13 ++----------- > >>>>>>>> kernel/rcu/tree.c | 6 +++--- > >>>>>>>> 2 files changed, 5 insertions(+), 14 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > >>>>>>>> index eed2951a4962..c2ca196907cb 100644 > >>>>>>>> --- a/kernel/rcu/rcu.h > >>>>>>>> +++ b/kernel/rcu/rcu.h > >>>>>>>> @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > >>>>>>>> > >>>>>>>> /* > >>>>>>>> * Given a snapshot from rcu_seq_snap(), determine whether or not a > >>>>>>>> - * full update-side operation has occurred. > >>>>>>>> + * full update-side operation has occurred while also handling > >>>>>>>> + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > >>>>>>>> */ > >>>>>>>> static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > >>>>>>>> -{ > >>>>>>>> - return ULONG_CMP_GE(READ_ONCE(*sp), s); > >>>>>>>> -} > >>>>>>>> - > >>>>>>>> -/* > >>>>>>>> - * Given a snapshot from rcu_seq_snap(), determine whether or not a > >>>>>>>> - * full update-side operation has occurred, but do not allow the > >>>>>>>> - * (ULONG_MAX / 2) safety-factor/guard-band. > >>>>>>>> - */ > >>>>>>>> -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > >>>>>>>> { > >>>>>>>> unsigned long cur_s = READ_ONCE(*sp); > >>>>>>>> > >>>>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >>>>>>>> index b77ccc55557b..835600cec9ba 100644 > >>>>>>>> --- a/kernel/rcu/tree.c > >>>>>>>> +++ b/kernel/rcu/tree.c > >>>>>>>> @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > >>>>>>>> bool poll_state_synchronize_rcu(unsigned long oldstate) > >>>>>>>> { > >>>>>>>> if (oldstate == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > >>>>>>>> + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ > >>>>>>>> return true; > >>>>>>>> } > >>>>>>>> @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > >>>>>>>> > >>>>>>>> smp_mb(); // Order against root rcu_node structure grace-period cleanup. > >>>>>>>> if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > >>>>>>>> + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > >>>>>>>> rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > >>>>>>>> + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ > >>>>>>>> return true; > >>>>>>>> } > >>>>>>>> -- > >>>>>>>> 2.34.1 > >>>>>>>>
On Wed, Jan 29, 2025 at 12:25 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > >>> > > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > >>>>> > > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > >>>>>>> <joel@joelfernandes.org> wrote: > > >>>>>>>> > > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > > >>>>>>>> large. > > >>>>>>>> > > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > > >>>>>>>> reporting false-negatives for too long. > > >>>>>>>> > > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > >>>>>>>> was done of all users to convince the change would work. > > >>>>>>>> > > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > >>>>>>> > > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > > >>>>>>> server for further testing. > > >>>>>> > > >>>>>> The run passed, details below: > > >>>>>> > > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > >>>>>> Results directory: > > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > >>>>> > > >>>>> Very good! > > >>>>> > > >>>>> How would you go about analyzing whether this is really safe vs. getting > > >>>>> just getting lucky and not having provoked an overflow? > > >>>> > > >>>> I would probably add a more specific test case stressing the API, or > > >>>> even a unit test of just the API by passing a range of sequences.. I > > >>>> should go ahead and do that but it sounds like you feel there is an > > >>>> issue with the patch? :) > > >>> > > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > > >>> so it is hard to test grace-period sequence-number wrap. > > >>> > > >>> Not impossible, though... > > >> > > >> We could test a decent number of candidate sequences to cover > > >> different cases. Not ideal like bruteforcing, but... Another idea is > > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > > > Or put the various sequence numbers into an unsigned short or even > > > an unsigned char. > > > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > > too far behind the current grace period, and sets a flag to alert > > > that CPU. Others rely on a false negative being functionally OK. > > > > > > Or so I believe. ;-) > > > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ > > The big questions are "under what conditions does it need to distinguish, > and what are the consequences of failing to get this right?" Also, "what > is the purpose of ->gpwrap?". My understanding is we take a snapshot of a sequence and then check at later time if it was reached. Ok I shall explore these questions. Meanwhile I created the following visualization using 5-bit numbers: https://drive.google.com/file/d/1w2sgJga7B5dye4iH0oPZ79obaKO-e_Jn/view?usp=sharing I find as we expect, that for rcu_seq_done(), as long as the distance between the "running sequence" and the "target" is ULONG_MAX/2, it returns correct results. OTHERWISE, it can return false results. So for target value of 28 (in 5-bit world), only initial running values from 12 would return FALSE. Before number 12, everything is TRUE. This can cause both false-positives and false-negatives depending on the input, however it is possible that are no users who use it causing false-positives! So I guess it really depends on user. False positives: If the initial value is ULONG_MAX/2 away from the target, then rcu_seq_done() can return TRUE when *no wraparound* happened. False negatives: The initial value of the snapshot was *within* the ULONG_MAX/2 distance from target. A full *wrap around then happened*, and we ended where we were initially, so rcu_seq_done() should return TRUE but it returns FALSE because it doesn't know about the wrap. Now we move rcu_seq_done_exact. It has the exact same behavior except that instead of ULONG_MAX/2, the above situations are shrunk to about 10 counts from the target. So if target is 28, then the initial sequence should have been at least 18 to avoid false-positive, but again it is possible and I certain see in the code that it cannot be used this way.. (at least so far).. So all we are left with is the false-negative band of ~2.5 GPs.. About "what are the consequences of failing to get this right". I believe false-positive is unacceptable unless it is maybe debug code - that can break functionality in code, too short GPs and all that. However false-negative is OK as long as the usecase can retry later and can afford to wait. Did I get that much correct? Thanks, - Joel
On Wed, Jan 29, 2025 at 06:10:10PM -0500, Joel Fernandes wrote: > On Wed, Jan 29, 2025 at 12:25 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > > > > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > > > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>> > > > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>>>> > > > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > >>>>>>> <joel@joelfernandes.org> wrote: > > > >>>>>>>> > > > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > > > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > > > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > > > >>>>>>>> large. > > > >>>>>>>> > > > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > > > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > > > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > > > >>>>>>>> reporting false-negatives for too long. > > > >>>>>>>> > > > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > >>>>>>>> was done of all users to convince the change would work. > > > >>>>>>>> > > > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >>>>>>> > > > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > > > >>>>>>> server for further testing. > > > >>>>>> > > > >>>>>> The run passed, details below: > > > >>>>>> > > > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > >>>>>> Results directory: > > > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > >>>>> > > > >>>>> Very good! > > > >>>>> > > > >>>>> How would you go about analyzing whether this is really safe vs. getting > > > >>>>> just getting lucky and not having provoked an overflow? > > > >>>> > > > >>>> I would probably add a more specific test case stressing the API, or > > > >>>> even a unit test of just the API by passing a range of sequences.. I > > > >>>> should go ahead and do that but it sounds like you feel there is an > > > >>>> issue with the patch? :) > > > >>> > > > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > > > >>> so it is hard to test grace-period sequence-number wrap. > > > >>> > > > >>> Not impossible, though... > > > >> > > > >> We could test a decent number of candidate sequences to cover > > > >> different cases. Not ideal like bruteforcing, but... Another idea is > > > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > > > > > Or put the various sequence numbers into an unsigned short or even > > > > an unsigned char. > > > > > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > > > too far behind the current grace period, and sets a flag to alert > > > > that CPU. Others rely on a false negative being functionally OK. > > > > > > > > Or so I believe. ;-) > > > > > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ > > > > The big questions are "under what conditions does it need to distinguish, > > and what are the consequences of failing to get this right?" Also, "what > > is the purpose of ->gpwrap?". > > My understanding is we take a snapshot of a sequence and then check at > later time if it was reached. Ok I shall explore these questions. > Meanwhile I created the following visualization using 5-bit numbers: > > https://drive.google.com/file/d/1w2sgJga7B5dye4iH0oPZ79obaKO-e_Jn/view?usp=sharing > > I find as we expect, that for rcu_seq_done(), as long as the distance > between the "running sequence" and the "target" is ULONG_MAX/2, it > returns correct results. OTHERWISE, it can return false results. So > for target value of 28 (in 5-bit world), only initial running values > from 12 would return FALSE. Before number 12, everything is TRUE. This > can cause both false-positives and false-negatives depending on the > input, however it is possible that are no users who use it causing > false-positives! So I guess it really depends on user. Your last sentence is extremely important. You need to evaluate the consequences of false positives and false negatives on a use-case-by-use-case basis. Which first requires enumerating the use cases. > False positives: > If the initial value is ULONG_MAX/2 away from the target, then > rcu_seq_done() can return TRUE when *no wraparound* happened. What exactly is the initial value and the target? My guess is that the initial value is what was returned from rcu_seq_snap(), but that is just a guess. My guess is that the target value is the value of the underlying sequence number at the time that rcu_seq_done() is invoked, but again, that is just a guess. Similar question for "away from the target", and similar guess. This might sound picky, but if you think that *I* am picky, just try the actual RCU code. ;-) > False negatives: > The initial value of the snapshot was *within* the ULONG_MAX/2 > distance from target. A full *wrap around then happened*, and we ended > where we were initially, so rcu_seq_done() should return TRUE but it > returns FALSE because it doesn't know about the wrap. By "where we were initially", one might reasonably guess that you meant that the initial value and the target are one and the same. But I suspect that you are thinking of a third value, the value of the underlying grace-period sequence number at the time of the call to rcu_seq_snap(). But you might be thinking of something else entirely. > Now we move rcu_seq_done_exact. It has the exact same behavior except > that instead of ULONG_MAX/2, the above situations are shrunk to about > 10 counts from the target. So if target is 28, then the initial > sequence should have been at least 18 to avoid false-positive, but > again it is possible and I certain see in the code that it cannot be > used this way.. (at least so far).. So all we are left with is the > false-negative band of ~2.5 GPs.. Here, I have the same questions. As you no doubt guessed. > About "what are the consequences of failing to get this right". I > believe false-positive is unacceptable unless it is maybe debug code - > that can break functionality in code, too short GPs and all that. > However false-negative is OK as long as the usecase can retry later > and can afford to wait. Did I get that much correct? Maybe? Please look at this on a per-use-case basis. Thanx, Paul
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index eed2951a4962..c2ca196907cb 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) /* * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred. + * full update-side operation has occurred while also handling + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. */ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) -{ - return ULONG_CMP_GE(READ_ONCE(*sp), s); -} - -/* - * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred, but do not allow the - * (ULONG_MAX / 2) safety-factor/guard-band. - */ -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) { unsigned long cur_s = READ_ONCE(*sp); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b77ccc55557b..835600cec9ba 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); bool poll_state_synchronize_rcu(unsigned long oldstate) { if (oldstate == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; } @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) smp_mb(); // Order against root rcu_node structure grace-period cleanup. if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; }
The rcu_seq_done() API has a large "false-negative" windows of size ULONG_MAX/2, where after wrap around, it is possible that it will think that a GP has not completed if a wrap around happens and the delta is large. rcu_seq_done_exact() is more accurate avoiding this wrap around issue, by making the window of false-negativity by only 3 GPs. Use this logic for rcu_seq_done() which is a nice negative code delta and could potentially avoid issues in the future where rcu_seq_done() was reporting false-negatives for too long. rcutorture runs of all scenarios for 15 minutes passed. Code inspection was done of all users to convince the change would work. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/rcu.h | 13 ++----------- kernel/rcu/tree.c | 6 +++--- 2 files changed, 5 insertions(+), 14 deletions(-)