Message ID | 20240305195720.42687-1-urezki@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] rcu: Do not release a wait-head from a GP kthread | expand |
On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > Fix a below race by not releasing a wait-head from the > GP-kthread as it can lead for reusing it whereas a worker > can still access it thus execute newly added callbacks too > early. > > CPU 0 CPU 1 > ----- ----- > > // wait_tail == HEAD1 > rcu_sr_normal_gp_cleanup() { > // has passed SR_MAX_USERS_WAKE_FROM_GP > wait_tail->next = next; > // done_tail = HEAD1 > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > queue_work() { > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > __queue_work() > } > } > > set_work_pool_and_clear_pending() > rcu_sr_normal_gp_cleanup_work() { > // new GP, wait_tail == HEAD2 > rcu_sr_normal_gp_cleanup() { > // executes all completion, but stop at HEAD1 > wait_tail->next = HEAD1; > // done_tail = HEAD2 > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > queue_work() { > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > __queue_work() > } > } > // done = HEAD2 > done = smp_load_acquire(&rcu_state.srs_done_tail); > // head = HEAD1 > head = done->next; > done->next = NULL; > llist_for_each_safe() { > // completes all callbacks, release HEAD1 > } > } > // Process second queue > set_work_pool_and_clear_pending() > rcu_sr_normal_gp_cleanup_work() { > // done = HEAD2 > done = smp_load_acquire(&rcu_state.srs_done_tail); > > // new GP, wait_tail == HEAD3 > rcu_sr_normal_gp_cleanup() { > // Finds HEAD2 with ->next == NULL at the end > rcu_sr_put_wait_head(HEAD2) > ... > > // A few more GPs later > rcu_sr_normal_gp_init() { > HEAD2 = rcu_sr_get_wait_head(); > llist_add(HEAD2, &rcu_state.srs_next); > // head == rcu_state.srs_next > head = done->next; > done->next = NULL; > llist_for_each_safe() { > // EXECUTE CALLBACKS TOO EARLY!!! > } > } > > Reported-by: Frederic Weisbecker <frederic@kernel.org> > Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > kernel/rcu/tree.c | 22 ++++++++-------------- > 1 file changed, 8 insertions(+), 14 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 31f3a61f9c38..475647620b12 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) > WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); > > /* > - * Process (a) and (d) cases. See an illustration. Apart of > - * that it handles the scenario when all clients are done, > - * wait-head is released if last. The worker is not kicked. > + * Process (a) and (d) cases. See an illustration. > */ > llist_for_each_safe(rcu, next, wait_tail->next) { > - if (rcu_sr_is_wait_head(rcu)) { > - if (!rcu->next) { > - rcu_sr_put_wait_head(rcu); > - wait_tail->next = NULL; > - } else { > - wait_tail->next = rcu; > - } > - > + if (rcu_sr_is_wait_head(rcu)) > break; > - } > > rcu_sr_normal_complete(rcu); > // It can be last, update a next on this step. > @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); > > - if (wait_tail->next) > - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > + /* > + * We schedule a work in order to perform a final processing > + * of outstanding users(if still left) and releasing wait-heads > + * added by rcu_sr_normal_gp_init() call. > + */ > + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > } Ah, nice. So instead of allocating/freeing in GP thread and freeing in worker, you allocate heads only in GP thread and free them only in worker, thus essentially fixing the UAF that Frederick found. AFAICS, this fixes the issue. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> There might a way to prevent queuing new work as fast-path optimization, incase the CBs per GP will always be < SR_MAX_USERS_WAKE_FROM_GP but I could not find a workqueue API that helps there, and work_busy() has comments saying not to use that. thanks, - Joel
On Wed, Mar 6, 2024 at 5:31 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > > Fix a below race by not releasing a wait-head from the > > GP-kthread as it can lead for reusing it whereas a worker > > can still access it thus execute newly added callbacks too > > early. > > [...] > There might a way to prevent queuing new work as fast-path optimization, incase > the CBs per GP will always be < SR_MAX_USERS_WAKE_FROM_GP but I could not find a > workqueue API that helps there, and work_busy() has comments saying not to use that. One way to do this would be to maintain a count of how many CBs are in flight via the worker route, and then fast-path-free the thing if the count is 0. Should I send a patch around something like that? It saves 1 more wakeup per synchronize_rcu() I think. - Joel
Le Tue, Mar 05, 2024 at 08:57:19PM +0100, Uladzislau Rezki (Sony) a écrit : > Fix a below race by not releasing a wait-head from the > GP-kthread as it can lead for reusing it whereas a worker > can still access it thus execute newly added callbacks too > early. > > CPU 0 CPU 1 > ----- ----- > > // wait_tail == HEAD1 > rcu_sr_normal_gp_cleanup() { > // has passed SR_MAX_USERS_WAKE_FROM_GP > wait_tail->next = next; > // done_tail = HEAD1 > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > queue_work() { > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > __queue_work() > } > } > > set_work_pool_and_clear_pending() > rcu_sr_normal_gp_cleanup_work() { > // new GP, wait_tail == HEAD2 > rcu_sr_normal_gp_cleanup() { > // executes all completion, but stop at HEAD1 > wait_tail->next = HEAD1; > // done_tail = HEAD2 > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > queue_work() { > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > __queue_work() > } > } > // done = HEAD2 > done = smp_load_acquire(&rcu_state.srs_done_tail); > // head = HEAD1 > head = done->next; > done->next = NULL; > llist_for_each_safe() { > // completes all callbacks, release HEAD1 > } > } > // Process second queue > set_work_pool_and_clear_pending() > rcu_sr_normal_gp_cleanup_work() { > // done = HEAD2 > done = smp_load_acquire(&rcu_state.srs_done_tail); > > // new GP, wait_tail == HEAD3 > rcu_sr_normal_gp_cleanup() { > // Finds HEAD2 with ->next == NULL at the end > rcu_sr_put_wait_head(HEAD2) > ... > > // A few more GPs later > rcu_sr_normal_gp_init() { > HEAD2 = rcu_sr_get_wait_head(); > llist_add(HEAD2, &rcu_state.srs_next); > // head == rcu_state.srs_next > head = done->next; > done->next = NULL; > llist_for_each_safe() { > // EXECUTE CALLBACKS TOO EARLY!!! > } > } > > Reported-by: Frederic Weisbecker <frederic@kernel.org> > Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
On 3/6/2024 5:31 PM, Joel Fernandes wrote: > > > On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: >> Fix a below race by not releasing a wait-head from the >> GP-kthread as it can lead for reusing it whereas a worker >> can still access it thus execute newly added callbacks too >> early. >> >> CPU 0 CPU 1 >> ----- ----- >> >> // wait_tail == HEAD1 >> rcu_sr_normal_gp_cleanup() { >> // has passed SR_MAX_USERS_WAKE_FROM_GP >> wait_tail->next = next; >> // done_tail = HEAD1 >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> queue_work() { >> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) >> __queue_work() >> } >> } >> >> set_work_pool_and_clear_pending() >> rcu_sr_normal_gp_cleanup_work() { >> // new GP, wait_tail == HEAD2 >> rcu_sr_normal_gp_cleanup() { >> // executes all completion, but stop at HEAD1 >> wait_tail->next = HEAD1; >> // done_tail = HEAD2 >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> queue_work() { >> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) >> __queue_work() >> } >> } >> // done = HEAD2 >> done = smp_load_acquire(&rcu_state.srs_done_tail); >> // head = HEAD1 >> head = done->next; >> done->next = NULL; >> llist_for_each_safe() { >> // completes all callbacks, release HEAD1 >> } >> } >> // Process second queue >> set_work_pool_and_clear_pending() >> rcu_sr_normal_gp_cleanup_work() { >> // done = HEAD2 >> done = smp_load_acquire(&rcu_state.srs_done_tail); >> >> // new GP, wait_tail == HEAD3 >> rcu_sr_normal_gp_cleanup() { >> // Finds HEAD2 with ->next == NULL at the end >> rcu_sr_put_wait_head(HEAD2) >> ... >> >> // A few more GPs later >> rcu_sr_normal_gp_init() { >> HEAD2 = rcu_sr_get_wait_head(); >> llist_add(HEAD2, &rcu_state.srs_next); >> // head == rcu_state.srs_next >> head = done->next; >> done->next = NULL; >> llist_for_each_safe() { >> // EXECUTE CALLBACKS TOO EARLY!!! >> } >> } >> >> Reported-by: Frederic Weisbecker <frederic@kernel.org> >> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") >> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> >> --- >> kernel/rcu/tree.c | 22 ++++++++-------------- >> 1 file changed, 8 insertions(+), 14 deletions(-) >> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> index 31f3a61f9c38..475647620b12 100644 >> --- a/kernel/rcu/tree.c >> +++ b/kernel/rcu/tree.c >> @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) >> WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); >> >> /* >> - * Process (a) and (d) cases. See an illustration. Apart of >> - * that it handles the scenario when all clients are done, >> - * wait-head is released if last. The worker is not kicked. >> + * Process (a) and (d) cases. See an illustration. >> */ >> llist_for_each_safe(rcu, next, wait_tail->next) { >> - if (rcu_sr_is_wait_head(rcu)) { >> - if (!rcu->next) { >> - rcu_sr_put_wait_head(rcu); >> - wait_tail->next = NULL; >> - } else { >> - wait_tail->next = rcu; >> - } >> - >> + if (rcu_sr_is_wait_head(rcu)) >> break; >> - } >> >> rcu_sr_normal_complete(rcu); >> // It can be last, update a next on this step. >> @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); >> >> - if (wait_tail->next) >> - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >> + /* >> + * We schedule a work in order to perform a final processing >> + * of outstanding users(if still left) and releasing wait-heads >> + * added by rcu_sr_normal_gp_init() call. >> + */ >> + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >> } One question, why do you need to queue_work() if wait_tail->next == NULL? AFAICS, at this stage if wait_tail->next == NULL, you are in CASE f. so the last remaining HEAD stays? (And llist_for_each_safe() in rcu_sr_normal_gp_cleanup_work becomes a NOOP). Could be something like this, but maybe I missed something: @@ -1672,7 +1674,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) */ static void rcu_sr_normal_gp_cleanup(void) { - struct llist_node *wait_tail, *next, *rcu; + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; int done = 0; wait_tail = rcu_state.srs_wait_tail; @@ -1707,7 +1709,8 @@ static void rcu_sr_normal_gp_cleanup(void) * of outstanding users(if still left) and releasing wait-heads * added by rcu_sr_normal_gp_init() call. */ - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); + if (rcu) + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); } /*
On 3/7/2024 1:21 AM, Joel Fernandes wrote: > > > On 3/6/2024 5:31 PM, Joel Fernandes wrote: >> >> >> On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: >>> Fix a below race by not releasing a wait-head from the >>> GP-kthread as it can lead for reusing it whereas a worker >>> can still access it thus execute newly added callbacks too >>> early. >>> >>> CPU 0 CPU 1 >>> ----- ----- >>> >>> // wait_tail == HEAD1 >>> rcu_sr_normal_gp_cleanup() { >>> // has passed SR_MAX_USERS_WAKE_FROM_GP >>> wait_tail->next = next; >>> // done_tail = HEAD1 >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>> queue_work() { >>> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) >>> __queue_work() >>> } >>> } >>> >>> set_work_pool_and_clear_pending() >>> rcu_sr_normal_gp_cleanup_work() { >>> // new GP, wait_tail == HEAD2 >>> rcu_sr_normal_gp_cleanup() { >>> // executes all completion, but stop at HEAD1 >>> wait_tail->next = HEAD1; >>> // done_tail = HEAD2 >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>> queue_work() { >>> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) >>> __queue_work() >>> } >>> } >>> // done = HEAD2 >>> done = smp_load_acquire(&rcu_state.srs_done_tail); >>> // head = HEAD1 >>> head = done->next; >>> done->next = NULL; >>> llist_for_each_safe() { >>> // completes all callbacks, release HEAD1 >>> } >>> } >>> // Process second queue >>> set_work_pool_and_clear_pending() >>> rcu_sr_normal_gp_cleanup_work() { >>> // done = HEAD2 >>> done = smp_load_acquire(&rcu_state.srs_done_tail); >>> >>> // new GP, wait_tail == HEAD3 >>> rcu_sr_normal_gp_cleanup() { >>> // Finds HEAD2 with ->next == NULL at the end >>> rcu_sr_put_wait_head(HEAD2) >>> ... >>> >>> // A few more GPs later >>> rcu_sr_normal_gp_init() { >>> HEAD2 = rcu_sr_get_wait_head(); >>> llist_add(HEAD2, &rcu_state.srs_next); >>> // head == rcu_state.srs_next >>> head = done->next; >>> done->next = NULL; >>> llist_for_each_safe() { >>> // EXECUTE CALLBACKS TOO EARLY!!! >>> } >>> } >>> >>> Reported-by: Frederic Weisbecker <frederic@kernel.org> >>> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") >>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> >>> --- >>> kernel/rcu/tree.c | 22 ++++++++-------------- >>> 1 file changed, 8 insertions(+), 14 deletions(-) >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index 31f3a61f9c38..475647620b12 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) >>> WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); >>> >>> /* >>> - * Process (a) and (d) cases. See an illustration. Apart of >>> - * that it handles the scenario when all clients are done, >>> - * wait-head is released if last. The worker is not kicked. >>> + * Process (a) and (d) cases. See an illustration. >>> */ >>> llist_for_each_safe(rcu, next, wait_tail->next) { >>> - if (rcu_sr_is_wait_head(rcu)) { >>> - if (!rcu->next) { >>> - rcu_sr_put_wait_head(rcu); >>> - wait_tail->next = NULL; >>> - } else { >>> - wait_tail->next = rcu; >>> - } >>> - >>> + if (rcu_sr_is_wait_head(rcu)) >>> break; >>> - } >>> >>> rcu_sr_normal_complete(rcu); >>> // It can be last, update a next on this step. >>> @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); >>> >>> - if (wait_tail->next) >>> - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >>> + /* >>> + * We schedule a work in order to perform a final processing >>> + * of outstanding users(if still left) and releasing wait-heads >>> + * added by rcu_sr_normal_gp_init() call. >>> + */ >>> + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >>> } > > One question, why do you need to queue_work() if wait_tail->next == NULL? > > AFAICS, at this stage if wait_tail->next == NULL, you are in CASE f. so the last > remaining HEAD stays? (And llist_for_each_safe() in > rcu_sr_normal_gp_cleanup_work becomes a NOOP). > Never mind, sorry for spewing nonsense. You can never end up with CASE f here so you still need the queue_work(). I think it is worth looking into how to free the last HEAD, without queuing a work though (while not causing wreckage). thanks, - Joel
On Thu, Mar 07, 2024 at 01:04:25AM +0100, Frederic Weisbecker wrote: > Le Tue, Mar 05, 2024 at 08:57:19PM +0100, Uladzislau Rezki (Sony) a écrit : > > Fix a below race by not releasing a wait-head from the > > GP-kthread as it can lead for reusing it whereas a worker > > can still access it thus execute newly added callbacks too > > early. > > > > CPU 0 CPU 1 > > ----- ----- > > > > // wait_tail == HEAD1 > > rcu_sr_normal_gp_cleanup() { > > // has passed SR_MAX_USERS_WAKE_FROM_GP > > wait_tail->next = next; > > // done_tail = HEAD1 > > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > > queue_work() { > > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > > __queue_work() > > } > > } > > > > set_work_pool_and_clear_pending() > > rcu_sr_normal_gp_cleanup_work() { > > // new GP, wait_tail == HEAD2 > > rcu_sr_normal_gp_cleanup() { > > // executes all completion, but stop at HEAD1 > > wait_tail->next = HEAD1; > > // done_tail = HEAD2 > > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > > queue_work() { > > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > > __queue_work() > > } > > } > > // done = HEAD2 > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > // head = HEAD1 > > head = done->next; > > done->next = NULL; > > llist_for_each_safe() { > > // completes all callbacks, release HEAD1 > > } > > } > > // Process second queue > > set_work_pool_and_clear_pending() > > rcu_sr_normal_gp_cleanup_work() { > > // done = HEAD2 > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > // new GP, wait_tail == HEAD3 > > rcu_sr_normal_gp_cleanup() { > > // Finds HEAD2 with ->next == NULL at the end > > rcu_sr_put_wait_head(HEAD2) > > ... > > > > // A few more GPs later > > rcu_sr_normal_gp_init() { > > HEAD2 = rcu_sr_get_wait_head(); > > llist_add(HEAD2, &rcu_state.srs_next); > > // head == rcu_state.srs_next > > head = done->next; > > done->next = NULL; > > llist_for_each_safe() { > > // EXECUTE CALLBACKS TOO EARLY!!! > > } > > } > > > > Reported-by: Frederic Weisbecker <frederic@kernel.org> > > Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Thank you, I will update with your review-by tag! -- Uladzislau Rezki
On Wed, Mar 06, 2024 at 05:44:04PM -0500, Joel Fernandes wrote: > On Wed, Mar 6, 2024 at 5:31 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > > > Fix a below race by not releasing a wait-head from the > > > GP-kthread as it can lead for reusing it whereas a worker > > > can still access it thus execute newly added callbacks too > > > early. > > > > [...] > > There might a way to prevent queuing new work as fast-path optimization, incase > > the CBs per GP will always be < SR_MAX_USERS_WAKE_FROM_GP but I could not find a > > workqueue API that helps there, and work_busy() has comments saying not to use that. > > One way to do this would be to maintain a count of how many CBs are in > flight via the worker route, and then fast-path-free the thing if the > count is 0. Should I send a patch around something like that? It saves > 1 more wakeup per synchronize_rcu() I think. > We can release the last wait-head if we know that the worker is not pending/running. Then we guarantee that Frederic's case is not possible. From the other hand it will introduce again more mess because the idea was, in the begging, to start with something really simple :) -- Uladzislau Rezki
On Thu, Mar 07, 2024 at 01:21:50AM -0500, Joel Fernandes wrote: > > > On 3/6/2024 5:31 PM, Joel Fernandes wrote: > > > > > > On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > >> Fix a below race by not releasing a wait-head from the > >> GP-kthread as it can lead for reusing it whereas a worker > >> can still access it thus execute newly added callbacks too > >> early. > >> > >> CPU 0 CPU 1 > >> ----- ----- > >> > >> // wait_tail == HEAD1 > >> rcu_sr_normal_gp_cleanup() { > >> // has passed SR_MAX_USERS_WAKE_FROM_GP > >> wait_tail->next = next; > >> // done_tail = HEAD1 > >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >> queue_work() { > >> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > >> __queue_work() > >> } > >> } > >> > >> set_work_pool_and_clear_pending() > >> rcu_sr_normal_gp_cleanup_work() { > >> // new GP, wait_tail == HEAD2 > >> rcu_sr_normal_gp_cleanup() { > >> // executes all completion, but stop at HEAD1 > >> wait_tail->next = HEAD1; > >> // done_tail = HEAD2 > >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >> queue_work() { > >> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > >> __queue_work() > >> } > >> } > >> // done = HEAD2 > >> done = smp_load_acquire(&rcu_state.srs_done_tail); > >> // head = HEAD1 > >> head = done->next; > >> done->next = NULL; > >> llist_for_each_safe() { > >> // completes all callbacks, release HEAD1 > >> } > >> } > >> // Process second queue > >> set_work_pool_and_clear_pending() > >> rcu_sr_normal_gp_cleanup_work() { > >> // done = HEAD2 > >> done = smp_load_acquire(&rcu_state.srs_done_tail); > >> > >> // new GP, wait_tail == HEAD3 > >> rcu_sr_normal_gp_cleanup() { > >> // Finds HEAD2 with ->next == NULL at the end > >> rcu_sr_put_wait_head(HEAD2) > >> ... > >> > >> // A few more GPs later > >> rcu_sr_normal_gp_init() { > >> HEAD2 = rcu_sr_get_wait_head(); > >> llist_add(HEAD2, &rcu_state.srs_next); > >> // head == rcu_state.srs_next > >> head = done->next; > >> done->next = NULL; > >> llist_for_each_safe() { > >> // EXECUTE CALLBACKS TOO EARLY!!! > >> } > >> } > >> > >> Reported-by: Frederic Weisbecker <frederic@kernel.org> > >> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > >> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > >> --- > >> kernel/rcu/tree.c | 22 ++++++++-------------- > >> 1 file changed, 8 insertions(+), 14 deletions(-) > >> > >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >> index 31f3a61f9c38..475647620b12 100644 > >> --- a/kernel/rcu/tree.c > >> +++ b/kernel/rcu/tree.c > >> @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) > >> WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); > >> > >> /* > >> - * Process (a) and (d) cases. See an illustration. Apart of > >> - * that it handles the scenario when all clients are done, > >> - * wait-head is released if last. The worker is not kicked. > >> + * Process (a) and (d) cases. See an illustration. > >> */ > >> llist_for_each_safe(rcu, next, wait_tail->next) { > >> - if (rcu_sr_is_wait_head(rcu)) { > >> - if (!rcu->next) { > >> - rcu_sr_put_wait_head(rcu); > >> - wait_tail->next = NULL; > >> - } else { > >> - wait_tail->next = rcu; > >> - } > >> - > >> + if (rcu_sr_is_wait_head(rcu)) > >> break; > >> - } > >> > >> rcu_sr_normal_complete(rcu); > >> // It can be last, update a next on this step. > >> @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) > >> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); > >> > >> - if (wait_tail->next) > >> - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > >> + /* > >> + * We schedule a work in order to perform a final processing > >> + * of outstanding users(if still left) and releasing wait-heads > >> + * added by rcu_sr_normal_gp_init() call. > >> + */ > >> + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > >> } > > One question, why do you need to queue_work() if wait_tail->next == NULL? > > AFAICS, at this stage if wait_tail->next == NULL, you are in CASE f. so the last > remaining HEAD stays? (And llist_for_each_safe() in > rcu_sr_normal_gp_cleanup_work becomes a NOOP). > > Could be something like this, but maybe I missed something: > > @@ -1672,7 +1674,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct > work_struct *work) > */ > static void rcu_sr_normal_gp_cleanup(void) > { > - struct llist_node *wait_tail, *next, *rcu; > + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; > int done = 0; > > wait_tail = rcu_state.srs_wait_tail; > @@ -1707,7 +1709,8 @@ static void rcu_sr_normal_gp_cleanup(void) > * of outstanding users(if still left) and releasing wait-heads > * added by rcu_sr_normal_gp_init() call. > */ > - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > + if (rcu) > + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > } > > /* > Unneeded queueing happens only first time after boot. After one cycle we always need to queue the work to do a previous cleanups. if (wait_head->next) queue_the_work(); wait_head->next is always not NULL except first cycle after boot. -- Uladzislau Rezki
On Thu, Mar 07, 2024 at 02:09:38AM -0500, Joel Fernandes wrote: > > > On 3/7/2024 1:21 AM, Joel Fernandes wrote: > > > > > > On 3/6/2024 5:31 PM, Joel Fernandes wrote: > >> > >> > >> On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > >>> Fix a below race by not releasing a wait-head from the > >>> GP-kthread as it can lead for reusing it whereas a worker > >>> can still access it thus execute newly added callbacks too > >>> early. > >>> > >>> CPU 0 CPU 1 > >>> ----- ----- > >>> > >>> // wait_tail == HEAD1 > >>> rcu_sr_normal_gp_cleanup() { > >>> // has passed SR_MAX_USERS_WAKE_FROM_GP > >>> wait_tail->next = next; > >>> // done_tail = HEAD1 > >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >>> queue_work() { > >>> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > >>> __queue_work() > >>> } > >>> } > >>> > >>> set_work_pool_and_clear_pending() > >>> rcu_sr_normal_gp_cleanup_work() { > >>> // new GP, wait_tail == HEAD2 > >>> rcu_sr_normal_gp_cleanup() { > >>> // executes all completion, but stop at HEAD1 > >>> wait_tail->next = HEAD1; > >>> // done_tail = HEAD2 > >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >>> queue_work() { > >>> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > >>> __queue_work() > >>> } > >>> } > >>> // done = HEAD2 > >>> done = smp_load_acquire(&rcu_state.srs_done_tail); > >>> // head = HEAD1 > >>> head = done->next; > >>> done->next = NULL; > >>> llist_for_each_safe() { > >>> // completes all callbacks, release HEAD1 > >>> } > >>> } > >>> // Process second queue > >>> set_work_pool_and_clear_pending() > >>> rcu_sr_normal_gp_cleanup_work() { > >>> // done = HEAD2 > >>> done = smp_load_acquire(&rcu_state.srs_done_tail); > >>> > >>> // new GP, wait_tail == HEAD3 > >>> rcu_sr_normal_gp_cleanup() { > >>> // Finds HEAD2 with ->next == NULL at the end > >>> rcu_sr_put_wait_head(HEAD2) > >>> ... > >>> > >>> // A few more GPs later > >>> rcu_sr_normal_gp_init() { > >>> HEAD2 = rcu_sr_get_wait_head(); > >>> llist_add(HEAD2, &rcu_state.srs_next); > >>> // head == rcu_state.srs_next > >>> head = done->next; > >>> done->next = NULL; > >>> llist_for_each_safe() { > >>> // EXECUTE CALLBACKS TOO EARLY!!! > >>> } > >>> } > >>> > >>> Reported-by: Frederic Weisbecker <frederic@kernel.org> > >>> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > >>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > >>> --- > >>> kernel/rcu/tree.c | 22 ++++++++-------------- > >>> 1 file changed, 8 insertions(+), 14 deletions(-) > >>> > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >>> index 31f3a61f9c38..475647620b12 100644 > >>> --- a/kernel/rcu/tree.c > >>> +++ b/kernel/rcu/tree.c > >>> @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) > >>> WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); > >>> > >>> /* > >>> - * Process (a) and (d) cases. See an illustration. Apart of > >>> - * that it handles the scenario when all clients are done, > >>> - * wait-head is released if last. The worker is not kicked. > >>> + * Process (a) and (d) cases. See an illustration. > >>> */ > >>> llist_for_each_safe(rcu, next, wait_tail->next) { > >>> - if (rcu_sr_is_wait_head(rcu)) { > >>> - if (!rcu->next) { > >>> - rcu_sr_put_wait_head(rcu); > >>> - wait_tail->next = NULL; > >>> - } else { > >>> - wait_tail->next = rcu; > >>> - } > >>> - > >>> + if (rcu_sr_is_wait_head(rcu)) > >>> break; > >>> - } > >>> > >>> rcu_sr_normal_complete(rcu); > >>> // It can be last, update a next on this step. > >>> @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) > >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); > >>> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); > >>> > >>> - if (wait_tail->next) > >>> - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > >>> + /* > >>> + * We schedule a work in order to perform a final processing > >>> + * of outstanding users(if still left) and releasing wait-heads > >>> + * added by rcu_sr_normal_gp_init() call. > >>> + */ > >>> + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > >>> } > > > > One question, why do you need to queue_work() if wait_tail->next == NULL? > > > > AFAICS, at this stage if wait_tail->next == NULL, you are in CASE f. so the last > > remaining HEAD stays? (And llist_for_each_safe() in > > rcu_sr_normal_gp_cleanup_work becomes a NOOP). > > > > Never mind, sorry for spewing nonsense. You can never end up with CASE f here so > you still need the queue_work(). I think it is worth looking into how to free > the last HEAD, without queuing a work though (while not causing wreckage). > No problem at all :) -- Uladzislau Rezki
On Wed, Mar 06, 2024 at 05:31:31PM -0500, Joel Fernandes wrote: > > > On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: > > Fix a below race by not releasing a wait-head from the > > GP-kthread as it can lead for reusing it whereas a worker > > can still access it thus execute newly added callbacks too > > early. > > > > CPU 0 CPU 1 > > ----- ----- > > > > // wait_tail == HEAD1 > > rcu_sr_normal_gp_cleanup() { > > // has passed SR_MAX_USERS_WAKE_FROM_GP > > wait_tail->next = next; > > // done_tail = HEAD1 > > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > > queue_work() { > > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > > __queue_work() > > } > > } > > > > set_work_pool_and_clear_pending() > > rcu_sr_normal_gp_cleanup_work() { > > // new GP, wait_tail == HEAD2 > > rcu_sr_normal_gp_cleanup() { > > // executes all completion, but stop at HEAD1 > > wait_tail->next = HEAD1; > > // done_tail = HEAD2 > > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > > queue_work() { > > test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) > > __queue_work() > > } > > } > > // done = HEAD2 > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > // head = HEAD1 > > head = done->next; > > done->next = NULL; > > llist_for_each_safe() { > > // completes all callbacks, release HEAD1 > > } > > } > > // Process second queue > > set_work_pool_and_clear_pending() > > rcu_sr_normal_gp_cleanup_work() { > > // done = HEAD2 > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > // new GP, wait_tail == HEAD3 > > rcu_sr_normal_gp_cleanup() { > > // Finds HEAD2 with ->next == NULL at the end > > rcu_sr_put_wait_head(HEAD2) > > ... > > > > // A few more GPs later > > rcu_sr_normal_gp_init() { > > HEAD2 = rcu_sr_get_wait_head(); > > llist_add(HEAD2, &rcu_state.srs_next); > > // head == rcu_state.srs_next > > head = done->next; > > done->next = NULL; > > llist_for_each_safe() { > > // EXECUTE CALLBACKS TOO EARLY!!! > > } > > } > > > > Reported-by: Frederic Weisbecker <frederic@kernel.org> > > Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > --- > > kernel/rcu/tree.c | 22 ++++++++-------------- > > 1 file changed, 8 insertions(+), 14 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 31f3a61f9c38..475647620b12 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) > > WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); > > > > /* > > - * Process (a) and (d) cases. See an illustration. Apart of > > - * that it handles the scenario when all clients are done, > > - * wait-head is released if last. The worker is not kicked. > > + * Process (a) and (d) cases. See an illustration. > > */ > > llist_for_each_safe(rcu, next, wait_tail->next) { > > - if (rcu_sr_is_wait_head(rcu)) { > > - if (!rcu->next) { > > - rcu_sr_put_wait_head(rcu); > > - wait_tail->next = NULL; > > - } else { > > - wait_tail->next = rcu; > > - } > > - > > + if (rcu_sr_is_wait_head(rcu)) > > break; > > - } > > > > rcu_sr_normal_complete(rcu); > > // It can be last, update a next on this step. > > @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) > > smp_store_release(&rcu_state.srs_done_tail, wait_tail); > > ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); > > > > - if (wait_tail->next) > > - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > > + /* > > + * We schedule a work in order to perform a final processing > > + * of outstanding users(if still left) and releasing wait-heads > > + * added by rcu_sr_normal_gp_init() call. > > + */ > > + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); > > } > > Ah, nice. So instead of allocating/freeing in GP thread and freeing in worker, > you allocate heads only in GP thread and free them only in worker, thus > essentially fixing the UAF that Frederick found. > > AFAICS, this fixes the issue. > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Thank you for the review-by! > There might a way to prevent queuing new work as fast-path optimization, incase > the CBs per GP will always be < SR_MAX_USERS_WAKE_FROM_GP but I could not find a > workqueue API that helps there, and work_busy() has comments saying not to use that. > This is not really critical but yes, we can think of it. -- Uladzislau Rezki
On 3/7/2024 7:57 AM, Uladzislau Rezki wrote: > On Wed, Mar 06, 2024 at 05:31:31PM -0500, Joel Fernandes wrote: >> >> >> On 3/5/2024 2:57 PM, Uladzislau Rezki (Sony) wrote: >>> Fix a below race by not releasing a wait-head from the >>> GP-kthread as it can lead for reusing it whereas a worker >>> can still access it thus execute newly added callbacks too >>> early. >>> >>> CPU 0 CPU 1 >>> ----- ----- >>> >>> // wait_tail == HEAD1 >>> rcu_sr_normal_gp_cleanup() { >>> // has passed SR_MAX_USERS_WAKE_FROM_GP >>> wait_tail->next = next; >>> // done_tail = HEAD1 >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>> queue_work() { >>> test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) >>> __queue_work() >>> } >>> } >>> >>> set_work_pool_and_clear_pending() >>> rcu_sr_normal_gp_cleanup_work() { [..] >>> >>> Reported-by: Frederic Weisbecker <frederic@kernel.org> >>> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") >>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> >>> --- >>> kernel/rcu/tree.c | 22 ++++++++-------------- >>> 1 file changed, 8 insertions(+), 14 deletions(-) >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index 31f3a61f9c38..475647620b12 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) >>> WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); >>> >>> /* >>> - * Process (a) and (d) cases. See an illustration. Apart of >>> - * that it handles the scenario when all clients are done, >>> - * wait-head is released if last. The worker is not kicked. >>> + * Process (a) and (d) cases. See an illustration. >>> */ >>> llist_for_each_safe(rcu, next, wait_tail->next) { >>> - if (rcu_sr_is_wait_head(rcu)) { >>> - if (!rcu->next) { >>> - rcu_sr_put_wait_head(rcu); >>> - wait_tail->next = NULL; >>> - } else { >>> - wait_tail->next = rcu; >>> - } >>> - >>> + if (rcu_sr_is_wait_head(rcu)) >>> break; >>> - } >>> >>> rcu_sr_normal_complete(rcu); >>> // It can be last, update a next on this step. >>> @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) >>> smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); >>> >>> - if (wait_tail->next) >>> - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >>> + /* >>> + * We schedule a work in order to perform a final processing >>> + * of outstanding users(if still left) and releasing wait-heads >>> + * added by rcu_sr_normal_gp_init() call. >>> + */ >>> + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); >>> } >> >> Ah, nice. So instead of allocating/freeing in GP thread and freeing in worker, >> you allocate heads only in GP thread and free them only in worker, thus >> essentially fixing the UAF that Frederick found. >> >> AFAICS, this fixes the issue. >> >> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> > Thank you for the review-by! > >> There might a way to prevent queuing new work as fast-path optimization, incase >> the CBs per GP will always be < SR_MAX_USERS_WAKE_FROM_GP but I could not find a >> workqueue API that helps there, and work_busy() has comments saying not to use that. >> > This is not really critical but yes, we can think of it. > Thanks, I have a patch that does that. I could not help but write it as soon as I woke up in the morning, ;-). It passes torture and I will push it for further review after some more testing. thanks, - Joel
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 31f3a61f9c38..475647620b12 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1656,21 +1656,11 @@ static void rcu_sr_normal_gp_cleanup(void) WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail)); /* - * Process (a) and (d) cases. See an illustration. Apart of - * that it handles the scenario when all clients are done, - * wait-head is released if last. The worker is not kicked. + * Process (a) and (d) cases. See an illustration. */ llist_for_each_safe(rcu, next, wait_tail->next) { - if (rcu_sr_is_wait_head(rcu)) { - if (!rcu->next) { - rcu_sr_put_wait_head(rcu); - wait_tail->next = NULL; - } else { - wait_tail->next = rcu; - } - + if (rcu_sr_is_wait_head(rcu)) break; - } rcu_sr_normal_complete(rcu); // It can be last, update a next on this step. @@ -1684,8 +1674,12 @@ static void rcu_sr_normal_gp_cleanup(void) smp_store_release(&rcu_state.srs_done_tail, wait_tail); ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); - if (wait_tail->next) - queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); + /* + * We schedule a work in order to perform a final processing + * of outstanding users(if still left) and releasing wait-heads + * added by rcu_sr_normal_gp_init() call. + */ + queue_work(system_highpri_wq, &rcu_state.srs_cleanup_work); } /*
Fix a below race by not releasing a wait-head from the GP-kthread as it can lead for reusing it whereas a worker can still access it thus execute newly added callbacks too early. CPU 0 CPU 1 ----- ----- // wait_tail == HEAD1 rcu_sr_normal_gp_cleanup() { // has passed SR_MAX_USERS_WAKE_FROM_GP wait_tail->next = next; // done_tail = HEAD1 smp_store_release(&rcu_state.srs_done_tail, wait_tail); queue_work() { test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) __queue_work() } } set_work_pool_and_clear_pending() rcu_sr_normal_gp_cleanup_work() { // new GP, wait_tail == HEAD2 rcu_sr_normal_gp_cleanup() { // executes all completion, but stop at HEAD1 wait_tail->next = HEAD1; // done_tail = HEAD2 smp_store_release(&rcu_state.srs_done_tail, wait_tail); queue_work() { test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work) __queue_work() } } // done = HEAD2 done = smp_load_acquire(&rcu_state.srs_done_tail); // head = HEAD1 head = done->next; done->next = NULL; llist_for_each_safe() { // completes all callbacks, release HEAD1 } } // Process second queue set_work_pool_and_clear_pending() rcu_sr_normal_gp_cleanup_work() { // done = HEAD2 done = smp_load_acquire(&rcu_state.srs_done_tail); // new GP, wait_tail == HEAD3 rcu_sr_normal_gp_cleanup() { // Finds HEAD2 with ->next == NULL at the end rcu_sr_put_wait_head(HEAD2) ... // A few more GPs later rcu_sr_normal_gp_init() { HEAD2 = rcu_sr_get_wait_head(); llist_add(HEAD2, &rcu_state.srs_next); // head == rcu_state.srs_next head = done->next; done->next = NULL; llist_for_each_safe() { // EXECUTE CALLBACKS TOO EARLY!!! } } Reported-by: Frederic Weisbecker <frederic@kernel.org> Fixes: 05a10b921000 ("rcu: Support direct wake-up of synchronize_rcu() users") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/rcu/tree.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-)