Message ID | 20211102124916.433836-1-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix memory ordering between normal and ordered work functions | expand |
On Tue, Nov 02, 2021 at 02:49:16PM +0200, Nikolay Borisov wrote: > Ordered work functions aren't guaranteed to be handled by the same thread > which executed the normal work functions. The only way execution between > normal/ordered functions is synchronized is via the WORK_DONE_BIT, > unfortunately the used bitops don't guarantee any ordering whatsoever. > > This manifested as seemingly inexplicable crashes on arm64, where > async_chunk::inode is seens as non-null in async_cow_submit which causes > submit_compressed_extents to be called and crash occurs because > async_chunk::inode suddenly became NULL. The call trace was similar to: > > pc : submit_compressed_extents+0x38/0x3d0 > lr : async_cow_submit+0x50/0xd0 > sp : ffff800015d4bc20 > > <registers omitted for brevity> > > Call trace: > submit_compressed_extents+0x38/0x3d0 > async_cow_submit+0x50/0xd0 > run_ordered_work+0xc8/0x280 > btrfs_work_helper+0x98/0x250 > process_one_work+0x1f0/0x4ac > worker_thread+0x188/0x504 > kthread+0x110/0x114 > ret_from_fork+0x10/0x18 > > Fix this by adding respective barrier calls which ensure that all > accesses preceding setting of WORK_DONE_BIT are strictly ordered before > settint the flag. At the same time add a read barrier after reading of > WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads > would be strictly ordered after reading the bit. This in turn ensures > are all accesses before WORK_DONE_BIT are going to be strictly ordered > before any access that can occur in ordered_func. > > Fixes: 08a9ff326418 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue") > Reported-by: Chris Murphy <lists@colorremedies.com> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928 > Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Tue, Nov 2, 2021 at 8:49 AM Nikolay Borisov <nborisov@suse.com> wrote: > > Ordered work functions aren't guaranteed to be handled by the same thread > which executed the normal work functions. The only way execution between > normal/ordered functions is synchronized is via the WORK_DONE_BIT, > unfortunately the used bitops don't guarantee any ordering whatsoever. > > This manifested as seemingly inexplicable crashes on arm64, where > async_chunk::inode is seens as non-null in async_cow_submit which causes > submit_compressed_extents to be called and crash occurs because > async_chunk::inode suddenly became NULL. The call trace was similar to: > > pc : submit_compressed_extents+0x38/0x3d0 > lr : async_cow_submit+0x50/0xd0 > sp : ffff800015d4bc20 > > <registers omitted for brevity> > > Call trace: > submit_compressed_extents+0x38/0x3d0 > async_cow_submit+0x50/0xd0 > run_ordered_work+0xc8/0x280 > btrfs_work_helper+0x98/0x250 > process_one_work+0x1f0/0x4ac > worker_thread+0x188/0x504 > kthread+0x110/0x114 > ret_from_fork+0x10/0x18 > > Fix this by adding respective barrier calls which ensure that all > accesses preceding setting of WORK_DONE_BIT are strictly ordered before > settint the flag. At the same time add a read barrier after reading of > WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads > would be strictly ordered after reading the bit. This in turn ensures > are all accesses before WORK_DONE_BIT are going to be strictly ordered > before any access that can occur in ordered_func. > > Fixes: 08a9ff326418 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue") > Reported-by: Chris Murphy <lists@colorremedies.com> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928 > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > --- > > David, > > IMO this is stable material. > > fs/btrfs/async-thread.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c > index 309516e6a968..d39af03b456c 100644 > --- a/fs/btrfs/async-thread.c > +++ b/fs/btrfs/async-thread.c > @@ -234,6 +234,13 @@ static void run_ordered_work(struct __btrfs_workqueue *wq, > ordered_list); > if (!test_bit(WORK_DONE_BIT, &work->flags)) > break; > + /* > + * Orders all subsequent loads after reading WORK_DONE_BIT, > + * paired with the smp_mb__before_atomic in btrfs_work_helper > + * this guarantees that the ordered function will see all > + * updates from ordinary work function. > + */ > + smp_rmb(); > > /* > * we are going to call the ordered done function, but > @@ -317,6 +324,13 @@ static void btrfs_work_helper(struct work_struct *normal_work) > thresh_exec_hook(wq); > work->func(work); > if (need_order) { > + /* > + * Ensures all memory accesses done in the work function are > + * ordered before setting the WORK_DONE_BIT.Ensuring the thread > + * which is going to executed the ordered work sees them. > + * Pairs with the smp_rmb in run_ordered_work. > + */ > + smp_mb__before_atomic(); > set_bit(WORK_DONE_BIT, &work->flags); > run_ordered_work(wq, work); > } else { > -- > 2.25.1 > Tested-by: Chris Murphy <chris@colorremedies.com>
On Tue, Nov 02, 2021 at 02:49:16PM +0200, Nikolay Borisov wrote: > Ordered work functions aren't guaranteed to be handled by the same thread > which executed the normal work functions. The only way execution between > normal/ordered functions is synchronized is via the WORK_DONE_BIT, > unfortunately the used bitops don't guarantee any ordering whatsoever. > > This manifested as seemingly inexplicable crashes on arm64, where > async_chunk::inode is seens as non-null in async_cow_submit which causes > submit_compressed_extents to be called and crash occurs because > async_chunk::inode suddenly became NULL. The call trace was similar to: > > pc : submit_compressed_extents+0x38/0x3d0 > lr : async_cow_submit+0x50/0xd0 > sp : ffff800015d4bc20 > > <registers omitted for brevity> > > Call trace: > submit_compressed_extents+0x38/0x3d0 > async_cow_submit+0x50/0xd0 > run_ordered_work+0xc8/0x280 > btrfs_work_helper+0x98/0x250 > process_one_work+0x1f0/0x4ac > worker_thread+0x188/0x504 > kthread+0x110/0x114 > ret_from_fork+0x10/0x18 > > Fix this by adding respective barrier calls which ensure that all > accesses preceding setting of WORK_DONE_BIT are strictly ordered before > settint the flag. At the same time add a read barrier after reading of > WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads > would be strictly ordered after reading the bit. This in turn ensures > are all accesses before WORK_DONE_BIT are going to be strictly ordered > before any access that can occur in ordered_func. > > Fixes: 08a9ff326418 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue") > Reported-by: Chris Murphy <lists@colorremedies.com> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928 > Signed-off-by: Nikolay Borisov <nborisov@suse.com> Added to misc-next, thanks. > --- > > David, > > IMO this is stable material. Yes. You can add the CC: stable@ tag or leave such notice so we don't spam their mailinglist.
diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 309516e6a968..d39af03b456c 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -234,6 +234,13 @@ static void run_ordered_work(struct __btrfs_workqueue *wq, ordered_list); if (!test_bit(WORK_DONE_BIT, &work->flags)) break; + /* + * Orders all subsequent loads after reading WORK_DONE_BIT, + * paired with the smp_mb__before_atomic in btrfs_work_helper + * this guarantees that the ordered function will see all + * updates from ordinary work function. + */ + smp_rmb(); /* * we are going to call the ordered done function, but @@ -317,6 +324,13 @@ static void btrfs_work_helper(struct work_struct *normal_work) thresh_exec_hook(wq); work->func(work); if (need_order) { + /* + * Ensures all memory accesses done in the work function are + * ordered before setting the WORK_DONE_BIT.Ensuring the thread + * which is going to executed the ordered work sees them. + * Pairs with the smp_rmb in run_ordered_work. + */ + smp_mb__before_atomic(); set_bit(WORK_DONE_BIT, &work->flags); run_ordered_work(wq, work); } else {
Ordered work functions aren't guaranteed to be handled by the same thread which executed the normal work functions. The only way execution between normal/ordered functions is synchronized is via the WORK_DONE_BIT, unfortunately the used bitops don't guarantee any ordering whatsoever. This manifested as seemingly inexplicable crashes on arm64, where async_chunk::inode is seens as non-null in async_cow_submit which causes submit_compressed_extents to be called and crash occurs because async_chunk::inode suddenly became NULL. The call trace was similar to: pc : submit_compressed_extents+0x38/0x3d0 lr : async_cow_submit+0x50/0xd0 sp : ffff800015d4bc20 <registers omitted for brevity> Call trace: submit_compressed_extents+0x38/0x3d0 async_cow_submit+0x50/0xd0 run_ordered_work+0xc8/0x280 btrfs_work_helper+0x98/0x250 process_one_work+0x1f0/0x4ac worker_thread+0x188/0x504 kthread+0x110/0x114 ret_from_fork+0x10/0x18 Fix this by adding respective barrier calls which ensure that all accesses preceding setting of WORK_DONE_BIT are strictly ordered before settint the flag. At the same time add a read barrier after reading of WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads would be strictly ordered after reading the bit. This in turn ensures are all accesses before WORK_DONE_BIT are going to be strictly ordered before any access that can occur in ordered_func. Fixes: 08a9ff326418 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue") Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928 Signed-off-by: Nikolay Borisov <nborisov@suse.com> --- David, IMO this is stable material. fs/btrfs/async-thread.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) -- 2.25.1