diff mbox series

generic/750 : add missing _fixed_by_git_commit line to the test

Message ID 20250401022921.983259-1-s.prabhu@samsung.com (mailing list archive)
State New
Headers show
Series generic/750 : add missing _fixed_by_git_commit line to the test | expand

Commit Message

Swarna Prabhu April 1, 2025, 2:29 a.m. UTC
Testing generic/750 with older kernels indicated that more work has to
be done, since we were able to reproduce a hang with v6.10-rc7 with 2.5
hours soak duration. We tried to reproduce the same issue on v6.12 and could
no longer reproduce the original hang. This motivated us to identify the commit
2e6506e1c4ee ("mm/migrate: fix deadlock in migrate_pages_batch() on large folios")
that fixes the originally reported deadlock hang annotated as pending work
to evaluate on generic/750. Hence if you are using kernel older than v6.11-rc4
this commit is needed.

Below is the kernel trace collected on v6.10-rc7 without the above
commit and CONFGI_PROVE_LOCKING enabled:

[ 8942.920967]  ret_from_fork_asm+0x1a/0x30
[ 8942.921450]  </TASK>
[ 8942.921711] INFO: task 750:2532 blocked for more than 241 seconds.                                                                                         [ 8942.922413]       Not tainted 6.10.0-rc7 #9
[ 8942.922894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                                                      [ 8942.923770] task:750             state:D stack:0     pid:2532  tgid:2532  ppid:2349   flags:0x00004002                                                     [ 8942.924820] Call Trace:
[ 8942.925109]  <TASK>
[ 8942.925362]  __schedule+0x465/0xe10
[ 8942.925756]  schedule+0x39/0x140
[ 8942.926114]  io_schedule+0x42/0x70
[ 8942.926493]  folio_wait_bit_common+0x10e/0x330
[ 8942.926986]  ? __pfx_wake_page_function+0x10/0x10
[ 8942.927506]  migrate_pages_batch+0x765/0xeb0
[ 8942.927986]  ? __pfx_compaction_alloc+0x10/0x10
[ 8942.928488]  ? __pfx_compaction_free+0x10/0x10
[ 8942.928983]  migrate_pages+0xbfd/0xf50
[ 8942.929377]  ? __pfx_compaction_alloc+0x10/0x10
[ 8942.929838]  ? __pfx_compaction_free+0x10/0x10
[ 8942.930553]  compact_zone+0xa4d/0x11d0
[ 8942.930936]  ? rcu_is_watching+0xd/0x40
[ 8942.931332]  compact_node+0xa9/0x120
[ 8942.931704]  sysctl_compaction_handler+0x71/0xd0
[ 8942.932177]  proc_sys_call_handler+0x1b8/0x2d0
[ 8942.932641]  vfs_write+0x281/0x530
[ 8942.932993]  ksys_write+0x67/0xf0
[ 8942.933381]  do_syscall_64+0x69/0x140
[ 8942.933822]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 8942.934415] RIP: 0033:0x7f8a460215c7
[ 8942.934843] RSP: 002b:00007fff75cf7bb0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 8942.935720] RAX: ffffffffffffffda RBX: 00007f8a45f8f740 RCX: 00007f8a460215c7
[ 8942.936550] RDX: 0000000000000002 RSI: 000055e89e3a7790 RDI: 0000000000000001
[ 8942.937405] RBP: 000055e89e3a7790 R08: 0000000000000000 R09: 0000000000000000                                                                              [ 8942.938236] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
[ 8942.939068] R13: 00007f8a4617a5c0 R14: 00007f8a46177e80 R15: 0000000000000000
[ 8942.939902]  </TASK>
[ 8942.940169] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[ 8942.941150] INFO: lockdep is turned off.

With the commit cherry picked to v6.10-rc7 , the test passes
successfully without any hang/deadlock, however
with CONFIG_PROVE_LOCKING enabled we do see the below trace for the
passing case:

 BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
 turning off the locking correctness validator.
 CPU: 1 PID: 2959 Comm: kworker/u34:5 Not tainted 6.10.0-rc7+ #12
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x68/0x90
  __lock_acquire.cold+0x186/0x1b1
  lock_acquire+0xd6/0x2e0
  ? btrfs_get_alloc_profile+0x27/0x90 [btrfs]
  seqcount_lockdep_reader_access+0x70/0x90 [btrfs]
  ? btrfs_get_alloc_profile+0x27/0x90 [btrfs]
  btrfs_get_alloc_profile+0x27/0x90 [btrfs]
  btrfs_reserve_extent+0xa9/0x290 [btrfs]
  btrfs_alloc_tree_block+0xa5/0x520 [btrfs]
  ? lockdep_unlock+0x5e/0xd0
  ? __lock_acquire+0xc6f/0x1fa0
  btrfs_force_cow_block+0x111/0x5f0 [btrfs]
  btrfs_cow_block+0xcc/0x2d0 [btrfs]
  btrfs_search_slot+0x502/0xd00 [btrfs]
  ? stack_depot_save_flags+0x24/0x8a0
  btrfs_lookup_file_extent+0x48/0x70 [btrfs]
  btrfs_drop_extents+0x108/0xce0 [btrfs]
  ? _raw_spin_unlock_irqrestore+0x35/0x60
  ? __create_object+0x5e/0x90
  ? rcu_is_watching+0xd/0x40
  ? kmem_cache_alloc_noprof+0x280/0x320
  insert_reserved_file_extent+0xea/0x3a0 [btrfs]
  ? btrfs_init_block_rsv+0x51/0x60 [btrfs]
  btrfs_finish_one_ordered+0x3ea/0x840 [btrfs]
  btrfs_work_helper+0x103/0x4b0 [btrfs]
  ? lock_release+0x177/0x2e0
  process_one_work+0x21a/0x590
  ? lock_is_held_type+0xd5/0x130
  worker_thread+0x1bf/0x3c0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xdd/0x110
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2d/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Started fstests-check.scope - [systemd-run] /usr/bin/bash -c "exit 77".
 fstests-check.scope: Deactivated successfully.

Signed-off-by: Swarna Prabhu <s.prabhu@samsung.com>
---
 tests/generic/750 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Luis Chamberlain April 1, 2025, 1:40 p.m. UTC | #1
On Tue, Apr 01, 2025 at 02:29:21AM +0000, Swarna Prabhu wrote:
> Testing generic/750 with older kernels indicated that more work has to
> be done, since we were able to reproduce a hang with v6.10-rc7 with 2.5
> hours soak duration. We tried to reproduce the same issue on v6.12 and could
> no longer reproduce the original hang. This motivated us to identify the commit
> 2e6506e1c4ee ("mm/migrate: fix deadlock in migrate_pages_batch() on large folios")
> that fixes the originally reported deadlock hang annotated as pending work
> to evaluate on generic/750. Hence if you are using kernel older than v6.11-rc4
> this commit is needed.
> 
> Below is the kernel trace collected on v6.10-rc7 without the above
> commit and CONFGI_PROVE_LOCKING enabled:
> 
> [ 8942.920967]  ret_from_fork_asm+0x1a/0x30
> [ 8942.921450]  </TASK>
> [ 8942.921711] INFO: task 750:2532 blocked for more than 241 seconds.                                                                                         [ 8942.922413]       Not tainted 6.10.0-rc7 #9
> [ 8942.922894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                                                      [ 8942.923770] task:750             state:D stack:0     pid:2532  tgid:2532  ppid:2349   flags:0x00004002                                                     [ 8942.924820] Call Trace:
> [ 8942.925109]  <TASK>
> [ 8942.925362]  __schedule+0x465/0xe10
> [ 8942.925756]  schedule+0x39/0x140
> [ 8942.926114]  io_schedule+0x42/0x70
> [ 8942.926493]  folio_wait_bit_common+0x10e/0x330
> [ 8942.926986]  ? __pfx_wake_page_function+0x10/0x10
> [ 8942.927506]  migrate_pages_batch+0x765/0xeb0
> [ 8942.927986]  ? __pfx_compaction_alloc+0x10/0x10
> [ 8942.928488]  ? __pfx_compaction_free+0x10/0x10
> [ 8942.928983]  migrate_pages+0xbfd/0xf50
> [ 8942.929377]  ? __pfx_compaction_alloc+0x10/0x10
> [ 8942.929838]  ? __pfx_compaction_free+0x10/0x10
> [ 8942.930553]  compact_zone+0xa4d/0x11d0
> [ 8942.930936]  ? rcu_is_watching+0xd/0x40
> [ 8942.931332]  compact_node+0xa9/0x120
> [ 8942.931704]  sysctl_compaction_handler+0x71/0xd0
> [ 8942.932177]  proc_sys_call_handler+0x1b8/0x2d0
> [ 8942.932641]  vfs_write+0x281/0x530
> [ 8942.932993]  ksys_write+0x67/0xf0
> [ 8942.933381]  do_syscall_64+0x69/0x140
> [ 8942.933822]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 8942.934415] RIP: 0033:0x7f8a460215c7
> [ 8942.934843] RSP: 002b:00007fff75cf7bb0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [ 8942.935720] RAX: ffffffffffffffda RBX: 00007f8a45f8f740 RCX: 00007f8a460215c7
> [ 8942.936550] RDX: 0000000000000002 RSI: 000055e89e3a7790 RDI: 0000000000000001
> [ 8942.937405] RBP: 000055e89e3a7790 R08: 0000000000000000 R09: 0000000000000000                                                                              [ 8942.938236] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
> [ 8942.939068] R13: 00007f8a4617a5c0 R14: 00007f8a46177e80 R15: 0000000000000000
> [ 8942.939902]  </TASK>
> [ 8942.940169] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
> [ 8942.941150] INFO: lockdep is turned off.
> 
> With the commit cherry picked to v6.10-rc7 , the test passes
> successfully without any hang/deadlock, however
> with CONFIG_PROVE_LOCKING enabled we do see the below trace for the
> passing case:
> 
>  BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
>  turning off the locking correctness validator.
>  CPU: 1 PID: 2959 Comm: kworker/u34:5 Not tainted 6.10.0-rc7+ #12
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
>  Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x68/0x90
>   __lock_acquire.cold+0x186/0x1b1
>   lock_acquire+0xd6/0x2e0
>   ? btrfs_get_alloc_profile+0x27/0x90 [btrfs]
>   seqcount_lockdep_reader_access+0x70/0x90 [btrfs]
>   ? btrfs_get_alloc_profile+0x27/0x90 [btrfs]
>   btrfs_get_alloc_profile+0x27/0x90 [btrfs]
>   btrfs_reserve_extent+0xa9/0x290 [btrfs]
>   btrfs_alloc_tree_block+0xa5/0x520 [btrfs]
>   ? lockdep_unlock+0x5e/0xd0
>   ? __lock_acquire+0xc6f/0x1fa0
>   btrfs_force_cow_block+0x111/0x5f0 [btrfs]
>   btrfs_cow_block+0xcc/0x2d0 [btrfs]
>   btrfs_search_slot+0x502/0xd00 [btrfs]
>   ? stack_depot_save_flags+0x24/0x8a0
>   btrfs_lookup_file_extent+0x48/0x70 [btrfs]
>   btrfs_drop_extents+0x108/0xce0 [btrfs]
>   ? _raw_spin_unlock_irqrestore+0x35/0x60
>   ? __create_object+0x5e/0x90
>   ? rcu_is_watching+0xd/0x40
>   ? kmem_cache_alloc_noprof+0x280/0x320
>   insert_reserved_file_extent+0xea/0x3a0 [btrfs]
>   ? btrfs_init_block_rsv+0x51/0x60 [btrfs]
>   btrfs_finish_one_ordered+0x3ea/0x840 [btrfs]
>   btrfs_work_helper+0x103/0x4b0 [btrfs]
>   ? lock_release+0x177/0x2e0
>   process_one_work+0x21a/0x590
>   ? lock_is_held_type+0xd5/0x130
>   worker_thread+0x1bf/0x3c0
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0xdd/0x110
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x2d/0x50
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
>  Started fstests-check.scope - [systemd-run] /usr/bin/bash -c "exit 77".
>  fstests-check.scope: Deactivated successfully.
> 
> Signed-off-by: Swarna Prabhu <s.prabhu@samsung.com>

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis
Johannes Thumshirn April 2, 2025, 5:51 a.m. UTC | #2
On 01.04.25 04:30, Swarna Prabhu wrote:
> -# We still deadlock with this test on v6.10-rc2, we need more work.
> -# but the below makes things better.
>   _fixed_by_git_commit kernel d99e3140a4d3 \
>   	"mm: turn folio_test_hugetlb into a PageType"
>   
> +#merged on v6.11-rc4
> +_fixed_by_git_commit kernel 2e6506e1c4ee \
> +    "mm/migrate: fix deadlock in migrate_pages_batch() on large folios"
> +

Do we really need the version information? It's kind of redundant when 
you have a commit hash.
Swarna Prabhu April 2, 2025, 6:29 p.m. UTC | #3
Sure,  will send a v2.

On Tue, Apr 1, 2025 at 10:51 PM Johannes Thumshirn <
Johannes.Thumshirn@wdc.com> wrote:

> On 01.04.25 04:30, Swarna Prabhu wrote:
> > -# We still deadlock with this test on v6.10-rc2, we need more work.
> > -# but the below makes things better.
> >   _fixed_by_git_commit kernel d99e3140a4d3 \
> >       "mm: turn folio_test_hugetlb into a PageType"
> >
> > +#merged on v6.11-rc4
> > +_fixed_by_git_commit kernel 2e6506e1c4ee \
> > +    "mm/migrate: fix deadlock in migrate_pages_batch() on large folios"
> > +
>
> Do we really need the version information? It's kind of redundant when
> you have a commit hash.
>
diff mbox series

Patch

diff --git a/tests/generic/750 b/tests/generic/750
index a0828b50..abce6a59 100755
--- a/tests/generic/750
+++ b/tests/generic/750
@@ -26,11 +26,13 @@  _cleanup()
 _require_scratch
 _require_vm_compaction
 
-# We still deadlock with this test on v6.10-rc2, we need more work.
-# but the below makes things better.
 _fixed_by_git_commit kernel d99e3140a4d3 \
 	"mm: turn folio_test_hugetlb into a PageType"
 
+#merged on v6.11-rc4
+_fixed_by_git_commit kernel 2e6506e1c4ee \
+    "mm/migrate: fix deadlock in migrate_pages_batch() on large folios"
+
 echo "Silence is golden"
 
 _scratch_mkfs > $seqres.full 2>&1