[0/3] btrfs: zoned: fix zoned extent allocator

Message ID	20211207153549.2946602-1-naohiro.aota@wdc.com (mailing list archive)
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> IronPort-SDR: o8A5Kf8Rxq75mlcjddYguwxur3f6HEZbnli4GTnO6eG2IYK4B1M8BXbouA7xYu1diO9M/PZWdC X462eeZmT5X/qUgVZEUbAsfaNs30AJXuV0sXPhmvJQSwCUPVgp4y3UczzrS56gbAogl7UCgV9R 1I6dS8aVnu8RYSZ2KLtNbcP1sak2kPNcXPTgaCTfQM9eSi7tZqWsNvSuyW/vbvvb0Ei4hLiv34 V/r4YwNOMy/1941DrZQQ4cLGiFy/5K672Uhks1AMk3y1XNK9DE1zckIV06bslSmDMF9SSdC05V MMRomroQjFfwwPVu0UciLIeC IronPort-SDR: b5otXEZhsGfjNAqmfxHEN6P9gV0TcRcLhUTfrIUjTJr2U4HPEnl8QKXphxZm/CuLnc3o7rUDmi Pm5gPAojY8R7LqlPVuiduoihVT2NqCsz0Zej6qtoyGywowuUZf0llDf5/twkXRhkuubgyDUYIO uF1NgldOidafbpQgOxkzlWKrTKfDt85EqdMDHDF6MHcbcYfnPrtwMRSXckX1A30pFVwpXCBOea oE6h4vhrF7D40jFpSIoEjAflBrNyg0WHBDCwYlmI3mo6rDlAFn48piH/osqZcjnGSSU9dDWqcI qOg= WDCIronportException: Internal From: Naohiro Aota <naohiro.aota@wdc.com> To: linux-btrfs@vger.kernel.org Cc: David Sterba <dsterba@suse.com>, Naohiro Aota <naohiro.aota@wdc.com> Subject: [PATCH 0/3] btrfs: zoned: fix zoned extent allocator Date: Wed, 8 Dec 2021 00:35:46 +0900 Message-Id: <20211207153549.2946602-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: zoned: fix zoned extent allocator \| expand [0/3] btrfs: zoned: fix zoned extent allocator [1/3] btrfs: zoned: unset dedicated block group on allocation failure [2/3] btrfs: add extent allocator hook to decide to allocate chunk or not [3/3] btrfs: zoned: fix chunk allocation condition for zoned allocator

Message ID

20211207153549.2946602-1-naohiro.aota@wdc.com (mailing list archive)

Headers

IronPort-SDR: 
 o8A5Kf8Rxq75mlcjddYguwxur3f6HEZbnli4GTnO6eG2IYK4B1M8BXbouA7xYu1diO9M/PZWdC
 X462eeZmT5X/qUgVZEUbAsfaNs30AJXuV0sXPhmvJQSwCUPVgp4y3UczzrS56gbAogl7UCgV9R
 1I6dS8aVnu8RYSZ2KLtNbcP1sak2kPNcXPTgaCTfQM9eSi7tZqWsNvSuyW/vbvvb0Ei4hLiv34
 V/r4YwNOMy/1941DrZQQ4cLGiFy/5K672Uhks1AMk3y1XNK9DE1zckIV06bslSmDMF9SSdC05V
 MMRomroQjFfwwPVu0UciLIeC
IronPort-SDR: 
 b5otXEZhsGfjNAqmfxHEN6P9gV0TcRcLhUTfrIUjTJr2U4HPEnl8QKXphxZm/CuLnc3o7rUDmi
 Pm5gPAojY8R7LqlPVuiduoihVT2NqCsz0Zej6qtoyGywowuUZf0llDf5/twkXRhkuubgyDUYIO
 uF1NgldOidafbpQgOxkzlWKrTKfDt85EqdMDHDF6MHcbcYfnPrtwMRSXckX1A30pFVwpXCBOea
 oE6h4vhrF7D40jFpSIoEjAflBrNyg0WHBDCwYlmI3mo6rDlAFn48piH/osqZcjnGSSU9dDWqcI
 qOg=
WDCIronportException: Internal
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>,
        Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH 0/3] btrfs: zoned: fix zoned extent allocator
Date: Wed,  8 Dec 2021 00:35:46 +0900
Message-Id: <20211207153549.2946602-1-naohiro.aota@wdc.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

btrfs: zoned: fix zoned extent allocator | expand

Message

Naohiro Aota Dec. 7, 2021, 3:35 p.m. UTC

There are several reports of hung_task on btrfs recently.

- https://github.com/naota/linux/issues/59
- https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/

The stack trace of hung is like this:

[  739.991925][   T25] INFO: task kworker/u4:0:7 blocked for more than 122 seconds.
[  739.994821][   T25]       Not tainted 5.16.0-rc3+ #56
[  739.996676][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  739.999600][   T25] task:kworker/u4:0    state:D stack:    0 pid:    7 ppid:     2 flags:0x00004000
[  740.002656][   T25] Workqueue: writeback wb_workfn (flush-btrfs-5)
[  740.004894][   T25] Call Trace:
[  740.006113][   T25]  <TASK>
[  740.007143][   T25]  __schedule+0x9e3/0x23b0
[  740.008657][   T25]  ? io_schedule_timeout+0x190/0x190
[  740.010529][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
[  740.012385][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
[  740.014163][   T25]  schedule+0xed/0x280
[  740.015567][   T25]  io_schedule+0xfb/0x180
[  740.017026][   T25]  folio_wait_bit_common+0x386/0x840
[  740.018839][   T25]  ? delete_from_page_cache+0x220/0x220
[  740.020744][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.022645][   T25]  ? filemap_range_has_page+0x210/0x210
[  740.024588][   T25]  __folio_lock+0x17/0x20
[  740.026197][   T25]  extent_write_cache_pages+0x78c/0xba0 [btrfs]
[  740.028776][   T25]  ? __extent_writepage+0x980/0x980 [btrfs]
[  740.031026][   T25]  ? __kasan_check_read+0x11/0x20
[  740.032916][   T25]  ? __lock_acquire+0x1772/0x5a10
[  740.034611][   T25]  extent_writepages+0x1e8/0x3b0 [btrfs]
[  740.036636][   T25]  ? extent_write_locked_range+0x580/0x580 [btrfs]
[  740.038828][   T25]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[  740.040929][   T25]  btrfs_writepages+0xe/0x10 [btrfs]
[  740.042879][   T25]  do_writepages+0x187/0x610
[  740.044239][   T25]  ? page_writeback_cpu_online+0x20/0x20
[  740.045810][   T25]  ? sched_clock+0x9/0x10
[  740.047015][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.048341][   T25]  ? find_held_lock+0x3c/0x130
[  740.049649][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.051155][   T25]  ? lock_release+0x3fd/0xed0
[  740.052358][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.053775][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.055059][   T25]  __writeback_single_inode+0xd7/0xa90
[  740.056399][   T25]  writeback_sb_inodes+0x4e8/0xff0
[  740.057693][   T25]  ? sync_inode_metadata+0xe0/0xe0
[  740.058918][   T25]  ? down_read_trylock+0x45/0x50
[  740.060003][   T25]  ? trylock_super+0x1b/0xc0
[  740.061050][   T25]  __writeback_inodes_wb+0xba/0x210
[  740.062123][   T25]  wb_writeback+0x5b3/0x8b0
[  740.063075][   T25]  ? __writeback_inodes_wb+0x210/0x210
[  740.064158][   T25]  ? __local_bh_enable_ip+0xaa/0x120
[  740.065196][   T25]  ? __local_bh_enable_ip+0xaa/0x120
[  740.066180][   T25]  ? trace_hardirqs_on+0x2b/0x130
[  740.067117][   T25]  ? wb_workfn+0x2cc/0xe80
[  740.067943][   T25]  ? get_nr_dirty_inodes+0x130/0x1c0
[  740.068909][   T25]  wb_workfn+0x6f5/0xe80
[  740.069674][   T25]  ? inode_wait_for_writeback+0x40/0x40
[  740.070841][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.071834][   T25]  ? lock_acquire+0x1c1/0x4f0
[  740.072630][   T25]  ? lock_release+0xed0/0xed0
[  740.073483][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.074374][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.075342][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.076236][   T25]  process_one_work+0x826/0x14e0
[  740.077045][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.077921][   T25]  ? pwq_dec_nr_in_flight+0x250/0x250
[  740.078842][   T25]  ? lockdep_hardirqs_off+0x99/0xe0
[  740.079699][   T25]  worker_thread+0x59b/0x1050
[  740.080478][   T25]  ? process_one_work+0x14e0/0x14e0
[  740.081306][   T25]  kthread+0x38f/0x460
[  740.081955][   T25]  ? set_kthread_struct+0x110/0x110
[  740.082769][   T25]  ret_from_fork+0x22/0x30
[  740.083476][   T25]  </TASK>
[  740.083972][   T25] INFO: task aio-dio-write-v:1459 blocked for more than 122 seconds.
[  740.085202][   T25]       Not tainted 5.16.0-rc3+ #56
[  740.085970][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  740.087238][   T25] task:aio-dio-write-v state:D stack:    0 pid: 1459 ppid:   542 flags:0x00004000
[  740.088676][   T25] Call Trace:
[  740.089269][   T25]  <TASK>
[  740.089780][   T25]  __schedule+0x9e3/0x23b0
[  740.090644][   T25]  ? io_schedule_timeout+0x190/0x190
[  740.091614][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
[  740.092649][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
[  740.093531][   T25]  schedule+0xed/0x280
[  740.094323][   T25]  io_schedule+0xfb/0x180
[  740.095095][   T25]  folio_wait_bit_common+0x386/0x840
[  740.095868][   T25]  ? delete_from_page_cache+0x220/0x220
[  740.096730][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.097432][   T25]  ? filemap_range_has_page+0x210/0x210
[  740.098252][   T25]  __filemap_get_folio+0x4d3/0x8f0
[  740.099016][   T25]  ? filemap_range_needs_writeback+0xb0/0xb0
[  740.099953][   T25]  ? lock_contended+0xdf0/0xdf0
[  740.100663][   T25]  pagecache_get_page+0x19/0xc0
[  740.101400][   T25]  prepare_pages+0x205/0x4c0 [btrfs]
[  740.102274][   T25]  btrfs_buffered_write+0x5e0/0x1060 [btrfs]
[  740.103402][   T25]  ? btrfs_dirty_pages+0x2c0/0x2c0 [btrfs]
[  740.104480][   T25]  ? __up_read+0x1a9/0x7b0
[  740.105260][   T25]  ? up_write+0x480/0x480
[  740.106024][   T25]  ? btrfs_file_llseek+0x600/0x600 [btrfs]
[  740.107157][   T25]  btrfs_file_write_iter+0x84e/0xfa0 [btrfs]
[  740.108239][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.109091][   T25]  aio_write+0x314/0x820
[  740.109888][   T25]  ? cpumask_weight.constprop.0+0x40/0x40
[  740.110846][   T25]  ? kvm_sched_clock_read+0x18/0x40
[  740.111677][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.112765][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.114113][   T25]  ? lock_release+0x3fd/0xed0
[  740.115264][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.116505][   T25]  io_submit_one.constprop.0+0xba3/0x1a50
[  740.117894][   T25]  ? io_submit_one.constprop.0+0xba3/0x1a50
[  740.119180][   T25]  ? kvm_sched_clock_read+0x18/0x40
[  740.120036][   T25]  ? sched_clock+0x9/0x10
[  740.120764][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.121619][   T25]  ? __x64_sys_io_getevents_time32+0x2a0/0x2a0
[  740.122628][   T25]  ? lock_release+0x3fd/0xed0
[  740.123522][   T25]  __x64_sys_io_submit+0x15d/0x2b0
[  740.124417][   T25]  ? __x64_sys_io_submit+0x15d/0x2b0
[  740.125363][   T25]  ? io_submit_one.constprop.0+0x1a50/0x1a50
[  740.126322][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.127243][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.128071][   T25]  ? lockdep_hardirqs_on+0x7e/0x100
[  740.128915][   T25]  ? syscall_enter_from_user_mode+0x25/0x80
[  740.129872][   T25]  ? trace_hardirqs_on+0x2b/0x130
[  740.130694][   T25]  do_syscall_64+0x3b/0x90
[  740.131380][   T25]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  740.132351][   T25] RIP: 0033:0x7f5c03578679
[  740.133117][   T25] RSP: 002b:00007fffa4f030e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  740.134468][   T25] RAX: ffffffffffffffda RBX: 00007f5c034746c0 RCX: 00007f5c03578679
[  740.135749][   T25] RDX: 0000561f67b701d0 RSI: 0000000000000037 RDI: 00007f5c03676000
[  740.137151][   T25] RBP: 00007f5c03676000 R08: 0000000000000000 R09: 0000000000000000
[  740.138505][   T25] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000037
[  740.139831][   T25] R13: 0000000000000000 R14: 0000561f67b701d0 R15: 0000561f67b701d0
[  740.141263][   T25]  </TASK>
[  740.141793][   T25]
[  740.141793][   T25] Showing all locks held in the system:
[  740.143126][   T25] 3 locks held by kworker/u4:0/7:
[  740.143982][   T25]  #0: ffff888100d4d948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x740/0x14e0
[  740.145599][   T25]  #1: ffffc9000007fda8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x770/0x14e0
[  740.147521][   T25]  #2: ffff8881771760e8 (&type->s_umount_key#30){++++}-{3:3}, at: trylock_super+0x1b/0xc0
[  740.149084][   T25] 1 lock held by khungtaskd/25:
[  740.149916][   T25]  #0: ffffffff833cee00 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x5f/0x27f
[  740.151491][   T25] 1 lock held by aio-dio-write-v/1459:
[  740.152458][   T25]  #0: ffff888172999b10 (&sb->s_type->i_mutex_key#13){++++}-{3:3}, at: btrfs_inode_lock+0x3f/0x70 [btrfs]
[  740.154487][   T25]
[  740.154920][   T25] =============================================
[  740.154920][   T25]
[  740.156347][   T25] Kernel panic - not syncing: hung_task: blocked tasks
[  740.157548][   T25] CPU: 0 PID: 25 Comm: khungtaskd Not tainted 5.16.0-rc3+ #56
[  740.158826][   T25] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
[  740.160265][   T25] Call Trace:
[  740.160788][   T25]  <TASK>
[  740.161240][   T25]  dump_stack_lvl+0x49/0x5e
[  740.162045][   T25]  dump_stack+0x10/0x12
[  740.162642][   T25]  panic+0x1e2/0x478
[  740.163330][   T25]  ? __warn_printk+0xf3/0xf3
[  740.164041][   T25]  watchdog.cold+0x118/0x137
[  740.164741][   T25]  ? reset_hung_task_detector+0x30/0x30
[  740.165537][   T25]  kthread+0x38f/0x460
[  740.166202][   T25]  ? set_kthread_struct+0x110/0x110
[  740.167022][   T25]  ret_from_fork+0x22/0x30
[  740.167782][   T25]  </TASK>
[  740.168520][   T25] Kernel Offset: disabled
[  740.169285][   T25] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---

While we are debugging this issue, we found some faulty behaviors on
the zoned extent allocator. It is not the root cause of the hung as we
see a similar report also on a regular btrfs. But, it looks like that
early -ENOSPC is, at least, making the hung to happen often.

So, this series fixes the faulty behaviors of the zoned extent
allocator.

Patch 1 fixes a case when allocation fails in a dedicated block group.

Patches 2 and 3 fix the chunk allocation condition for zoned
allocator, so that it won't block a possible chunk allocation.

Naohiro Aota (3):
  btrfs: zoned: unset dedicated block group on allocation failure
  btrfs: add extent allocator hook to decide to allocate chunk or not
  btrfs: zoned: fix chunk allocation condition for zoned allocator

 fs/btrfs/extent-tree.c | 58 ++++++++++++++++++++++++++++++------------
 fs/btrfs/zoned.c       |  5 ++--
 fs/btrfs/zoned.h       |  5 ++--
 3 files changed, 46 insertions(+), 22 deletions(-)

Comments

David Sterba Dec. 8, 2021, 4:18 p.m. UTC | #1

On Wed, Dec 08, 2021 at 12:35:46AM +0900, Naohiro Aota wrote:
> There are several reports of hung_task on btrfs recently.
> 
> - https://github.com/naota/linux/issues/59
> - https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/
> 
> The stack trace of hung is like this:
> 
> [  739.991925][   T25] INFO: task kworker/u4:0:7 blocked for more than 122 seconds.
> [  739.994821][   T25]       Not tainted 5.16.0-rc3+ #56
> [  739.996676][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  739.999600][   T25] task:kworker/u4:0    state:D stack:    0 pid:    7 ppid:     2 flags:0x00004000
> [  740.002656][   T25] Workqueue: writeback wb_workfn (flush-btrfs-5)
> [  740.004894][   T25] Call Trace:
> [  740.006113][   T25]  <TASK>
> [  740.007143][   T25]  __schedule+0x9e3/0x23b0
> [  740.008657][   T25]  ? io_schedule_timeout+0x190/0x190
> [  740.010529][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
> [  740.012385][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
> [  740.014163][   T25]  schedule+0xed/0x280
> [  740.015567][   T25]  io_schedule+0xfb/0x180
> [  740.017026][   T25]  folio_wait_bit_common+0x386/0x840
> [  740.018839][   T25]  ? delete_from_page_cache+0x220/0x220
> [  740.020744][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.022645][   T25]  ? filemap_range_has_page+0x210/0x210
> [  740.024588][   T25]  __folio_lock+0x17/0x20
> [  740.026197][   T25]  extent_write_cache_pages+0x78c/0xba0 [btrfs]
> [  740.028776][   T25]  ? __extent_writepage+0x980/0x980 [btrfs]
> [  740.031026][   T25]  ? __kasan_check_read+0x11/0x20
> [  740.032916][   T25]  ? __lock_acquire+0x1772/0x5a10
> [  740.034611][   T25]  extent_writepages+0x1e8/0x3b0 [btrfs]
> [  740.036636][   T25]  ? extent_write_locked_range+0x580/0x580 [btrfs]
> [  740.038828][   T25]  ? lockdep_hardirqs_on_prepare+0x410/0x410
> [  740.040929][   T25]  btrfs_writepages+0xe/0x10 [btrfs]
> [  740.042879][   T25]  do_writepages+0x187/0x610
> [  740.044239][   T25]  ? page_writeback_cpu_online+0x20/0x20
> [  740.045810][   T25]  ? sched_clock+0x9/0x10
> [  740.047015][   T25]  ? sched_clock_cpu+0x18/0x1b0
> [  740.048341][   T25]  ? find_held_lock+0x3c/0x130
> [  740.049649][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.051155][   T25]  ? lock_release+0x3fd/0xed0
> [  740.052358][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.053775][   T25]  ? lock_is_held_type+0xe4/0x140
> [  740.055059][   T25]  __writeback_single_inode+0xd7/0xa90
> [  740.056399][   T25]  writeback_sb_inodes+0x4e8/0xff0
> [  740.057693][   T25]  ? sync_inode_metadata+0xe0/0xe0
> [  740.058918][   T25]  ? down_read_trylock+0x45/0x50
> [  740.060003][   T25]  ? trylock_super+0x1b/0xc0
> [  740.061050][   T25]  __writeback_inodes_wb+0xba/0x210
> [  740.062123][   T25]  wb_writeback+0x5b3/0x8b0
> [  740.063075][   T25]  ? __writeback_inodes_wb+0x210/0x210
> [  740.064158][   T25]  ? __local_bh_enable_ip+0xaa/0x120
> [  740.065196][   T25]  ? __local_bh_enable_ip+0xaa/0x120
> [  740.066180][   T25]  ? trace_hardirqs_on+0x2b/0x130
> [  740.067117][   T25]  ? wb_workfn+0x2cc/0xe80
> [  740.067943][   T25]  ? get_nr_dirty_inodes+0x130/0x1c0
> [  740.068909][   T25]  wb_workfn+0x6f5/0xe80
> [  740.069674][   T25]  ? inode_wait_for_writeback+0x40/0x40
> [  740.070841][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.071834][   T25]  ? lock_acquire+0x1c1/0x4f0
> [  740.072630][   T25]  ? lock_release+0xed0/0xed0
> [  740.073483][   T25]  ? lock_downgrade+0x7c0/0x7c0
> [  740.074374][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.075342][   T25]  ? lock_is_held_type+0xe4/0x140
> [  740.076236][   T25]  process_one_work+0x826/0x14e0
> [  740.077045][   T25]  ? lock_is_held_type+0xe4/0x140
> [  740.077921][   T25]  ? pwq_dec_nr_in_flight+0x250/0x250
> [  740.078842][   T25]  ? lockdep_hardirqs_off+0x99/0xe0
> [  740.079699][   T25]  worker_thread+0x59b/0x1050
> [  740.080478][   T25]  ? process_one_work+0x14e0/0x14e0
> [  740.081306][   T25]  kthread+0x38f/0x460
> [  740.081955][   T25]  ? set_kthread_struct+0x110/0x110
> [  740.082769][   T25]  ret_from_fork+0x22/0x30
> [  740.083476][   T25]  </TASK>
> [  740.083972][   T25] INFO: task aio-dio-write-v:1459 blocked for more than 122 seconds.
> [  740.085202][   T25]       Not tainted 5.16.0-rc3+ #56
> [  740.085970][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  740.087238][   T25] task:aio-dio-write-v state:D stack:    0 pid: 1459 ppid:   542 flags:0x00004000
> [  740.088676][   T25] Call Trace:
> [  740.089269][   T25]  <TASK>
> [  740.089780][   T25]  __schedule+0x9e3/0x23b0
> [  740.090644][   T25]  ? io_schedule_timeout+0x190/0x190
> [  740.091614][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
> [  740.092649][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
> [  740.093531][   T25]  schedule+0xed/0x280
> [  740.094323][   T25]  io_schedule+0xfb/0x180
> [  740.095095][   T25]  folio_wait_bit_common+0x386/0x840
> [  740.095868][   T25]  ? delete_from_page_cache+0x220/0x220
> [  740.096730][   T25]  ? lock_is_held_type+0xe4/0x140
> [  740.097432][   T25]  ? filemap_range_has_page+0x210/0x210
> [  740.098252][   T25]  __filemap_get_folio+0x4d3/0x8f0
> [  740.099016][   T25]  ? filemap_range_needs_writeback+0xb0/0xb0
> [  740.099953][   T25]  ? lock_contended+0xdf0/0xdf0
> [  740.100663][   T25]  pagecache_get_page+0x19/0xc0
> [  740.101400][   T25]  prepare_pages+0x205/0x4c0 [btrfs]
> [  740.102274][   T25]  btrfs_buffered_write+0x5e0/0x1060 [btrfs]
> [  740.103402][   T25]  ? btrfs_dirty_pages+0x2c0/0x2c0 [btrfs]
> [  740.104480][   T25]  ? __up_read+0x1a9/0x7b0
> [  740.105260][   T25]  ? up_write+0x480/0x480
> [  740.106024][   T25]  ? btrfs_file_llseek+0x600/0x600 [btrfs]
> [  740.107157][   T25]  btrfs_file_write_iter+0x84e/0xfa0 [btrfs]
> [  740.108239][   T25]  ? lock_downgrade+0x7c0/0x7c0
> [  740.109091][   T25]  aio_write+0x314/0x820
> [  740.109888][   T25]  ? cpumask_weight.constprop.0+0x40/0x40
> [  740.110846][   T25]  ? kvm_sched_clock_read+0x18/0x40
> [  740.111677][   T25]  ? sched_clock_cpu+0x18/0x1b0
> [  740.112765][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.114113][   T25]  ? lock_release+0x3fd/0xed0
> [  740.115264][   T25]  ? lock_downgrade+0x7c0/0x7c0
> [  740.116505][   T25]  io_submit_one.constprop.0+0xba3/0x1a50
> [  740.117894][   T25]  ? io_submit_one.constprop.0+0xba3/0x1a50
> [  740.119180][   T25]  ? kvm_sched_clock_read+0x18/0x40
> [  740.120036][   T25]  ? sched_clock+0x9/0x10
> [  740.120764][   T25]  ? sched_clock_cpu+0x18/0x1b0
> [  740.121619][   T25]  ? __x64_sys_io_getevents_time32+0x2a0/0x2a0
> [  740.122628][   T25]  ? lock_release+0x3fd/0xed0
> [  740.123522][   T25]  __x64_sys_io_submit+0x15d/0x2b0
> [  740.124417][   T25]  ? __x64_sys_io_submit+0x15d/0x2b0
> [  740.125363][   T25]  ? io_submit_one.constprop.0+0x1a50/0x1a50
> [  740.126322][   T25]  ? __this_cpu_preempt_check+0x13/0x20
> [  740.127243][   T25]  ? lock_is_held_type+0xe4/0x140
> [  740.128071][   T25]  ? lockdep_hardirqs_on+0x7e/0x100
> [  740.128915][   T25]  ? syscall_enter_from_user_mode+0x25/0x80
> [  740.129872][   T25]  ? trace_hardirqs_on+0x2b/0x130
> [  740.130694][   T25]  do_syscall_64+0x3b/0x90
> [  740.131380][   T25]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  740.132351][   T25] RIP: 0033:0x7f5c03578679
> [  740.133117][   T25] RSP: 002b:00007fffa4f030e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> [  740.134468][   T25] RAX: ffffffffffffffda RBX: 00007f5c034746c0 RCX: 00007f5c03578679
> [  740.135749][   T25] RDX: 0000561f67b701d0 RSI: 0000000000000037 RDI: 00007f5c03676000
> [  740.137151][   T25] RBP: 00007f5c03676000 R08: 0000000000000000 R09: 0000000000000000
> [  740.138505][   T25] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000037
> [  740.139831][   T25] R13: 0000000000000000 R14: 0000561f67b701d0 R15: 0000561f67b701d0
> [  740.141263][   T25]  </TASK>
> [  740.141793][   T25]
> [  740.141793][   T25] Showing all locks held in the system:
> [  740.143126][   T25] 3 locks held by kworker/u4:0/7:
> [  740.143982][   T25]  #0: ffff888100d4d948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x740/0x14e0
> [  740.145599][   T25]  #1: ffffc9000007fda8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x770/0x14e0
> [  740.147521][   T25]  #2: ffff8881771760e8 (&type->s_umount_key#30){++++}-{3:3}, at: trylock_super+0x1b/0xc0
> [  740.149084][   T25] 1 lock held by khungtaskd/25:
> [  740.149916][   T25]  #0: ffffffff833cee00 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x5f/0x27f
> [  740.151491][   T25] 1 lock held by aio-dio-write-v/1459:
> [  740.152458][   T25]  #0: ffff888172999b10 (&sb->s_type->i_mutex_key#13){++++}-{3:3}, at: btrfs_inode_lock+0x3f/0x70 [btrfs]
> [  740.154487][   T25]
> [  740.154920][   T25] =============================================
> [  740.154920][   T25]
> [  740.156347][   T25] Kernel panic - not syncing: hung_task: blocked tasks
> [  740.157548][   T25] CPU: 0 PID: 25 Comm: khungtaskd Not tainted 5.16.0-rc3+ #56
> [  740.158826][   T25] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
> [  740.160265][   T25] Call Trace:
> [  740.160788][   T25]  <TASK>
> [  740.161240][   T25]  dump_stack_lvl+0x49/0x5e
> [  740.162045][   T25]  dump_stack+0x10/0x12
> [  740.162642][   T25]  panic+0x1e2/0x478
> [  740.163330][   T25]  ? __warn_printk+0xf3/0xf3
> [  740.164041][   T25]  watchdog.cold+0x118/0x137
> [  740.164741][   T25]  ? reset_hung_task_detector+0x30/0x30
> [  740.165537][   T25]  kthread+0x38f/0x460
> [  740.166202][   T25]  ? set_kthread_struct+0x110/0x110
> [  740.167022][   T25]  ret_from_fork+0x22/0x30
> [  740.167782][   T25]  </TASK>
> [  740.168520][   T25] Kernel Offset: disabled
> [  740.169285][   T25] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
> 
> While we are debugging this issue, we found some faulty behaviors on
> the zoned extent allocator. It is not the root cause of the hung as we
> see a similar report also on a regular btrfs. But, it looks like that
> early -ENOSPC is, at least, making the hung to happen often.
> 
> So, this series fixes the faulty behaviors of the zoned extent
> allocator.
> 
> Patch 1 fixes a case when allocation fails in a dedicated block group.
> 
> Patches 2 and 3 fix the chunk allocation condition for zoned
> allocator, so that it won't block a possible chunk allocation.
> 
> Naohiro Aota (3):
>   btrfs: zoned: unset dedicated block group on allocation failure
>   btrfs: add extent allocator hook to decide to allocate chunk or not
>   btrfs: zoned: fix chunk allocation condition for zoned allocator

All seem to be relevant for 5.16-rc so I'll add them to misc-next now to
give it some testing, pull request next week. Thanks.

Shinichiro Kawasaki Dec. 29, 2021, 12:22 a.m. UTC | #2

On Dec 08, 2021 / 17:18, David Sterba wrote:
> On Wed, Dec 08, 2021 at 12:35:46AM +0900, Naohiro Aota wrote:
> > There are several reports of hung_task on btrfs recently.
> > 
> > - https://github.com/naota/linux/issues/59
> > - https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/

(snip)

> > While we are debugging this issue, we found some faulty behaviors on
> > the zoned extent allocator. It is not the root cause of the hung as we
> > see a similar report also on a regular btrfs. But, it looks like that
> > early -ENOSPC is, at least, making the hung to happen often.
> > 
> > So, this series fixes the faulty behaviors of the zoned extent
> > allocator.
> > 
> > Patch 1 fixes a case when allocation fails in a dedicated block group.
> > 
> > Patches 2 and 3 fix the chunk allocation condition for zoned
> > allocator, so that it won't block a possible chunk allocation.
> > 
> > Naohiro Aota (3):
> >   btrfs: zoned: unset dedicated block group on allocation failure
> >   btrfs: add extent allocator hook to decide to allocate chunk or not
> >   btrfs: zoned: fix chunk allocation condition for zoned allocator
> 
> All seem to be relevant for 5.16-rc so I'll add them to misc-next now to
> give it some testing, pull request next week. Thanks.

Hello David, thanks for your maintainer-ship always.

When I run my test set for zoned btrfs configuration, I keep on observing the
issue that Naohiro addressed with the three patches. The patches are not yet
merged to 5.16-rc7. Can I expect they get merged to rc8?

David Sterba Jan. 3, 2022, 7:13 p.m. UTC | #3

On Wed, Dec 29, 2021 at 12:22:31AM +0000, Shinichiro Kawasaki wrote:
> On Dec 08, 2021 / 17:18, David Sterba wrote:
> > On Wed, Dec 08, 2021 at 12:35:46AM +0900, Naohiro Aota wrote:
> > > There are several reports of hung_task on btrfs recently.
> > > 
> > > - https://github.com/naota/linux/issues/59
> > > - https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/
> 
> (snip)
> 
> > > While we are debugging this issue, we found some faulty behaviors on
> > > the zoned extent allocator. It is not the root cause of the hung as we
> > > see a similar report also on a regular btrfs. But, it looks like that
> > > early -ENOSPC is, at least, making the hung to happen often.
> > > 
> > > So, this series fixes the faulty behaviors of the zoned extent
> > > allocator.
> > > 
> > > Patch 1 fixes a case when allocation fails in a dedicated block group.
> > > 
> > > Patches 2 and 3 fix the chunk allocation condition for zoned
> > > allocator, so that it won't block a possible chunk allocation.
> > > 
> > > Naohiro Aota (3):
> > >   btrfs: zoned: unset dedicated block group on allocation failure
> > >   btrfs: add extent allocator hook to decide to allocate chunk or not
> > >   btrfs: zoned: fix chunk allocation condition for zoned allocator
> > 
> > All seem to be relevant for 5.16-rc so I'll add them to misc-next now to
> > give it some testing, pull request next week. Thanks.
> 
> Hello David, thanks for your maintainer-ship always.
> 
> When I run my test set for zoned btrfs configuration, I keep on observing the
> issue that Naohiro addressed with the three patches. The patches are not yet
> merged to 5.16-rc7. Can I expect they get merged to rc8?

Sorry, I did not get to sending the pull request due to holidays. The
timing of 5.16 release next week is too close, I'll tag the patches for
stable so they'll get to 5.16 later.

Shinichiro Kawasaki Jan. 5, 2022, 2 a.m. UTC | #4

On Jan 03, 2022 / 20:13, David Sterba wrote:
> On Wed, Dec 29, 2021 at 12:22:31AM +0000, Shinichiro Kawasaki wrote:
> > On Dec 08, 2021 / 17:18, David Sterba wrote:
> > > On Wed, Dec 08, 2021 at 12:35:46AM +0900, Naohiro Aota wrote:
> > > > There are several reports of hung_task on btrfs recently.
> > > > 
> > > > - https://github.com/naota/linux/issues/59
> > > > - https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/
> > 
> > (snip)
> > 
> > > > While we are debugging this issue, we found some faulty behaviors on
> > > > the zoned extent allocator. It is not the root cause of the hung as we
> > > > see a similar report also on a regular btrfs. But, it looks like that
> > > > early -ENOSPC is, at least, making the hung to happen often.
> > > > 
> > > > So, this series fixes the faulty behaviors of the zoned extent
> > > > allocator.
> > > > 
> > > > Patch 1 fixes a case when allocation fails in a dedicated block group.
> > > > 
> > > > Patches 2 and 3 fix the chunk allocation condition for zoned
> > > > allocator, so that it won't block a possible chunk allocation.
> > > > 
> > > > Naohiro Aota (3):
> > > >   btrfs: zoned: unset dedicated block group on allocation failure
> > > >   btrfs: add extent allocator hook to decide to allocate chunk or not
> > > >   btrfs: zoned: fix chunk allocation condition for zoned allocator
> > > 
> > > All seem to be relevant for 5.16-rc so I'll add them to misc-next now to
> > > give it some testing, pull request next week. Thanks.
> > 
> > Hello David, thanks for your maintainer-ship always.
> > 
> > When I run my test set for zoned btrfs configuration, I keep on observing the
> > issue that Naohiro addressed with the three patches. The patches are not yet
> > merged to 5.16-rc7. Can I expect they get merged to rc8?
> 
> Sorry, I did not get to sending the pull request due to holidays. The
> timing of 5.16 release next week is too close, I'll tag the patches for
> stable so they'll get to 5.16 later.

Sounds good. Thank you for your care.