diff mbox series

[1/2] ocfs2: fix missing reset j_num_trans for sync

Message ID 20230430031302.15597-1-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] ocfs2: fix missing reset j_num_trans for sync | expand

Commit Message

heming.zhao@suse.com April 30, 2023, 3:13 a.m. UTC
fstest generic cases 266 272 281 trigger hanging issue when umount.

I use 266 to describe the root cause.

```
 49 _dmerror_unmount
 50 _dmerror_mount
 51
 52 echo "Compare files"
 53 md5sum $testdir/file1 | _filter_scratch
 54 md5sum $testdir/file2 | _filter_scratch
 55
 56 echo "CoW and unmount"
 57 sync
 58 _dmerror_load_error_table
 59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
 60     -c "fdatasync" $testdir/file2 2>&1)
 61 echo $urk >> $seqres.full
 62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
 63
 64 echo "Clean up the mess"
 65 _dmerror_unmount
```

After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
verify target file. Line 57 run 'sync' before line 58 changes the dm
target from dm-linear to dm-error. This case is hanging at line 65.

The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
finally no function reset ->j_num_trans until umount is triggered.

call flow:
```
[md5sum] //line 53 54
 vfs_read
  ocfs2_file_read_iter
   ocfs2_inode_lock_atime
    ocfs2_update_inode_atime
     + ocfs2_start_trans //atomic_inc j_num_trans
     + ...
     + ocfs2_commit_trans//no modify j_num_trans

sync //line 57. no modify j_num_trans

_dmerror_load_error_table //all write will return error after this line

_dmerror_unmount //found j_num_trans is not zero, run commit thread
               //but the underlying dev is dm-error, journaling IO
               //failed all the time and keep going to retry.
```

*** How to fix ***

kick commit thread in sync path, which can reset j_num_trans to 0.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/super.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Joseph Qi May 1, 2023, 2:07 a.m. UTC | #1
Hi,

What's the journal status in this case?
I wonder why commit thread is not working, which should flush journal
and reset j_num_trans during commit cache.

Thanks,
Joseph

On 4/30/23 11:13 AM, Heming Zhao wrote:
> fstest generic cases 266 272 281 trigger hanging issue when umount.
> 
> I use 266 to describe the root cause.
> 
> ```
>  49 _dmerror_unmount
>  50 _dmerror_mount
>  51
>  52 echo "Compare files"
>  53 md5sum $testdir/file1 | _filter_scratch
>  54 md5sum $testdir/file2 | _filter_scratch
>  55
>  56 echo "CoW and unmount"
>  57 sync
>  58 _dmerror_load_error_table
>  59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
>  60     -c "fdatasync" $testdir/file2 2>&1)
>  61 echo $urk >> $seqres.full
>  62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
>  63
>  64 echo "Clean up the mess"
>  65 _dmerror_unmount
> ```
> 
> After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> verify target file. Line 57 run 'sync' before line 58 changes the dm
> target from dm-linear to dm-error. This case is hanging at line 65.
> 
> The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> finally no function reset ->j_num_trans until umount is triggered.
> 
> call flow:
> ```
> [md5sum] //line 53 54
>  vfs_read
>   ocfs2_file_read_iter
>    ocfs2_inode_lock_atime
>     ocfs2_update_inode_atime
>      + ocfs2_start_trans //atomic_inc j_num_trans
>      + ...
>      + ocfs2_commit_trans//no modify j_num_trans
> 
> sync //line 57. no modify j_num_trans
> 
> _dmerror_load_error_table //all write will return error after this line
> 
> _dmerror_unmount //found j_num_trans is not zero, run commit thread
>                //but the underlying dev is dm-error, journaling IO
>                //failed all the time and keep going to retry.
> ```
> 
> *** How to fix ***
> 
> kick commit thread in sync path, which can reset j_num_trans to 0.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/super.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 0b0e6a132101..bb3fa21e9b47 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
>  			jbd2_log_wait_commit(osb->journal->j_journal,
>  					     target);
>  	}
> +	/* kick commit thread to reset journal->j_num_trans */
> +	if (atomic_read(&(osb->journal->j_num_trans)))
> +		wake_up(&osb->checkpoint_event);
>  	return 0;
>  }
>
heming.zhao@suse.com May 1, 2023, 2:26 a.m. UTC | #2
On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> Hi,
> 
> What's the journal status in this case?
jbd2 is marked with JBD2_ABORT

> I wonder why commit thread is not working, which should flush journal
> and reset j_num_trans during commit cache.
The reason is in my above answer.

below is the dmesg log for No. 266.

```
[  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
[  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
[  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
[  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
[  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
[  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
[  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
[  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
[  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
[  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
[  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
[  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
[  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
[  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
[  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
[  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
[  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
[  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
[  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
[  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
[  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
[  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
[  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
[  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
[  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
[  596.224616] kernel: JBD2: Ignoring recovery information on journal
[  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
[  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
[  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
[  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
[  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
[  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
[  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
[  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
[  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
[  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
[  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
[  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
[  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
[  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
[  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
[  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
[  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
[  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
[  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
[  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
[  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
[  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
[  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
[  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
[  597.139758] root[5475]: run xfstest generic/266
[  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
[  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
[  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
[  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
[  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
[  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
[  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
[  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
[  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
[  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
[  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
[  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
[  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
[  597.590136] kernel: JBD2: Ignoring recovery information on journal
[  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
[  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
[  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
[  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
[  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
[  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
[  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
[  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
[  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
[  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
[  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
[  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
[  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
[  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
[  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
[  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
[  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
[  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
[  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
[  600.209151] systemd[1]: Starting DBus interface for snapper...
[  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
[  600.217370] systemd[1]: Started DBus interface for snapper.
[  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
[  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
[  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
[  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
[  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
[  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
[  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
[  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
[  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
[  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
[  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
[  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
[  599.615320] kernel: JBD2: Ignoring recovery information on journal
[  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
[  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
[  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
[  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
[  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
[  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
[  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
[  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
[  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
[  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
[  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
[  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
[  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
[  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
[  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
[  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
[  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
[  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
[  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
[  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
[  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
[  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
[  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
[  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
[  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
[  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
[  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
[  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
[  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
[  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
[  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
[  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
[  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
[  600.127615] kernel: Aborting journal on device dm-0-24.
[  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
[  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
[  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
[  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
[  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
[  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
^C
tb-fstest1:/usr/src/linux/fs/ocfs2 #
```

Thanks,
Heming

> 
> On 4/30/23 11:13 AM, Heming Zhao wrote:
> > fstest generic cases 266 272 281 trigger hanging issue when umount.
> > 
> > I use 266 to describe the root cause.
> > 
> > ```
> >  49 _dmerror_unmount
> >  50 _dmerror_mount
> >  51
> >  52 echo "Compare files"
> >  53 md5sum $testdir/file1 | _filter_scratch
> >  54 md5sum $testdir/file2 | _filter_scratch
> >  55
> >  56 echo "CoW and unmount"
> >  57 sync
> >  58 _dmerror_load_error_table
> >  59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
> >  60     -c "fdatasync" $testdir/file2 2>&1)
> >  61 echo $urk >> $seqres.full
> >  62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
> >  63
> >  64 echo "Clean up the mess"
> >  65 _dmerror_unmount
> > ```
> > 
> > After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> > verify target file. Line 57 run 'sync' before line 58 changes the dm
> > target from dm-linear to dm-error. This case is hanging at line 65.
> > 
> > The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> > do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> > the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> > finally no function reset ->j_num_trans until umount is triggered.
> > 
> > call flow:
> > ```
> > [md5sum] //line 53 54
> >  vfs_read
> >   ocfs2_file_read_iter
> >    ocfs2_inode_lock_atime
> >     ocfs2_update_inode_atime
> >      + ocfs2_start_trans //atomic_inc j_num_trans
> >      + ...
> >      + ocfs2_commit_trans//no modify j_num_trans
> > 
> > sync //line 57. no modify j_num_trans
> > 
> > _dmerror_load_error_table //all write will return error after this line
> > 
> > _dmerror_unmount //found j_num_trans is not zero, run commit thread
> >                //but the underlying dev is dm-error, journaling IO
> >                //failed all the time and keep going to retry.
> > ```
> > 
> > *** How to fix ***
> > 
> > kick commit thread in sync path, which can reset j_num_trans to 0.
> > 
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/super.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> > index 0b0e6a132101..bb3fa21e9b47 100644
> > --- a/fs/ocfs2/super.c
> > +++ b/fs/ocfs2/super.c
> > @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
> >  			jbd2_log_wait_commit(osb->journal->j_journal,
> >  					     target);
> >  	}
> > +	/* kick commit thread to reset journal->j_num_trans */
> > +	if (atomic_read(&(osb->journal->j_num_trans)))
> > +		wake_up(&osb->checkpoint_event);
> >  	return 0;
> >  }
> >
heming.zhao@suse.com May 1, 2023, 2:48 a.m. UTC | #3
On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> Hi,
> 
> What's the journal status in this case?
> I wonder why commit thread is not working, which should flush journal
> and reset j_num_trans during commit cache.

This rootcasue of NO. 266 is that ocfs2_sync_fs bypasses commit thread and
directly calls jbd2 APIs to commit journal. This code design makes 'sync'
doesn't reset ->j_num_trans. So my patch add the kick commit thread action
in sync path.

Thanks,
Heming

> 
> 
> On 4/30/23 11:13 AM, Heming Zhao wrote:
> > fstest generic cases 266 272 281 trigger hanging issue when umount.
> > 
> > I use 266 to describe the root cause.
> > 
> > ```
> >  49 _dmerror_unmount
> >  50 _dmerror_mount
> >  51
> >  52 echo "Compare files"
> >  53 md5sum $testdir/file1 | _filter_scratch
> >  54 md5sum $testdir/file2 | _filter_scratch
> >  55
> >  56 echo "CoW and unmount"
> >  57 sync
> >  58 _dmerror_load_error_table
> >  59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
> >  60     -c "fdatasync" $testdir/file2 2>&1)
> >  61 echo $urk >> $seqres.full
> >  62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
> >  63
> >  64 echo "Clean up the mess"
> >  65 _dmerror_unmount
> > ```
> > 
> > After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> > verify target file. Line 57 run 'sync' before line 58 changes the dm
> > target from dm-linear to dm-error. This case is hanging at line 65.
> > 
> > The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> > do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> > the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> > finally no function reset ->j_num_trans until umount is triggered.
> > 
> > call flow:
> > ```
> > [md5sum] //line 53 54
> >  vfs_read
> >   ocfs2_file_read_iter
> >    ocfs2_inode_lock_atime
> >     ocfs2_update_inode_atime
> >      + ocfs2_start_trans //atomic_inc j_num_trans
> >      + ...
> >      + ocfs2_commit_trans//no modify j_num_trans
> > 
> > sync //line 57. no modify j_num_trans
> > 
> > _dmerror_load_error_table //all write will return error after this line
> > 
> > _dmerror_unmount //found j_num_trans is not zero, run commit thread
> >                //but the underlying dev is dm-error, journaling IO
> >                //failed all the time and keep going to retry.
> > ```
> > 
> > *** How to fix ***
> > 
> > kick commit thread in sync path, which can reset j_num_trans to 0.
> > 
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/super.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> > index 0b0e6a132101..bb3fa21e9b47 100644
> > --- a/fs/ocfs2/super.c
> > +++ b/fs/ocfs2/super.c
> > @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
> >  			jbd2_log_wait_commit(osb->journal->j_journal,
> >  					     target);
> >  	}
> > +	/* kick commit thread to reset journal->j_num_trans */
> > +	if (atomic_read(&(osb->journal->j_num_trans)))
> > +		wake_up(&osb->checkpoint_event);
> >  	return 0;
> >  }
> >
Joseph Qi May 1, 2023, 2:52 a.m. UTC | #4
On 5/1/23 10:26 AM, Heming Zhao wrote:
> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
>> Hi,
>>
>> What's the journal status in this case?
> jbd2 is marked with JBD2_ABORT
> 
>> I wonder why commit thread is not working, which should flush journal
>> and reset j_num_trans during commit cache.
> The reason is in my above answer.
> 
> below is the dmesg log for No. 266.
> 
> ```
> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> [  597.139758] root[5475]: run xfstest generic/266
> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> [  600.209151] systemd[1]: Starting DBus interface for snapper...
> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> [  600.217370] systemd[1]: Started DBus interface for snapper.
> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> [  600.127615] kernel: Aborting journal on device dm-0-24.
> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> ^C
> tb-fstest1:/usr/src/linux/fs/ocfs2 #
> ```

So the case is, flush journal in commit thread will never end if journal
is abort, and then lead to umount hang since commit thread won't stop,
right?

If so, I don't think your changes in sync fs is a right solution. It
seems it just for bypassing the testcase.

If the expected behavior is to let umount ends even journal is abort,
it seems we have to make journal shutdown happy in this case.

Thanks,
Joseph
heming.zhao@suse.com May 1, 2023, 3:09 a.m. UTC | #5
On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> 
> 
> On 5/1/23 10:26 AM, Heming Zhao wrote:
> > On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> >> Hi,
> >>
> >> What's the journal status in this case?
> > jbd2 is marked with JBD2_ABORT
> > 
> >> I wonder why commit thread is not working, which should flush journal
> >> and reset j_num_trans during commit cache.
> > The reason is in my above answer.
> > 
> > below is the dmesg log for No. 266.
> > 
> > ```
> > [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> > [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> > [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> > [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> > [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> > [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> > [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> > [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> > [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> > [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> > [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> > [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> > [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> > [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> > [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> > [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> > [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> > [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> > [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> > [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> > [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> > [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> > [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> > [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> > [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> > [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> > [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> > [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> > [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> > [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> > [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> > [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> > [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> > [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> > [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> > [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> > [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> > [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> > [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> > [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> > [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> > [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> > [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> > [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> > [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> > [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> > [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> > [  597.139758] root[5475]: run xfstest generic/266
> > [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> > [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> > [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> > [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> > [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> > [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> > [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> > [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> > [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> > [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> > [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> > [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> > [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> > [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> > [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> > [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> > [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> > [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> > [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> > [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> > [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> > [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> > [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> > [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> > [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> > [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> > [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> > [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> > [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> > [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> > [  600.209151] systemd[1]: Starting DBus interface for snapper...
> > [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> > [  600.217370] systemd[1]: Started DBus interface for snapper.
> > [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> > [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> > [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> > [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> > [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> > [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> > [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> > [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> > [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> > [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> > [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> > [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> > [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> > [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> > [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> > [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> > [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> > [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> > [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> > [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> > [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> > [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> > [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> > [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> > [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> > [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> > [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> > [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> > [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> > [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> > [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> > [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> > [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> > [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> > [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> > [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> > [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> > [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> > [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> > [  600.127615] kernel: Aborting journal on device dm-0-24.
> > [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> > [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> > [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> > [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> > [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > ^C
> > tb-fstest1:/usr/src/linux/fs/ocfs2 #
> > ```
> 
> So the case is, flush journal in commit thread will never end if journal
> is abort, and then lead to umount hang since commit thread won't stop,
> right?
Yes.

> 
> If so, I don't think your changes in sync fs is a right solution. It
> seems it just for bypassing the testcase.
No. 
In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
tidy up all already issued write IOs. After sync & load dm-error, new write IOs
fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
jbd2 to enter ABORT status.

> 
> If the expected behavior is to let umount ends even journal is abort,
> it seems we have to make journal shutdown happy in this case.
This is my [patch 2/2] job, which resolves fstest generaic NO. 347 361 628 629.

- Heming
heming.zhao@suse.com May 1, 2023, 3:17 a.m. UTC | #6
On Mon, May 01, 2023 at 11:10:06AM +0800, Heming Zhao wrote:
> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> > 
> > 
> > On 5/1/23 10:26 AM, Heming Zhao wrote:
> > > On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> > >> Hi,
> > >>
> > >> What's the journal status in this case?
> > > jbd2 is marked with JBD2_ABORT
> > > 
> > >> I wonder why commit thread is not working, which should flush journal
> > >> and reset j_num_trans during commit cache.
> > > The reason is in my above answer.
> > > 
> > > below is the dmesg log for No. 266.
> > > 
> > > ```
> > > [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> > > [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > > [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> > > [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> > > [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> > > [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> > > [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> > > [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> > > [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> > > [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> > > [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> > > [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> > > [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> > > [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> > > [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> > > [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> > > [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> > > [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> > > [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> > > [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> > > [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> > > [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> > > [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> > > [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> > > [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> > > [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> > > [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> > > [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > > [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> > > [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> > > [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> > > [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> > > [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> > > [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> > > [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> > > [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > > [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> > > [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> > > [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> > > [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> > > [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> > > [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> > > [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> > > [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> > > [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> > > [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> > > [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> > > [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> > > [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> > > [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> > > [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> > > [  597.139758] root[5475]: run xfstest generic/266
> > > [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> > > [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> > > [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> > > [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > > [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> > > [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> > > [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> > > [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> > > [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> > > [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> > > [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> > > [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> > > [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> > > [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> > > [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> > > [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > > [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> > > [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > > [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> > > [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> > > [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> > > [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> > > [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> > > [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> > > [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> > > [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> > > [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> > > [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> > > [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> > > [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> > > [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> > > [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> > > [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> > > [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> > > [  600.209151] systemd[1]: Starting DBus interface for snapper...
> > > [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> > > [  600.217370] systemd[1]: Started DBus interface for snapper.
> > > [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> > > [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> > > [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> > > [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > > [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> > > [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> > > [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> > > [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> > > [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> > > [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> > > [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> > > [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> > > [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> > > [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> > > [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > > [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> > > [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > > [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> > > [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> > > [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> > > [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> > > [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> > > [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> > > [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> > > [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> > > [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> > > [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> > > [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> > > [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> > > [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> > > [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> > > [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> > > [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> > > [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > > [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > > [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > > [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > > [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > > [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > > [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > > [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > > [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > > [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > > [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> > > [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> > > [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> > > [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> > > [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> > > [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> > > [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> > > [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> > > [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> > > [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> > > [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> > > [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> > > [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> > > [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> > > [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> > > [  600.127615] kernel: Aborting journal on device dm-0-24.
> > > [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> > > [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> > > [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> > > [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> > > [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> > > ^C
> > > tb-fstest1:/usr/src/linux/fs/ocfs2 #
> > > ```
> > 
> > So the case is, flush journal in commit thread will never end if journal
> > is abort, and then lead to umount hang since commit thread won't stop,
> > right?
> Yes.
> 
> > 
> > If so, I don't think your changes in sync fs is a right solution. It
> > seems it just for bypassing the testcase.
> No. 
> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> jbd2 to enter ABORT status.
> 
> > 
> > If the expected behavior is to let umount ends even journal is abort,
> > it seems we have to make journal shutdown happy in this case.
> This is my [patch 2/2] job, which resolves fstest generaic NO. 347 361 628 629.
> 
> - Heming

The log after patching [1/2] (without patch [2/2])

```
[ 2252.964874] root[2991]: run xfstest generic/266
[ 2252.566084] unknown: run fstests generic/266 at 2023-05-01 10:58:37
[ 2253.006923] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
[ 2253.595670] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: joining the lockspace group...
[ 2253.598924] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover 1
[ 2253.598954] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: group event done 0
[ 2253.601184] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: add member 1
[ 2253.602340] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_members 1 nodes
[ 2253.602624] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: generation 1 slots 1 1:1
[ 2253.602842] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory
[ 2253.603010] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory 0 in 0 new
[ 2253.603171] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory 0 out 0 messages
[ 2253.603340] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover 1 generation 1 done: 0 ms
[ 2253.603619] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: join complete
[ 2253.606374] kernel: JBD2: Ignoring recovery information on journal
[ 2253.611239] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
[ 2254.020143] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[ 2253.623800] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: leaving the lockspace group...
[ 2253.624720] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: group event done 0
[ 2253.626368] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: release_lockspace final free
[ 2254.027879] ocfs2_hb_ctl[3197]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u D9AC6C30527E4F8799E9BCDAC64F2F47
[ 2253.634342] kernel: ocfs2: Unmounting device (253,32) on (node 1)
[ 2253.659701] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: joining the lockspace group...
[ 2253.668134] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: group event done 0
[ 2253.668158] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover 1
[ 2253.668583] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: add member 1
[ 2253.668792] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_members 1 nodes
[ 2253.669003] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: generation 1 slots 1 1:1
[ 2253.669211] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory
[ 2253.669670] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory 0 in 0 new
[ 2253.670034] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover_directory 0 out 0 messages
[ 2253.670610] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: dlm_recover 1 generation 1 done: 4 ms
[ 2253.671041] kernel: dlm: D9AC6C30527E4F8799E9BCDAC64F2F47: join complete
[ 2255.687006] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: joining the lockspace group...
[ 2255.689981] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: group event done 0
[ 2255.689993] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1
[ 2255.690435] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: add member 1
[ 2255.690688] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_members 1 nodes
[ 2255.690897] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: generation 1 slots 1 1:1
[ 2255.691128] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory
[ 2255.691354] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 in 0 new
[ 2255.691561] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 out 0 messages
[ 2255.691847] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1 generation 1 done: 0 ms
[ 2255.692446] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: join complete
[ 2255.695546] kernel: JBD2: Ignoring recovery information on journal
[ 2255.700152] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[ 2256.230778] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[ 2255.851082] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: leaving the lockspace group...
[ 2255.853187] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: group event done 0
[ 2255.855128] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: release_lockspace final free
[ 2256.260312] ocfs2_hb_ctl[3265]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 0DCC4C24451D49CF918C2994FD09B36B
[ 2255.871911] kernel: ocfs2: Unmounting device (254,0) on (node 1)
[ 2255.885376] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: joining the lockspace group...
[ 2255.898789] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1
[ 2255.898801] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: group event done 0
[ 2255.899523] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: add member 1
[ 2255.899899] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_members 1 nodes
[ 2255.900227] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: generation 1 slots 1 1:1
[ 2255.900544] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory
[ 2255.900875] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 in 0 new
[ 2255.901153] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 out 0 messages
[ 2255.901516] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1 generation 1 done: 0 ms
[ 2255.901944] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: join complete
[ 2255.908503] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[ 2256.139340] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
[ 2256.151624] kernel: (xfs_io,3288,1):ocfs2_inode_lock_update:2354 ERROR: status = -5
[ 2256.152027] kernel: (xfs_io,3288,1):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[ 2256.152422] kernel: (xfs_io,3288,1):ocfs2_statfs:1652 ERROR: status = -5
[ 2256.152778] kernel: (xfs_io,3288,1):ocfs2_statfs:1684 ERROR: status = -5
[ 2256.154220] kernel: (xfs_io,3288,1):ocfs2_inode_lock_update:2354 ERROR: status = -5
[ 2256.154673] kernel: (xfs_io,3288,1):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[ 2256.154995] kernel: (xfs_io,3288,1):ocfs2_statfs:1652 ERROR: status = -5
[ 2256.155272] kernel: (xfs_io,3288,1):ocfs2_statfs:1684 ERROR: status = -5
[ 2256.157120] kernel: (xfs_io,3288,1):ocfs2_inode_lock_update:2354 ERROR: status = -5
[ 2256.157425] kernel: (xfs_io,3288,1):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[ 2256.157713] kernel: (xfs_io,3288,1):ocfs2_statfs:1652 ERROR: status = -5
[ 2256.157994] kernel: (xfs_io,3288,1):ocfs2_statfs:1684 ERROR: status = -5
[ 2256.160671] kernel: (xfs_io,3288,3):ocfs2_get_refcount_tree:358 ERROR: status = -5
[ 2256.160956] kernel: (xfs_io,3288,3):ocfs2_lock_refcount_tree:456 ERROR: status = -5
[ 2256.161205] kernel: (xfs_io,3288,3):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
[ 2256.161445] kernel: (xfs_io,3288,3):ocfs2_refcount_cow:3478 ERROR: status = -5
[ 2256.161680] kernel: (xfs_io,3288,3):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
[ 2256.161918] kernel: (xfs_io,3288,3):ocfs2_file_write_iter:2433 ERROR: status = -5
[ 2256.570033] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[ 2256.166077] kernel: (umount,3292,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
[ 2256.166494] kernel: (umount,3292,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[ 2256.166745] kernel: (umount,3292,3):ocfs2_statfs:1652 ERROR: status = -5
[ 2256.166965] kernel: (umount,3292,3):ocfs2_statfs:1684 ERROR: status = -5
[ 2256.174011] kernel: (umount,3292,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
[ 2256.174249] kernel: (umount,3292,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[ 2256.174511] kernel: (umount,3292,3):ocfs2_shutdown_local_alloc:416 ERROR: status = -5
[ 2256.175558] kernel: Buffer I/O error on dev dm-0, logical block 70163, lost sync page write
[ 2256.176960] kernel: (umount,3292,3):ocfs2_write_block:78 ERROR: status = -5
[ 2256.177256] kernel: (umount,3292,3):ocfs2_update_disk_slot:199 ERROR: status = -5
[ 2256.177498] kernel: (umount,3292,3):ocfs2_put_slot:517 ERROR: status = -5
[ 2256.178604] kernel: Buffer I/O error on dev dm-0, logical block 24, lost sync page write
[ 2256.178933] kernel: (umount,3292,2):ocfs2_write_block:78 ERROR: status = -5
[ 2256.179215] kernel: (umount,3292,2):ocfs2_journal_toggle_dirty:984 ERROR: status = -5
[ 2256.179449] kernel: (umount,3292,2):ocfs2_journal_shutdown:1052 ERROR: status = -5
[ 2256.179881] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: leaving the lockspace group...
[ 2256.181581] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: group event done 0
[ 2256.182134] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: release_lockspace final free
[ 2256.583796] ocfs2_hb_ctl[3293]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 0DCC4C24451D49CF918C2994FD09B36B
[ 2256.194870] kernel: ocfs2: Unmounting device (254,0) on (node 1)
[ 2256.202861] kernel: dm-0: detected capacity change from 20971520 to 0
[ 2256.241314] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: joining the lockspace group...
[ 2256.244424] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1
[ 2256.244431] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: group event done 0
[ 2256.245010] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: add member 1
[ 2256.245186] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_members 1 nodes
[ 2256.245360] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: generation 1 slots 1 1:1
[ 2256.245527] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory
[ 2256.245690] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 in 0 new
[ 2256.245848] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover_directory 0 out 0 messages
[ 2256.246015] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: dlm_recover 1 generation 1 done: 0 ms
[ 2256.246295] kernel: dlm: 0DCC4C24451D49CF918C2994FD09B36B: join complete
[ 2257.367729] kernel: ocfs2: Unable to create the lockspace 0DCC4C24451D49CF918C2994FD09B36B (-17), because a ocfs2-tools program is running on this file system with the same name lockspace
[ 2257.368058] kernel: (mount.ocfs2,3306,0):ocfs2_dlm_init:3355 ERROR: status = -17
[ 2257.368264] kernel: (mount.ocfs2,3306,2):ocfs2_mount_volume:1808 ERROR: status = -17
[ 2257.368596] kernel: (mount.ocfs2,3306,2):ocfs2_fill_super:1192 ERROR: status = -17
[ 2257.800975] systemd[1]: fstests-generic-266.scope: Deactivated successfully.
[ 2257.823311] systemd[1]: fstest-test.mount: Deactivated successfully.
[ 2257.424691] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
[ 2257.426048] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
[ 2257.829022] ocfs2_hb_ctl[3323]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
[ 2257.427000] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
[ 2257.434399] kernel: ocfs2: Unmounting device (253,16) on (node 1)
[ 2337.361206] CRON[3392]: (root) CMD (run-parts /etc/cron.hourly)
[ 2337.367344] CRON[3391]: (root) CMDEND (run-parts /etc/cron.hourly)
[ 2365.960233] systemd[1]: Started Timeline of Snapper Snapshots.
[ 2365.967411] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.12' (uid=0 pid=3407 comm="/usr/lib/snapper/systemd-helper --timeline")
[ 2365.969446] systemd[1]: Starting DBus interface for snapper...
[ 2365.973698] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
[ 2365.973755] systemd[1]: Started DBus interface for snapper.
[ 2365.974893] systemd[1]: snapper-timeline.service: Deactivated successfully.
[ 2426.037566] systemd[1]: snapperd.service: Deactivated successfully.
^C
tb-fstest1:/usr/src/linux/fs/ocfs2 #
```

- Heming
Joseph Qi May 1, 2023, 3:21 a.m. UTC | #7
On 5/1/23 11:09 AM, Heming Zhao wrote:
> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
>>
>>
>> On 5/1/23 10:26 AM, Heming Zhao wrote:
>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
>>>> Hi,
>>>>
>>>> What's the journal status in this case?
>>> jbd2 is marked with JBD2_ABORT
>>>
>>>> I wonder why commit thread is not working, which should flush journal
>>>> and reset j_num_trans during commit cache.
>>> The reason is in my above answer.
>>>
>>> below is the dmesg log for No. 266.
>>>
>>> ```
>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>> [  597.139758] root[5475]: run xfstest generic/266
>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>> ^C
>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
>>> ```
>>
>> So the case is, flush journal in commit thread will never end if journal
>> is abort, and then lead to umount hang since commit thread won't stop,
>> right?
> Yes.
> 
>>
>> If so, I don't think your changes in sync fs is a right solution. It
>> seems it just for bypassing the testcase.
> No. 
> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> jbd2 to enter ABORT status.
> 

What if there is no 'sync' in this case?

>>
>> If the expected behavior is to let umount ends even journal is abort,
>> it seems we have to make journal shutdown happy in this case.
> This is my [patch 2/2] job, which resolves fstest generaic NO. 347 361 628 629.
> 
> - Heming
heming.zhao@suse.com May 1, 2023, 4:51 a.m. UTC | #8
On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
> 
> 
> On 5/1/23 11:09 AM, Heming Zhao wrote:
> > On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/1/23 10:26 AM, Heming Zhao wrote:
> >>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> >>>> Hi,
> >>>>
> >>>> What's the journal status in this case?
> >>> jbd2 is marked with JBD2_ABORT
> >>>
> >>>> I wonder why commit thread is not working, which should flush journal
> >>>> and reset j_num_trans during commit cache.
> >>> The reason is in my above answer.
> >>>
> >>> below is the dmesg log for No. 266.
> >>>
> >>> ```
> >>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> >>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> >>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> >>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> >>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> >>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> >>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> >>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> >>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> >>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> >>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> >>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> >>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> >>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> >>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> >>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> >>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> >>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> >>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> >>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> >>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> >>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>> [  597.139758] root[5475]: run xfstest generic/266
> >>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> >>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> >>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> >>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> >>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> >>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> >>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> >>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> >>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> >>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> >>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
> >>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> >>> [  600.217370] systemd[1]: Started DBus interface for snapper.
> >>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> >>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> >>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> >>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> >>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> >>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> >>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> >>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> >>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> >>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> >>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> >>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> >>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> >>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> >>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> >>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> >>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> >>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>> [  600.127615] kernel: Aborting journal on device dm-0-24.
> >>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>> ^C
> >>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
> >>> ```
> >>
> >> So the case is, flush journal in commit thread will never end if journal
> >> is abort, and then lead to umount hang since commit thread won't stop,
> >> right?
> > Yes.
> > 
> >>
> >> If so, I don't think your changes in sync fs is a right solution. It
> >> seems it just for bypassing the testcase.
> > No. 
> > In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> > tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> > fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> > jbd2 to enter ABORT status.
> > 
> 
> What if there is no 'sync' in this case?

It is same with issuing 'sync' case. Please differ:
'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'

(The output of 'journalctl -f' when comment out 'sync' in NO. 266)
```
[  204.125234] root[2136]: run xfstest generic/266
[  203.943897] unknown: run fstests generic/266 at 2023-05-01 12:40:37
[  204.179159] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
[  205.000897] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: joining the lockspace group...
[  205.006914] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1
[  205.006919] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0
[  205.007771] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: add member 1
[  205.008251] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_members 1 nodes
[  205.008641] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: generation 1 slots 1 1:1
[  205.009043] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory
[  205.009305] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 in 0 new
[  205.009640] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 out 0 messages
[  205.010061] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 generation 1 done: 0 ms
[  205.010399] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: join complete
[  205.016329] kernel: JBD2: Ignoring recovery information on journal
[  205.020398] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
[  205.213841] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  205.218581] ocfs2_hb_ctl[2342]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F8E488BAE771460B89ABD355D42B0264
[  205.033443] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: leaving the lockspace group...
[  205.034046] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0
[  205.034296] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: release_lockspace final free
[  205.040937] kernel: ocfs2: Unmounting device (253,32) on (node 1)
[  205.066896] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: joining the lockspace group...
[  205.077350] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1
[  205.077419] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0
[  205.078070] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: add member 1
[  205.078298] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_members 1 nodes
[  205.078579] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: generation 1 slots 1 1:1
[  205.078823] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory
[  205.079081] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 in 0 new
[  205.079276] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 out 0 messages
[  205.079493] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 generation 1 done: 0 ms
[  205.079728] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: join complete
[  207.150959] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: joining the lockspace group...
[  207.154653] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0
[  207.154724] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1
[  207.155282] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: add member 1
[  207.155587] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_members 1 nodes
[  207.155935] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: generation 1 slots 1 1:1
[  207.156220] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory
[  207.156584] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 in 0 new
[  207.156860] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 out 0 messages
[  207.157130] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 generation 1 done: 0 ms
[  207.157453] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: join complete
[  207.161303] kernel: JBD2: Ignoring recovery information on journal
[  207.165650] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[  207.492458] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  207.334595] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: leaving the lockspace group...
[  207.338685] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0
[  207.342262] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: release_lockspace final free
[  207.530465] ocfs2_hb_ctl[2410]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 0304961B816D4F5CA1E793BD4655E027
[  207.365106] kernel: ocfs2: Unmounting device (254,0) on (node 1)
[  207.394907] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: joining the lockspace group...
[  207.408281] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1
[  207.408344] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0
[  207.409397] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: add member 1
[  207.409826] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_members 1 nodes
[  207.410239] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: generation 1 slots 1 1:1
[  207.410699] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory
[  207.411147] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 in 0 new
[  207.411579] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 out 0 messages
[  207.412579] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 generation 1 done: 4 ms
[  207.413093] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: join complete
[  207.420774] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
[  207.623776] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
[  207.627307] kernel: (xfs_io,2432,2):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  207.627487] kernel: (xfs_io,2432,2):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  207.627635] kernel: (xfs_io,2432,2):ocfs2_statfs:1654 ERROR: status = -5
[  207.627771] kernel: (xfs_io,2432,2):ocfs2_statfs:1686 ERROR: status = -5
[  207.628009] kernel: (xfs_io,2432,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  207.628275] kernel: (xfs_io,2432,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  207.628521] kernel: (xfs_io,2432,0):ocfs2_statfs:1654 ERROR: status = -5
[  207.628786] kernel: (xfs_io,2432,0):ocfs2_statfs:1686 ERROR: status = -5
[  207.629079] kernel: (xfs_io,2432,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  207.629356] kernel: (xfs_io,2432,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  207.629628] kernel: (xfs_io,2432,0):ocfs2_statfs:1654 ERROR: status = -5
[  207.629868] kernel: (xfs_io,2432,0):ocfs2_statfs:1686 ERROR: status = -5
[  207.632449] kernel: (xfs_io,2432,0):ocfs2_get_refcount_tree:358 ERROR: status = -5
[  207.632807] kernel: (xfs_io,2432,0):ocfs2_lock_refcount_tree:456 ERROR: status = -5
[  207.633186] kernel: (xfs_io,2432,0):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
[  207.633467] kernel: (xfs_io,2432,0):ocfs2_refcount_cow:3478 ERROR: status = -5
[  207.633736] kernel: (xfs_io,2432,0):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
[  207.634022] kernel: (xfs_io,2432,0):ocfs2_file_write_iter:2433 ERROR: status = -5
[  207.634437] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
[  207.634636] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
[  207.634788] kernel: Aborting journal on device dm-0-24.
[  207.634942] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
[  207.635097] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
[  207.635339] kernel: (xfs_io,2432,0):ocfs2_sync_file:201 ERROR: status = -5
[  207.638738] kernel: (umount,2436,2):ocfs2_inode_lock_update:2354 ERROR: status = -5
[  207.638939] kernel: (umount,2436,2):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
[  207.639112] kernel: (umount,2436,2):ocfs2_statfs:1654 ERROR: status = -5
[  207.639281] kernel: (umount,2436,2):ocfs2_statfs:1686 ERROR: status = -5
[  207.823818] systemd[1]: fstest-scratch.mount: Deactivated successfully.
[  207.642501] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  207.642689] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
[  208.648030] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  209.672049] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  210.696057] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  211.720064] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  212.744068] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  213.768081] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  214.792079] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  215.816079] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  216.840090] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  217.863993] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  218.888057] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  219.912099] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  220.936106] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  221.960112] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  222.984116] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  222.984857] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
[  224.008130] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  225.032123] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
[  226.056075] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5
^C
```

- Heming
Joseph Qi May 1, 2023, 11:44 a.m. UTC | #9
On 5/1/23 12:51 PM, Heming Zhao wrote:
> On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
>>
>>
>> On 5/1/23 11:09 AM, Heming Zhao wrote:
>>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 5/1/23 10:26 AM, Heming Zhao wrote:
>>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> What's the journal status in this case?
>>>>> jbd2 is marked with JBD2_ABORT
>>>>>
>>>>>> I wonder why commit thread is not working, which should flush journal
>>>>>> and reset j_num_trans during commit cache.
>>>>> The reason is in my above answer.
>>>>>
>>>>> below is the dmesg log for No. 266.
>>>>>
>>>>> ```
>>>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
>>>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
>>>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
>>>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
>>>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
>>>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
>>>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
>>>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
>>>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
>>>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
>>>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
>>>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
>>>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
>>>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
>>>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
>>>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
>>>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
>>>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
>>>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
>>>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
>>>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
>>>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>>>> [  597.139758] root[5475]: run xfstest generic/266
>>>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
>>>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
>>>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
>>>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
>>>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
>>>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
>>>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
>>>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
>>>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
>>>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
>>>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
>>>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
>>>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
>>>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
>>>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
>>>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
>>>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
>>>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
>>>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
>>>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
>>>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
>>>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
>>>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
>>>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
>>>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
>>>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
>>>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
>>>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
>>>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
>>>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
>>>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
>>>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>> ^C
>>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
>>>>> ```
>>>>
>>>> So the case is, flush journal in commit thread will never end if journal
>>>> is abort, and then lead to umount hang since commit thread won't stop,
>>>> right?
>>> Yes.
>>>
>>>>
>>>> If so, I don't think your changes in sync fs is a right solution. It
>>>> seems it just for bypassing the testcase.
>>> No. 
>>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
>>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
>>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
>>> jbd2 to enter ABORT status.
>>>
>>
>> What if there is no 'sync' in this case?
> 
> It is same with issuing 'sync' case. Please differ:
> 'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'
>I mean your solution is to reset j_num_trans in sync fs, but in case of
no 'sync', how do you resolve it?
heming.zhao@suse.com May 1, 2023, 12:35 p.m. UTC | #10
On Mon, May 01, 2023 at 07:44:42PM +0800, Joseph Qi wrote:
> 
> 
> On 5/1/23 12:51 PM, Heming Zhao wrote:
> > On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/1/23 11:09 AM, Heming Zhao wrote:
> >>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 5/1/23 10:26 AM, Heming Zhao wrote:
> >>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> What's the journal status in this case?
> >>>>> jbd2 is marked with JBD2_ABORT
> >>>>>
> >>>>>> I wonder why commit thread is not working, which should flush journal
> >>>>>> and reset j_num_trans during commit cache.
> >>>>> The reason is in my above answer.
> >>>>>
> >>>>> below is the dmesg log for No. 266.
> >>>>>
> >>>>> ```
> >>>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> >>>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> >>>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> >>>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> >>>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> >>>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> >>>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> >>>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> >>>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> >>>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> >>>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> >>>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> >>>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> >>>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> >>>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> >>>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> >>>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> >>>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> >>>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> >>>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> >>>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> >>>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>> [  597.139758] root[5475]: run xfstest generic/266
> >>>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> >>>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> >>>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> >>>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> >>>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> >>>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> >>>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> >>>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> >>>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> >>>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> >>>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
> >>>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> >>>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
> >>>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> >>>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> >>>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> >>>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> >>>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> >>>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> >>>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> >>>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> >>>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> >>>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> >>>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> >>>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> >>>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> >>>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> >>>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> >>>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> >>>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> >>>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
> >>>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>> ^C
> >>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
> >>>>> ```
> >>>>
> >>>> So the case is, flush journal in commit thread will never end if journal
> >>>> is abort, and then lead to umount hang since commit thread won't stop,
> >>>> right?
> >>> Yes.
> >>>
> >>>>
> >>>> If so, I don't think your changes in sync fs is a right solution. It
> >>>> seems it just for bypassing the testcase.
> >>> No. 
> >>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> >>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> >>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> >>> jbd2 to enter ABORT status.
> >>>
> >>
> >> What if there is no 'sync' in this case?
> > 
> > It is same with issuing 'sync' case. Please differ:
> > 'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'
> >I mean your solution is to reset j_num_trans in sync fs, but in case of
> no 'sync', how do you resolve it?

If I understand your question correctly, your meaning is to remove No. 266
'sync' line, then how to handle this case.
My answer is this is my patch [2/2] job. I have tested it already.

- Heming
Joseph Qi May 2, 2023, 12:29 a.m. UTC | #11
On 5/1/23 8:35 PM, Heming Zhao wrote:
> On Mon, May 01, 2023 at 07:44:42PM +0800, Joseph Qi wrote:
>>
>>
>> On 5/1/23 12:51 PM, Heming Zhao wrote:
>>> On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 5/1/23 11:09 AM, Heming Zhao wrote:
>>>>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
>>>>>>
>>>>>>
>>>>>> On 5/1/23 10:26 AM, Heming Zhao wrote:
>>>>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What's the journal status in this case?
>>>>>>> jbd2 is marked with JBD2_ABORT
>>>>>>>
>>>>>>>> I wonder why commit thread is not working, which should flush journal
>>>>>>>> and reset j_num_trans during commit cache.
>>>>>>> The reason is in my above answer.
>>>>>>>
>>>>>>> below is the dmesg log for No. 266.
>>>>>>>
>>>>>>> ```
>>>>>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>>>>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>>>>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>>>>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>>>>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>>>>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>>>>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>>>>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>>>>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>>>>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>>>>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
>>>>>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
>>>>>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
>>>>>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>>>>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
>>>>>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
>>>>>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
>>>>>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
>>>>>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
>>>>>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
>>>>>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
>>>>>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
>>>>>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
>>>>>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
>>>>>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
>>>>>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
>>>>>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
>>>>>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
>>>>>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>>>>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
>>>>>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
>>>>>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
>>>>>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
>>>>>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
>>>>>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
>>>>>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
>>>>>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
>>>>>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
>>>>>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
>>>>>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
>>>>>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
>>>>>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
>>>>>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
>>>>>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
>>>>>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
>>>>>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  597.139758] root[5475]: run xfstest generic/266
>>>>>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
>>>>>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
>>>>>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>>>>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>>>>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>>>>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>>>>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>>>>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>>>>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>>>>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>>>>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
>>>>>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>>>>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
>>>>>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
>>>>>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
>>>>>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
>>>>>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
>>>>>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
>>>>>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
>>>>>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
>>>>>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
>>>>>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
>>>>>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
>>>>>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
>>>>>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
>>>>>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
>>>>>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
>>>>>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
>>>>>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
>>>>>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
>>>>>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
>>>>>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
>>>>>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
>>>>>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
>>>>>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>>>>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>>>>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>>>>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>>>>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>>>>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>>>>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>>>>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>>>>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
>>>>>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>>>>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
>>>>>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
>>>>>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
>>>>>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
>>>>>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
>>>>>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
>>>>>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
>>>>>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
>>>>>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
>>>>>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
>>>>>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
>>>>>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
>>>>>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
>>>>>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
>>>>>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
>>>>>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
>>>>>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
>>>>>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
>>>>>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
>>>>>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
>>>>>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
>>>>>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
>>>>>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
>>>>>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
>>>>>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
>>>>>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
>>>>>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
>>>>>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
>>>>>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
>>>>>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
>>>>>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
>>>>>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>>>>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>>>>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
>>>>>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
>>>>>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
>>>>>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>>>>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
>>>>>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
>>>>>>> ^C
>>>>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
>>>>>>> ```
>>>>>>
>>>>>> So the case is, flush journal in commit thread will never end if journal
>>>>>> is abort, and then lead to umount hang since commit thread won't stop,
>>>>>> right?
>>>>> Yes.
>>>>>
>>>>>>
>>>>>> If so, I don't think your changes in sync fs is a right solution. It
>>>>>> seems it just for bypassing the testcase.
>>>>> No. 
>>>>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
>>>>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
>>>>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
>>>>> jbd2 to enter ABORT status.
>>>>>
>>>>
>>>> What if there is no 'sync' in this case?
>>>
>>> It is same with issuing 'sync' case. Please differ:
>>> 'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'
>> I mean your solution is to reset j_num_trans in sync fs, but in case of
>> no 'sync', how do you resolve it?
> 
> If I understand your question correctly, your meaning is to remove No. 266
> 'sync' line, then how to handle this case.
> My answer is this is my patch [2/2] job. I have tested it already.
> 
IMO, they are the same issue. 
And it looks weird to fix it in sync fs.
heming.zhao@suse.com May 2, 2023, 3:43 a.m. UTC | #12
On Tue, May 02, 2023 at 08:29:36AM +0800, Joseph Qi wrote:
> 
> 
> On 5/1/23 8:35 PM, Heming Zhao wrote:
> > On Mon, May 01, 2023 at 07:44:42PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/1/23 12:51 PM, Heming Zhao wrote:
> >>> On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 5/1/23 11:09 AM, Heming Zhao wrote:
> >>>>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 5/1/23 10:26 AM, Heming Zhao wrote:
> >>>>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> What's the journal status in this case?
> >>>>>>> jbd2 is marked with JBD2_ABORT
> >>>>>>>
> >>>>>>>> I wonder why commit thread is not working, which should flush journal
> >>>>>>>> and reset j_num_trans during commit cache.
> >>>>>>> The reason is in my above answer.
> >>>>>>>
> >>>>>>> below is the dmesg log for No. 266.
> >>>>>>>
> >>>>>>> ```
> >>>>>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> >>>>>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> >>>>>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> >>>>>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> >>>>>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> >>>>>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> >>>>>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> >>>>>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> >>>>>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> >>>>>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> >>>>>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> >>>>>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> >>>>>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> >>>>>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> >>>>>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> >>>>>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> >>>>>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> >>>>>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> >>>>>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  597.139758] root[5475]: run xfstest generic/266
> >>>>>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> >>>>>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> >>>>>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> >>>>>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> >>>>>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> >>>>>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> >>>>>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> >>>>>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> >>>>>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
> >>>>>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> >>>>>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
> >>>>>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> >>>>>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> >>>>>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> >>>>>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> >>>>>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> >>>>>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> >>>>>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> >>>>>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> >>>>>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> >>>>>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> >>>>>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> >>>>>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> >>>>>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> >>>>>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
> >>>>>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> ^C
> >>>>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
> >>>>>>> ```
> >>>>>>
> >>>>>> So the case is, flush journal in commit thread will never end if journal
> >>>>>> is abort, and then lead to umount hang since commit thread won't stop,
> >>>>>> right?
> >>>>> Yes.
> >>>>>
> >>>>>>
> >>>>>> If so, I don't think your changes in sync fs is a right solution. It
> >>>>>> seems it just for bypassing the testcase.
> >>>>> No. 
> >>>>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> >>>>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> >>>>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> >>>>> jbd2 to enter ABORT status.
> >>>>>
> >>>>
> >>>> What if there is no 'sync' in this case?
> >>>
> >>> It is same with issuing 'sync' case. Please differ:
> >>> 'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'
> >> I mean your solution is to reset j_num_trans in sync fs, but in case of
> >> no 'sync', how do you resolve it?
> > 
> > If I understand your question correctly, your meaning is to remove No. 266
> > 'sync' line, then how to handle this case.
> > My answer is this is my patch [2/2] job. I have tested it already.
> > 
> IMO, they are the same issue. 
> And it looks weird to fix it in sync fs.

I keep my viewpoint there are two issues and need two patches.
sync action should tidy up all write IOs (including journal transactions).
From the dmesg, it's obviously ocfs2 forgets to reset internal jbd2 counter
(j_num_trans).

But patch [2/2] (from another side) could fix No. 266 umount failed issue. If
you don't like patch [1/2], let's focus to review patch [2/2].

- Heming
diff mbox series

Patch

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 0b0e6a132101..bb3fa21e9b47 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -412,6 +412,9 @@  static int ocfs2_sync_fs(struct super_block *sb, int wait)
 			jbd2_log_wait_commit(osb->journal->j_journal,
 					     target);
 	}
+	/* kick commit thread to reset journal->j_num_trans */
+	if (atomic_read(&(osb->journal->j_num_trans)))
+		wake_up(&osb->checkpoint_event);
 	return 0;
 }