diff mbox series

ocfs2: fix defrag path triggering jbd2 ASSERT

Message ID 20230217003717.32469-1-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series ocfs2: fix defrag path triggering jbd2 ASSERT | expand

Commit Message

heming.zhao@suse.com Feb. 17, 2023, 12:37 a.m. UTC
code path:

ocfs2_ioctl_move_extents
 ocfs2_move_extents
  ocfs2_defrag_extent
   __ocfs2_move_extent
    + ocfs2_journal_access_di
    + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
    + ocfs2_journal_dirty //crash by jbs2 ASSERT

crash stacks:

PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
 #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
 #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
 #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
 #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
 #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
 #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
 #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
    [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
    RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
    RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
    R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
    R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
 #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
 #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]

Analysis

This bug has the same root cause of 'commit 7f27ec978b0e ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()")'.
For this bug, jbd2_journal_restart() is called by ocfs2_split_extent()
during defragmenting.

How to fix

For ocfs2_split_extent() can handle journal operations totally by itself.
Caller doesn't need to call journal access/dirty pair, and caller only
needs to call journal start/stop pair. The fix method is to remove journal
access/dirty from __ocfs2_move_extent().

The discussion for this patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
v1 -> v2:
- doesn't change any code.
- change patch subject from "ocfs2: fix J_ASSERT_JH in defragment path"
  to "ocfs2: fix defrag path triggering jbd2 ASSERT"
- rewrite/polish commit log

v1: https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html

---
 fs/ocfs2/move_extents.c | 10 ----------
 1 file changed, 10 deletions(-)

Comments

Joseph Qi Feb. 19, 2023, 11:12 a.m. UTC | #1
On 2/17/23 8:37 AM, Heming Zhao wrote:
> code path:
> 
> ocfs2_ioctl_move_extents
>  ocfs2_move_extents
>   ocfs2_defrag_extent
>    __ocfs2_move_extent
>     + ocfs2_journal_access_di
>     + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
>     + ocfs2_journal_dirty //crash by jbs2 ASSERT
> 
> crash stacks:
> 
> PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
>  #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
>  #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
>  #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
>  #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
>  #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
>  #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
>  #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
>     [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
>     RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
>     RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
>     RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
>     RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
>     R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
>     R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
>  #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
>  #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
> 
> Analysis
> 
> This bug has the same root cause of 'commit 7f27ec978b0e ("ocfs2: call
> ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()")'.
> For this bug, jbd2_journal_restart() is called by ocfs2_split_extent()
> during defragmenting.
> 
> How to fix
> 
> For ocfs2_split_extent() can handle journal operations totally by itself.
> Caller doesn't need to call journal access/dirty pair, and caller only
> needs to call journal start/stop pair. The fix method is to remove journal
> access/dirty from __ocfs2_move_extent().
> 
> The discussion for this patch:
> https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

> ---
> v1 -> v2:
> - doesn't change any code.
> - change patch subject from "ocfs2: fix J_ASSERT_JH in defragment path"
>   to "ocfs2: fix defrag path triggering jbd2 ASSERT"
> - rewrite/polish commit log
> 
> v1: https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html
> 
> ---
>  fs/ocfs2/move_extents.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
> index 192cad0662d8..6251748c695b 100644
> --- a/fs/ocfs2/move_extents.c
> +++ b/fs/ocfs2/move_extents.c
> @@ -105,14 +105,6 @@ static int __ocfs2_move_extent(handle_t *handle,
>  	 */
>  	replace_rec.e_flags = ext_flags & ~OCFS2_EXT_REFCOUNTED;
>  
> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode),
> -				      context->et.et_root_bh,
> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> -	if (ret) {
> -		mlog_errno(ret);
> -		goto out;
> -	}
> -
>  	ret = ocfs2_split_extent(handle, &context->et, path, index,
>  				 &replace_rec, context->meta_ac,
>  				 &context->dealloc);
> @@ -121,8 +113,6 @@ static int __ocfs2_move_extent(handle_t *handle,
>  		goto out;
>  	}
>  
> -	ocfs2_journal_dirty(handle, context->et.et_root_bh);
> -
>  	context->new_phys_cpos = new_p_cpos;
>  
>  	/*
diff mbox series

Patch

diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
index 192cad0662d8..6251748c695b 100644
--- a/fs/ocfs2/move_extents.c
+++ b/fs/ocfs2/move_extents.c
@@ -105,14 +105,6 @@  static int __ocfs2_move_extent(handle_t *handle,
 	 */
 	replace_rec.e_flags = ext_flags & ~OCFS2_EXT_REFCOUNTED;
 
-	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode),
-				      context->et.et_root_bh,
-				      OCFS2_JOURNAL_ACCESS_WRITE);
-	if (ret) {
-		mlog_errno(ret);
-		goto out;
-	}
-
 	ret = ocfs2_split_extent(handle, &context->et, path, index,
 				 &replace_rec, context->meta_ac,
 				 &context->dealloc);
@@ -121,8 +113,6 @@  static int __ocfs2_move_extent(handle_t *handle,
 		goto out;
 	}
 
-	ocfs2_journal_dirty(handle, context->et.et_root_bh);
-
 	context->new_phys_cpos = new_p_cpos;
 
 	/*