diff mbox series

[1/2] ocfs2: fix jbd2 assertion in defragment path

Message ID 20220521101416.29793-1-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] ocfs2: fix jbd2 assertion in defragment path | expand

Commit Message

heming.zhao@suse.com May 21, 2022, 10:14 a.m. UTC
defragfs triggered jbd2 report ASSERT when working.

code path:

ocfs2_ioctl_move_extents
 ocfs2_move_extents
  __ocfs2_move_extents_range
   ocfs2_defrag_extent
    __ocfs2_move_extent
     + ocfs2_journal_access_di
     + ocfs2_split_extent  //[1]
     + ocfs2_journal_dirty //crash

[1]: ocfs2_split_extent called ocfs2_extend_trans, which committed
dirty buffers then restarted the transaction. The changed transaction
triggered jbd2 ASSERT.

crash stacks:

PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
 #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
 #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
 #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
 #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
 #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
 #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
 #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
    [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
    RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
    RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
    R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
    R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
 #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
 #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]

The fix method is simple, ocfs2_split_extent includes three paths. all
the paths already have journal access/dirty pair. We only need to
remove journal access/dirty from __ocfs2_move_extent.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/move_extents.c | 10 ----------
 1 file changed, 10 deletions(-)

Comments

Joseph Qi June 2, 2022, 9:34 a.m. UTC | #1
Hi,

Sorry for the late response since I was busy on other things...

On 5/21/22 6:14 PM, Heming Zhao wrote:
> defragfs triggered jbd2 report ASSERT when working.
> 
> code path:
> 
> ocfs2_ioctl_move_extents
>  ocfs2_move_extents
>   __ocfs2_move_extents_range
>    ocfs2_defrag_extent
>     __ocfs2_move_extent
>      + ocfs2_journal_access_di
>      + ocfs2_split_extent  //[1]
>      + ocfs2_journal_dirty //crash
> 
> [1]: ocfs2_split_extent called ocfs2_extend_trans, which committed
> dirty buffers then restarted the transaction. The changed transaction
> triggered jbd2 ASSERT.
> 
> crash stacks:
> 
> PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
>  #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
>  #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
>  #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
>  #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
>  #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
>  #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
>  #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
>     [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
>     RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
>     RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
>     RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
>     RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
>     R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
>     R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
>  #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
>  #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
> 
> The fix method is simple, ocfs2_split_extent includes three paths. all
> the paths already have journal access/dirty pair. We only need to
> remove journal access/dirty from __ocfs2_move_extent.
> 

I am not sure what you mean by "all three paths have journal access/
dirty pair".

It seems we can't do it in your way, as journal access has different
ocfs2_trigger with different offset, e.g. dinode is different from
extent block.

Or we may reserve enough credits to resolve this issue? 

Thanks,
Joseph
heming.zhao@suse.com June 4, 2022, 12:08 a.m. UTC | #2
On Thu, Jun 02, 2022 at 05:34:18PM +0800, Joseph Qi wrote:
> Hi,
> 
> Sorry for the late response since I was busy on other things...
> 
> On 5/21/22 6:14 PM, Heming Zhao wrote:
> > defragfs triggered jbd2 report ASSERT when working.
> > 
> > code path:
> > 
> > ocfs2_ioctl_move_extents
> >  ocfs2_move_extents
> >   __ocfs2_move_extents_range
> >    ocfs2_defrag_extent
> >     __ocfs2_move_extent
> >      + ocfs2_journal_access_di
> >      + ocfs2_split_extent  //[1]
> >      + ocfs2_journal_dirty //crash
> > 
> > [1]: ocfs2_split_extent called ocfs2_extend_trans, which committed
> > dirty buffers then restarted the transaction. The changed transaction
> > triggered jbd2 ASSERT.
> > 
> > crash stacks:
> > 
> > PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
> >  #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
> >  #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
> >  #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
> >  #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
> >  #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
> >  #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
> >  #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
> >     [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
> >     RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
> >     RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
> >     RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
> >     RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
> >     R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
> >     R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >  #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
> >  #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
> >  #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
> > 
> > The fix method is simple, ocfs2_split_extent includes three paths. all
> > the paths already have journal access/dirty pair. We only need to
> > remove journal access/dirty from __ocfs2_move_extent.
> > 
> 
> I am not sure what you mean by "all three paths have journal access/
> dirty pair".

I am a newbie for ocfs2 & jbd2, I can't make sure this [1/2] patch is correct.
Below is my analysis.

All three paths (below 1.[123]):

__ocfs2_move_extent
 + ocfs2_journal_access_di
 |
 + ocfs2_split_extent           //[1]
 |  + ocfs2_replace_extent_rec  //[1.1]
 |  + ocfs2_split_and_insert    //[1.2]
 |  + ocfs2_try_to_merge_extent //[1.3]
 |
 + + ocfs2_journal_dirty       //crash

All three paths have itselves journal access/dirty pair.
So the access/dirty pair in caller __ocfs2_move_extent is unnecessary.

> 
> It seems we can't do it in your way, as journal access has different
> ocfs2_trigger with different offset, e.g. dinode is different from
> extent block.

There are 3 ocfs2_split_extent callers:
 fs/ocfs2/alloc.c <<ocfs2_change_extent_flag>>
 fs/ocfs2/move_extents.c <<__ocfs2_move_extent>>
 fs/ocfs2/refcounttree.c <<ocfs2_clear_ext_refcount>>

We can see, except defrag flow (caller __ocfs2_move_extent), other two
don't use jbd2 access/dirty pair around ocfs2_split_extent.

In my view, in defrag code flow, the struct ocfs2_triggers is changed many
times. The outermost ocfs2_triggers (belongs to __ocfs2_move_extent) must be
overwritten in sub-routine ocfs2_split_extent.

In another perspective, why ocfs2_change_extent_flag & ocfs2_clear_ext_refcount
don't use journal access/dirty pair to wrap ocfs2_split_extent? 

> 
> Or we may reserve enough credits to resolve this issue? 
> 

In my view, credits is not the key for this crash issue. The crash is
triggered by transaction changed:

crash code:

```jbd2_journal_dirty_metadata()

if (data_race(jh->b_transaction != transaction &&
		jh->b_next_transaction != transaction)) {
	spin_lock(&jh->b_state_lock);
	J_ASSERT_JH(jh, jh->b_transaction == transaction ||
		jh->b_next_transaction == transaction);
	spin_unlock(&jh->b_state_lock);
}
```

why transaction is changed?

ocfs2_split_extent
 ocfs2_split_and_insert
  ocfs2_do_insert_extent
    ocfs2_insert_path
       ocfs2_extend_trans //change transaction, and pls note this func comments

Thanks,
Heming
Joseph Qi June 10, 2022, 7:46 a.m. UTC | #3
On 6/4/22 8:08 AM, Heming Zhao wrote:
> On Thu, Jun 02, 2022 at 05:34:18PM +0800, Joseph Qi wrote:
>> Hi,
>>
>> Sorry for the late response since I was busy on other things...
>>
>> On 5/21/22 6:14 PM, Heming Zhao wrote:
>>> defragfs triggered jbd2 report ASSERT when working.
>>>
>>> code path:
>>>
>>> ocfs2_ioctl_move_extents
>>>  ocfs2_move_extents
>>>   __ocfs2_move_extents_range
>>>    ocfs2_defrag_extent
>>>     __ocfs2_move_extent
>>>      + ocfs2_journal_access_di
>>>      + ocfs2_split_extent  //[1]
>>>      + ocfs2_journal_dirty //crash
>>>
>>> [1]: ocfs2_split_extent called ocfs2_extend_trans, which committed
>>> dirty buffers then restarted the transaction. The changed transaction
>>> triggered jbd2 ASSERT.
>>>
>>> crash stacks:
>>>
>>> PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
>>>  #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
>>>  #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
>>>  #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
>>>  #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
>>>  #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
>>>  #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
>>>  #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
>>>     [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
>>>     RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
>>>     RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
>>>     RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
>>>     RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
>>>     R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
>>>     R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
>>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>  #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
>>>  #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
>>>  #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
>>>
>>> The fix method is simple, ocfs2_split_extent includes three paths. all
>>> the paths already have journal access/dirty pair. We only need to
>>> remove journal access/dirty from __ocfs2_move_extent.
>>>
>>
>> I am not sure what you mean by "all three paths have journal access/
>> dirty pair".
> 
> I am a newbie for ocfs2 & jbd2, I can't make sure this [1/2] patch is correct.
> Below is my analysis.
> 
> All three paths (below 1.[123]):
> 
> __ocfs2_move_extent
>  + ocfs2_journal_access_di
>  |
>  + ocfs2_split_extent           //[1]
>  |  + ocfs2_replace_extent_rec  //[1.1]
>  |  + ocfs2_split_and_insert    //[1.2]
>  |  + ocfs2_try_to_merge_extent //[1.3]
>  |
>  + + ocfs2_journal_dirty       //crash
> 
> All three paths have itselves journal access/dirty pair.
> So the access/dirty pair in caller __ocfs2_move_extent is unnecessary.
> 

ocfs2_journal_access_xxx() is to notify jbd2 to do a specific operation
to a bh. I said in my last mail, they are for different bh.

>>
>> It seems we can't do it in your way, as journal access has different
>> ocfs2_trigger with different offset, e.g. dinode is different from
>> extent block.
> 
> There are 3 ocfs2_split_extent callers:
>  fs/ocfs2/alloc.c <<ocfs2_change_extent_flag>>
>  fs/ocfs2/move_extents.c <<__ocfs2_move_extent>>
>  fs/ocfs2/refcounttree.c <<ocfs2_clear_ext_refcount>>
> 
> We can see, except defrag flow (caller __ocfs2_move_extent), other two
> don't use jbd2 access/dirty pair around ocfs2_split_extent.
> 

Not exactly, you have to take look the whole flow.

Thanks,
Joseph
diff mbox series

Patch

diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
index 192cad0662d8..6251748c695b 100644
--- a/fs/ocfs2/move_extents.c
+++ b/fs/ocfs2/move_extents.c
@@ -105,14 +105,6 @@  static int __ocfs2_move_extent(handle_t *handle,
 	 */
 	replace_rec.e_flags = ext_flags & ~OCFS2_EXT_REFCOUNTED;
 
-	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode),
-				      context->et.et_root_bh,
-				      OCFS2_JOURNAL_ACCESS_WRITE);
-	if (ret) {
-		mlog_errno(ret);
-		goto out;
-	}
-
 	ret = ocfs2_split_extent(handle, &context->et, path, index,
 				 &replace_rec, context->meta_ac,
 				 &context->dealloc);
@@ -121,8 +113,6 @@  static int __ocfs2_move_extent(handle_t *handle,
 		goto out;
 	}
 
-	ocfs2_journal_dirty(handle, context->et.et_root_bh);
-
 	context->new_phys_cpos = new_p_cpos;
 
 	/*