diff mbox series

[2/2] ocfs2: add error handling path when jbd2 enter ABORT status

Message ID 20230430031302.15597-2-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] ocfs2: fix missing reset j_num_trans for sync | expand

Commit Message

heming.zhao@suse.com April 30, 2023, 3:13 a.m. UTC
fstest generic cases 347 361 628 629 trigger a same issue:
When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
journal.

This commit gives ocfs2 ability to handle jbd2 ABORT case.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/alloc.c      | 10 ++++++----
 fs/ocfs2/journal.c    |  5 +++++
 fs/ocfs2/localalloc.c |  3 +++
 3 files changed, 14 insertions(+), 4 deletions(-)

Comments

heming.zhao@suse.com April 30, 2023, 3:16 a.m. UTC | #1
More info for this patch, maybe I should write below info in commit log.

 From the comment of __ocfs2_abort(), there is another way to handle

journal ABORT: panic.
And fstest generic test NO. 648 will trigger
  ocfs2_abort then make kernel
panic. I don't like the panic style, ocfs2
  should elegantly handle journal
abnormal case.

Thanks,
Heming

On 4/30/23 11:13 AM, Heming Zhao wrote:
> fstest generic cases 347 361 628 629 trigger a same issue:
> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> journal.
> 
> This commit gives ocfs2 ability to handle jbd2 ABORT case.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>   fs/ocfs2/alloc.c      | 10 ++++++----
>   fs/ocfs2/journal.c    |  5 +++++
>   fs/ocfs2/localalloc.c |  3 +++
>   3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
heming.zhao@suse.com April 30, 2023, 3:22 a.m. UTC | #2
(sorry, my last mail format is by Thunderbird, resend by neomutt.)

----
More info for this patch, maybe I should write below info in commit log.

From the comment of __ocfs2_abort(), there is another way to handle
journal ABORT: panic.
And fstest generic test NO. 648 will trigger ocfs2_abort then make
kernel panic. I don't like the panic style, ocfs2 should elegantly
handle journal abnormal case.

Thanks,
Heming

On Sun, Apr 30, 2023 at 11:13:02AM +0800, Heming Zhao wrote:
> fstest generic cases 347 361 628 629 trigger a same issue:
> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> journal.
> 
> This commit gives ocfs2 ability to handle jbd2 ABORT case.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/alloc.c      | 10 ++++++----
>  fs/ocfs2/journal.c    |  5 +++++
>  fs/ocfs2/localalloc.c |  3 +++
>  3 files changed, 14 insertions(+), 4 deletions(-)
>
Joseph Qi May 4, 2023, 2:27 a.m. UTC | #3
On 4/30/23 11:13 AM, Heming Zhao wrote:
> fstest generic cases 347 361 628 629 trigger a same issue:
> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> journal.

What's the end user impact?

JBD2_ABORT indicates a fatal error happens, either in jounral layer or
filesystem. And we should not commit any further transactions.
It seems that we may unify the behavior like:

if (is_journal_aborted(journal))
	return -EROFS;

Thanks,
Joseph

> 
> This commit gives ocfs2 ability to handle jbd2 ABORT case.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/alloc.c      | 10 ++++++----
>  fs/ocfs2/journal.c    |  5 +++++
>  fs/ocfs2/localalloc.c |  3 +++
>  3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index 51c93929a146..d90961a1c433 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -6308,11 +6308,13 @@ void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
>  
>  	if (tl_inode) {
>  		cancel_delayed_work(&osb->osb_truncate_log_wq);
> -		flush_workqueue(osb->ocfs2_wq);
> +		if (!is_journal_aborted(osb->journal->j_journal)) {
> +			flush_workqueue(osb->ocfs2_wq);
>  
> -		status = ocfs2_flush_truncate_log(osb);
> -		if (status < 0)
> -			mlog_errno(status);
> +			status = ocfs2_flush_truncate_log(osb);
> +			if (status < 0)
> +				mlog_errno(status);
> +		}
>  
>  		brelse(osb->osb_tl_bh);
>  		iput(osb->osb_tl_inode);
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index 25d8072ccfce..2798067a2f66 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -312,11 +312,16 @@ static int ocfs2_commit_cache(struct ocfs2_super *osb)
>  	status = jbd2_journal_flush(journal->j_journal, 0);
>  	jbd2_journal_unlock_updates(journal->j_journal);
>  	if (status < 0) {
> +		if (is_journal_aborted(journal->j_journal)) {
> +			ocfs2_error(osb->sb, "jbd2 status: ABORT.\n");
> +			goto reset;
> +		}
>  		up_write(&journal->j_trans_barrier);
>  		mlog_errno(status);
>  		goto finally;
>  	}
>  
> +reset:
>  	ocfs2_inc_trans_id(journal);
>  
>  	flushed = atomic_read(&journal->j_num_trans);
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index c4426d12a2ad..e2e3400717b0 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -378,6 +378,9 @@ void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
>  	if (osb->ocfs2_wq)
>  		flush_workqueue(osb->ocfs2_wq);
>  
> +	if (is_journal_aborted(osb->journal->j_journal))
> +		goto out;
> +
>  	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
>  		goto out;
>
heming.zhao@suse.com May 4, 2023, 6:21 a.m. UTC | #4
On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> 
> 
> On 4/30/23 11:13 AM, Heming Zhao wrote:
> > fstest generic cases 347 361 628 629 trigger a same issue:
> > When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> > journal.
> 
> What's the end user impact?

There are two impacts:

1. this issue makes unmount command hanging.

No one likes or accepts filesystem unmount failure when underlying devices meets
error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
successfully.

2. for developers to verify their patch doesn't make any regression.

fstest is a famious & important fs testsuite.
(Yes, I know ocfs2 has itself testsuite ocfs2_test.)

Current status, there are many test cases (about 9 in my env) cause fstest
hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
gfs2 only has 1 hanging case.

In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
testsuite. Long-term target is to make ocfs2 to pass all the testsuite.

On kernel 6.2.9-1, the fstest 'quick' test group result:
(the result based on my two patches [1/2] & [2/2])
```
./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648

Failures: generic/003 generic/013 generic/015 generic/040 generic/041
generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
generic/636 generic/702
Failed 49 of 568 tests
```

> 
> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
> filesystem. And we should not commit any further transactions.
> It seems that we may unify the behavior like:
> 
> if (is_journal_aborted(journal))
> 	return -EROFS;
> 
> 
IIUC, do you mean add above code in ocfs2_commit_cache or in
ocfs2_commit_thread?
1>
If in ocfs2_commit_cache, this fix doesn't take any effect for unmount
failure(hanging).
When return -EROFS, the caller ocfs2_commit_thread just print error message and
keep going to retry.
2>
If in ocfs2_commit_thread, it makes commit thread exit/stop without notifying
other ocfs2 paths. Later wake_up(&osb->checkpoint_event) won't take effect,
ocfs2 will enter chaose. 

In my view, there is missing a place to set fs RO. So [2/2] in
ocfs2_commit_cache, uses ocfs2_error to mark RO before returning status (<0
value). The ocfs2_error is the right place to mark RO.

- Heming

> > 
> > This commit gives ocfs2 ability to handle jbd2 ABORT case.
> > 
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/alloc.c      | 10 ++++++----
> >  fs/ocfs2/journal.c    |  5 +++++
> >  fs/ocfs2/localalloc.c |  3 +++
> >  3 files changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> > index 51c93929a146..d90961a1c433 100644
> > --- a/fs/ocfs2/alloc.c
> > +++ b/fs/ocfs2/alloc.c
> > @@ -6308,11 +6308,13 @@ void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
> >  
> >  	if (tl_inode) {
> >  		cancel_delayed_work(&osb->osb_truncate_log_wq);
> > -		flush_workqueue(osb->ocfs2_wq);
> > +		if (!is_journal_aborted(osb->journal->j_journal)) {
> > +			flush_workqueue(osb->ocfs2_wq);
> >  
> > -		status = ocfs2_flush_truncate_log(osb);
> > -		if (status < 0)
> > -			mlog_errno(status);
> > +			status = ocfs2_flush_truncate_log(osb);
> > +			if (status < 0)
> > +				mlog_errno(status);
> > +		}
> >  
> >  		brelse(osb->osb_tl_bh);
> >  		iput(osb->osb_tl_inode);
> > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> > index 25d8072ccfce..2798067a2f66 100644
> > --- a/fs/ocfs2/journal.c
> > +++ b/fs/ocfs2/journal.c
> > @@ -312,11 +312,16 @@ static int ocfs2_commit_cache(struct ocfs2_super *osb)
> >  	status = jbd2_journal_flush(journal->j_journal, 0);
> >  	jbd2_journal_unlock_updates(journal->j_journal);
> >  	if (status < 0) {
> > +		if (is_journal_aborted(journal->j_journal)) {
> > +			ocfs2_error(osb->sb, "jbd2 status: ABORT.\n");
> > +			goto reset;
> > +		}
> >  		up_write(&journal->j_trans_barrier);
> >  		mlog_errno(status);
> >  		goto finally;
> >  	}
> >  
> > +reset:
> >  	ocfs2_inc_trans_id(journal);
> >  
> >  	flushed = atomic_read(&journal->j_num_trans);
> > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> > index c4426d12a2ad..e2e3400717b0 100644
> > --- a/fs/ocfs2/localalloc.c
> > +++ b/fs/ocfs2/localalloc.c
> > @@ -378,6 +378,9 @@ void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
> >  	if (osb->ocfs2_wq)
> >  		flush_workqueue(osb->ocfs2_wq);
> >  
> > +	if (is_journal_aborted(osb->journal->j_journal))
> > +		goto out;
> > +
> >  	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
> >  		goto out;
> >
Joseph Qi May 4, 2023, 7:34 a.m. UTC | #5
On 5/4/23 2:21 PM, Heming Zhao wrote:
> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>
>>
>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
>>> journal.
>>
>> What's the end user impact?
> 
> There are two impacts:
> 
> 1. this issue makes unmount command hanging.
> 
> No one likes or accepts filesystem unmount failure when underlying devices meets
> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
> successfully.
> 

So umount hang should be in the patch description, right?

> 2. for developers to verify their patch doesn't make any regression.
> 
> fstest is a famious & important fs testsuite.
> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> 
> Current status, there are many test cases (about 9 in my env) cause fstest
> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
> gfs2 only has 1 hanging case.
> 
> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
> 
> On kernel 6.2.9-1, the fstest 'quick' test group result:
> (the result based on my two patches [1/2] & [2/2])
> ```
> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> 
> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
> generic/636 generic/702
> Failed 49 of 568 tests
> ```
> 
>>
>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
>> filesystem. And we should not commit any further transactions.
>> It seems that we may unify the behavior like:
>>
>> if (is_journal_aborted(journal))
>> 	return -EROFS;
>>
>>
> IIUC, do you mean add above code in ocfs2_commit_cache or in
> ocfs2_commit_thread?

Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
Actually we've handled this case before, but just limit the print. But
it seems not enough now.
BTW, the basic rule here is, we don't want to change journal to prevent
other nodes corrupting the filesystem.

> 1>
> If in ocfs2_commit_cache, this fix doesn't take any effect for unmount
> failure(hanging).
> When return -EROFS, the caller ocfs2_commit_thread just print error message and
> keep going to retry.
> 2>
> If in ocfs2_commit_thread, it makes commit thread exit/stop without notifying
> other ocfs2 paths. Later wake_up(&osb->checkpoint_event) won't take effect,
> ocfs2 will enter chaose. 
> 
> In my view, there is missing a place to set fs RO. So [2/2] in
> ocfs2_commit_cache, uses ocfs2_error to mark RO before returning status (<0
> value). The ocfs2_error is the right place to mark RO.
> 
> - Heming
> 
>>>
>>> This commit gives ocfs2 ability to handle jbd2 ABORT case.
>>>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>>  fs/ocfs2/alloc.c      | 10 ++++++----
>>>  fs/ocfs2/journal.c    |  5 +++++
>>>  fs/ocfs2/localalloc.c |  3 +++
>>>  3 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>> index 51c93929a146..d90961a1c433 100644
>>> --- a/fs/ocfs2/alloc.c
>>> +++ b/fs/ocfs2/alloc.c
>>> @@ -6308,11 +6308,13 @@ void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
>>>  
>>>  	if (tl_inode) {
>>>  		cancel_delayed_work(&osb->osb_truncate_log_wq);
>>> -		flush_workqueue(osb->ocfs2_wq);
>>> +		if (!is_journal_aborted(osb->journal->j_journal)) {
>>> +			flush_workqueue(osb->ocfs2_wq);
>>>  
>>> -		status = ocfs2_flush_truncate_log(osb);
>>> -		if (status < 0)
>>> -			mlog_errno(status);
>>> +			status = ocfs2_flush_truncate_log(osb);
>>> +			if (status < 0)
>>> +				mlog_errno(status);
>>> +		}
>>>  
>>>  		brelse(osb->osb_tl_bh);
>>>  		iput(osb->osb_tl_inode);
>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>>> index 25d8072ccfce..2798067a2f66 100644
>>> --- a/fs/ocfs2/journal.c
>>> +++ b/fs/ocfs2/journal.c
>>> @@ -312,11 +312,16 @@ static int ocfs2_commit_cache(struct ocfs2_super *osb)
>>>  	status = jbd2_journal_flush(journal->j_journal, 0);
>>>  	jbd2_journal_unlock_updates(journal->j_journal);
>>>  	if (status < 0) {
>>> +		if (is_journal_aborted(journal->j_journal)) {
>>> +			ocfs2_error(osb->sb, "jbd2 status: ABORT.\n");
>>> +			goto reset;
>>> +		}
>>>  		up_write(&journal->j_trans_barrier);
>>>  		mlog_errno(status);
>>>  		goto finally;
>>>  	}
>>>  
>>> +reset:
>>>  	ocfs2_inc_trans_id(journal);
>>>  
>>>  	flushed = atomic_read(&journal->j_num_trans);
>>> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
>>> index c4426d12a2ad..e2e3400717b0 100644
>>> --- a/fs/ocfs2/localalloc.c
>>> +++ b/fs/ocfs2/localalloc.c
>>> @@ -378,6 +378,9 @@ void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
>>>  	if (osb->ocfs2_wq)
>>>  		flush_workqueue(osb->ocfs2_wq);
>>>  
>>> +	if (is_journal_aborted(osb->journal->j_journal))
>>> +		goto out;
>>> +
>>>  	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
>>>  		goto out;
>>>
heming.zhao@suse.com May 4, 2023, 8:02 a.m. UTC | #6
On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
> 
> 
> On 5/4/23 2:21 PM, Heming Zhao wrote:
> > On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>> fstest generic cases 347 361 628 629 trigger a same issue:
> >>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> >>> journal.
> >>
> >> What's the end user impact?
> > 
> > There are two impacts:
> > 
> > 1. this issue makes unmount command hanging.
> > 
> > No one likes or accepts filesystem unmount failure when underlying devices meets
> > error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
> > successfully.
> > 
> 
> So umount hang should be in the patch description, right?

OK.

> 
> > 2. for developers to verify their patch doesn't make any regression.
> > 
> > fstest is a famious & important fs testsuite.
> > (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> > 
> > Current status, there are many test cases (about 9 in my env) cause fstest
> > hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
> > gfs2 only has 1 hanging case.
> > 
> > In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
> > testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
> > 
> > On kernel 6.2.9-1, the fstest 'quick' test group result:
> > (the result based on my two patches [1/2] & [2/2])
> > ```
> > ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> > 
> > Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> > generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
> > generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
> > generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
> > generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
> > generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
> > generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
> > generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
> > generic/636 generic/702
> > Failed 49 of 568 tests
> > ```
> > 
> >>
> >> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
> >> filesystem. And we should not commit any further transactions.
> >> It seems that we may unify the behavior like:
> >>
> >> if (is_journal_aborted(journal))
> >> 	return -EROFS;
> >>
> >>
> > IIUC, do you mean add above code in ocfs2_commit_cache or in
> > ocfs2_commit_thread?
> 
> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
> Actually we've handled this case before, but just limit the print. But
> it seems not enough now.

I wrote in my previous mail. Follow your idea, The code should be:

if (is_journal_aborted(journal)) {
	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
	return -EROFS;
}

Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
IOs will trigger IO error then trigger RO status. This flow is wrong, we should
mark RO as early as possible when JBD ABORT happens. In my view, the best place
is my [2/2] patch code which in ocfs2_commit_cache().

> BTW, the basic rule here is, we don't want to change journal to prevent
> other nodes corrupting the filesystem.

If my memory is correct, every node has itself special journal partition.
If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
No other node could corrupting the fs.

> 
> > 1>
> > If in ocfs2_commit_cache, this fix doesn't take any effect for unmount
> > failure(hanging).
> > When return -EROFS, the caller ocfs2_commit_thread just print error message and
> > keep going to retry.
> > 2>
> > If in ocfs2_commit_thread, it makes commit thread exit/stop without notifying
> > other ocfs2 paths. Later wake_up(&osb->checkpoint_event) won't take effect,
> > ocfs2 will enter chaose. 
> > 
> > In my view, there is missing a place to set fs RO. So [2/2] in
> > ocfs2_commit_cache, uses ocfs2_error to mark RO before returning status (<0
> > value). The ocfs2_error is the right place to mark RO.
> > 
> > - Heming
> > 
> >>>
> >>> This commit gives ocfs2 ability to handle jbd2 ABORT case.
> >>>
> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>> ---
> >>>  fs/ocfs2/alloc.c      | 10 ++++++----
> >>>  fs/ocfs2/journal.c    |  5 +++++
> >>>  fs/ocfs2/localalloc.c |  3 +++
> >>>  3 files changed, 14 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> >>> index 51c93929a146..d90961a1c433 100644
> >>> --- a/fs/ocfs2/alloc.c
> >>> +++ b/fs/ocfs2/alloc.c
> >>> @@ -6308,11 +6308,13 @@ void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
> >>>  
> >>>  	if (tl_inode) {
> >>>  		cancel_delayed_work(&osb->osb_truncate_log_wq);
> >>> -		flush_workqueue(osb->ocfs2_wq);
> >>> +		if (!is_journal_aborted(osb->journal->j_journal)) {
> >>> +			flush_workqueue(osb->ocfs2_wq);
> >>>  
> >>> -		status = ocfs2_flush_truncate_log(osb);
> >>> -		if (status < 0)
> >>> -			mlog_errno(status);
> >>> +			status = ocfs2_flush_truncate_log(osb);
> >>> +			if (status < 0)
> >>> +				mlog_errno(status);
> >>> +		}
> >>>  
> >>>  		brelse(osb->osb_tl_bh);
> >>>  		iput(osb->osb_tl_inode);
> >>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> >>> index 25d8072ccfce..2798067a2f66 100644
> >>> --- a/fs/ocfs2/journal.c
> >>> +++ b/fs/ocfs2/journal.c
> >>> @@ -312,11 +312,16 @@ static int ocfs2_commit_cache(struct ocfs2_super *osb)
> >>>  	status = jbd2_journal_flush(journal->j_journal, 0);
> >>>  	jbd2_journal_unlock_updates(journal->j_journal);
> >>>  	if (status < 0) {
> >>> +		if (is_journal_aborted(journal->j_journal)) {
> >>> +			ocfs2_error(osb->sb, "jbd2 status: ABORT.\n");
> >>> +			goto reset;
> >>> +		}
> >>>  		up_write(&journal->j_trans_barrier);
> >>>  		mlog_errno(status);
> >>>  		goto finally;
> >>>  	}
> >>>  
> >>> +reset:
> >>>  	ocfs2_inc_trans_id(journal);
> >>>  
> >>>  	flushed = atomic_read(&journal->j_num_trans);
> >>> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> >>> index c4426d12a2ad..e2e3400717b0 100644
> >>> --- a/fs/ocfs2/localalloc.c
> >>> +++ b/fs/ocfs2/localalloc.c
> >>> @@ -378,6 +378,9 @@ void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
> >>>  	if (osb->ocfs2_wq)
> >>>  		flush_workqueue(osb->ocfs2_wq);
> >>>  
> >>> +	if (is_journal_aborted(osb->journal->j_journal))
> >>> +		goto out;
> >>> +
> >>>  	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
> >>>  		goto out;
> >>>
Joseph Qi May 4, 2023, 9:41 a.m. UTC | #7
On 5/4/23 4:02 PM, Heming Zhao wrote:
> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
>>
>>
>> On 5/4/23 2:21 PM, Heming Zhao wrote:
>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
>>>>> journal.
>>>>
>>>> What's the end user impact?
>>>
>>> There are two impacts:
>>>
>>> 1. this issue makes unmount command hanging.
>>>
>>> No one likes or accepts filesystem unmount failure when underlying devices meets
>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
>>> successfully.
>>>
>>
>> So umount hang should be in the patch description, right?
> 
> OK.
> 
>>
>>> 2. for developers to verify their patch doesn't make any regression.
>>>
>>> fstest is a famious & important fs testsuite.
>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
>>>
>>> Current status, there are many test cases (about 9 in my env) cause fstest
>>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
>>> gfs2 only has 1 hanging case.
>>>
>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
>>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
>>>
>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
>>> (the result based on my two patches [1/2] & [2/2])
>>> ```
>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
>>>
>>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
>>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
>>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
>>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
>>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
>>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
>>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
>>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
>>> generic/636 generic/702
>>> Failed 49 of 568 tests
>>> ```
>>>
>>>>
>>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
>>>> filesystem. And we should not commit any further transactions.
>>>> It seems that we may unify the behavior like:
>>>>
>>>> if (is_journal_aborted(journal))
>>>> 	return -EROFS;
>>>>
>>>>
>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
>>> ocfs2_commit_thread?
>>
>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
>> Actually we've handled this case before, but just limit the print. But
>> it seems not enough now.
> 
> I wrote in my previous mail. Follow your idea, The code should be:
> 
> if (is_journal_aborted(journal)) {
> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
> 	return -EROFS;
> }
> 
> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
> IOs will trigger IO error then trigger RO status. This flow is wrong, we should
> mark RO as early as possible when JBD ABORT happens. In my view, the best place
> is my [2/2] patch code which in ocfs2_commit_cache().
> 

Agree, but ocfs2_abort() is more appropriate here, see
ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
has to change to read-only accordingly.

>> BTW, the basic rule here is, we don't want to change journal to prevent
>> other nodes corrupting the filesystem.
> 
> If my memory is correct, every node has itself special journal partition.
> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
> No other node could corrupting the fs.
> 
I don't think it's a good idea to modify journal even flush fails. Why
not catch EROFS in commit thread and break? Now we can only expect
umount continues with error, right?
heming.zhao@suse.com May 4, 2023, 4:20 p.m. UTC | #8
On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:
> 
> 
> On 5/4/23 4:02 PM, Heming Zhao wrote:
> > On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/4/23 2:21 PM, Heming Zhao wrote:
> >>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>>>> fstest generic cases 347 361 628 629 trigger a same issue:
> >>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> >>>>> journal.
> >>>>
> >>>> What's the end user impact?
> >>>
> >>> There are two impacts:
> >>>
> >>> 1. this issue makes unmount command hanging.
> >>>
> >>> No one likes or accepts filesystem unmount failure when underlying devices meets
> >>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
> >>> successfully.
> >>>
> >>
> >> So umount hang should be in the patch description, right?
> > 
> > OK.
> > 
> >>
> >>> 2. for developers to verify their patch doesn't make any regression.
> >>>
> >>> fstest is a famious & important fs testsuite.
> >>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> >>>
> >>> Current status, there are many test cases (about 9 in my env) cause fstest
> >>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
> >>> gfs2 only has 1 hanging case.
> >>>
> >>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
> >>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
> >>>
> >>> On kernel 6.2.9-1, the fstest 'quick' test group result:
> >>> (the result based on my two patches [1/2] & [2/2])
> >>> ```
> >>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> >>>
> >>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> >>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
> >>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
> >>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
> >>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
> >>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
> >>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
> >>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
> >>> generic/636 generic/702
> >>> Failed 49 of 568 tests
> >>> ```
> >>>
> >>>>
> >>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
> >>>> filesystem. And we should not commit any further transactions.
> >>>> It seems that we may unify the behavior like:
> >>>>
> >>>> if (is_journal_aborted(journal))
> >>>> 	return -EROFS;
> >>>>
> >>>>
> >>> IIUC, do you mean add above code in ocfs2_commit_cache or in
> >>> ocfs2_commit_thread?
> >>
> >> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
> >> Actually we've handled this case before, but just limit the print. But
> >> it seems not enough now.
> > 
> > I wrote in my previous mail. Follow your idea, The code should be:
> > 
> > if (is_journal_aborted(journal)) {
> > 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
> > 	return -EROFS;
> > }
> > 
> > Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> > Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
> > IOs will trigger IO error then trigger RO status. This flow is wrong, we should
> > mark RO as early as possible when JBD ABORT happens. In my view, the best place
> > is my [2/2] patch code which in ocfs2_commit_cache().
> > 
> 
> Agree, but ocfs2_abort() is more appropriate here, see
> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
> has to change to read-only accordingly.

You are right. I ever used ocfs2_abort() then got panic. :)
So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
mail has mess format by thunderbird). I copy the mail content here:
```
More info for this patch, maybe I should write below info in commit log.

From the comment of __ocfs2_abort(), there is another way to handle
journal ABORT: panic.
And fstest generic test NO. 648 will trigger ocfs2_abort then make
kernel panic. I don't like the panic style, ocfs2 should elegantly
handle journal abnormal case.
```

fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
find my previous mail, I ran the fstest with parameter '-e generic/648', which
means fstest excludes NO. 648.

The only different between ocfs2_abort and ocfs2_error is set
OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
brute, I suggest to use ocfs2_error.

> 
> >> BTW, the basic rule here is, we don't want to change journal to prevent
> >> other nodes corrupting the filesystem.
> > 
> > If my memory is correct, every node has itself special journal partition.
> > If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
> > No other node could corrupting the fs.
> > 
> I don't think it's a good idea to modify journal even flush fails. Why
> not catch EROFS in commit thread and break? Now we can only expect
> umount continues with error, right?
> 

With your idea, I spent more than two hours for coding & testing. The result is
my patch [2/2] is correct. Please believe my patch [2/2].

In patch [2/2], there is 'goto reset' which will resolve below three issues:

ocfs2_commit_cache
 + ... ...
reset:  //goto label
 + ocfs2_inc_trans_id(journal); //<1.1>
 + atomic_set(&journal->j_num_trans, 0); //<2>
 + ocfs2_wake_downconvert_thread(osb); //<3>
 + wake_up(&journal->j_checkpointed); //<1.2>
 
1> trigger hanging

umount
 ... ...
  ocfs2_clear_inode
   ocfs2_checkpoint_inode(inode)
    + ocfs2_ci_fully_checkpointed() //1.1: check '->j_trans_id'
    + wait_event(osb->journal->j_checkpointed, ...)
       + 1.2: will wait forever, if commit thread exits.

2> trigger BUG_ON()

ocfs2_journal_shutdown
 + BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);

3> trigger dlm lock convert blocking

if there is pending downconvert lock, no one kick it.

At last, you could set up a fstest env and verify my writing.

- Heming
Joseph Qi May 5, 2023, 3:42 a.m. UTC | #9
On 5/5/23 12:20 AM, Heming Zhao wrote:
> On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:
>>
>>
>> On 5/4/23 4:02 PM, Heming Zhao wrote:
>>> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 5/4/23 2:21 PM, Heming Zhao wrote:
>>>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>>>>>
>>>>>>
>>>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
>>>>>>> journal.
>>>>>>
>>>>>> What's the end user impact?
>>>>>
>>>>> There are two impacts:
>>>>>
>>>>> 1. this issue makes unmount command hanging.
>>>>>
>>>>> No one likes or accepts filesystem unmount failure when underlying devices meets
>>>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
>>>>> successfully.
>>>>>
>>>>
>>>> So umount hang should be in the patch description, right?
>>>
>>> OK.
>>>
>>>>
>>>>> 2. for developers to verify their patch doesn't make any regression.
>>>>>
>>>>> fstest is a famious & important fs testsuite.
>>>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
>>>>>
>>>>> Current status, there are many test cases (about 9 in my env) cause fstest
>>>>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
>>>>> gfs2 only has 1 hanging case.
>>>>>
>>>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
>>>>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
>>>>>
>>>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
>>>>> (the result based on my two patches [1/2] & [2/2])
>>>>> ```
>>>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
>>>>>
>>>>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
>>>>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
>>>>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
>>>>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
>>>>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
>>>>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
>>>>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
>>>>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
>>>>> generic/636 generic/702
>>>>> Failed 49 of 568 tests
>>>>> ```
>>>>>
>>>>>>
>>>>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
>>>>>> filesystem. And we should not commit any further transactions.
>>>>>> It seems that we may unify the behavior like:
>>>>>>
>>>>>> if (is_journal_aborted(journal))
>>>>>> 	return -EROFS;
>>>>>>
>>>>>>
>>>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
>>>>> ocfs2_commit_thread?
>>>>
>>>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
>>>> Actually we've handled this case before, but just limit the print. But
>>>> it seems not enough now.
>>>
>>> I wrote in my previous mail. Follow your idea, The code should be:
>>>
>>> if (is_journal_aborted(journal)) {
>>> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
>>> 	return -EROFS;
>>> }
>>>
>>> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
>>> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
>>> IOs will trigger IO error then trigger RO status. This flow is wrong, we should
>>> mark RO as early as possible when JBD ABORT happens. In my view, the best place
>>> is my [2/2] patch code which in ocfs2_commit_cache().
>>>
>>
>> Agree, but ocfs2_abort() is more appropriate here, see
>> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
>> has to change to read-only accordingly.
> 
> You are right. I ever used ocfs2_abort() then got panic. :)
> So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
> mail has mess format by thunderbird). I copy the mail content here:
> ```
> More info for this patch, maybe I should write below info in commit log.
> 
> From the comment of __ocfs2_abort(), there is another way to handle
> journal ABORT: panic.
> And fstest generic test NO. 648 will trigger ocfs2_abort then make
> kernel panic. I don't like the panic style, ocfs2 should elegantly
> handle journal abnormal case.
> ```
> 
> fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
> find my previous mail, I ran the fstest with parameter '-e generic/648', which
> means fstest excludes NO. 648.
> 
> The only different between ocfs2_abort and ocfs2_error is set
> OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
> brute, I suggest to use ocfs2_error.
> 

Looked back the history, it is designed to panic in cluster mode:
a2f2ddbf2baf ocfs2: __ocfs2_abort() should not enable panic for local mounts

>>
>>>> BTW, the basic rule here is, we don't want to change journal to prevent
>>>> other nodes corrupting the filesystem.
>>>
>>> If my memory is correct, every node has itself special journal partition.
>>> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
>>> No other node could corrupting the fs.
>>>
>> I don't think it's a good idea to modify journal even flush fails. Why
>> not catch EROFS in commit thread and break? Now we can only expect
>> umount continues with error, right?
>>
> 
> With your idea, I spent more than two hours for coding & testing. The result is
> my patch [2/2] is correct. Please believe my patch [2/2].
> 

I am not arguing if your patch can pass the testcases above. I am
talking about whether we do it this way is right.

Now journal flush actually fails since journal is abort, which means
journal data are not reflected in filesystem. But after your change, it
behaves like we've done a normal checkpoint.

Take the following scenario into consideration:
N0 does some changes on file1, but now disk down and journal is abort,
and checkpoint behaves successfully even it does nothing.
N1 get the access right on file1 and does other changes, and checkpoint
normally.
N0 is crash, and N1 begins to recover N0, then data corruption happens.

I've fixed a similar issue in the past:
6f6a6fda2945 jbd2: fix ocfs2 corrupt when updating journal superblock fails

Thanks,
Joseph
heming.zhao@suse.com May 8, 2023, 4:40 p.m. UTC | #10
Sorry for reply late, I am a little bit busy recently.

On Fri, May 05, 2023 at 11:42:51AM +0800, Joseph Qi wrote:
> 
> 
> On 5/5/23 12:20 AM, Heming Zhao wrote:
> > On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/4/23 4:02 PM, Heming Zhao wrote:
> >>> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 5/4/23 2:21 PM, Heming Zhao wrote:
> >>>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
> >>>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> >>>>>>> journal.
> >>>>>>
> >>>>>> What's the end user impact?
> >>>>>
> >>>>> There are two impacts:
> >>>>>
> >>>>> 1. this issue makes unmount command hanging.
> >>>>>
> >>>>> No one likes or accepts filesystem unmount failure when underlying devices meets
> >>>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
> >>>>> successfully.
> >>>>>
> >>>>
> >>>> So umount hang should be in the patch description, right?
> >>>
> >>> OK.
> >>>
> >>>>
> >>>>> 2. for developers to verify their patch doesn't make any regression.
> >>>>>
> >>>>> fstest is a famious & important fs testsuite.
> >>>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> >>>>>
> >>>>> Current status, there are many test cases (about 9 in my env) cause fstest
> >>>>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
> >>>>> gfs2 only has 1 hanging case.
> >>>>>
> >>>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
> >>>>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
> >>>>>
> >>>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
> >>>>> (the result based on my two patches [1/2] & [2/2])
> >>>>> ```
> >>>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> >>>>>
> >>>>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> >>>>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
> >>>>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
> >>>>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
> >>>>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
> >>>>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
> >>>>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
> >>>>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
> >>>>> generic/636 generic/702
> >>>>> Failed 49 of 568 tests
> >>>>> ```
> >>>>>
> >>>>>>
> >>>>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
> >>>>>> filesystem. And we should not commit any further transactions.
> >>>>>> It seems that we may unify the behavior like:
> >>>>>>
> >>>>>> if (is_journal_aborted(journal))
> >>>>>> 	return -EROFS;
> >>>>>>
> >>>>>>
> >>>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
> >>>>> ocfs2_commit_thread?
> >>>>
> >>>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
> >>>> Actually we've handled this case before, but just limit the print. But
> >>>> it seems not enough now.
> >>>
> >>> I wrote in my previous mail. Follow your idea, The code should be:
> >>>
> >>> if (is_journal_aborted(journal)) {
> >>> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
> >>> 	return -EROFS;
> >>> }
> >>>
> >>> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> >>> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
> >>> IOs will trigger IO error then trigger RO status. This flow is wrong, we should
> >>> mark RO as early as possible when JBD ABORT happens. In my view, the best place
> >>> is my [2/2] patch code which in ocfs2_commit_cache().
> >>>
> >>
> >> Agree, but ocfs2_abort() is more appropriate here, see
> >> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
> >> has to change to read-only accordingly.
> > 
> > You are right. I ever used ocfs2_abort() then got panic. :)
> > So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
> > mail has mess format by thunderbird). I copy the mail content here:
> > ```
> > More info for this patch, maybe I should write below info in commit log.
> > 
> > From the comment of __ocfs2_abort(), there is another way to handle
> > journal ABORT: panic.
> > And fstest generic test NO. 648 will trigger ocfs2_abort then make
> > kernel panic. I don't like the panic style, ocfs2 should elegantly
> > handle journal abnormal case.
> > ```
> > 
> > fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
> > find my previous mail, I ran the fstest with parameter '-e generic/648', which
> > means fstest excludes NO. 648.
> > 
> > The only different between ocfs2_abort and ocfs2_error is set
> > OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
> > brute, I suggest to use ocfs2_error.
> > 
> 
> Looked back the history, it is designed to panic in cluster mode:
> a2f2ddbf2baf ocfs2: __ocfs2_abort() should not enable panic for local mounts

About the panic action from ocfs2_abort(), in my view, it's too scared to
acceptable. I am not familar with o2cb stack, in pcmk stack, there is
resource-agents RA Filesystem[1], which includes two level to detect fs abormal:
- Filesystem_monitor_10 : read the device
- Filesystem_monitor_20 : write and read a status file

if ocfs2 enters abort status, Filesystem RA will detect it and trigger HA stack
to do the handover resources job.
if Filesystem RA doesn't apply monitor_[10|20] level, a readonly fs (not panic
then generate a coredump) could give administrator more info to locate the
rootcause.

I write above base on gfs2 fstest result: gfs2 doesn't panic for the ocfs2 panic
cases.

[1]:
https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/Filesystem

> 
> >>
> >>>> BTW, the basic rule here is, we don't want to change journal to prevent
> >>>> other nodes corrupting the filesystem.
> >>>
> >>> If my memory is correct, every node has itself special journal partition.
> >>> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
> >>> No other node could corrupting the fs.
> >>>
> >> I don't think it's a good idea to modify journal even flush fails. Why
> >> not catch EROFS in commit thread and break? Now we can only expect
> >> umount continues with error, right?
> >>
> > 
> > With your idea, I spent more than two hours for coding & testing. The result is
> > my patch [2/2] is correct. Please believe my patch [2/2].
> > 
> 
> I am not arguing if your patch can pass the testcases above. I am
> talking about whether we do it this way is right.
> 
> Now journal flush actually fails since journal is abort, which means
> journal data are not reflected in filesystem. But after your change, it
> behaves like we've done a normal checkpoint.
> 
> Take the following scenario into consideration:
> N0 does some changes on file1, but now disk down and journal is abort,
> and checkpoint behaves successfully even it does nothing.
> N1 get the access right on file1 and does other changes, and checkpoint
> normally.
> N0 is crash, and N1 begins to recover N0, then data corruption happens.
> 
> I've fixed a similar issue in the past:
> 6f6a6fda2945 jbd2: fix ocfs2 corrupt when updating journal superblock fails
> 

I agree with above description. Thank you for pointing out my mistake, I forgot
the dlm layer callback for node-down case.

- Heming
Joseph Qi May 9, 2023, 1:53 a.m. UTC | #11
On 5/9/23 12:40 AM, Heming Zhao wrote:
> Sorry for reply late, I am a little bit busy recently.
> 
> On Fri, May 05, 2023 at 11:42:51AM +0800, Joseph Qi wrote:
>>
>>
>> On 5/5/23 12:20 AM, Heming Zhao wrote:
>>> On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 5/4/23 4:02 PM, Heming Zhao wrote:
>>>>> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
>>>>>>
>>>>>>
>>>>>> On 5/4/23 2:21 PM, Heming Zhao wrote:
>>>>>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>>>>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>>>>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
>>>>>>>>> journal.
>>>>>>>>
>>>>>>>> What's the end user impact?
>>>>>>>
>>>>>>> There are two impacts:
>>>>>>>
>>>>>>> 1. this issue makes unmount command hanging.
>>>>>>>
>>>>>>> No one likes or accepts filesystem unmount failure when underlying devices meets
>>>>>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
>>>>>>> successfully.
>>>>>>>
>>>>>>
>>>>>> So umount hang should be in the patch description, right?
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>>> 2. for developers to verify their patch doesn't make any regression.
>>>>>>>
>>>>>>> fstest is a famious & important fs testsuite.
>>>>>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
>>>>>>>
>>>>>>> Current status, there are many test cases (about 9 in my env) cause fstest
>>>>>>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
>>>>>>> gfs2 only has 1 hanging case.
>>>>>>>
>>>>>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
>>>>>>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
>>>>>>>
>>>>>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
>>>>>>> (the result based on my two patches [1/2] & [2/2])
>>>>>>> ```
>>>>>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
>>>>>>>
>>>>>>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
>>>>>>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
>>>>>>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
>>>>>>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
>>>>>>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
>>>>>>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
>>>>>>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
>>>>>>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
>>>>>>> generic/636 generic/702
>>>>>>> Failed 49 of 568 tests
>>>>>>> ```
>>>>>>>
>>>>>>>>
>>>>>>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
>>>>>>>> filesystem. And we should not commit any further transactions.
>>>>>>>> It seems that we may unify the behavior like:
>>>>>>>>
>>>>>>>> if (is_journal_aborted(journal))
>>>>>>>> 	return -EROFS;
>>>>>>>>
>>>>>>>>
>>>>>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
>>>>>>> ocfs2_commit_thread?
>>>>>>
>>>>>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
>>>>>> Actually we've handled this case before, but just limit the print. But
>>>>>> it seems not enough now.
>>>>>
>>>>> I wrote in my previous mail. Follow your idea, The code should be:
>>>>>
>>>>> if (is_journal_aborted(journal)) {
>>>>> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
>>>>> 	return -EROFS;
>>>>> }
>>>>>
>>>>> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
>>>>> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
>>>>> IOs will trigger IO error then trigger RO status. This flow is wrong, we should
>>>>> mark RO as early as possible when JBD ABORT happens. In my view, the best place
>>>>> is my [2/2] patch code which in ocfs2_commit_cache().
>>>>>
>>>>
>>>> Agree, but ocfs2_abort() is more appropriate here, see
>>>> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
>>>> has to change to read-only accordingly.
>>>
>>> You are right. I ever used ocfs2_abort() then got panic. :)
>>> So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
>>> mail has mess format by thunderbird). I copy the mail content here:
>>> ```
>>> More info for this patch, maybe I should write below info in commit log.
>>>
>>> From the comment of __ocfs2_abort(), there is another way to handle
>>> journal ABORT: panic.
>>> And fstest generic test NO. 648 will trigger ocfs2_abort then make
>>> kernel panic. I don't like the panic style, ocfs2 should elegantly
>>> handle journal abnormal case.
>>> ```
>>>
>>> fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
>>> find my previous mail, I ran the fstest with parameter '-e generic/648', which
>>> means fstest excludes NO. 648.
>>>
>>> The only different between ocfs2_abort and ocfs2_error is set
>>> OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
>>> brute, I suggest to use ocfs2_error.
>>>
>>
>> Looked back the history, it is designed to panic in cluster mode:
>> a2f2ddbf2baf ocfs2: __ocfs2_abort() should not enable panic for local mounts
> 
> About the panic action from ocfs2_abort(), in my view, it's too scared to
> acceptable. I am not familar with o2cb stack, in pcmk stack, there is
> resource-agents RA Filesystem[1], which includes two level to detect fs abormal:
> - Filesystem_monitor_10 : read the device
> - Filesystem_monitor_20 : write and read a status file
> 
> if ocfs2 enters abort status, Filesystem RA will detect it and trigger HA stack
> to do the handover resources job.
> if Filesystem RA doesn't apply monitor_[10|20] level, a readonly fs (not panic
> then generate a coredump) could give administrator more info to locate the
> rootcause.
> 
> I write above base on gfs2 fstest result: gfs2 doesn't panic for the ocfs2 panic
> cases.
> 
> [1]:
> https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/Filesystem
> 

Agree, so a more elegant way is to make the journal abort node behaves
like node down, e.g. set hard readonly so that heartbeat also can not
update normally.

In xfstests, it uses device mapper to simulate disk down, so it will
also trigger node down recover since heartbeat won't update normally?

Also, I'm not sure pcmk supports fence other node by only rejecting IOs.
The built-in ocfs2 fence policy only supports fence self through panic
or reset.

Thanks,
Joseph 

>>
>>>>
>>>>>> BTW, the basic rule here is, we don't want to change journal to prevent
>>>>>> other nodes corrupting the filesystem.
>>>>>
>>>>> If my memory is correct, every node has itself special journal partition.
>>>>> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
>>>>> No other node could corrupting the fs.
>>>>>
>>>> I don't think it's a good idea to modify journal even flush fails. Why
>>>> not catch EROFS in commit thread and break? Now we can only expect
>>>> umount continues with error, right?
>>>>
>>>
>>> With your idea, I spent more than two hours for coding & testing. The result is
>>> my patch [2/2] is correct. Please believe my patch [2/2].
>>>
>>
>> I am not arguing if your patch can pass the testcases above. I am
>> talking about whether we do it this way is right.
>>
>> Now journal flush actually fails since journal is abort, which means
>> journal data are not reflected in filesystem. But after your change, it
>> behaves like we've done a normal checkpoint.
>>
>> Take the following scenario into consideration:
>> N0 does some changes on file1, but now disk down and journal is abort,
>> and checkpoint behaves successfully even it does nothing.
>> N1 get the access right on file1 and does other changes, and checkpoint
>> normally.
>> N0 is crash, and N1 begins to recover N0, then data corruption happens.
>>
>> I've fixed a similar issue in the past:
>> 6f6a6fda2945 jbd2: fix ocfs2 corrupt when updating journal superblock fails
>>
> 
> I agree with above description. Thank you for pointing out my mistake, I forgot
> the dlm layer callback for node-down case.
> 
> - Heming
heming.zhao@suse.com May 9, 2023, 3:22 p.m. UTC | #12
On Tue, May 09, 2023 at 09:53:06AM +0800, Joseph Qi wrote:
> 
> 
> On 5/9/23 12:40 AM, Heming Zhao wrote:
> > Sorry for reply late, I am a little bit busy recently.
> > 
> > On Fri, May 05, 2023 at 11:42:51AM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/5/23 12:20 AM, Heming Zhao wrote:
> >>> On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 5/4/23 4:02 PM, Heming Zhao wrote:
> >>>>> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 5/4/23 2:21 PM, Heming Zhao wrote:
> >>>>>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>>>>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
> >>>>>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> >>>>>>>>> journal.
> >>>>>>>>
> >>>>>>>> What's the end user impact?
> >>>>>>>
> >>>>>>> There are two impacts:
> >>>>>>>
> >>>>>>> 1. this issue makes unmount command hanging.
> >>>>>>>
> >>>>>>> No one likes or accepts filesystem unmount failure when underlying devices meets
> >>>>>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount
> >>>>>>> successfully.
> >>>>>>>
> >>>>>>
> >>>>>> So umount hang should be in the patch description, right?
> >>>>>
> >>>>> OK.
> >>>>>
> >>>>>>
> >>>>>>> 2. for developers to verify their patch doesn't make any regression.
> >>>>>>>
> >>>>>>> fstest is a famious & important fs testsuite.
> >>>>>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> >>>>>>>
> >>>>>>> Current status, there are many test cases (about 9 in my env) cause fstest
> >>>>>>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed,
> >>>>>>> gfs2 only has 1 hanging case.
> >>>>>>>
> >>>>>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the
> >>>>>>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite.
> >>>>>>>
> >>>>>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
> >>>>>>> (the result based on my two patches [1/2] & [2/2])
> >>>>>>> ```
> >>>>>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> >>>>>>>
> >>>>>>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> >>>>>>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244
> >>>>>>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329
> >>>>>>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380
> >>>>>>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410
> >>>>>>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479
> >>>>>>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563
> >>>>>>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630
> >>>>>>> generic/636 generic/702
> >>>>>>> Failed 49 of 568 tests
> >>>>>>> ```
> >>>>>>>
> >>>>>>>>
> >>>>>>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or
> >>>>>>>> filesystem. And we should not commit any further transactions.
> >>>>>>>> It seems that we may unify the behavior like:
> >>>>>>>>
> >>>>>>>> if (is_journal_aborted(journal))
> >>>>>>>> 	return -EROFS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
> >>>>>>> ocfs2_commit_thread?
> >>>>>>
> >>>>>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
> >>>>>> Actually we've handled this case before, but just limit the print. But
> >>>>>> it seems not enough now.
> >>>>>
> >>>>> I wrote in my previous mail. Follow your idea, The code should be:
> >>>>>
> >>>>> if (is_journal_aborted(journal)) {
> >>>>> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important.
> >>>>> 	return -EROFS;
> >>>>> }
> >>>>>
> >>>>> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> >>>>> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
> >>>>> IOs will trigger IO error then trigger RO status. This flow is wrong, we should
> >>>>> mark RO as early as possible when JBD ABORT happens. In my view, the best place
> >>>>> is my [2/2] patch code which in ocfs2_commit_cache().
> >>>>>
> >>>>
> >>>> Agree, but ocfs2_abort() is more appropriate here, see
> >>>> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
> >>>> has to change to read-only accordingly.
> >>>
> >>> You are right. I ever used ocfs2_abort() then got panic. :)
> >>> So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
> >>> mail has mess format by thunderbird). I copy the mail content here:
> >>> ```
> >>> More info for this patch, maybe I should write below info in commit log.
> >>>
> >>> From the comment of __ocfs2_abort(), there is another way to handle
> >>> journal ABORT: panic.
> >>> And fstest generic test NO. 648 will trigger ocfs2_abort then make
> >>> kernel panic. I don't like the panic style, ocfs2 should elegantly
> >>> handle journal abnormal case.
> >>> ```
> >>>
> >>> fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
> >>> find my previous mail, I ran the fstest with parameter '-e generic/648', which
> >>> means fstest excludes NO. 648.
> >>>
> >>> The only different between ocfs2_abort and ocfs2_error is set
> >>> OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
> >>> brute, I suggest to use ocfs2_error.
> >>>
> >>
> >> Looked back the history, it is designed to panic in cluster mode:
> >> a2f2ddbf2baf ocfs2: __ocfs2_abort() should not enable panic for local mounts
> > 
> > About the panic action from ocfs2_abort(), in my view, it's too scared to
> > acceptable. I am not familar with o2cb stack, in pcmk stack, there is
> > resource-agents RA Filesystem[1], which includes two level to detect fs abormal:
> > - Filesystem_monitor_10 : read the device
> > - Filesystem_monitor_20 : write and read a status file
> > 
> > if ocfs2 enters abort status, Filesystem RA will detect it and trigger HA stack
> > to do the handover resources job.
> > if Filesystem RA doesn't apply monitor_[10|20] level, a readonly fs (not panic
> > then generate a coredump) could give administrator more info to locate the
> > rootcause.
> > 
> > I write above base on gfs2 fstest result: gfs2 doesn't panic for the ocfs2 panic
> > cases.
> > 
> > [1]:
> > https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/Filesystem
> > 
> 
> Agree, so a more elegant way is to make the journal abort node behaves
> like node down, e.g. set hard readonly so that heartbeat also can not
> update normally.
> 
> In xfstests, it uses device mapper to simulate disk down, so it will
> also trigger node down recover since heartbeat won't update normally?

I am not familiar with hearbeat. Base on current code, in pcmk stack,
your description is partly right. (see my answer in below. node may be hanging
forever)
> 
> Also, I'm not sure pcmk supports fence other node by only rejecting IOs.
> The built-in ocfs2 fence policy only supports fence self through panic
> or reset.
> 

Yes, pcmk supports fence another node.

SUSE HA stack pcmk (crmsh + corosync + pacemaker + stonith) is powerful and
enterprise-class software solution, which usually deploys on SAP environment.
For rejecting IOs or jbd2 aborted, Filesystem RA (see above [1]) could use 
monitor_[10|20] (OCF_CHECK_LEVEL) to enable detecting IOs error. If customer
doesn't config OCF_CHECK_LEVEL, and the storage stack: disk => dlm+ocfs2, these
will be hanging forever. 

But, for a typical user case for SUSE customer:
disks => mpath => lvm2 (dlm/lvmlockd) => ocfs2
pcmk stack could monitor layer: dlm lvm2 lvmlockd ocfs2. anyone enters abnormal
status, there will trigger fence action.

- Heming

> >>>>
> >>>>>> BTW, the basic rule here is, we don't want to change journal to prevent
> >>>>>> other nodes corrupting the filesystem.
> >>>>>
> >>>>> If my memory is correct, every node has itself special journal partition.
> >>>>> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
> >>>>> No other node could corrupting the fs.
> >>>>>
> >>>> I don't think it's a good idea to modify journal even flush fails. Why
> >>>> not catch EROFS in commit thread and break? Now we can only expect
> >>>> umount continues with error, right?
> >>>>
> >>>
> >>> With your idea, I spent more than two hours for coding & testing. The result is
> >>> my patch [2/2] is correct. Please believe my patch [2/2].
> >>>
> >>
> >> I am not arguing if your patch can pass the testcases above. I am
> >> talking about whether we do it this way is right.
> >>
> >> Now journal flush actually fails since journal is abort, which means
> >> journal data are not reflected in filesystem. But after your change, it
> >> behaves like we've done a normal checkpoint.
> >>
> >> Take the following scenario into consideration:
> >> N0 does some changes on file1, but now disk down and journal is abort,
> >> and checkpoint behaves successfully even it does nothing.
> >> N1 get the access right on file1 and does other changes, and checkpoint
> >> normally.
> >> N0 is crash, and N1 begins to recover N0, then data corruption happens.
> >>
> >> I've fixed a similar issue in the past:
> >> 6f6a6fda2945 jbd2: fix ocfs2 corrupt when updating journal superblock fails
> >>
> > 
> > I agree with above description. Thank you for pointing out my mistake, I forgot
> > the dlm layer callback for node-down case.
> > 
> > - Heming
diff mbox series

Patch

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 51c93929a146..d90961a1c433 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6308,11 +6308,13 @@  void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
 
 	if (tl_inode) {
 		cancel_delayed_work(&osb->osb_truncate_log_wq);
-		flush_workqueue(osb->ocfs2_wq);
+		if (!is_journal_aborted(osb->journal->j_journal)) {
+			flush_workqueue(osb->ocfs2_wq);
 
-		status = ocfs2_flush_truncate_log(osb);
-		if (status < 0)
-			mlog_errno(status);
+			status = ocfs2_flush_truncate_log(osb);
+			if (status < 0)
+				mlog_errno(status);
+		}
 
 		brelse(osb->osb_tl_bh);
 		iput(osb->osb_tl_inode);
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 25d8072ccfce..2798067a2f66 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -312,11 +312,16 @@  static int ocfs2_commit_cache(struct ocfs2_super *osb)
 	status = jbd2_journal_flush(journal->j_journal, 0);
 	jbd2_journal_unlock_updates(journal->j_journal);
 	if (status < 0) {
+		if (is_journal_aborted(journal->j_journal)) {
+			ocfs2_error(osb->sb, "jbd2 status: ABORT.\n");
+			goto reset;
+		}
 		up_write(&journal->j_trans_barrier);
 		mlog_errno(status);
 		goto finally;
 	}
 
+reset:
 	ocfs2_inc_trans_id(journal);
 
 	flushed = atomic_read(&journal->j_num_trans);
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index c4426d12a2ad..e2e3400717b0 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -378,6 +378,9 @@  void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
 	if (osb->ocfs2_wq)
 		flush_workqueue(osb->ocfs2_wq);
 
+	if (is_journal_aborted(osb->journal->j_journal))
+		goto out;
+
 	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
 		goto out;