Message ID | 20231030120057.928280-1-srivathsa.d.dara@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ocfs2: call ocfs2_abort when journal abort | expand |
On Mon, Oct 30, 2023 at 12:00:57PM +0000, Srivathsa Dara wrote: > From: Ryan Ding <ryan.ding@oracle.com> > > > journal can not recover from abort state, so we should take following > action to prevent file system from corruption: Could you elaborate on how corruption can occur when the filesystem is in this particular state? How could this be reproduced? > 1. change to readonly filesystem when local mount. We can not afford > further write, so change to RO state is reasonable. This is generally the strategy, but unrelated to the code that this commit is touching. The ocfs2_commit_thread() only applies to non-local mounts. > 2. panic when cluster mount. Because we can not release lock resource in > this state, other node will hung when it require a lock owned by this > node. So panic and remaster is a reasonable choise. Panicking is never a reasonable choice :-) Why couldn't a remastering be initiated without going through a reboot instead? > ocfs2_abort() will do all the above work. > > Signed-off-by: Ryan Ding <ryan.ding@oracle.com> > Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> > --- > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index ce215565d061..6dace475f019 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -14,7 +14,6 @@ > #include <linux/kthread.h> > #include <linux/time.h> > #include <linux/random.h> > -#include <linux/delay.h> > #include <linux/writeback.h> > > #include <cluster/masklog.h> > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super *osb, int quota) > > static int ocfs2_commit_thread(void *arg) > { > - int status; > + int status = 0; > struct ocfs2_super *osb = arg; > struct ocfs2_journal *journal = osb->journal; > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > wait_event_interruptible(osb->checkpoint_event, > atomic_read(&journal->j_num_trans) > || kthread_should_stop()); > + if (status < 0) { > + /* As we can not terminate by ourself, just enter an > + * empty loop to wait for stop. > + */ > + continue; > + } the above is unnecessary, given that below will induce panic upon failure. > status = ocfs2_commit_cache(osb); > if (status < 0) { > - static unsigned long abort_warn_time; > - > - /* Warn about this once per minute */ > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > - mlog(ML_ERROR, "status = %d, journal is " > - "already aborted.\n", status); > /* > - * After ocfs2_commit_cache() fails, j_num_trans has a > - * non-zero value. Sleep here to avoid a busy-wait > - * loop. > + * journal can not recover from abort state, there is > + * no need to keep commit cache. So we should either > + * change to readonly(local mount) or just panic > + * (cluster mount). commit cache doesn't apply to local mounts. > + * We should also clear j_num_trans to prevent further > + * commit. that's redundant given that we're about to panic. > */ > - msleep_interruptible(1000); > + atomic_set(&journal->j_num_trans, 0); same here. > + ocfs2_abort(osb->sb, "Detected aborted journal"); ocfs2_commit_cache returning with an error does not necessarily mean that the journal has been aborted; jbd2_journal_flush will return an error for other conditions too, some of which could be recoverable. Therefore unconditionally aborting here is not necessarily warranted, and presumably this is why the current code is a retry loop. But even if the journal has been aborted at that point, there's no reason why the node cannot self-fence and effectively shutdown the local fs, instead of aborting with a panic. Regards, Anthony
On 10/30/23 8:00 PM, Srivathsa Dara wrote: > From: Ryan Ding <ryan.ding@oracle.com> > > > journal can not recover from abort state, so we should take following > action to prevent file system from corruption: > > 1. change to readonly filesystem when local mount. We can not afford > further write, so change to RO state is reasonable. > > 2. panic when cluster mount. Because we can not release lock resource in > this state, other node will hung when it require a lock owned by this > node. So panic and remaster is a reasonable choise. > > ocfs2_abort() will do all the above work. > > Signed-off-by: Ryan Ding <ryan.ding@oracle.com> > Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> > --- > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index ce215565d061..6dace475f019 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -14,7 +14,6 @@ > #include <linux/kthread.h> > #include <linux/time.h> > #include <linux/random.h> > -#include <linux/delay.h> > #include <linux/writeback.h> > > #include <cluster/masklog.h> > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super *osb, int quota) > > static int ocfs2_commit_thread(void *arg) > { > - int status; > + int status = 0; > struct ocfs2_super *osb = arg; > struct ocfs2_journal *journal = osb->journal; > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > wait_event_interruptible(osb->checkpoint_event, > atomic_read(&journal->j_num_trans) > || kthread_should_stop()); > + if (status < 0) { > + /* As we can not terminate by ourself, just enter an > + * empty loop to wait for stop. > + */ > + continue; > + } > > status = ocfs2_commit_cache(osb); > if (status < 0) { > - static unsigned long abort_warn_time; > - > - /* Warn about this once per minute */ > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > - mlog(ML_ERROR, "status = %d, journal is " > - "already aborted.\n", status); > /* > - * After ocfs2_commit_cache() fails, j_num_trans has a > - * non-zero value. Sleep here to avoid a busy-wait > - * loop. > + * journal can not recover from abort state, there is > + * no need to keep commit cache. So we should either > + * change to readonly(local mount) or just panic > + * (cluster mount). > + * We should also clear j_num_trans to prevent further > + * commit. > */ > - msleep_interruptible(1000); > + atomic_set(&journal->j_num_trans, 0); Unconditionally clear 'j_num_trans' here seems buggy. This may cause other nodes corrupt filesystem. Thanks, Joseph > + ocfs2_abort(osb->sb, "Detected aborted journal"); > } > > if (kthread_should_stop() && atomic_read(&journal->j_num_trans)){
On 11/1/23 09:13, Joseph Qi wrote: > > > On 10/30/23 8:00 PM, Srivathsa Dara wrote: >> From: Ryan Ding <ryan.ding@oracle.com> >> >> >> journal can not recover from abort state, so we should take following >> action to prevent file system from corruption: >> >> 1. change to readonly filesystem when local mount. We can not afford >> further write, so change to RO state is reasonable. >> >> 2. panic when cluster mount. Because we can not release lock resource in >> this state, other node will hung when it require a lock owned by this >> node. So panic and remaster is a reasonable choise. >> >> ocfs2_abort() will do all the above work. >> >> Signed-off-by: Ryan Ding <ryan.ding@oracle.com> >> Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> >> --- >> fs/ocfs2/journal.c | 27 +++++++++++++++------------ >> 1 file changed, 15 insertions(+), 12 deletions(-) >> >> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c >> index ce215565d061..6dace475f019 100644 >> --- a/fs/ocfs2/journal.c >> +++ b/fs/ocfs2/journal.c >> @@ -14,7 +14,6 @@ >> #include <linux/kthread.h> >> #include <linux/time.h> >> #include <linux/random.h> >> -#include <linux/delay.h> >> #include <linux/writeback.h> >> >> #include <cluster/masklog.h> >> @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super *osb, int quota) >> >> static int ocfs2_commit_thread(void *arg) >> { >> - int status; >> + int status = 0; >> struct ocfs2_super *osb = arg; >> struct ocfs2_journal *journal = osb->journal; >> >> @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) >> wait_event_interruptible(osb->checkpoint_event, >> atomic_read(&journal->j_num_trans) >> || kthread_should_stop()); >> + if (status < 0) { >> + /* As we can not terminate by ourself, just enter an >> + * empty loop to wait for stop. >> + */ >> + continue; >> + } >> >> status = ocfs2_commit_cache(osb); >> if (status < 0) { >> - static unsigned long abort_warn_time; >> - >> - /* Warn about this once per minute */ >> - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >> - mlog(ML_ERROR, "status = %d, journal is " >> - "already aborted.\n", status); >> /* >> - * After ocfs2_commit_cache() fails, j_num_trans has a >> - * non-zero value. Sleep here to avoid a busy-wait >> - * loop. >> + * journal can not recover from abort state, there is >> + * no need to keep commit cache. So we should either >> + * change to readonly(local mount) or just panic >> + * (cluster mount). >> + * We should also clear j_num_trans to prevent further >> + * commit. >> */ >> - msleep_interruptible(1000); >> + atomic_set(&journal->j_num_trans, 0); > > Unconditionally clear 'j_num_trans' here seems buggy. > This may cause other nodes corrupt filesystem. Yes, you described this buggy condition during review my patch: https://lore.kernel.org/ocfs2-devel/2b90546e-d23e-7d45-22dd-a887a73f3e61@linux.alibaba.com/ btw, I ever send a v2 patch [1] about handling journal abort: [1] https://lore.kernel.org/ocfs2-devel/20230626112453.22571-1-heming.zhao@suse.com/ This patch never get any feedback. And I also send another mail [2] related with above patch: [2] https://lore.kernel.org/ocfs2-devel/20230626150916.je3egonzb3crvbl5@p15/ please note, [2] is the problems from fsck.ocfs2, which is completely unrelated to [1]. Heming > > Thanks, > Joseph > >> + ocfs2_abort(osb->sb, "Detected aborted journal"); >> } >> >> if (kthread_should_stop() && atomic_read(&journal->j_num_trans)){ >
Hi Anthony, thanks for the review -----Original Message----- From: Anthony Iliopoulos <ailiop@suse.com> Sent: Wednesday, November 1, 2023 4:53 AM To: Srivathsa Dara <srivathsa.d.dara@oracle.com> Cc: mark@fasheh.com; jlbec@evilplan.org; joseph.qi@linux.alibaba.com; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi <junxiao.bi@oracle.com>; ocfs2-devel@lists.linux.dev Subject: Re: [PATCH] ocfs2: call ocfs2_abort when journal abort On Mon, Oct 30, 2023 at 12:00:57PM +0000, Srivathsa Dara wrote: > From: Ryan Ding <ryan.ding@oracle.com> > > > journal can not recover from abort state, so we should take following > action to prevent file system from corruption: Could you elaborate on how corruption can occur when the filesystem is in this particular state? How could this be reproduced? [Srivathsa]: Corruption can occur due to storage failure > 1. change to readonly filesystem when local mount. We can not afford > further write, so change to RO state is reasonable. This is generally the strategy, but unrelated to the code that this commit is touching. The ocfs2_commit_thread() only applies to non-local mounts. [Srivathsa]: Yes, you are right, this is only for cluster mode. I will remove all local mount related comments. > 2. panic when cluster mount. Because we can not release lock resource in > this state, other node will hung when it require a lock owned by this > node. So panic and remaster is a reasonable choise. Panicking is never a reasonable choice :-) Why couldn't a remastering be initiated without going through a reboot instead? [Srivathsa]: Without reboot, we run into a journal failure, the only option is to have fs in read-only mode. Could you suggest any other way to make this node release the locks held by it? > ocfs2_abort() will do all the above work. > > Signed-off-by: Ryan Ding <ryan.ding@oracle.com> > Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> > --- > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index > ce215565d061..6dace475f019 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -14,7 +14,6 @@ > #include <linux/kthread.h> > #include <linux/time.h> > #include <linux/random.h> > -#include <linux/delay.h> > #include <linux/writeback.h> > > #include <cluster/masklog.h> > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct > ocfs2_super *osb, int quota) > > static int ocfs2_commit_thread(void *arg) { > - int status; > + int status = 0; > struct ocfs2_super *osb = arg; > struct ocfs2_journal *journal = osb->journal; > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > wait_event_interruptible(osb->checkpoint_event, > atomic_read(&journal->j_num_trans) > || kthread_should_stop()); > + if (status < 0) { > + /* As we can not terminate by ourself, just enter an > + * empty loop to wait for stop. > + */ > + continue; > + } the above is unnecessary, given that below will induce panic upon failure. > status = ocfs2_commit_cache(osb); > if (status < 0) { > - static unsigned long abort_warn_time; > - > - /* Warn about this once per minute */ > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > - mlog(ML_ERROR, "status = %d, journal is " > - "already aborted.\n", status); > /* > - * After ocfs2_commit_cache() fails, j_num_trans has a > - * non-zero value. Sleep here to avoid a busy-wait > - * loop. > + * journal can not recover from abort state, there is > + * no need to keep commit cache. So we should either > + * change to readonly(local mount) or just panic > + * (cluster mount). commit cache doesn't apply to local mounts. > + * We should also clear j_num_trans to prevent further > + * commit. that's redundant given that we're about to panic. > */ > - msleep_interruptible(1000); > + atomic_set(&journal->j_num_trans, 0); same here. > + ocfs2_abort(osb->sb, "Detected aborted journal"); ocfs2_commit_cache returning with an error does not necessarily mean that the journal has been aborted; jbd2_journal_flush will return an error for other conditions too, some of which could be recoverable. Therefore unconditionally aborting here is not necessarily warranted, and presumably this is why the current code is a retry loop. [Srivathsa]: Even the journal is not aborted, there is some serious error with the journal, keeping it running at this stage could lead to corruption. But even if the journal has been aborted at that point, there's no reason why the node cannot self-fence and effectively shutdown the local fs, instead of aborting with a panic. [Srivathsa]: By self-fence, did you mean system panic/reboot due to heartbeat loss? If so, that can't be guaranteed that it will happen because the storage failure can be intermittent. Even if that happens, it will panic or reboot, how effectively it will shutdown the local fs, considering the journal is failing? [Srivathsa]: will send a v2 addressing the redundant code. Regards, Anthony Thanks, Srivathsa
On Thu, Nov 09, 2023 at 07:26:16PM +0000, Srivathsa Dara wrote: > Hi Anthony, > thanks for the review > > -----Original Message----- > From: Anthony Iliopoulos <ailiop@suse.com> > Sent: Wednesday, November 1, 2023 4:53 AM > To: Srivathsa Dara <srivathsa.d.dara@oracle.com> > Cc: mark@fasheh.com; jlbec@evilplan.org; joseph.qi@linux.alibaba.com; > Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>; > Junxiao Bi <junxiao.bi@oracle.com>; ocfs2-devel@lists.linux.dev > Subject: Re: [PATCH] ocfs2: call ocfs2_abort when journal abort > > On Mon, Oct 30, 2023 at 12:00:57PM +0000, Srivathsa Dara wrote: > > From: Ryan Ding <ryan.ding@oracle.com> > > > > > > journal can not recover from abort state, so we should take following > > action to prevent file system from corruption: > > Could you elaborate on how corruption can occur when the filesystem is > in this particular state? How could this be reproduced? > > [Srivathsa]: Corruption can occur due to storage failure How does that failure lead from aborting the journal to causing fs corruption? What exactly gets corrupted after journal abort? Can you provide more details and possibly a reproducer on this? > > 1. change to readonly filesystem when local mount. We can not afford > > further write, so change to RO state is reasonable. > > This is generally the strategy, but unrelated to the code that this commit > is touching. The ocfs2_commit_thread() only applies to non-local mounts. > > [Srivathsa]: Yes, you are right, this is only for cluster mode. I will remove > all local mount related comments. > > > 2. panic when cluster mount. Because we can not release lock resource in > > this state, other node will hung when it require a lock owned by this > > node. So panic and remaster is a reasonable choise. > > Panicking is never a reasonable choice :-) Why couldn't a remastering be > initiated without going through a reboot instead? > > [Srivathsa]: Without reboot, we run into a journal failure, the only option > is to have fs in read-only mode. Could you suggest any other way to make > this node release the locks held by it? Switching to read-only, and self-fencing so that another node will take over the recovery. > > ocfs2_abort() will do all the above work. > > > > Signed-off-by: Ryan Ding <ryan.ding@oracle.com> > > Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> > > --- > > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > > 1 file changed, 15 insertions(+), 12 deletions(-) > > > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index > > ce215565d061..6dace475f019 100644 > > --- a/fs/ocfs2/journal.c > > +++ b/fs/ocfs2/journal.c > > @@ -14,7 +14,6 @@ > > #include <linux/kthread.h> > > #include <linux/time.h> > > #include <linux/random.h> > > -#include <linux/delay.h> > > #include <linux/writeback.h> > > > > #include <cluster/masklog.h> > > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct > > ocfs2_super *osb, int quota) > > > > static int ocfs2_commit_thread(void *arg) { > > - int status; > > + int status = 0; > > struct ocfs2_super *osb = arg; > > struct ocfs2_journal *journal = osb->journal; > > > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > > wait_event_interruptible(osb->checkpoint_event, > > atomic_read(&journal->j_num_trans) > > || kthread_should_stop()); > > + if (status < 0) { > > + /* As we can not terminate by ourself, just enter an > > + * empty loop to wait for stop. > > + */ > > + continue; > > + } > > the above is unnecessary, given that below will induce panic upon failure. > > > status = ocfs2_commit_cache(osb); > > if (status < 0) { > > - static unsigned long abort_warn_time; > > - > > - /* Warn about this once per minute */ > > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > > - mlog(ML_ERROR, "status = %d, journal is " > > - "already aborted.\n", status); > > /* > > - * After ocfs2_commit_cache() fails, j_num_trans has a > > - * non-zero value. Sleep here to avoid a busy-wait > > - * loop. > > + * journal can not recover from abort state, there is > > + * no need to keep commit cache. So we should either > > + * change to readonly(local mount) or just panic > > + * (cluster mount). > > commit cache doesn't apply to local mounts. > > > + * We should also clear j_num_trans to prevent further > > + * commit. > > that's redundant given that we're about to panic. > > > */ > > - msleep_interruptible(1000); > > + atomic_set(&journal->j_num_trans, 0); > > same here. > > > + ocfs2_abort(osb->sb, "Detected aborted journal"); > > ocfs2_commit_cache returning with an error does not necessarily mean that > the journal has been aborted; jbd2_journal_flush will return an error for other > conditions too, some of which could be recoverable. > > Therefore unconditionally aborting here is not necessarily warranted, and > presumably this is why the current code is a retry loop. > > [Srivathsa]: Even the journal is not aborted, there is some serious > error with the journal, keeping it running at this stage could lead > to corruption. How, precisely? Irrespective of the journal state, this should not be leading to any corruptions, the ocfs2_commit_thread will just keep unsuccessfully attempting a journal flush. Otherwise there's a bug somewhere that needs to be directly addressed. > But even if the journal has been aborted at that point, there's no reason why > the node cannot self-fence and effectively shutdown the local fs, instead of > aborting with a panic. > > [Srivathsa]: By self-fence, did you mean system panic/reboot due > to heartbeat loss? If so, that can't be guaranteed that it will happen > because the storage failure can be intermittent. Even if that happens, > it will panic or reboot, how effectively it will shutdown the local fs, > considering the journal is failing? I mean intentionally stopping the heartbeat and switching to read-only. > [Srivathsa]: will send a v2 addressing the redundant code. Apart from the redundant code, I see two separate issues: - the claim that there is a potential fs corruption bug after a journal abort in ocfs2_commit_thread, which needs to be clarified and addressed properly. - the method of handling irrecoverable journal errors. Regards, Anthony
From: Anthony Iliopoulos ailiop@suse.com Sent: Friday, November 10, 2023 5:28 AM To: Srivathsa Dara srivathsa.d.dara@oracle.com Cc: mark@fasheh.com mark@fasheh.com; jlbec@evilplan.org jlbec@evilplan.org; joseph.qi@linux.alibaba.com joseph.qi@linux.alibaba.com; Rajesh Sivaramasubramaniom rajesh.sivaramasubramaniom@oracle.com; Junxiao Bi junxiao.bi@oracle.com; ocfs2-devel@lists.linux.dev ocfs2-devel@lists.linux.dev; Gautham Ananthakrishna gautham.ananthakrishna@oracle.com Subject: [External] : Re: [PATCH] ocfs2: call ocfs2_abort when journal abort On Thu, Nov 09, 2023 at 07:26:16PM +0000, Srivathsa Dara wrote: > Hi Anthony, > thanks for the review > > -----Original Message----- > From: Anthony Iliopoulos ailiop@suse.com > Sent: Wednesday, November 1, 2023 4:53 AM > To: Srivathsa Dara srivathsa.d.dara@oracle.com > Cc: mark@fasheh.com; jlbec@evilplan.org; joseph.qi@linux.alibaba.com; > Rajesh Sivaramasubramaniom rajesh.sivaramasubramaniom@oracle.com; > Junxiao Bi junxiao.bi@oracle.com; ocfs2-devel@lists.linux.dev > Subject: Re: [PATCH] ocfs2: call ocfs2_abort when journal abort > > On Mon, Oct 30, 2023 at 12:00:57PM +0000, Srivathsa Dara wrote: > > From: Ryan Ding ryan.ding@oracle.com > > > > > > journal can not recover from abort state, so we should take following > > action to prevent file system from corruption: > > Could you elaborate on how corruption can occur when the filesystem is > in this particular state? How could this be reproduced? > > [Srivathsa]: Corruption can occur due to storage failure How does that failure lead from aborting the journal to causing fs corruption? What exactly gets corrupted after journal abort? Can you provide more details and possibly a reproducer on this? [Srivathsa]: Isn't it a bad idea to leave the filesystem like a way it is with a journal error. Because journal is not working as expected, we might run into some trouble in future. > > 1. change to readonly filesystem when local mount. We can not afford > > further write, so change to RO state is reasonable. > > This is generally the strategy, but unrelated to the code that this commit > is touching. The ocfs2_commit_thread() only applies to non-local mounts. > > [Srivathsa]: Yes, you are right, this is only for cluster mode. I will remove > all local mount related comments. > > > 2. panic when cluster mount. Because we can not release lock resource in > > this state, other node will hung when it require a lock owned by this > > node. So panic and remaster is a reasonable choise. > > Panicking is never a reasonable choice :-) Why couldn't a remastering be > initiated without going through a reboot instead? > > [Srivathsa]: Without reboot, we run into a journal failure, the only option > is to have fs in read-only mode. Could you suggest any other way to make > this node release the locks held by it? Switching to read-only, and self-fencing so that another node will take over the recovery. [Srivathsa]: ok got it. > > ocfs2_abort() will do all the above work. > > > > Signed-off-by: Ryan Ding ryan.ding@oracle.com > > Signed-off-by: Srivathsa Dara srivathsa.d.dara@oracle.com > > --- > > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > > 1 file changed, 15 insertions(+), 12 deletions(-) > > > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index > > ce215565d061..6dace475f019 100644 > > --- a/fs/ocfs2/journal.c > > +++ b/fs/ocfs2/journal.c > > @@ -14,7 +14,6 @@ > > #include <linux/kthread.h> > > #include <linux/time.h> > > #include <linux/random.h> > > -#include <linux/delay.h> > > #include <linux/writeback.h> > > > > #include <cluster/masklog.h> > > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct > > ocfs2_super *osb, int quota) > > > > static int ocfs2_commit_thread(void *arg) { > > - int status; > > + int status = 0; > > struct ocfs2_super *osb = arg; > > struct ocfs2_journal *journal = osb->journal; > > > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > > wait_event_interruptible(osb->checkpoint_event, > > atomic_read(&journal->j_num_trans) > > || kthread_should_stop()); > > + if (status < 0) { > > + /* As we can not terminate by ourself, just enter an > > + * empty loop to wait for stop. > > + */ > > + continue; > > + } > > the above is unnecessary, given that below will induce panic upon failure. > > > status = ocfs2_commit_cache(osb); > > if (status < 0) { > > - static unsigned long abort_warn_time; > > - > > - /* Warn about this once per minute */ > > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > > - mlog(ML_ERROR, "status = %d, journal is " > > - "already aborted.\n", status); > > /* > > - * After ocfs2_commit_cache() fails, j_num_trans has a > > - * non-zero value. Sleep here to avoid a busy-wait > > - * loop. > > + * journal can not recover from abort state, there is > > + * no need to keep commit cache. So we should either > > + * change to readonly(local mount) or just panic > > + * (cluster mount). > > commit cache doesn't apply to local mounts. > > > + * We should also clear j_num_trans to prevent further > > + * commit. > > that's redundant given that we're about to panic. > > > */ > > - msleep_interruptible(1000); > > + atomic_set(&journal->j_num_trans, 0); > > same here. > > > + ocfs2_abort(osb->sb, "Detected aborted journal"); > > ocfs2_commit_cache returning with an error does not necessarily mean that > the journal has been aborted; jbd2_journal_flush will return an error for other > conditions too, some of which could be recoverable. > > Therefore unconditionally aborting here is not necessarily warranted, and > presumably this is why the current code is a retry loop. > > [Srivathsa]: Even the journal is not aborted, there is some serious > error with the journal, keeping it running at this stage could lead > to corruption. How, precisely? Irrespective of the journal state, this should not be leading to any corruptions, the ocfs2_commit_thread will just keep unsuccessfully attempting a journal flush. Otherwise there's a bug somewhere that needs to be directly addressed. > But even if the journal has been aborted at that point, there's no reason why > the node cannot self-fence and effectively shutdown the local fs, instead of > aborting with a panic. > > [Srivathsa]: By self-fence, did you mean system panic/reboot due > to heartbeat loss? If so, that can't be guaranteed that it will happen > because the storage failure can be intermittent. Even if that happens, > it will panic or reboot, how effectively it will shutdown the local fs, > considering the journal is failing? I mean intentionally stopping the heartbeat and switching to read-only. > [Srivathsa]: will send a v2 addressing the redundant code. Apart from the redundant code, I see two separate issues: - the claim that there is a potential fs corruption bug after a journal abort in ocfs2_commit_thread, which needs to be clarified and addressed properly. - the method of handling irrecoverable journal errors. [Srivathsa]: Switching to read-only and self fencing affect performance in production systems. Regards, Anthony
On Tue, Nov 21, 2023 at 12:19:03PM +0000, Srivathsa Dara wrote: > [Srivathsa]: Isn't it a bad idea to leave the filesystem like a way it is > with a journal error. Because journal is not working as expected, we > might run into some trouble in future. The journal being in an aborted state is part of the expected design. It implies that the filesystem accordingly should be in read-only state. Anything else is a bug, and you'd have to be much more specific which exact bug you're trying to address. Resorting to panic is simply obscuring any potential issues that may actually need fixing. > [Srivathsa]: Switching to read-only and self fencing affect > performance in production systems. How so? Again, please elaborate with specific technical details. Regards, Anthony
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index ce215565d061..6dace475f019 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -14,7 +14,6 @@ #include <linux/kthread.h> #include <linux/time.h> #include <linux/random.h> -#include <linux/delay.h> #include <linux/writeback.h> #include <cluster/masklog.h> @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super *osb, int quota) static int ocfs2_commit_thread(void *arg) { - int status; + int status = 0; struct ocfs2_super *osb = arg; struct ocfs2_journal *journal = osb->journal; @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) wait_event_interruptible(osb->checkpoint_event, atomic_read(&journal->j_num_trans) || kthread_should_stop()); + if (status < 0) { + /* As we can not terminate by ourself, just enter an + * empty loop to wait for stop. + */ + continue; + } status = ocfs2_commit_cache(osb); if (status < 0) { - static unsigned long abort_warn_time; - - /* Warn about this once per minute */ - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) - mlog(ML_ERROR, "status = %d, journal is " - "already aborted.\n", status); /* - * After ocfs2_commit_cache() fails, j_num_trans has a - * non-zero value. Sleep here to avoid a busy-wait - * loop. + * journal can not recover from abort state, there is + * no need to keep commit cache. So we should either + * change to readonly(local mount) or just panic + * (cluster mount). + * We should also clear j_num_trans to prevent further + * commit. */ - msleep_interruptible(1000); + atomic_set(&journal->j_num_trans, 0); + ocfs2_abort(osb->sb, "Detected aborted journal"); } if (kthread_should_stop() && atomic_read(&journal->j_num_trans)){