Message ID | 20140421121824.66710cc2496f84740324137d@linux-foundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 21, 2014 at 12:18:24PM -0700, Andrew Morton wrote: > On Fri, 18 Apr 2014 17:18:27 +0800 Joseph Qi <joseph.qi@huawei.com> wrote: > > > >>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > > >>>> + mlog(ML_ERROR, "status = %d, journal is " > > >>>> + "already aborted.\n", status); > > >>>> + msleep_interruptible(1000); > > >>>> + } > > >>> > > >>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue > > >>> right after this anyway - is there a problem with it waiting on that? > > >>> > > >> Since jbd2 is already aborted, commit cache is meaningless. > > > > > > I understand that, but I'm asking why the msleep and whether we can avoid > > > that. To go back to my question: > > > > > > "ocfs2_commit_thread will wait on the checkpoint_event queue right after > > > this anyway - is there a problem with it waiting on that?" > > > > > > Thanks, > > > --Mark > > Sorry for my obscure description. > > If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared. > > Then the condition of checkpoint event still evaluates true, so it won't wait. > > If Mark didn't understand the reason for the msleep then nobody weill, > so we need to add a comment. This? > > --- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix > +++ a/fs/ocfs2/journal.c > @@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg > if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > mlog(ML_ERROR, "status = %d, journal is " > "already aborted.\n", status); > + /* > + * After ocfs2_commit_cache() fails, j_num_trans has a > + * non-zero value. Sleep here to avoid a busy-wait > + * loop. > + */ > msleep_interruptible(1000); > } > > > This patch seems rather hacky :( Isn't there a better solution? Right, that's what I was getting at with my followup later on in the mail thread about this. > Why even keep the kernel thread running after an abort? The msleep is papering over the real issue. Either the thread should shut down or we need to re-evaluate usage of j_num_trans which is the condition that keeps it from sleeping (and from a quick glance it doesn't seem like j_num_trans does anything for us). --Mark -- Mark Fasheh
On 2014/4/22 4:51, Mark Fasheh wrote: > On Mon, Apr 21, 2014 at 12:18:24PM -0700, Andrew Morton wrote: >> On Fri, 18 Apr 2014 17:18:27 +0800 Joseph Qi <joseph.qi@huawei.com> wrote: >> >>>>>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >>>>>>> + mlog(ML_ERROR, "status = %d, journal is " >>>>>>> + "already aborted.\n", status); >>>>>>> + msleep_interruptible(1000); >>>>>>> + } >>>>>> >>>>>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue >>>>>> right after this anyway - is there a problem with it waiting on that? >>>>>> >>>>> Since jbd2 is already aborted, commit cache is meaningless. >>>> >>>> I understand that, but I'm asking why the msleep and whether we can avoid >>>> that. To go back to my question: >>>> >>>> "ocfs2_commit_thread will wait on the checkpoint_event queue right after >>>> this anyway - is there a problem with it waiting on that?" >>>> >>>> Thanks, >>>> --Mark >>> Sorry for my obscure description. >>> If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared. >>> Then the condition of checkpoint event still evaluates true, so it won't wait. >> >> If Mark didn't understand the reason for the msleep then nobody weill, >> so we need to add a comment. This? >> >> --- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix >> +++ a/fs/ocfs2/journal.c >> @@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg >> if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >> mlog(ML_ERROR, "status = %d, journal is " >> "already aborted.\n", status); >> + /* >> + * After ocfs2_commit_cache() fails, j_num_trans has a >> + * non-zero value. Sleep here to avoid a busy-wait >> + * loop. >> + */ >> msleep_interruptible(1000); >> } >> >> >> This patch seems rather hacky :( Isn't there a better solution? > > Right, that's what I was getting at with my followup later on in the mail > thread about this. > > >> Why even keep the kernel thread running after an abort? > > The msleep is papering over the real issue. Either the thread should shut > down or we need to re-evaluate usage of j_num_trans which is the condition > that keeps it from sleeping (and from a quick glance it doesn't seem like > j_num_trans does anything for us). > --Mark > AFAIK, the commit thread ends only if dismounting volume. Journal abort is different from journal shutdown, that's why leaves it running. > -- > Mark Fasheh > > . >
--- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix +++ a/fs/ocfs2/journal.c @@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) mlog(ML_ERROR, "status = %d, journal is " "already aborted.\n", status); + /* + * After ocfs2_commit_cache() fails, j_num_trans has a + * non-zero value. Sleep here to avoid a busy-wait + * loop. + */ msleep_interruptible(1000); }