Message ID | c4df68a7-706b-0216-b2a0-a177789f380f@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs_repair: fix progress reporting | expand |
On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: > Long ago, a young developer tried to fix a segfault in xfs_repair where > a short progress reporting interval could cause a timer to go off and try > to print a progress mesage before any had been properly set up because > we were still busy zeroing the log, and a NULL pointer dereference > ensued. > > That young developer got it wrong, and completely broke progress > reporting, because the change caused us to exit early from the pthread > start routine, and not initialize the progress timer at all. > > That developer is now slightly older and wiser, and finally realizes that > the simple and correct solution here is to initialize the message format > to the first one in the list, so that we will be ready to go with a > progress message no matter when the first timer goes off. > > Reported-by: Leonardo Vaz <lvaz@redhat.com> > Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > It might be nice to add progress reporting for the log zeroing, but that > requires renumbering all these macros, and we don't/can't actually get > any fine-grained progress at all, so probably not worth it. > > diff --git a/repair/progress.c b/repair/progress.c > index 5ee08229..d7baa606 100644 > --- a/repair/progress.c > +++ b/repair/progress.c > @@ -125,7 +125,11 @@ init_progress_rpt (void) > */ > > pthread_mutex_init(&global_msgs.mutex, NULL); > - global_msgs.format = NULL; > + /* > + * Ensure the format string is not NULL in case the first timer > + * goes off before any stage calls set_progress_msg() to set it. > + */ > + global_msgs.format = &progress_rpt_reports[0]; Hmm so does that mean the first progress report could be for "scanning freespace"? Or could you append a new entry to progress_rpt_reports for "getting my shit together and moving out of my parents basement" and initialize it to that? --D > global_msgs.count = glob_agcount; > global_msgs.interval = report_interval; > global_msgs.done = prog_rpt_done; > @@ -171,10 +175,6 @@ progress_rpt_thread (void *p) > msg_block_t *msgp = (msg_block_t *)p; > uint64_t percent; > > - /* It's possible to get here very early w/ no progress msg set */ > - if (!msgp->format) > - return NULL; > - > if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) > do_error (_("progress_rpt: cannot malloc progress msg buffer\n")); > >
On 5/18/20 7:58 PM, Darrick J. Wong wrote: > On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: >> Long ago, a young developer tried to fix a segfault in xfs_repair where >> a short progress reporting interval could cause a timer to go off and try >> to print a progress mesage before any had been properly set up because >> we were still busy zeroing the log, and a NULL pointer dereference >> ensued. >> >> That young developer got it wrong, and completely broke progress >> reporting, because the change caused us to exit early from the pthread >> start routine, and not initialize the progress timer at all. >> >> That developer is now slightly older and wiser, and finally realizes that >> the simple and correct solution here is to initialize the message format >> to the first one in the list, so that we will be ready to go with a >> progress message no matter when the first timer goes off. >> >> Reported-by: Leonardo Vaz <lvaz@redhat.com> >> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >> --- >> >> It might be nice to add progress reporting for the log zeroing, but that >> requires renumbering all these macros, and we don't/can't actually get >> any fine-grained progress at all, so probably not worth it. >> >> diff --git a/repair/progress.c b/repair/progress.c >> index 5ee08229..d7baa606 100644 >> --- a/repair/progress.c >> +++ b/repair/progress.c >> @@ -125,7 +125,11 @@ init_progress_rpt (void) >> */ >> >> pthread_mutex_init(&global_msgs.mutex, NULL); >> - global_msgs.format = NULL; >> + /* >> + * Ensure the format string is not NULL in case the first timer >> + * goes off before any stage calls set_progress_msg() to set it. >> + */ >> + global_msgs.format = &progress_rpt_reports[0]; > > Hmm so does that mean the first progress report could be for "scanning > freespace"? Yes. But unless "zeroing the log" takes more than the report interval, which by default is 15 minutes, it won't be wrong. > Or could you append a new entry to progress_rpt_reports for "getting my > shit together and moving out of my parents basement" and initialize it > to that? Oh I guess it could be appended and wouldn't have to be out of order but honestly I don't think it's worth it, even a big slow log shouldn't take long enough to need a progress report. -Eric > --D > >> global_msgs.count = glob_agcount; >> global_msgs.interval = report_interval; >> global_msgs.done = prog_rpt_done; >> @@ -171,10 +175,6 @@ progress_rpt_thread (void *p) >> msg_block_t *msgp = (msg_block_t *)p; >> uint64_t percent; >> >> - /* It's possible to get here very early w/ no progress msg set */ >> - if (!msgp->format) >> - return NULL; >> - >> if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) >> do_error (_("progress_rpt: cannot malloc progress msg buffer\n")); >> >> >
On Mon, May 18, 2020 at 08:03:09PM -0500, Eric Sandeen wrote: > On 5/18/20 7:58 PM, Darrick J. Wong wrote: > > On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: > >> Long ago, a young developer tried to fix a segfault in xfs_repair where > >> a short progress reporting interval could cause a timer to go off and try > >> to print a progress mesage before any had been properly set up because > >> we were still busy zeroing the log, and a NULL pointer dereference > >> ensued. > >> > >> That young developer got it wrong, and completely broke progress > >> reporting, because the change caused us to exit early from the pthread > >> start routine, and not initialize the progress timer at all. > >> > >> That developer is now slightly older and wiser, and finally realizes that > >> the simple and correct solution here is to initialize the message format > >> to the first one in the list, so that we will be ready to go with a > >> progress message no matter when the first timer goes off. > >> > >> Reported-by: Leonardo Vaz <lvaz@redhat.com> > >> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") > >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > >> --- > >> > >> It might be nice to add progress reporting for the log zeroing, but that > >> requires renumbering all these macros, and we don't/can't actually get > >> any fine-grained progress at all, so probably not worth it. > >> > >> diff --git a/repair/progress.c b/repair/progress.c > >> index 5ee08229..d7baa606 100644 > >> --- a/repair/progress.c > >> +++ b/repair/progress.c > >> @@ -125,7 +125,11 @@ init_progress_rpt (void) > >> */ > >> > >> pthread_mutex_init(&global_msgs.mutex, NULL); > >> - global_msgs.format = NULL; > >> + /* > >> + * Ensure the format string is not NULL in case the first timer > >> + * goes off before any stage calls set_progress_msg() to set it. > >> + */ > >> + global_msgs.format = &progress_rpt_reports[0]; > > > > Hmm so does that mean the first progress report could be for "scanning > > freespace"? > > Yes. But unless "zeroing the log" takes more than the report interval, > which by default is 15 minutes, it won't be wrong. > > > Or could you append a new entry to progress_rpt_reports for "getting my > > shit together and moving out of my parents basement" and initialize it > > to that? > > Oh I guess it could be appended and wouldn't have to be out of order but > honestly I don't think it's worth it, even a big slow log shouldn't take > long enough to need a progress report. Zeroing a 2048MB log in 900s = 2.28MB/s So I guess that's unlikely... but it still feels like leaving some kind of weird logic bomb lurking where if we decrease the interval or someone throws us a slow cloudy block store, we'll start issuing weird progress messsages about some other part of xfs_repair which hasn't even started yet. <shrug> --D > -Eric > > > --D > > > >> global_msgs.count = glob_agcount; > >> global_msgs.interval = report_interval; > >> global_msgs.done = prog_rpt_done; > >> @@ -171,10 +175,6 @@ progress_rpt_thread (void *p) > >> msg_block_t *msgp = (msg_block_t *)p; > >> uint64_t percent; > >> > >> - /* It's possible to get here very early w/ no progress msg set */ > >> - if (!msgp->format) > >> - return NULL; > >> - > >> if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) > >> do_error (_("progress_rpt: cannot malloc progress msg buffer\n")); > >> > >> > >
On 5/18/20 8:13 PM, Darrick J. Wong wrote: > On Mon, May 18, 2020 at 08:03:09PM -0500, Eric Sandeen wrote: >> On 5/18/20 7:58 PM, Darrick J. Wong wrote: >>> On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: >>>> Long ago, a young developer tried to fix a segfault in xfs_repair where >>>> a short progress reporting interval could cause a timer to go off and try >>>> to print a progress mesage before any had been properly set up because >>>> we were still busy zeroing the log, and a NULL pointer dereference >>>> ensued. >>>> >>>> That young developer got it wrong, and completely broke progress >>>> reporting, because the change caused us to exit early from the pthread >>>> start routine, and not initialize the progress timer at all. >>>> >>>> That developer is now slightly older and wiser, and finally realizes that >>>> the simple and correct solution here is to initialize the message format >>>> to the first one in the list, so that we will be ready to go with a >>>> progress message no matter when the first timer goes off. >>>> >>>> Reported-by: Leonardo Vaz <lvaz@redhat.com> >>>> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >>>> --- >>>> >>>> It might be nice to add progress reporting for the log zeroing, but that >>>> requires renumbering all these macros, and we don't/can't actually get >>>> any fine-grained progress at all, so probably not worth it. >>>> >>>> diff --git a/repair/progress.c b/repair/progress.c >>>> index 5ee08229..d7baa606 100644 >>>> --- a/repair/progress.c >>>> +++ b/repair/progress.c >>>> @@ -125,7 +125,11 @@ init_progress_rpt (void) >>>> */ >>>> >>>> pthread_mutex_init(&global_msgs.mutex, NULL); >>>> - global_msgs.format = NULL; >>>> + /* >>>> + * Ensure the format string is not NULL in case the first timer >>>> + * goes off before any stage calls set_progress_msg() to set it. >>>> + */ >>>> + global_msgs.format = &progress_rpt_reports[0]; >>> >>> Hmm so does that mean the first progress report could be for "scanning >>> freespace"? >> >> Yes. But unless "zeroing the log" takes more than the report interval, >> which by default is 15 minutes, it won't be wrong. >> >>> Or could you append a new entry to progress_rpt_reports for "getting my >>> shit together and moving out of my parents basement" and initialize it >>> to that? >> >> Oh I guess it could be appended and wouldn't have to be out of order but >> honestly I don't think it's worth it, even a big slow log shouldn't take >> long enough to need a progress report. > > Zeroing a 2048MB log in 900s = 2.28MB/s > > So I guess that's unlikely... but it still feels like leaving some kind > of weird logic bomb lurking where if we decrease the interval or someone > throws us a slow cloudy block store, we'll start issuing weird progress > messsages about some other part of xfs_repair which hasn't even started > yet. <shrug> welp, alright, I'll add a log status message.
diff --git a/repair/progress.c b/repair/progress.c index 5ee08229..d7baa606 100644 --- a/repair/progress.c +++ b/repair/progress.c @@ -125,7 +125,11 @@ init_progress_rpt (void) */ pthread_mutex_init(&global_msgs.mutex, NULL); - global_msgs.format = NULL; + /* + * Ensure the format string is not NULL in case the first timer + * goes off before any stage calls set_progress_msg() to set it. + */ + global_msgs.format = &progress_rpt_reports[0]; global_msgs.count = glob_agcount; global_msgs.interval = report_interval; global_msgs.done = prog_rpt_done; @@ -171,10 +175,6 @@ progress_rpt_thread (void *p) msg_block_t *msgp = (msg_block_t *)p; uint64_t percent; - /* It's possible to get here very early w/ no progress msg set */ - if (!msgp->format) - return NULL; - if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) do_error (_("progress_rpt: cannot malloc progress msg buffer\n"));
Long ago, a young developer tried to fix a segfault in xfs_repair where a short progress reporting interval could cause a timer to go off and try to print a progress mesage before any had been properly set up because we were still busy zeroing the log, and a NULL pointer dereference ensued. That young developer got it wrong, and completely broke progress reporting, because the change caused us to exit early from the pthread start routine, and not initialize the progress timer at all. That developer is now slightly older and wiser, and finally realizes that the simple and correct solution here is to initialize the message format to the first one in the list, so that we will be ready to go with a progress message no matter when the first timer goes off. Reported-by: Leonardo Vaz <lvaz@redhat.com> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- It might be nice to add progress reporting for the log zeroing, but that requires renumbering all these macros, and we don't/can't actually get any fine-grained progress at all, so probably not worth it.