Message ID | 20210726144613.954844-7-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mirror: Handle errors after READY cancel | expand |
26.07.2021 17:46, Max Reitz wrote: > We must check whether the job is force-cancelled early in our main loop, > most importantly before any `continue` statement. For example, we used > to have `continue`s before our current checking location that are > triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept > failing, force-cancelling the job would not terminate it. > > A job being force-cancelled should be treated the same as the job having > failed, so put the check in the same place where we check `s->ret < 0`. > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/mirror.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 72e02fa34e..46d1a1e5a2 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > mirror_wait_for_any_operation(s, true); > } > > - if (s->ret < 0) { > + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { > ret = s->ret; > goto immediate_exit; > } > @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > break; > } > > - ret = 0; > - That's just a cleanup, that statement is useless pre-patch, yes? > if (job_is_ready(&s->common.job) && !should_complete) { > delay_ns = (s->in_flight == 0 && > cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); > @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), > delay_ns); > job_sleep_ns(&s->common.job, delay_ns); > - if (job_is_cancelled(&s->common.job)) { > - break; > - } > s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > } > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
On 27.07.21 15:13, Vladimir Sementsov-Ogievskiy wrote: > 26.07.2021 17:46, Max Reitz wrote: >> We must check whether the job is force-cancelled early in our main loop, >> most importantly before any `continue` statement. For example, we used >> to have `continue`s before our current checking location that are >> triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept >> failing, force-cancelling the job would not terminate it. >> >> A job being force-cancelled should be treated the same as the job having >> failed, so put the check in the same place where we check `s->ret < 0`. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block/mirror.c | 7 +------ >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/block/mirror.c b/block/mirror.c >> index 72e02fa34e..46d1a1e5a2 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, >> Error **errp) >> mirror_wait_for_any_operation(s, true); >> } >> - if (s->ret < 0) { >> + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { >> ret = s->ret; >> goto immediate_exit; >> } >> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, >> Error **errp) >> break; >> } >> - ret = 0; >> - > > That's just a cleanup, that statement is useless pre-patch, yes? I think it was intended for if we left this loop via the job_is_cancelled() condition below. Since it’s removed, this statement seems meaningless, so I removed it along with the `break`. Max > >> if (job_is_ready(&s->common.job) && !should_complete) { >> delay_ns = (s->in_flight == 0 && >> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); >> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, >> Error **errp) >> trace_mirror_before_sleep(s, cnt, >> job_is_ready(&s->common.job), >> delay_ns); >> job_sleep_ns(&s->common.job, delay_ns); >> - if (job_is_cancelled(&s->common.job)) { >> - break; >> - } >> s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); >> } >> > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >
Am 26.07.2021 um 16:46 hat Max Reitz geschrieben: > We must check whether the job is force-cancelled early in our main loop, > most importantly before any `continue` statement. For example, we used > to have `continue`s before our current checking location that are > triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept > failing, force-cancelling the job would not terminate it. > > A job being force-cancelled should be treated the same as the job having > failed, so put the check in the same place where we check `s->ret < 0`. > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/mirror.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 72e02fa34e..46d1a1e5a2 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > mirror_wait_for_any_operation(s, true); > } > > - if (s->ret < 0) { > + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { > ret = s->ret; > goto immediate_exit; > } > @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > break; > } > > - ret = 0; > - > if (job_is_ready(&s->common.job) && !should_complete) { > delay_ns = (s->in_flight == 0 && > cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); > @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), > delay_ns); > job_sleep_ns(&s->common.job, delay_ns); > - if (job_is_cancelled(&s->common.job)) { > - break; > - } I think it was intentional that the check is here because it means skipping the job_sleep_ns() and instead cancelling immediately, and we probably still want that. Between your check above and here, the coroutine can yield, so cancellation could have been newly requested. So have the check in both places, I guess? And a comment to explain why neither is redundant. > s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > } Kevin
On 03.08.21 16:34, Kevin Wolf wrote: > Am 26.07.2021 um 16:46 hat Max Reitz geschrieben: >> We must check whether the job is force-cancelled early in our main loop, >> most importantly before any `continue` statement. For example, we used >> to have `continue`s before our current checking location that are >> triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept >> failing, force-cancelling the job would not terminate it. >> >> A job being force-cancelled should be treated the same as the job having >> failed, so put the check in the same place where we check `s->ret < 0`. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block/mirror.c | 7 +------ >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/block/mirror.c b/block/mirror.c >> index 72e02fa34e..46d1a1e5a2 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >> mirror_wait_for_any_operation(s, true); >> } >> >> - if (s->ret < 0) { >> + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { >> ret = s->ret; >> goto immediate_exit; >> } >> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >> break; >> } >> >> - ret = 0; >> - >> if (job_is_ready(&s->common.job) && !should_complete) { >> delay_ns = (s->in_flight == 0 && >> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); >> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >> trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), >> delay_ns); >> job_sleep_ns(&s->common.job, delay_ns); >> - if (job_is_cancelled(&s->common.job)) { >> - break; >> - } > I think it was intentional that the check is here because it means > skipping the job_sleep_ns() and instead cancelling immediately, and we > probably still want that. Between your check above and here, the > coroutine can yield, so cancellation could have been newly requested. I’m afraid I don’t quite understand. If cancel is requested in job_sleep_ns(), then we will go back to the top of the loop, wait for in-flight active requests and then break. Waiting for the in-flight requests seems unnecessary, but does it really make a difference in practice? We don’t start new requests, so it should be legal to wait for existing ones to settle, and also I believe someone will have to wait for those in-flight requests anyway (when the mirror top node is removed). (The only thing we could do is to cancel the in-flight requests, but that is what mirror_cancel() does.) Looking more at the whole loop, there are a couple of places that can yield. Of course we can check whether the job has been cancelled after every single one of them, but that would be a bit strange. We only really need to check before we initiate new requests or want to change the state. I believe the right place to do the check would be after the job_pause_point(). And perhaps the active write functions (bdrv_mirror_top_do_write() and bdrv_mirror_top_pwritev()) should stop copying to the target if the job has been cancelled. Max > So have the check in both places, I guess? And a comment to explain why > neither is redundant. > >> s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); >> } > Kevin >
Am 04.08.2021 um 10:25 hat Max Reitz geschrieben: > On 03.08.21 16:34, Kevin Wolf wrote: > > Am 26.07.2021 um 16:46 hat Max Reitz geschrieben: > > > We must check whether the job is force-cancelled early in our main loop, > > > most importantly before any `continue` statement. For example, we used > > > to have `continue`s before our current checking location that are > > > triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept > > > failing, force-cancelling the job would not terminate it. > > > > > > A job being force-cancelled should be treated the same as the job having > > > failed, so put the check in the same place where we check `s->ret < 0`. > > > > > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > > --- > > > block/mirror.c | 7 +------ > > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > > > diff --git a/block/mirror.c b/block/mirror.c > > > index 72e02fa34e..46d1a1e5a2 100644 > > > --- a/block/mirror.c > > > +++ b/block/mirror.c > > > @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > > > mirror_wait_for_any_operation(s, true); > > > } > > > - if (s->ret < 0) { > > > + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { > > > ret = s->ret; > > > goto immediate_exit; > > > } > > > @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > > > break; > > > } > > > - ret = 0; > > > - > > > if (job_is_ready(&s->common.job) && !should_complete) { > > > delay_ns = (s->in_flight == 0 && > > > cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); > > > @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > > > trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), > > > delay_ns); > > > job_sleep_ns(&s->common.job, delay_ns); > > > - if (job_is_cancelled(&s->common.job)) { > > > - break; > > > - } > > I think it was intentional that the check is here because it means > > skipping the job_sleep_ns() and instead cancelling immediately, and we > > probably still want that. Between your check above and here, the > > coroutine can yield, so cancellation could have been newly requested. > > I’m afraid I don’t quite understand. Hm, I don't either. Somehow I thought job_sleep_ns() was after the check, while quoting the exact hunk that shows that it comes before it... I'm still not sure if sleeping before exiting is really useful, but it seems we never cared about that. Kevin
On 04.08.21 11:48, Kevin Wolf wrote: > Am 04.08.2021 um 10:25 hat Max Reitz geschrieben: >> On 03.08.21 16:34, Kevin Wolf wrote: >>> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben: >>>> We must check whether the job is force-cancelled early in our main loop, >>>> most importantly before any `continue` statement. For example, we used >>>> to have `continue`s before our current checking location that are >>>> triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept >>>> failing, force-cancelling the job would not terminate it. >>>> >>>> A job being force-cancelled should be treated the same as the job having >>>> failed, so put the check in the same place where we check `s->ret < 0`. >>>> >>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 >>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>> --- >>>> block/mirror.c | 7 +------ >>>> 1 file changed, 1 insertion(+), 6 deletions(-) >>>> >>>> diff --git a/block/mirror.c b/block/mirror.c >>>> index 72e02fa34e..46d1a1e5a2 100644 >>>> --- a/block/mirror.c >>>> +++ b/block/mirror.c >>>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >>>> mirror_wait_for_any_operation(s, true); >>>> } >>>> - if (s->ret < 0) { >>>> + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { >>>> ret = s->ret; >>>> goto immediate_exit; >>>> } >>>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >>>> break; >>>> } >>>> - ret = 0; >>>> - >>>> if (job_is_ready(&s->common.job) && !should_complete) { >>>> delay_ns = (s->in_flight == 0 && >>>> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); >>>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) >>>> trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), >>>> delay_ns); >>>> job_sleep_ns(&s->common.job, delay_ns); >>>> - if (job_is_cancelled(&s->common.job)) { >>>> - break; >>>> - } >>> I think it was intentional that the check is here because it means >>> skipping the job_sleep_ns() and instead cancelling immediately, and we >>> probably still want that. Between your check above and here, the >>> coroutine can yield, so cancellation could have been newly requested. >> I’m afraid I don’t quite understand. > Hm, I don't either. Somehow I thought job_sleep_ns() was after the > check, while quoting the exact hunk that shows that it comes before > it... > > I'm still not sure if sleeping before exiting is really useful, but it > seems we never cared about that. Jobs that are (force-)cancelled cannot yield or sleep anyway (job_sleep_ns(), job_yield(), and job_pause_point() will all return immediately when called on a cancelled job). So I thought you meant that a job can only be cancelled while it is yielding, so we should prefer to put the is_cancelled check after a yield point (like job_pause_point()) than before it. But I mean, if you’re happy, I’ll be happy, too. :) Max
diff --git a/block/mirror.c b/block/mirror.c index 72e02fa34e..46d1a1e5a2 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) mirror_wait_for_any_operation(s, true); } - if (s->ret < 0) { + if (s->ret < 0 || job_is_cancelled(&s->common.job)) { ret = s->ret; goto immediate_exit; } @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) break; } - ret = 0; - if (job_is_ready(&s->common.job) && !should_complete) { delay_ns = (s->in_flight == 0 && cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), delay_ns); job_sleep_ns(&s->common.job, delay_ns); - if (job_is_cancelled(&s->common.job)) { - break; - } s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); }
We must check whether the job is force-cancelled early in our main loop, most importantly before any `continue` statement. For example, we used to have `continue`s before our current checking location that are triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept failing, force-cancelling the job would not terminate it. A job being force-cancelled should be treated the same as the job having failed, so put the check in the same place where we check `s->ret < 0`. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Max Reitz <mreitz@redhat.com> --- block/mirror.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)