Message ID | b9c33b04-08d7-87a1-3d93-d81a84e6af12@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > One ore more threads are waiting for q->mq_freeze_depth to become 0. But the > thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from > blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, > atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in > blk_queue_enter() will never be satisfied. But what does that wait_event() > want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? > Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is > about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 > as if blk_queue_dying(q) == true? That is, something like below: > > diff --git a/block/blk-core.c b/block/blk-core.c > index 85909b4..59e2496 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > smp_rmb(); > > wait_event(q->mq_freeze_wq, > - (atomic_read(&q->mq_freeze_depth) == 0 && > - (preempt || !blk_queue_preempt_only(q))) || > + atomic_read(&q->mq_freeze_depth) || > + (preempt || !blk_queue_preempt_only(q)) || > blk_queue_dying(q)); > - if (blk_queue_dying(q)) > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > return -ENODEV; > } > } That change looks wrong to me. Additionally, I think that you are looking in the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter() work fine for all block drivers except the loop driver I think that you should have a closer look at how the loop driver uses this block layer functionality. Thanks, Bart.
On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: >> One ore more threads are waiting for q->mq_freeze_depth to become 0. But the >> thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from >> blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, >> atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in >> blk_queue_enter() will never be satisfied. But what does that wait_event() >> want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? >> Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is >> about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 >> as if blk_queue_dying(q) == true? That is, something like below: >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 85909b4..59e2496 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) >> smp_rmb(); >> >> wait_event(q->mq_freeze_wq, >> - (atomic_read(&q->mq_freeze_depth) == 0 && >> - (preempt || !blk_queue_preempt_only(q))) || >> + atomic_read(&q->mq_freeze_depth) || >> + (preempt || !blk_queue_preempt_only(q)) || >> blk_queue_dying(q)); >> - if (blk_queue_dying(q)) >> + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) >> return -ENODEV; >> } >> } > > That change looks wrong to me. Hi Bart, Why does it look wrong to you? > Additionally, I think that you are looking in > the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter() > work fine for all block drivers except the loop driver I think that you should > have a closer look at how the loop driver uses this block layer functionality. > > Thanks, > > Bart. > > >
On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote: > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 85909b4..59e2496 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > smp_rmb(); > > > > > > wait_event(q->mq_freeze_wq, > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > - (preempt || !blk_queue_preempt_only(q))) || > > > + atomic_read(&q->mq_freeze_depth) || > > > + (preempt || !blk_queue_preempt_only(q)) || > > > blk_queue_dying(q)); > > > - if (blk_queue_dying(q)) > > > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > > > return -ENODEV; > > > } > > > } > > > > That change looks wrong to me. > > Hi Bart, > > Why does it look wrong to you? Because that change conflicts with the purpose of queue freezing and also because that change would inject I/O errors in code paths that shouldn't inject I/O errors. Please have a look at e.g. generic_make_request(). From the start of that function: if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else bio_io_error(bio); return ret; } The above patch changes the behavior of blk_queue_enter() code from waiting while q->mq_freeze_depth != 0 into returning -ENODEV while the request queue is frozen. That will cause generic_make_request() to call bio_io_error(bio) while a request queue is frozen if REQ_NOWAIT has not been set, which is the default behavior. So any operation that freezes the queue temporarily, e.g. changing the queue depth, concurrently with I/O processing can cause I/O to fail with -ENODEV. As you probably know failure of write requests has very annoying consequences. It e.g. causes filesystems to go into read-only mode. That's why I think that the above change is completely wrong. Bart.
> jbd2/sda1-8(PID=2271) is stuck waiting for journal commit operation. > I don't know how this thread is involved to this problem. It feels like it should be a necessary link in the chain. This is the filesystem underneath the loop device. If that hangs, we would expect the loop device to hang, but not vice versa. AIUI, you couldn't reproduce this hang on your own machine. Don't you think, your fuzzer has just found a nice exploit for reiserfs? Alan (It's interesting that you found this hang just after we fixed block_queue_enter() to wait in D state instead of S state.[1] I don't know syszkaller, so I assume from this it only flagged this case up because of the hung task warning. I.e. syzkaller didn't have its own watchdog that would have failed the test anyway. [1] the commit that caused you to CC me. "block: do not use interruptible wait anywhere" https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 )
Bart Van Assche wrote: > On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote: > > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > > index 85909b4..59e2496 100644 > > > > --- a/block/blk-core.c > > > > +++ b/block/blk-core.c > > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > > smp_rmb(); > > > > > > > > wait_event(q->mq_freeze_wq, > > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > > - (preempt || !blk_queue_preempt_only(q))) || > > > > + atomic_read(&q->mq_freeze_depth) || > > > > + (preempt || !blk_queue_preempt_only(q)) || > > > > blk_queue_dying(q)); > > > > - if (blk_queue_dying(q)) > > > > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > > > > return -ENODEV; > > > > } > > > > } > > > > > > That change looks wrong to me. > > > > Hi Bart, > > > > Why does it look wrong to you? > > Because that change conflicts with the purpose of queue freezing and also because > that change would inject I/O errors in code paths that shouldn't inject I/O errors. But waiting there until atomic_read(&q->mq_freeze_depth) becomes 0 is causing deadlock. wait_event() never returns is a bug. I think we should not wait for q->mq_freeze_depth.
diff --git a/block/blk-core.c b/block/blk-core.c index 85909b4..59e2496 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) smp_rmb(); wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || + atomic_read(&q->mq_freeze_depth) || + (preempt || !blk_queue_preempt_only(q)) || blk_queue_dying(q)); - if (blk_queue_dying(q)) + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) return -ENODEV; } }