BUG: Hung task timeouts in for-4.10/dio

Message ID	03cda65e-3a82-6ccd-861b-67a55888eb19@kernel.dk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Subject: Re: BUG: Hung task timeouts in for-4.10/dio To: Damien Le Moal <damien.lemoal@wdc.com>, Christoph Hellwig <hch@lst.de> References: <8eed541f-b4c2-37c9-6650-216f3d0aa92f@fb.com> <21f0cddd-28d9-790d-b4fc-4635cca77ea8@deltatee.com> <9fe151a1-2d9f-28ef-25a7-bbcb01d8d972@fb.com> <c2d69ddc-28c9-bf26-5b06-74d00e88a700@deltatee.com> <6997fb64-aaac-208f-fa89-b8b4567f7321@fb.com> <a74de948-3026-d9b3-6b2e-b9cc1f3c05d4@deltatee.com> <6d532d4a-8d35-f610-65ed-efef949a3686@fb.com> <84693d82-b8dc-42b3-03da-055def7499a0@deltatee.com> <1e0e590b-fde3-a86d-bc60-8e5465928533@fb.com> <84cf8089-7d6d-0146-49fb-91a2d08650b2@wdc.com> <20161109010959.GA32126@lst.de> <b3d8c5fa-b930-b21f-3986-1ff94611c7f9@wdc.com> <a6d1d368-04a2-e999-a610-abd96add81a9@fb.com> Cc: Logan Gunthorpe <logang@deltatee.com>, linux-block@vger.kernel.org, Mike Snitzer <snitzer@redhat.com> From: Jens Axboe <axboe@kernel.dk> Message-ID: <03cda65e-3a82-6ccd-861b-67a55888eb19@kernel.dk> Date: Tue, 8 Nov 2016 19:02:52 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <a6d1d368-04a2-e999-a610-abd96add81a9@fb.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Jens Axboe Nov. 9, 2016, 2:02 a.m. UTC

On 11/08/2016 06:28 PM, Jens Axboe wrote:
> On 11/08/2016 06:25 PM, Damien Le Moal wrote:
>>
>> On 11/9/16 10:09, Christoph Hellwig wrote:
>>> Ok, sounds like I'm really the one to blame.  I'll see if I can
>>> find a reproducer.  Damien, or you using device mapper on that
>>> system?
>>
>> No LVM/md/dm used on boot. Mount is direct to the block device (SSDs
>> with ext4). The devices are simple SSDs, so no polling involved.
>>
>> The hangs suspiciously look like they are either background write or
>> flush. So I was wondering if it is indeed related to FLUSH/FUA as Logan
>> suggested or the background write stuff, rather than the direct-IO
>> optimization & polling.
>
> The background write stuff is not in either of those branches, plus the
> backtrace would have looked different. Yours is showing us waiting for a
> request. I don't think it's the direct-io or polling code, it looks like
> a generic issue.
>
>> Will try again/bisect to see if I can get more info.
>
> Maybe try and revert the one that Logan pointed his finger at, if that
> is doable.

It smells like an accounting error. One thing that I don't like with the
current scheme is the implicit knowledge that certain flags imply sync
as well. If we clear any of those flags, then we screw up accounting at
the end.

Does this make a difference?

Christoph Hellwig Nov. 9, 2016, 2:05 a.m. UTC | #1

On Tue, Nov 08, 2016 at 07:02:52PM -0700, Jens Axboe wrote:
> It smells like an accounting error. One thing that I don't like with the
> current scheme is the implicit knowledge that certain flags imply sync
> as well. If we clear any of those flags, then we screw up accounting at
> the end.
>
> Does this make a difference?

That looks interesting.  In the meantime I reproduced a similar
hang, but only half-way through an xfstests run with a non-mq device.
I'll see how far I can narrow it down and will give your patch a try as
well.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe Nov. 9, 2016, 2:11 a.m. UTC | #2

On 11/08/2016 07:05 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2016 at 07:02:52PM -0700, Jens Axboe wrote:
>> It smells like an accounting error. One thing that I don't like with the
>> current scheme is the implicit knowledge that certain flags imply sync
>> as well. If we clear any of those flags, then we screw up accounting at
>> the end.
>>
>> Does this make a difference?
>
> That looks interesting.  In the meantime I reproduced a similar
> hang, but only half-way through an xfstests run with a non-mq device.
> I'll see how far I can narrow it down and will give your patch a try as
> well.

It'd only trigger on non-mq, and the symptoms (and the bisect) point at
this being an accounting issue. Damien/Logan, would be great you could
try the debug patch I sent.

My non-mq drive on my test box has write through caching, which probably
explains why I haven't seen the issue.

Damien Le Moal Nov. 9, 2016, 2:17 a.m. UTC | #3

Jens,

On 11/9/16 11:02, Jens Axboe wrote:
> It smells like an accounting error. One thing that I don't like with the
> current scheme is the implicit knowledge that certain flags imply sync
> as well. If we clear any of those flags, then we screw up accounting at
> the end.
> 
> Does this make a difference?
> 
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index c486b7aa62ee..d70983e28115 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -395,6 +395,8 @@ void blk_insert_flush(struct request *rq)
>   	if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
>   		rq->cmd_flags &= ~REQ_FUA;
> 
> +	rq->cmd_flags |= REQ_SYNC;
> +
>   	/*
>   	 * An empty flush handed down from a stacking driver may
>   	 * translate into nothing if the underlying device does not

That worked. Still seeing hangs/failures at boot with the latest
for-4.10/block (networkmanager failing, no login shell, etc) but with
the above patch, I get a clean boot and login with network working.
Once booted, simple tests on ext4 and xfs do not show any problem, but
that was only very light testing.

Best regards.

Jens Axboe Nov. 9, 2016, 2:38 a.m. UTC | #4

On 11/08/2016 07:17 PM, Damien Le Moal wrote:
>
> Jens,
>
> On 11/9/16 11:02, Jens Axboe wrote:
>> It smells like an accounting error. One thing that I don't like with the
>> current scheme is the implicit knowledge that certain flags imply sync
>> as well. If we clear any of those flags, then we screw up accounting at
>> the end.
>>
>> Does this make a difference?
>>
>>
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index c486b7aa62ee..d70983e28115 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -395,6 +395,8 @@ void blk_insert_flush(struct request *rq)
>>   	if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
>>   		rq->cmd_flags &= ~REQ_FUA;
>>
>> +	rq->cmd_flags |= REQ_SYNC;
>> +
>>   	/*
>>   	 * An empty flush handed down from a stacking driver may
>>   	 * translate into nothing if the underlying device does not
>
> That worked. Still seeing hangs/failures at boot with the latest
> for-4.10/block (networkmanager failing, no login shell, etc) but with
> the above patch, I get a clean boot and login with network working.
> Once booted, simple tests on ext4 and xfs do not show any problem, but
> that was only very light testing.

Great, thanks for testing! So that validates the theory. I'll get
something committed with a comment. It's not the prettiest patch, but
it'll do for now.

Christoph Hellwig Nov. 9, 2016, 2:40 a.m. UTC | #5

On Tue, Nov 08, 2016 at 07:38:29PM -0700, Jens Axboe wrote:
> Great, thanks for testing! So that validates the theory. I'll get
> something committed with a comment. It's not the prettiest patch, but
> it'll do for now.

Should we revert the change instead (logically, not with an actual
git revert as that would break) and require and add REQ_SYNC to
all users instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe Nov. 9, 2016, 2:45 a.m. UTC | #6

On 11/08/2016 07:40 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2016 at 07:38:29PM -0700, Jens Axboe wrote:
>> Great, thanks for testing! So that validates the theory. I'll get
>> something committed with a comment. It's not the prettiest patch, but
>> it'll do for now.
>
> Should we revert the change instead (logically, not with an actual
> git revert as that would break) and require and add REQ_SYNC to
> all users instead?

I just committed the work-around. But yes, let's have a logical revert
and require that REQ_SYNC be set for REQ_FUA|REQ_PREFLUSH to avoid it
being this fragile.

Damien Le Moal Nov. 9, 2016, 2:55 a.m. UTC | #7

Jens,

On 11/9/16 11:45, Jens Axboe wrote:
> I just committed the work-around. But yes, let's have a logical revert
> and require that REQ_SYNC be set for REQ_FUA|REQ_PREFLUSH to avoid it
> being this fragile.

Great ! Thank you.

Logan Gunthorpe Nov. 9, 2016, 5:03 p.m. UTC | #8

Hey,

I just tested with the latest for-4.10/block branch and it looks like it
fixed our problem.

Thanks!

Logan

On 08/11/16 07:55 PM, Damien Le Moal wrote:
> 
> Jens,
> 
> On 11/9/16 11:45, Jens Axboe wrote:
>> I just committed the work-around. But yes, let's have a logical revert
>> and require that REQ_SYNC be set for REQ_FUA|REQ_PREFLUSH to avoid it
>> being this fragile.
> 
> Great ! Thank you.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

BUG: Hung task timeouts in for-4.10/dio

Commit Message

Comments

Patch