diff mbox

A lot warnings in dmesg while running thunderbird

Message ID 20160719110534.GA96833@clm-mbp.thefacebook.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Mason July 19, 2016, 11:05 a.m. UTC
On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote:
>Hi Chris,
>
>I am able to reproduce the issue with the 'short-write' program. But before
>the call trace associated with btrfs_destroy_inode(), I see the following call
>trace ...
>
>------------[ cut here ]------------
>WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100

[ ... ]

Ok, the problem is in how we're dealing with the offset into the sector when
we fail. The dirty_sectors variable already has this accounted in it, so
this patch fixes it for me.  I ran overnight, but I'll let it go for a few
days just to make sure:

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gabriel C July 20, 2016, 12:11 a.m. UTC | #1
On 19.07.2016 13:05, Chris Mason wrote:
> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote:
>> Hi Chris,
>>
>> I am able to reproduce the issue with the 'short-write' program. But before
>> the call trace associated with btrfs_destroy_inode(), I see the following call
>> trace ...
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100
> 
> [ ... ]
> 
> Ok, the problem is in how we're dealing with the offset into the sector when
> we fail. The dirty_sectors variable already has this accounted in it, so
> this patch fixes it for me.  I ran overnight, but I'll let it go for a few
> days just to make sure:
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index fac9b839..5842423 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1629,13 +1629,11 @@ again:
>  		 * managed to copy.
>  		 */
>  		if (num_sectors > dirty_sectors) {
> -			/*
> -			 * we round down because we don't want to count
> -			 * any partial blocks actually sent through the
> -			 * IO machines
> -			 */
> -			release_bytes = round_down(release_bytes - copied,
> -				      root->sectorsize);
> +
> +			/* release everything except the sectors we dirtied */
> +			release_bytes -= dirty_sectors <<
> +				root->fs_info->sb->s_blocksize_bits;
> +
>  			if (copied > 0) {
>  				spin_lock(&BTRFS_I(inode)->lock);
>  				BTRFS_I(inode)->outstanding_extents++;
> 

Since I guess you are testing this on latest git code I started to test on latest stable.

Until now all seems file .. your test program is still running without to trigger the bug.

Also thunderbird is running without to trigger the bug.

I let it run overnight and report back.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason July 20, 2016, 1:50 p.m. UTC | #2
On 07/19/2016 08:11 PM, Gabriel C wrote:
>
>
> On 19.07.2016 13:05, Chris Mason wrote:
>> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote:
>>> Hi Chris,
>>>
>>> I am able to reproduce the issue with the 'short-write' program. But before
>>> the call trace associated with btrfs_destroy_inode(), I see the following call
>>> trace ...
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100
>>
>> [ ... ]
>>
>> Ok, the problem is in how we're dealing with the offset into the sector when
>> we fail. The dirty_sectors variable already has this accounted in it, so
>> this patch fixes it for me.  I ran overnight, but I'll let it go for a few
>> days just to make sure:
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index fac9b839..5842423 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1629,13 +1629,11 @@ again:
>>  		 * managed to copy.
>>  		 */
>>  		if (num_sectors > dirty_sectors) {
>> -			/*
>> -			 * we round down because we don't want to count
>> -			 * any partial blocks actually sent through the
>> -			 * IO machines
>> -			 */
>> -			release_bytes = round_down(release_bytes - copied,
>> -				      root->sectorsize);
>> +
>> +			/* release everything except the sectors we dirtied */
>> +			release_bytes -= dirty_sectors <<
>> +				root->fs_info->sb->s_blocksize_bits;
>> +
>>  			if (copied > 0) {
>>  				spin_lock(&BTRFS_I(inode)->lock);
>>  				BTRFS_I(inode)->outstanding_extents++;
>>
>
> Since I guess you are testing this on latest git code I started to test on latest stable.

Any v4.7-rc or v4.6 stable where the patch applies ;)

>
> Until now all seems file .. your test program is still running without to trigger the bug.
>
> Also thunderbird is running without to trigger the bug.
>
> I let it run overnight and report back.

Great, thanks!

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gabriel C July 20, 2016, 5:50 p.m. UTC | #3
On 20.07.2016 15:50, Chris Mason wrote:
> 
> 
> On 07/19/2016 08:11 PM, Gabriel C wrote:
>>
>>
>> On 19.07.2016 13:05, Chris Mason wrote:
>>> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote:
>>>> Hi Chris,
>>>>
>>>> I am able to reproduce the issue with the 'short-write' program. But before
>>>> the call trace associated with btrfs_destroy_inode(), I see the following call
>>>> trace ...
>>>>
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100
>>>
>>> [ ... ]
>>>
>>> Ok, the problem is in how we're dealing with the offset into the sector when
>>> we fail. The dirty_sectors variable already has this accounted in it, so
>>> this patch fixes it for me.  I ran overnight, but I'll let it go for a few
>>> days just to make sure:
>>>
>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>> index fac9b839..5842423 100644
>>> --- a/fs/btrfs/file.c
>>> +++ b/fs/btrfs/file.c
>>> @@ -1629,13 +1629,11 @@ again:
>>>  		 * managed to copy.
>>>  		 */
>>>  		if (num_sectors > dirty_sectors) {
>>> -			/*
>>> -			 * we round down because we don't want to count
>>> -			 * any partial blocks actually sent through the
>>> -			 * IO machines
>>> -			 */
>>> -			release_bytes = round_down(release_bytes - copied,
>>> -				      root->sectorsize);
>>> +
>>> +			/* release everything except the sectors we dirtied */
>>> +			release_bytes -= dirty_sectors <<
>>> +				root->fs_info->sb->s_blocksize_bits;
>>> +
>>>  			if (copied > 0) {
>>>  				spin_lock(&BTRFS_I(inode)->lock);
>>>  				BTRFS_I(inode)->outstanding_extents++;
>>>
>>
>> Since I guess you are testing this on latest git code I started to test on latest stable.
> 
> Any v4.7-rc or v4.6 stable where the patch applies ;)
> 
>>
>> Until now all seems file .. your test program is still running without to trigger the bug.
>>
>> Also thunderbird is running without to trigger the bug.
>>
>> I let it run overnight and report back.
> 
> Great, thanks!

After 24h of running the program and thundirbird all is still fine here.

I let it run one more day.. But looks very good.


Regards,

Gabriel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason July 21, 2016, 12:56 p.m. UTC | #4
On 07/20/2016 01:50 PM, Gabriel C wrote:
>
> After 24h of running the program and thundirbird all is still fine here.
>
> I let it run one more day.. But looks very good.
>

Thanks for your time in helping to track this down.  It'll go into the 
next merge window and be cc'd to stable.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gabriel C July 21, 2016, 9:25 p.m. UTC | #5
On 21.07.2016 14:56, Chris Mason wrote:
> On 07/20/2016 01:50 PM, Gabriel C wrote:
>>
>> After 24h of running the program and thundirbird all is still fine here.
>>
>> I let it run one more day.. But looks very good.
>>
> 
> Thanks for your time in helping to track this down.  It'll go into the 
> next merge window and be cc'd to stable.
> 

You are welcome :)

Test program was running without problems for 52h.. I think your fix is fine :)

Also feel free to add Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> to you commit.

Regrads,

Gabriel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adam Borowski July 22, 2016, 12:53 a.m. UTC | #6
On Thu, Jul 21, 2016 at 11:25:02PM +0200, Gabriel C wrote:
> On 21.07.2016 14:56, Chris Mason wrote:
> > On 07/20/2016 01:50 PM, Gabriel C wrote:
> >> After 24h of running the program and thundirbird all is still fine here.
> >>
> >> I let it run one more day.. But looks very good.
> >>
> > 
> > Thanks for your time in helping to track this down.  It'll go into the 
> > next merge window and be cc'd to stable.
> > 
> 
> You are welcome :)
> 
> Test program was running without problems for 52h.. I think your fix is fine :)

AOL.
I haven't tested it for as long, but had it running concurrently with some
big compiles (kernel, sbuild ie snapshot+dpkg+subv delete), cleanup of
several tens of daily snapshots, etc.

No problems.
diff mbox

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fac9b839..5842423 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1629,13 +1629,11 @@  again:
 		 * managed to copy.
 		 */
 		if (num_sectors > dirty_sectors) {
-			/*
-			 * we round down because we don't want to count
-			 * any partial blocks actually sent through the
-			 * IO machines
-			 */
-			release_bytes = round_down(release_bytes - copied,
-				      root->sectorsize);
+
+			/* release everything except the sectors we dirtied */
+			release_bytes -= dirty_sectors <<
+				root->fs_info->sb->s_blocksize_bits;
+
 			if (copied > 0) {
 				spin_lock(&BTRFS_I(inode)->lock);
 				BTRFS_I(inode)->outstanding_extents++;