Message ID | 20160719110534.GA96833@clm-mbp.thefacebook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 19.07.2016 13:05, Chris Mason wrote: > On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: >> Hi Chris, >> >> I am able to reproduce the issue with the 'short-write' program. But before >> the call trace associated with btrfs_destroy_inode(), I see the following call >> trace ... >> >> ------------[ cut here ]------------ >> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100 > > [ ... ] > > Ok, the problem is in how we're dealing with the offset into the sector when > we fail. The dirty_sectors variable already has this accounted in it, so > this patch fixes it for me. I ran overnight, but I'll let it go for a few > days just to make sure: > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index fac9b839..5842423 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1629,13 +1629,11 @@ again: > * managed to copy. > */ > if (num_sectors > dirty_sectors) { > - /* > - * we round down because we don't want to count > - * any partial blocks actually sent through the > - * IO machines > - */ > - release_bytes = round_down(release_bytes - copied, > - root->sectorsize); > + > + /* release everything except the sectors we dirtied */ > + release_bytes -= dirty_sectors << > + root->fs_info->sb->s_blocksize_bits; > + > if (copied > 0) { > spin_lock(&BTRFS_I(inode)->lock); > BTRFS_I(inode)->outstanding_extents++; > Since I guess you are testing this on latest git code I started to test on latest stable. Until now all seems file .. your test program is still running without to trigger the bug. Also thunderbird is running without to trigger the bug. I let it run overnight and report back. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/19/2016 08:11 PM, Gabriel C wrote: > > > On 19.07.2016 13:05, Chris Mason wrote: >> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: >>> Hi Chris, >>> >>> I am able to reproduce the issue with the 'short-write' program. But before >>> the call trace associated with btrfs_destroy_inode(), I see the following call >>> trace ... >>> >>> ------------[ cut here ]------------ >>> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100 >> >> [ ... ] >> >> Ok, the problem is in how we're dealing with the offset into the sector when >> we fail. The dirty_sectors variable already has this accounted in it, so >> this patch fixes it for me. I ran overnight, but I'll let it go for a few >> days just to make sure: >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index fac9b839..5842423 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -1629,13 +1629,11 @@ again: >> * managed to copy. >> */ >> if (num_sectors > dirty_sectors) { >> - /* >> - * we round down because we don't want to count >> - * any partial blocks actually sent through the >> - * IO machines >> - */ >> - release_bytes = round_down(release_bytes - copied, >> - root->sectorsize); >> + >> + /* release everything except the sectors we dirtied */ >> + release_bytes -= dirty_sectors << >> + root->fs_info->sb->s_blocksize_bits; >> + >> if (copied > 0) { >> spin_lock(&BTRFS_I(inode)->lock); >> BTRFS_I(inode)->outstanding_extents++; >> > > Since I guess you are testing this on latest git code I started to test on latest stable. Any v4.7-rc or v4.6 stable where the patch applies ;) > > Until now all seems file .. your test program is still running without to trigger the bug. > > Also thunderbird is running without to trigger the bug. > > I let it run overnight and report back. Great, thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20.07.2016 15:50, Chris Mason wrote: > > > On 07/19/2016 08:11 PM, Gabriel C wrote: >> >> >> On 19.07.2016 13:05, Chris Mason wrote: >>> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: >>>> Hi Chris, >>>> >>>> I am able to reproduce the issue with the 'short-write' program. But before >>>> the call trace associated with btrfs_destroy_inode(), I see the following call >>>> trace ... >>>> >>>> ------------[ cut here ]------------ >>>> WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100 >>> >>> [ ... ] >>> >>> Ok, the problem is in how we're dealing with the offset into the sector when >>> we fail. The dirty_sectors variable already has this accounted in it, so >>> this patch fixes it for me. I ran overnight, but I'll let it go for a few >>> days just to make sure: >>> >>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> index fac9b839..5842423 100644 >>> --- a/fs/btrfs/file.c >>> +++ b/fs/btrfs/file.c >>> @@ -1629,13 +1629,11 @@ again: >>> * managed to copy. >>> */ >>> if (num_sectors > dirty_sectors) { >>> - /* >>> - * we round down because we don't want to count >>> - * any partial blocks actually sent through the >>> - * IO machines >>> - */ >>> - release_bytes = round_down(release_bytes - copied, >>> - root->sectorsize); >>> + >>> + /* release everything except the sectors we dirtied */ >>> + release_bytes -= dirty_sectors << >>> + root->fs_info->sb->s_blocksize_bits; >>> + >>> if (copied > 0) { >>> spin_lock(&BTRFS_I(inode)->lock); >>> BTRFS_I(inode)->outstanding_extents++; >>> >> >> Since I guess you are testing this on latest git code I started to test on latest stable. > > Any v4.7-rc or v4.6 stable where the patch applies ;) > >> >> Until now all seems file .. your test program is still running without to trigger the bug. >> >> Also thunderbird is running without to trigger the bug. >> >> I let it run overnight and report back. > > Great, thanks! After 24h of running the program and thundirbird all is still fine here. I let it run one more day.. But looks very good. Regards, Gabriel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/20/2016 01:50 PM, Gabriel C wrote: > > After 24h of running the program and thundirbird all is still fine here. > > I let it run one more day.. But looks very good. > Thanks for your time in helping to track this down. It'll go into the next merge window and be cc'd to stable. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21.07.2016 14:56, Chris Mason wrote: > On 07/20/2016 01:50 PM, Gabriel C wrote: >> >> After 24h of running the program and thundirbird all is still fine here. >> >> I let it run one more day.. But looks very good. >> > > Thanks for your time in helping to track this down. It'll go into the > next merge window and be cc'd to stable. > You are welcome :) Test program was running without problems for 52h.. I think your fix is fine :) Also feel free to add Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> to you commit. Regrads, Gabriel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 21, 2016 at 11:25:02PM +0200, Gabriel C wrote: > On 21.07.2016 14:56, Chris Mason wrote: > > On 07/20/2016 01:50 PM, Gabriel C wrote: > >> After 24h of running the program and thundirbird all is still fine here. > >> > >> I let it run one more day.. But looks very good. > >> > > > > Thanks for your time in helping to track this down. It'll go into the > > next merge window and be cc'd to stable. > > > > You are welcome :) > > Test program was running without problems for 52h.. I think your fix is fine :) AOL. I haven't tested it for as long, but had it running concurrently with some big compiles (kernel, sbuild ie snapshot+dpkg+subv delete), cleanup of several tens of daily snapshots, etc. No problems.
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fac9b839..5842423 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1629,13 +1629,11 @@ again: * managed to copy. */ if (num_sectors > dirty_sectors) { - /* - * we round down because we don't want to count - * any partial blocks actually sent through the - * IO machines - */ - release_bytes = round_down(release_bytes - copied, - root->sectorsize); + + /* release everything except the sectors we dirtied */ + release_bytes -= dirty_sectors << + root->fs_info->sb->s_blocksize_bits; + if (copied > 0) { spin_lock(&BTRFS_I(inode)->lock); BTRFS_I(inode)->outstanding_extents++;