From patchwork Wed Jun 18 10:33:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Lyakas X-Patchwork-Id: 4375081 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8478ABEEAA for ; Wed, 18 Jun 2014 10:33:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5F9BD202DD for ; Wed, 18 Jun 2014 10:33:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 397CB202D1 for ; Wed, 18 Jun 2014 10:33:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965682AbaFRKdh (ORCPT ); Wed, 18 Jun 2014 06:33:37 -0400 Received: from mail-we0-f176.google.com ([74.125.82.176]:40105 "EHLO mail-we0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965039AbaFRKdf (ORCPT ); Wed, 18 Jun 2014 06:33:35 -0400 Received: by mail-we0-f176.google.com with SMTP id u56so613246wes.21 for ; Wed, 18 Jun 2014 03:33:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=fxU5qlbAs45DDsq6u1Rh00jcIROe3APLw0kDvgWVFps=; b=KzDlgIT3ggz9yWNOazH6r/BEplvB/1YbpHpwUNUe19IAsgftLr36YfQY4xuSv2OG+T jJ9a6Q6RDHonauTmLJ1JB4QjZBFCx5yTjn1rGPFlmxMGUf7W64E/LJMPeutQR9psdl/Q YzfcEfjOy3ALAHySyyemFxcM/aGAwlKoTIOGavPnwiCVyuF9gJM8lUkBpkrxnxz2kI8o 8CzO5G1/ACBdWLubuw01oT+ilbBL8gSVhHQJos1zCTmg2TeaqSyuIdBuHHuHvFsIvup5 dQQZWGPmyxGSVPliWgYDg6/s1zvhppZW35vcCJLU9y1OALtFtgeB0ohr0zknTsU+eLVa n6/w== X-Gm-Message-State: ALoCoQm+6hFZCQyJEqhmbywMCyqJ15Ew/qTDHLbpaxj5rdGzsUHz6qMU+Xpab6qMb9L+8QZOOl8I MIME-Version: 1.0 X-Received: by 10.194.93.228 with SMTP id cx4mr46746192wjb.61.1403087614200; Wed, 18 Jun 2014 03:33:34 -0700 (PDT) Received: by 10.194.86.100 with HTTP; Wed, 18 Jun 2014 03:33:34 -0700 (PDT) In-Reply-To: References: <1376935725-27427-1-git-send-email-fdmanana@gmail.com> <1376956371-6764-1-git-send-email-fdmanana@gmail.com> Date: Wed, 18 Jun 2014 13:33:34 +0300 Message-ID: Subject: Re: [PATCH v6] Btrfs: fix memory leak of orphan block rsv From: Alex Lyakas To: Filipe Manana Cc: linux-btrfs Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Filipe, I finally got to debug this deeper. As it turns out, this happens only if both "nospace_cache" and "clear_cache" are specified. You need to unmount and mount again to cause this. After mounting, due to "clear_cache", all the block-groups are marked as BTRFS_DC_CLEAR, and then cache_save_setup() is called on them (this function is called only in case of BTRFS_DC_CLEAR). So cache_save_setup() goes ahead and creates the free-space inode. But then it realizes that it was mounted with nospace_cache, so it does not put any content in the inode. But the inode itself gets created. The patch that fixes this for me: alex@ubuntu-alex:/mnt/work/alex/linux-stable/source$ git diff -U10 Thanks, Alex. On Wed, Nov 6, 2013 at 3:19 PM, Filipe David Manana wrote: > On Mon, Nov 4, 2013 at 12:16 PM, Alex Lyakas > wrote: >> Hi Filipe, >> any luck with this patch?:) > > Hey Alex, > > I haven't digged further, but I remember I couldn't reproduce your > issue (with latest btrfs-next of that day) of getting the free space > inodes created even when mount option nospace_cache is given. > > What kernel were you using? > >> >> Alex. >> >> On Wed, Oct 23, 2013 at 5:26 PM, Filipe David Manana wrote: >>> On Wed, Oct 23, 2013 at 3:14 PM, Alex Lyakas >>> wrote: >>>> Hello, >>>> >>>> On Wed, Oct 23, 2013 at 4:35 PM, Filipe David Manana wrote: >>>>> On Wed, Oct 23, 2013 at 2:33 PM, Alex Lyakas >>>>> wrote: >>>>>> Hi Filipe, >>>>>> >>>>>> >>>>>> On Tue, Aug 20, 2013 at 2:52 AM, Filipe David Borba Manana >>>>>> wrote: >>>>>>> >>>>>>> This issue is simple to reproduce and observe if kmemleak is enabled. >>>>>>> Two simple ways to reproduce it: >>>>>>> >>>>>>> ** 1 >>>>>>> >>>>>>> $ mkfs.btrfs -f /dev/loop0 >>>>>>> $ mount /dev/loop0 /mnt/btrfs >>>>>>> $ btrfs balance start /mnt/btrfs >>>>>>> $ umount /mnt/btrfs >>>> >>>> So here it seems that the leak can only happen in case the block-group >>>> has a free-space inode. This is what the orphan item is added for. >>>> Yes, here kmemleak reports. >>>> But: if space_cache option is disabled (and nospace_cache) enabled, it >>>> seems that btrfs still creates the FREE_SPACE inodes, although they >>>> are empty because in cache_save_setup: >>>> >>>> inode = lookup_free_space_inode(root, block_group, path); >>>> if (IS_ERR(inode) && PTR_ERR(inode) != -ENOENT) { >>>> ret = PTR_ERR(inode); >>>> btrfs_release_path(path); >>>> goto out; >>>> } >>>> >>>> if (IS_ERR(inode)) { >>>> ... >>>> ret = create_free_space_inode(root, trans, block_group, path); >>>> >>>> and only later it actually sets BTRFS_DC_WRITTEN if space_cache option >>>> is disabled. Amazing! >>>> Although this is a different issue, do you know perhaps why these >>>> empty inodes are needed? >>> >>> Don't know if they are needed. But you have a point, it seems odd to >>> create the free space cache inode if mount option nospace_cache was >>> supplied. Thanks Alex. Testing the following patch: >>> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>> index c43ee8a..eb1b7da 100644 >>> --- a/fs/btrfs/extent-tree.c >>> +++ b/fs/btrfs/extent-tree.c >>> @@ -3162,6 +3162,9 @@ static int cache_save_setup(struct >>> btrfs_block_group_cache *block_group, >>> int retries = 0; >>> int ret = 0; >>> >>> + if (!btrfs_test_opt(root, SPACE_CACHE)) >>> + return 0; >>> + >>> /* >>> * If this block group is smaller than 100 megs don't bother caching the >>> * block group. >>> >>> >>>> >>>> Thanks! >>>> Alex. >>>> >>>> >>>> >>>>>>> >>>>>>> ** 2 >>>>>>> >>>>>>> $ mkfs.btrfs -f /dev/loop0 >>>>>>> $ mount /dev/loop0 /mnt/btrfs >>>>>>> $ touch /mnt/btrfs/foobar >>>>>>> $ rm -f /mnt/btrfs/foobar >>>>>>> $ umount /mnt/btrfs >>>>>> >>>>>> >>>>>> I tried the second repro script on kernel 3.8.13, and kmemleak does >>>>>> not report a leak (even if I force the kmemleak scan). I did not try >>>>>> the balance-repro script, though. Am I missing something? >>>>> >>>>> Maybe it's not an issue on 3.8.13 and older releases. >>>>> This was on btrfs-next from August 19. >>>>> >>>>> thanks for testing >>>>> >>>>>> >>>>>> Thanks, >>>>>> Alex. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> After a while, kmemleak reports the leak: >>>>>>> >>>>>>> $ cat /sys/kernel/debug/kmemleak >>>>>>> unreferenced object 0xffff880402b13e00 (size 128): >>>>>>> comm "btrfs", pid 19621, jiffies 4341648183 (age 70057.844s) >>>>>>> hex dump (first 32 bytes): >>>>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>>>>> 00 fc c6 b1 04 88 ff ff 04 00 04 00 ad 4e ad de .............N.. >>>>>>> backtrace: >>>>>>> [] kmemleak_alloc+0x26/0x50 >>>>>>> [] kmem_cache_alloc_trace+0xeb/0x1d0 >>>>>>> [] btrfs_alloc_block_rsv+0x39/0x70 [btrfs] >>>>>>> [] btrfs_orphan_add+0x13d/0x1b0 [btrfs] >>>>>>> [] btrfs_remove_block_group+0x143/0x500 [btrfs] >>>>>>> [] btrfs_relocate_chunk.isra.63+0x618/0x790 [btrfs] >>>>>>> [] btrfs_balance+0x8f7/0xe90 [btrfs] >>>>>>> [] btrfs_ioctl_balance+0x250/0x550 [btrfs] >>>>>>> [] btrfs_ioctl+0xdfa/0x25f0 [btrfs] >>>>>>> [] do_vfs_ioctl+0x96/0x570 >>>>>>> [] SyS_ioctl+0x91/0xb0 >>>>>>> [] system_call_fastpath+0x16/0x1b >>>>>>> [] 0xffffffffffffffff >>>>>>> >>>>>>> This affects btrfs-next, revision be8e3cd00d7293dd177e3f8a4a1645ce09ca3acb >>>>>>> (Btrfs: separate out tests into their own directory). >>>>>>> >>>>>>> Signed-off-by: Filipe David Borba Manana >>>>>>> --- >>>>>>> >>>>>>> V2: removed atomic_t member in struct btrfs_block_rsv, as suggested by >>>>>>> Josef Bacik, and use instead the condition reserved == 0 to decide >>>>>>> when to free the block. >>>>>>> V3: simplified patch, just kfree() (and not btrfs_free_block_rsv) the >>>>>>> root's orphan_block_rsv when free'ing the root. Thanks Josef for >>>>>>> the suggestion. >>>>>>> V4: use btrfs_free_block_rsv() instead of kfree(). The error I was getting >>>>>>> in xfstests when using btrfs_free_block_rsv() was unrelated, Josef just >>>>>>> pointed it to me (separate issue). >>>>>>> V5: move the free call below the iput() call, so that btrfs_evict_node() >>>>>>> can process the orphan_block_rsv first to do some needed cleanup before >>>>>>> we free it. >>>>>>> V6: free the root's orphan_block_rsv in close_ctree() too. After a balance >>>>>>> the orphan_block_rsv of the tree of tree roots was being leaked, because >>>>>>> free_fs_root() is only called for filesystem trees. >>>>>>> >>>>>>> fs/btrfs/disk-io.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>>>>> index 3b12c26..5d17163 100644 >>>>>>> --- a/fs/btrfs/disk-io.c >>>>>>> +++ b/fs/btrfs/disk-io.c >>>>>>> @@ -3430,6 +3430,8 @@ static void free_fs_root(struct btrfs_root *root) >>>>>>> { >>>>>>> iput(root->cache_inode); >>>>>>> WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); >>>>>>> + btrfs_free_block_rsv(root, root->orphan_block_rsv); >>>>>>> + root->orphan_block_rsv = NULL; >>>>>>> if (root->anon_dev) >>>>>>> free_anon_bdev(root->anon_dev); >>>>>>> free_extent_buffer(root->node); >>>>>>> @@ -3582,6 +3584,9 @@ int close_ctree(struct btrfs_root *root) >>>>>>> >>>>>>> btrfs_free_stripe_hash_table(fs_info); >>>>>>> >>>>>>> + btrfs_free_block_rsv(root, root->orphan_block_rsv); >>>>>>> + root->orphan_block_rsv = NULL; >>>>>>> + >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> -- >>>>>>> 1.7.9.5 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Filipe David Manana, >>>>> >>>>> "Reasonable men adapt themselves to the world. >>>>> Unreasonable men adapt the world to themselves. >>>>> That's why all progress depends on unreasonable men." >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d170412..06f876e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2941,20 +2941,26 @@ again: goto out; } if (IS_ERR(inode)) { BUG_ON(retries); retries++; if (block_group->ro) goto out_free; + /* with nospace_cache avoid creating the free-space inode */ + if (!btrfs_test_opt(root, SPACE_CACHE)) { + dcs = BTRFS_DC_WRITTEN; + goto out_free; + } + ret = create_free_space_inode(root, trans, block_group, path); if (ret) goto out_free; goto again; } /* We've already setup this transaction, go ahead and exit */ if (block_group->cache_generation == trans->transid && i_size_read(inode)) { dcs = BTRFS_DC_SETUP;