From patchwork Thu Jul 21 09:08:22 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miao Xie X-Patchwork-Id: 994032 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p6L8x37M023931 for ; Thu, 21 Jul 2011 08:59:03 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752772Ab1GUI67 (ORCPT ); Thu, 21 Jul 2011 04:58:59 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:64422 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752568Ab1GUI66 (ORCPT ); Thu, 21 Jul 2011 04:58:58 -0400 Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3]) by song.cn.fujitsu.com (Postfix) with ESMTP id 59F1A170115 for ; Thu, 21 Jul 2011 16:58:59 +0800 (CST) Received: from mailserver.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id p6L8wjXm025081 for ; Thu, 21 Jul 2011 16:58:45 +0800 Received: from [10.167.225.64] ([10.167.225.64]) by mailserver.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.1FP4) with ESMTP id 2011072116580541-892036 ; Thu, 21 Jul 2011 16:58:05 +0800 Message-ID: <4E27EC86.3030006@cn.fujitsu.com> Date: Thu, 21 Jul 2011 17:08:22 +0800 From: Miao Xie Reply-To: miaox@cn.fujitsu.com User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc14 Thunderbird/3.1.4 MIME-Version: 1.0 To: Linux Btrfs Subject: [BUG] Chunk allocation fails when the system meta-data block group is full X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-07-21 16:58:05, Serialize by Router on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-07-21 16:58:05, Serialize complete at 2011-07-21 16:58:05 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Thu, 21 Jul 2011 08:59:03 +0000 (UTC) Hi, Everyone I found there is an bug in the code of the chunk allocation by reading the code, That is: If we allocate lots of the meta-data chunks or data chunks, and make the system meta-data block group be full, then we can not allocate any chunk for ever, even though there is lots of free disk space. It is because Btrfs do not allocate any new system meta-data chunk when the old block group is full, and then we have no system meta-data space to store the new meta-data chunk information. This bug is hard to be triggered in the normal way, because we need lots of disk space to allocate new meta-data chunks, and fill the system meta-data block group. So I used a tricky method to triggered this bug: 1. modify the source of Btrfs to exclude most free space of the system meta-data block group, and change the max size of the deta chunk, by this way, we can allocate lots of the chunks and fill the system meta-data block group easily. (See the attached patch) 2. create a new Btrfs filesystem. (Data profile: single) 3. mount the new filesystem. 4. create a large file (Oops happened) ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:2602! [SNIP] Call Trace: [] btrfs_alloc_chunk+0x71/0x84 [btrfs] [] do_chunk_alloc+0x28e/0x2f3 [btrfs] [] btrfs_reserve_extent+0xfb/0x1c2 [btrfs] [] cow_file_range+0x1c0/0x32b [btrfs] [] run_delalloc_range+0xb7/0x33f [btrfs] [] __extent_writepage+0x1c1/0x5d0 [btrfs] [] ? clear_extent_buffer_uptodate+0x85/0x85 [btrfs] [] extent_write_cache_pages.clone.0+0x176/0x2ad [btrfs] [] extent_writepages+0x3e/0x53 [btrfs] [] ? uncompress_inline+0x122/0x122 [btrfs] [] btrfs_writepages+0x22/0x24 [btrfs] [] do_writepages+0x1c/0x28 [] writeback_single_inode+0xc2/0x1c3 [] writeback_sb_inodes+0xcc/0x15a [] writeback_inodes_wb+0x10a/0x11c [] balance_dirty_pages_ratelimited_nr+0x2f9/0x3fd [] __btrfs_buffered_write+0x298/0x315 [btrfs] [] ? file_update_time+0xf2/0x10c [] btrfs_file_aio_write+0x3c7/0x47e [btrfs] [] do_sync_write+0xc6/0x103 [] ? security_file_permission+0x29/0x2e [] vfs_write+0xa9/0x105 [] sys_write+0x45/0x6c [] system_call_fastpath+0x16/0x1b [SNIP] RIP [] __finish_chunk_alloc+0x176/0x1f8 [btrfs] RSP ---[ end trace 5a55cd7f2763cc4c ]--- If my analysis is right, and this bug actually exists, I think we can fix this bug by splitting the chunk allocation to two steps: 1. do chunk allocation and in-memory information update 2. update the meta-data and the system meta-data according to all the new chunks allocated at the 1st step. And we also split the 1st step to 3 sub-steps: 1. If we want to allocate a system meta-data chunk, or the free space of old system meta-data block group is not enough though we don't want to allocate a system meta-data chunk, we allocate a new system meta-data chunk and update the system meta-data space information in the memory. 2. If we want to allocate a meta-data chunk, or the free space of old meta-data block group is not enough though we don't want to allocate a meta-data chunk, we allocate a new meta-data chunk and update the meta-data space information in the memory. 3. If we want to allocate a data chunk, we allocate a new data chunk. Does anyone have other good idea to fix it? Thanks Miao (The patch that make the bug be triggered easily) --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1860fa8..8d4ab87 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7012,6 +7012,12 @@ int btrfs_read_block_groups(struct btrfs_root *root) */ exclude_super_stripes(root, cache); + if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) { + ret = add_excluded_extent(root, cache->key.objectid, + cache->key.offset - 4096); + BUG_ON(ret); + } + /* * check for two cases, either we are full, and therefore * don't need to bother with the caching work since we won't diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 19450bc..96c0c5e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2357,8 +2357,10 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, } if (type & BTRFS_BLOCK_GROUP_DATA) { - max_stripe_size = 1024 * 1024 * 1024; - max_chunk_size = 10 * max_stripe_size; +// max_stripe_size = 1024 * 1024 * 1024; +// max_chunk_size = 10 * max_stripe_size; + max_stripe_size = 64 * 1024 * 1024; + max_chunk_size = 2 * max_stripe_size; } else if (type & BTRFS_BLOCK_GROUP_METADATA) { max_stripe_size = 256 * 1024 * 1024; max_chunk_size = max_stripe_size;