From patchwork Mon Jun 20 10:12:10 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miao Xie X-Patchwork-Id: 896652 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p5KA4Qn0020113 for ; Mon, 20 Jun 2011 10:04:26 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753607Ab1FTKEW (ORCPT ); Mon, 20 Jun 2011 06:04:22 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:59680 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753283Ab1FTKEV (ORCPT ); Mon, 20 Jun 2011 06:04:21 -0400 Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3]) by song.cn.fujitsu.com (Postfix) with ESMTP id 62ED4170127; Mon, 20 Jun 2011 18:04:18 +0800 (CST) Received: from mailserver.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id p5KA4Ejn016499; Mon, 20 Jun 2011 18:04:15 +0800 Received: from [10.167.225.64] ([10.167.225.64]) by mailserver.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.1FP4) with ESMTP id 2011062018035233-617773 ; Mon, 20 Jun 2011 18:03:52 +0800 Message-ID: <4DFF1CFA.7060201@cn.fujitsu.com> Date: Mon, 20 Jun 2011 18:12:10 +0800 From: Miao Xie Reply-To: miaox@cn.fujitsu.com User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc14 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jim Schutt CC: linux-btrfs@vger.kernel.org, Josef Bacik , ceph-devel@vger.kernel.org Subject: Re: kernel BUG at fs/btrfs/delayed-inode.c:1301! References: <4DFB7C77.6010807@sandia.gov> In-Reply-To: <4DFB7C77.6010807@sandia.gov> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-06-20 18:03:52, Serialize by Router on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-06-20 18:03:55 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Mon, 20 Jun 2011 10:04:27 +0000 (UTC) Hi, Jim Could you test the attached patch for me? I have done some quick tests, it worked well. But I'm not sure if it can fix the bug you reported or not, so I need your help! Thanks Miao On fri, 17 Jun 2011 10:10:31 -0600, Jim Schutt wrote: > Hi, > > I've hit this delayed-inode BUG several times. I'm using btrfs > as the data store for Ceph OSDs, and testing a heavy write load. > The kernel I'm running is a recent commit (f8f44f09eaa) from > Linus' tree with the for-chris branch (commit ed0ca14021e5) of > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git > merged in. > > Please let me know what I can do to help resolve this. > > [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) > [ 5447.569766] ------------[ cut here ]------------ > [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! > [ 5447.580672] invalid opcode: 0000 [#1] SMP > [ 5447.584806] CPU 2 > [ 5447.586646] Modules linked in: loop btrfs zlib_deflate lzo_compress ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp i2c_dev i2c_core ext3 jbd scsi_transport_iscsi rds ib_ipoib rdma_ucm rdma_cm ib_ucm ib_uverbs ib_umad ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod video sbs sbshc pci_slot battery acpi_pad ac kvm sg ses enclosure sd_mod megaraid_sas ide_cd_mod cdrom qla2xxx ib_mthca scsi_transport_fc ib_mad scsi_tgt ib_core button serio_raw ata_piix i5k_amb libata hwmon i5000_edac scsi_mod tpm_tis edac_core ehci_hcd pcspkr iTCO_wdt tpm dcdbas tpm_bios iTCO_vendor_support uhci_hcd rtc nfs nfs_acl auth_rpcgss fscache lockd sunrpc tg3 bnx2 e1000 [last unloaded: freq_table] > [ 5447.660248] > [ 5447.661744] Pid: 7622, comm: cosd Not tainted 3.0.0-rc3-00178-gbfc8ccb #34 Dell Inc. PowerEdge 1950/0DT097 > [ 5447.671421] RIP: 0010:[] [] btrfs_insert_delayed_dir_index+0x124/0x14c [btrfs] > [ 5447.681922] RSP: 0018:ffff88021c0edaf8 EFLAGS: 00010292 > [ 5447.687351] RAX: 000000000000009e RBX: ffff880085bf0480 RCX: 0000000000012e0f > [ 5447.694487] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffffff819aed98 > [ 5447.701631] RBP: ffff88021c0edb48 R08: ffff88021c0ed908 R09: ffffffff8189ef98 > [ 5447.708783] R10: 0000000000000000 R11: 0000000000000006 R12: 00000000ffffffef > [ 5447.715923] R13: ffff880072e64240 R14: ffff880072e64288 R15: 000000000000000d > [ 5447.723065] FS: 00007fefc66a9940(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000 > [ 5447.731178] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 5447.736934] CR2: 00000000004042b0 CR3: 00000001ca4ef000 CR4: 00000000000006e0 > [ 5447.744087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 5447.751218] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 5447.758343] Process cosd (pid: 7622, threadinfo ffff88021c0ec000, task ffff8801d92996b0) > [ 5447.766422] Stack: > [ 5447.768429] 00000001d2da3700 ffff88021c0edb88 ffff8802238d50f8 ffff880225627000 > [ 5447.775866] ffff8801e32f5e58 ffff8801d2da3700 0000000000000000 ffff8801d9aec510 > [ 5447.783291] 000000000000000d ffff8802238d50f8 ffff88021c0edbf8 ffffffffa0641c4e > [ 5447.790721] Call Trace: > [ 5447.793191] [] btrfs_insert_dir_item+0x189/0x1bb [btrfs] > [ 5447.800156] [] btrfs_add_link+0x12b/0x191 [btrfs] > [ 5447.806517] [] btrfs_add_nondir+0x31/0x58 [btrfs] > [ 5447.812876] [] btrfs_create+0xf9/0x197 [btrfs] > [ 5447.818961] [] vfs_create+0x72/0x92 > [ 5447.824090] [] do_last+0x22c/0x40b > [ 5447.829133] [] path_openat+0xc0/0x2ef > [ 5447.834438] [] ? __perf_event_task_sched_out+0x24/0x44 > [ 5447.841216] [] ? perf_event_task_sched_out+0x59/0x67 > [ 5447.847846] [] do_filp_open+0x3d/0x87 > [ 5447.853156] [] ? strncpy_from_user+0x43/0x4d > [ 5447.859072] [] ? getname_flags+0x2e/0x80 > [ 5447.864636] [] ? do_getname+0x14b/0x173 > [ 5447.870112] [] ? audit_getname+0x16/0x26 > [ 5447.875682] [] ? spin_lock+0xe/0x10 > [ 5447.880882] [] do_sys_open+0x69/0xae > [ 5447.886153] [] sys_open+0x20/0x22 > [ 5447.891114] [] system_call_fastpath+0x16/0x1b > [ 5447.897124] Code: 85 c0 41 89 c4 74 28 49 8b 45 10 49 8b 4d 00 45 89 e0 48 8b 75 c0 48 c7 c7 bb 43 69 a0 48 8b 90 e8 02 00 00 31 c0 e8 ce fb 9b e0 <0f> 0b eb fe 4c 89 f7 e8 81 7b d2 e0 4c 89 ef e8 d4 e4 ff ff 48 > [ 5447.916562] RIP [] btrfs_insert_delayed_dir_index+0x124/0x14c [btrfs] > [ 5447.924683] RSP > [ 5447.928514] ---[ end trace 461a7f9887994fe0 ]--- > > -- Jim > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > From 457f39393b2e3d475fbba029b90b6a4e17b94d43 Mon Sep 17 00:00:00 2001 From: Miao Xie Date: Mon, 20 Jun 2011 17:21:51 +0800 Subject: [PATCH] btrfs: fix inconsonant inode information When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information. Signed-off-by: Miao Xie --- fs/btrfs/delayed-inode.c | 104 +++++++++++++++++++++++++++++++++++----------- fs/btrfs/delayed-inode.h | 1 + fs/btrfs/inode.c | 12 ++++- 3 files changed, 91 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index f1cbd02..280755e 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -82,19 +82,16 @@ static inline struct btrfs_delayed_root *btrfs_get_delayed_root( return root->fs_info->delayed_root; } -static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node( - struct inode *inode) +static struct btrfs_delayed_node *btrfs_get_delayed_node(struct inode *inode) { - struct btrfs_delayed_node *node; struct btrfs_inode *btrfs_inode = BTRFS_I(inode); struct btrfs_root *root = btrfs_inode->root; u64 ino = btrfs_ino(inode); - int ret; + struct btrfs_delayed_node *node; -again: node = ACCESS_ONCE(btrfs_inode->delayed_node); if (node) { - atomic_inc(&node->refs); /* can be accessed */ + atomic_inc(&node->refs); return node; } @@ -103,7 +100,9 @@ again: if (node) { if (btrfs_inode->delayed_node) { spin_unlock(&root->inode_lock); - goto again; + BUG_ON(btrfs_inode->delayed_node != node); + atomic_inc(&node->refs); /* can be accessed */ + return node; } btrfs_inode->delayed_node = node; atomic_inc(&node->refs); /* can be accessed */ @@ -113,6 +112,23 @@ again: } spin_unlock(&root->inode_lock); + return NULL; +} + +static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node( + struct inode *inode) +{ + struct btrfs_delayed_node *node; + struct btrfs_inode *btrfs_inode = BTRFS_I(inode); + struct btrfs_root *root = btrfs_inode->root; + u64 ino = btrfs_ino(inode); + int ret; + +again: + node = btrfs_get_delayed_node(inode); + if (node) + return node; + node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS); if (!node) return ERR_PTR(-ENOMEM); @@ -548,19 +564,6 @@ struct btrfs_delayed_item *__btrfs_next_delayed_item( return next; } -static inline struct btrfs_delayed_node *btrfs_get_delayed_node( - struct inode *inode) -{ - struct btrfs_inode *btrfs_inode = BTRFS_I(inode); - struct btrfs_delayed_node *delayed_node; - - delayed_node = btrfs_inode->delayed_node; - if (delayed_node) - atomic_inc(&delayed_node->refs); - - return delayed_node; -} - static inline struct btrfs_root *btrfs_get_fs_root(struct btrfs_root *root, u64 root_id) { @@ -1404,8 +1407,7 @@ end: int btrfs_inode_delayed_dir_index_count(struct inode *inode) { - struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node; - int ret = 0; + struct btrfs_delayed_node *delayed_node = btrfs_get_delayed_node(inode); if (!delayed_node) return -ENOENT; @@ -1415,11 +1417,14 @@ int btrfs_inode_delayed_dir_index_count(struct inode *inode) * a new directory index is added into the delayed node and index_cnt * is updated now. So we needn't lock the delayed node. */ - if (!delayed_node->index_cnt) + if (!delayed_node->index_cnt) { + btrfs_release_delayed_node(delayed_node); return -EINVAL; + } BTRFS_I(inode)->index_cnt = delayed_node->index_cnt; - return ret; + btrfs_release_delayed_node(delayed_node); + return 0; } void btrfs_get_delayed_items(struct inode *inode, struct list_head *ins_list, @@ -1613,6 +1618,57 @@ static void fill_stack_inode_item(struct btrfs_trans_handle *trans, inode->i_ctime.tv_nsec); } +int btrfs_fill_inode(struct inode *inode, u32 *rdev) +{ + struct btrfs_delayed_node *delayed_node; + struct btrfs_inode_item *inode_item; + struct btrfs_timespec *tspec; + + delayed_node = btrfs_get_delayed_node(inode); + if (!delayed_node) + return -ENOENT; + + mutex_lock(&delayed_node->mutex); + if (!delayed_node->inode_dirty) { + mutex_unlock(&delayed_node->mutex); + btrfs_release_delayed_node(delayed_node); + return -ENOENT; + } + + inode_item = &delayed_node->inode_item; + + inode->i_uid = btrfs_stack_inode_uid(inode_item); + inode->i_gid = btrfs_stack_inode_gid(inode_item); + btrfs_i_size_write(inode, btrfs_stack_inode_size(inode_item)); + inode->i_mode = btrfs_stack_inode_mode(inode_item); + inode->i_nlink = btrfs_stack_inode_nlink(inode_item); + inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item)); + BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item); + BTRFS_I(inode)->sequence = btrfs_stack_inode_sequence(inode_item); + inode->i_rdev = 0; + *rdev = btrfs_stack_inode_rdev(inode_item); + BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item); + + tspec = btrfs_inode_atime(inode_item); + inode->i_atime.tv_sec = btrfs_stack_timespec_sec(tspec); + inode->i_atime.tv_nsec = btrfs_stack_timespec_nsec(tspec); + + tspec = btrfs_inode_mtime(inode_item); + inode->i_mtime.tv_sec = btrfs_stack_timespec_sec(tspec); + inode->i_mtime.tv_nsec = btrfs_stack_timespec_nsec(tspec); + + tspec = btrfs_inode_ctime(inode_item); + inode->i_ctime.tv_sec = btrfs_stack_timespec_sec(tspec); + inode->i_ctime.tv_nsec = btrfs_stack_timespec_nsec(tspec); + + inode->i_generation = BTRFS_I(inode)->generation; + BTRFS_I(inode)->index_cnt = (u64)-1; + + mutex_unlock(&delayed_node->mutex); + btrfs_release_delayed_node(delayed_node); + return 0; +} + int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *inode) { diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h index d1a6a29..8d27af4 100644 --- a/fs/btrfs/delayed-inode.h +++ b/fs/btrfs/delayed-inode.h @@ -119,6 +119,7 @@ void btrfs_kill_delayed_inode_items(struct inode *inode); int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *inode); +int btrfs_fill_inode(struct inode *inode, u32 *rdev); /* Used for drop dead root */ void btrfs_kill_all_delayed_nodes(struct btrfs_root *root); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5813dec..1d25a04 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2509,6 +2509,11 @@ static void btrfs_read_locked_inode(struct inode *inode) int maybe_acls; u32 rdev; int ret; + bool filled = false; + + ret = btrfs_fill_inode(inode, &rdev); + if (!ret) + filled = true; path = btrfs_alloc_path(); BUG_ON(!path); @@ -2520,6 +2525,10 @@ static void btrfs_read_locked_inode(struct inode *inode) goto make_bad; leaf = path->nodes[0]; + + if (filled) + goto cache_acl; + inode_item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_inode_item); if (!leaf->map_token) @@ -2556,7 +2565,7 @@ static void btrfs_read_locked_inode(struct inode *inode) BTRFS_I(inode)->index_cnt = (u64)-1; BTRFS_I(inode)->flags = btrfs_inode_flags(leaf, inode_item); - +cache_acl: /* * try to precache a NULL acl entry for files that don't have * any xattrs or acls @@ -2572,7 +2581,6 @@ static void btrfs_read_locked_inode(struct inode *inode) } btrfs_free_path(path); - inode_item = NULL; switch (inode->i_mode & S_IFMT) { case S_IFREG: