Message ID | 20180228155610.5828-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 28, 2018 at 03:56:10PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > If we have a file with 2 (or more) hard links in the same directory, > remove one of the hard links, create a new file (or link an existing file) > in the same directory with the name of the removed hard link, and then > finally fsync the new file, we end up with a log that fails to replay, > causing a mount failure. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ mkdir /mnt/testdir > $ touch /mnt/testdir/foo > $ ln /mnt/testdir/foo /mnt/testdir/bar > > $ sync > > $ unlink /mnt/testdir/bar > $ touch /mnt/testdir/bar > $ xfs_io -c "fsync" /mnt/testdir/bar > > <power failure> > > $ mount /dev/sdb /mnt > mount: mount(2) failed: /mnt: No such file or directory > > When replaying the log, for that example, we also see the following in > dmesg/syslog: > > [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 > [71813.674204] ------------[ cut here ]------------ > [71813.675694] BTRFS: Transaction aborted (error -2) > [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] > [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] > [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1 > [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] > [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286 > [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001 > [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff > [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001 > [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8 > [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101 > [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 > [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0 > [71813.679669] Call Trace: > [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] > [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] > [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] > [71813.679669] ? __lock_is_held+0x39/0x71 > [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] > [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] > [71813.679669] ? rcu_read_unlock+0x3a/0x57 > [71813.679669] ? __lock_is_held+0x39/0x71 > [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] > [71813.679669] walk_log_tree+0xad/0x188 [btrfs] > [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] > [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] > [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] > [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 > [71813.679669] ? mount_fs+0x64/0x10b > [71813.679669] mount_fs+0x64/0x10b > [71813.679669] vfs_kern_mount+0x68/0xce > [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 > [71813.679669] ? mount_fs+0x64/0x10b > [71813.679669] mount_fs+0x64/0x10b > [71813.679669] vfs_kern_mount+0x68/0xce > [71813.679669] do_mount+0x6e5/0x973 > [71813.679669] ? memdup_user+0x3e/0x5c > [71813.679669] SyS_mount+0x72/0x98 > [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b > [71813.679669] RIP: 0033:0x7f7cedf150ba > [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206 > [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c > [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- > [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry > [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) > [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 > [71814.128078] BTRFS error (device dm-0): open_ctree failed > > This happens because the log has inode reference items for both inode 258 > (the first file we created) and inode 259 (the second file created), and > when processing the reference item for inode 258, we replace the > corresponding item in the subvolume tree (which has two names, "foo" and > "bar") witht he one in the log (which only has one name, "foo") without > removing the corresponding dir index keys from the parent directory. > Later, when processing the inode reference item for inode 259, which has > a name of "bar" associated to it, we notice that dir index entries exist > for that name and for a different inode, so we attempt to unlink that > name, which fails because the inode reference item for inode 258 no longer > has the name "bar" associated to it, making a call to btrfs_unlink_inode() > fail with a -ENOENT error. > > Fix this by unlinking all the names in an inode reference item from a > subvolume tree that are not present in the inode reference item found in > the log tree, before overwriting it with the item from the log tree. Since the order during replaying is INODE_ITEM then DIR_INDEX then other types, in this particular case, can we fix the problem by also logging the parent so that we have the correct DIR_INDEX? With DIR_INDEX, the problem would be fixed simpler, wouldn't it? Thanks, -liubo > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/ctree.h | 5 ++- > fs/btrfs/inode-item.c | 44 ++++++++++++-------- > fs/btrfs/tree-log.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 139 insertions(+), 22 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 1a462ab85c49..5d33478bc704 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, > u64 inode_objectid, u64 ref_objectid, int ins_len, > int cow); > > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, > + const char *name, > + int name_len, struct btrfs_inode_ref **ref_ret); > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, > u64 ref_objectid, const char *name, > int name_len, > struct btrfs_inode_extref **extref_ret); > diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c > index 39c968f80157..65e1a76bf755 100644 > --- a/fs/btrfs/inode-item.c > +++ b/fs/btrfs/inode-item.c > @@ -22,10 +22,10 @@ > #include "transaction.h" > #include "print-tree.h" > > -static int find_name_in_backref(struct btrfs_path *path, const char *name, > - int name_len, struct btrfs_inode_ref **ref_ret) > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, > + const char *name, > + int name_len, struct btrfs_inode_ref **ref_ret) > { > - struct extent_buffer *leaf; > struct btrfs_inode_ref *ref; > unsigned long ptr; > unsigned long name_ptr; > @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, > u32 cur_offset = 0; > int len; > > - leaf = path->nodes[0]; > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); > + item_size = btrfs_item_size_nr(leaf, slot); > + ptr = btrfs_item_ptr_offset(leaf, slot); > while (cur_offset < item_size) { > ref = (struct btrfs_inode_ref *)(ptr + cur_offset); > len = btrfs_inode_ref_name_len(leaf, ref); > @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, > if (len != name_len) > continue; > if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) { > - *ref_ret = ref; > + if (ref_ret) > + *ref_ret = ref; > return 1; > } > } > return 0; > } > > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, > + u64 ref_objectid, > const char *name, int name_len, > struct btrfs_inode_extref **extref_ret) > { > - struct extent_buffer *leaf; > struct btrfs_inode_extref *extref; > unsigned long ptr; > unsigned long name_ptr; > @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, > u32 cur_offset = 0; > int ref_name_len; > > - leaf = path->nodes[0]; > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); > + item_size = btrfs_item_size_nr(leaf, slot); > + ptr = btrfs_item_ptr_offset(leaf, slot); > > /* > * Search all extended backrefs in this item. We're only > @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, > return ERR_PTR(ret); > if (ret > 0) > return NULL; > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, &extref)) > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], > + ref_objectid, name, name_len, > + &extref)) > return NULL; > return extref; > } > @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, > * This should always succeed so error here will make the FS > * readonly. > */ > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], > + ref_objectid, > name, name_len, &extref)) { > btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL); > ret = -EROFS; > @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, > } else if (ret < 0) { > goto out; > } > - if (!find_name_in_backref(path, name, name_len, &ref)) { > + if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], > + name, name_len, &ref)) { > ret = -ENOENT; > search_ext_refs = 1; > goto out; > @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans, > ret = btrfs_insert_empty_item(trans, root, path, &key, > ins_len); > if (ret == -EEXIST) { > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, > + if (btrfs_find_name_in_ext_backref(path->nodes[0], > + path->slots[0], > + ref_objectid, > name, name_len, NULL)) > goto out; > > @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, > if (ret == -EEXIST) { > u32 old_size; > > - if (find_name_in_backref(path, name, name_len, &ref)) > + if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0], > + name, name_len, &ref)) > goto out; > > old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]); > @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, > ret = 0; > } else if (ret < 0) { > if (ret == -EOVERFLOW) { > - if (find_name_in_backref(path, name, name_len, &ref)) > + if (btrfs_find_name_in_backref(path->nodes[0], > + path->slots[0], > + name, name_len, &ref)) > ret = -EEXIST; > else > ret = -EMLINK; > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 411a022489e4..fd573816f461 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root *log, > ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]); > > if (key->type == BTRFS_INODE_EXTREF_KEY) { > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, > + if (btrfs_find_name_in_ext_backref(path->nodes[0], > + path->slots[0], > + ref_objectid, > name, namelen, NULL)) > match = 1; > > @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, > read_extent_buffer(eb, *name, (unsigned long)&extref->name, > *namelen); > > - *index = btrfs_inode_extref_index(eb, extref); > + if (index) > + *index = btrfs_inode_extref_index(eb, extref); > if (parent_objectid) > *parent_objectid = btrfs_inode_extref_parent(eb, extref); > > @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, > > read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen); > > - *index = btrfs_inode_ref_index(eb, ref); > + if (index) > + *index = btrfs_inode_ref_index(eb, ref); > > return 0; > } > > /* > + * Take an inode reference item from the log tree and iterate all names from the > + * inode reference item in the subvolume tree with the same key (if it exists). > + * For any name that is not in the inode reference item from the log tree, do a > + * proper unlink of that name (that is, remove its entry from the inode > + * reference item and both dir index keys). > + */ > +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, > + struct btrfs_root *root, > + struct btrfs_path *path, > + struct btrfs_inode *inode, > + struct extent_buffer *log_eb, > + int log_slot, > + struct btrfs_key *key) > +{ > + int ret; > + unsigned long ref_ptr; > + unsigned long ref_end; > + struct extent_buffer *eb; > + > +again: > + btrfs_release_path(path); > + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); > + if (ret > 0) { > + ret = 0; > + goto out; > + } > + if (ret < 0) > + goto out; > + > + eb = path->nodes[0]; > + ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]); > + ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]); > + while (ref_ptr < ref_end) { > + char *name = NULL; > + int namelen; > + u64 parent_id; > + > + if (key->type == BTRFS_INODE_EXTREF_KEY) { > + ret = extref_get_fields(eb, ref_ptr, &namelen, &name, > + NULL, &parent_id); > + } else { > + parent_id = key->offset; > + ret = ref_get_fields(eb, ref_ptr, &namelen, &name, > + NULL); > + } > + if (ret) > + goto out; > + > + if (key->type == BTRFS_INODE_EXTREF_KEY) > + ret = btrfs_find_name_in_ext_backref(log_eb, log_slot, > + parent_id, name, > + namelen, NULL); > + else > + ret = btrfs_find_name_in_backref(log_eb, log_slot, name, > + namelen, NULL); > + > + if (!ret) { > + struct inode *dir; > + > + btrfs_release_path(path); > + dir = read_one_inode(root, parent_id); > + if (!dir) { > + ret = -ENOENT; > + kfree(name); > + goto out; > + } > + ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), > + inode, name, namelen); > + kfree(name); > + iput(dir); > + if (ret) > + goto out; > + goto again; > + } > + > + kfree(name); > + ref_ptr += namelen; > + if (key->type == BTRFS_INODE_EXTREF_KEY) > + ref_ptr += sizeof(struct btrfs_inode_extref); > + else > + ref_ptr += sizeof(struct btrfs_inode_ref); > + } > + ret = 0; > + out: > + btrfs_release_path(path); > + return ret; > +} > + > +/* > * replay one inode back reference item found in the log tree. > * eb, slot and key refer to the buffer and key found in the log tree. > * root is the destination we are replaying into, and path is for temp > @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, > } > } > > + /* > + * Before we overwrite the inode reference item in the subvolume tree > + * with the item from the log tree, we must unlink all names from the > + * parent directory that are in the subvolume's tree inode reference > + * item, otherwise we end up with an inconsistent subvolume tree where > + * dir index entries exist for a name but there is no inode reference > + * item with the same name. > + */ > + ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot, > + key); > + if (ret) > + goto out; > + > /* finally write the back reference in the inode */ > ret = overwrite_item(trans, root, path, eb, slot, key); > out: > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote: > On Wed, Feb 28, 2018 at 03:56:10PM +0000, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > If we have a file with 2 (or more) hard links in the same directory, > > remove one of the hard links, create a new file (or link an existing file) > > in the same directory with the name of the removed hard link, and then > > finally fsync the new file, we end up with a log that fails to replay, > > causing a mount failure. > > > > Example: > > > > $ mkfs.btrfs -f /dev/sdb > > $ mount /dev/sdb /mnt > > > > $ mkdir /mnt/testdir > > $ touch /mnt/testdir/foo > > $ ln /mnt/testdir/foo /mnt/testdir/bar > > > > $ sync > > > > $ unlink /mnt/testdir/bar > > $ touch /mnt/testdir/bar So here, "unlink & touch" similar to rename, therefore we could also be conservative and set /mnt/testdir 's last_unlink_trans to force to commit transaction. Thanks, -liubo > > $ xfs_io -c "fsync" /mnt/testdir/bar > > > > <power failure> > > > > $ mount /dev/sdb /mnt > > mount: mount(2) failed: /mnt: No such file or directory > > > > When replaying the log, for that example, we also see the following in > > dmesg/syslog: > > > > [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 > > [71813.674204] ------------[ cut here ]------------ > > [71813.675694] BTRFS: Transaction aborted (error -2) > > [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] > > [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] > > [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1 > > [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] > > [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286 > > [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001 > > [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff > > [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001 > > [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8 > > [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101 > > [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 > > [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0 > > [71813.679669] Call Trace: > > [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] > > [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] > > [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] > > [71813.679669] ? __lock_is_held+0x39/0x71 > > [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] > > [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] > > [71813.679669] ? rcu_read_unlock+0x3a/0x57 > > [71813.679669] ? __lock_is_held+0x39/0x71 > > [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] > > [71813.679669] walk_log_tree+0xad/0x188 [btrfs] > > [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] > > [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] > > [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] > > [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] > > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 > > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 > > [71813.679669] ? mount_fs+0x64/0x10b > > [71813.679669] mount_fs+0x64/0x10b > > [71813.679669] vfs_kern_mount+0x68/0xce > > [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] > > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 > > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 > > [71813.679669] ? mount_fs+0x64/0x10b > > [71813.679669] mount_fs+0x64/0x10b > > [71813.679669] vfs_kern_mount+0x68/0xce > > [71813.679669] do_mount+0x6e5/0x973 > > [71813.679669] ? memdup_user+0x3e/0x5c > > [71813.679669] SyS_mount+0x72/0x98 > > [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b > > [71813.679669] RIP: 0033:0x7f7cedf150ba > > [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206 > > [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c > > [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- > > [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry > > [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) > > [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 > > [71814.128078] BTRFS error (device dm-0): open_ctree failed > > > > This happens because the log has inode reference items for both inode 258 > > (the first file we created) and inode 259 (the second file created), and > > when processing the reference item for inode 258, we replace the > > corresponding item in the subvolume tree (which has two names, "foo" and > > "bar") witht he one in the log (which only has one name, "foo") without > > removing the corresponding dir index keys from the parent directory. > > Later, when processing the inode reference item for inode 259, which has > > a name of "bar" associated to it, we notice that dir index entries exist > > for that name and for a different inode, so we attempt to unlink that > > name, which fails because the inode reference item for inode 258 no longer > > has the name "bar" associated to it, making a call to btrfs_unlink_inode() > > fail with a -ENOENT error. > > > > Fix this by unlinking all the names in an inode reference item from a > > subvolume tree that are not present in the inode reference item found in > > the log tree, before overwriting it with the item from the log tree. > > Since the order during replaying is INODE_ITEM then DIR_INDEX then > other types, in this particular case, can we fix the problem by also > logging the parent so that we have the correct DIR_INDEX? > > With DIR_INDEX, the problem would be fixed simpler, wouldn't it? > > Thanks, > > -liubo > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > --- > > fs/btrfs/ctree.h | 5 ++- > > fs/btrfs/inode-item.c | 44 ++++++++++++-------- > > fs/btrfs/tree-log.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++-- > > 3 files changed, 139 insertions(+), 22 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index 1a462ab85c49..5d33478bc704 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, > > u64 inode_objectid, u64 ref_objectid, int ins_len, > > int cow); > > > > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, > > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, > > + const char *name, > > + int name_len, struct btrfs_inode_ref **ref_ret); > > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, > > u64 ref_objectid, const char *name, > > int name_len, > > struct btrfs_inode_extref **extref_ret); > > diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c > > index 39c968f80157..65e1a76bf755 100644 > > --- a/fs/btrfs/inode-item.c > > +++ b/fs/btrfs/inode-item.c > > @@ -22,10 +22,10 @@ > > #include "transaction.h" > > #include "print-tree.h" > > > > -static int find_name_in_backref(struct btrfs_path *path, const char *name, > > - int name_len, struct btrfs_inode_ref **ref_ret) > > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, > > + const char *name, > > + int name_len, struct btrfs_inode_ref **ref_ret) > > { > > - struct extent_buffer *leaf; > > struct btrfs_inode_ref *ref; > > unsigned long ptr; > > unsigned long name_ptr; > > @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, > > u32 cur_offset = 0; > > int len; > > > > - leaf = path->nodes[0]; > > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); > > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); > > + item_size = btrfs_item_size_nr(leaf, slot); > > + ptr = btrfs_item_ptr_offset(leaf, slot); > > while (cur_offset < item_size) { > > ref = (struct btrfs_inode_ref *)(ptr + cur_offset); > > len = btrfs_inode_ref_name_len(leaf, ref); > > @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, > > if (len != name_len) > > continue; > > if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) { > > - *ref_ret = ref; > > + if (ref_ret) > > + *ref_ret = ref; > > return 1; > > } > > } > > return 0; > > } > > > > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, > > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, > > + u64 ref_objectid, > > const char *name, int name_len, > > struct btrfs_inode_extref **extref_ret) > > { > > - struct extent_buffer *leaf; > > struct btrfs_inode_extref *extref; > > unsigned long ptr; > > unsigned long name_ptr; > > @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, > > u32 cur_offset = 0; > > int ref_name_len; > > > > - leaf = path->nodes[0]; > > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); > > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); > > + item_size = btrfs_item_size_nr(leaf, slot); > > + ptr = btrfs_item_ptr_offset(leaf, slot); > > > > /* > > * Search all extended backrefs in this item. We're only > > @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, > > return ERR_PTR(ret); > > if (ret > 0) > > return NULL; > > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, &extref)) > > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], > > + ref_objectid, name, name_len, > > + &extref)) > > return NULL; > > return extref; > > } > > @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, > > * This should always succeed so error here will make the FS > > * readonly. > > */ > > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, > > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], > > + ref_objectid, > > name, name_len, &extref)) { > > btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL); > > ret = -EROFS; > > @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, > > } else if (ret < 0) { > > goto out; > > } > > - if (!find_name_in_backref(path, name, name_len, &ref)) { > > + if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], > > + name, name_len, &ref)) { > > ret = -ENOENT; > > search_ext_refs = 1; > > goto out; > > @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans, > > ret = btrfs_insert_empty_item(trans, root, path, &key, > > ins_len); > > if (ret == -EEXIST) { > > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, > > + if (btrfs_find_name_in_ext_backref(path->nodes[0], > > + path->slots[0], > > + ref_objectid, > > name, name_len, NULL)) > > goto out; > > > > @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, > > if (ret == -EEXIST) { > > u32 old_size; > > > > - if (find_name_in_backref(path, name, name_len, &ref)) > > + if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0], > > + name, name_len, &ref)) > > goto out; > > > > old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]); > > @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, > > ret = 0; > > } else if (ret < 0) { > > if (ret == -EOVERFLOW) { > > - if (find_name_in_backref(path, name, name_len, &ref)) > > + if (btrfs_find_name_in_backref(path->nodes[0], > > + path->slots[0], > > + name, name_len, &ref)) > > ret = -EEXIST; > > else > > ret = -EMLINK; > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > > index 411a022489e4..fd573816f461 100644 > > --- a/fs/btrfs/tree-log.c > > +++ b/fs/btrfs/tree-log.c > > @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root *log, > > ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]); > > > > if (key->type == BTRFS_INODE_EXTREF_KEY) { > > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, > > + if (btrfs_find_name_in_ext_backref(path->nodes[0], > > + path->slots[0], > > + ref_objectid, > > name, namelen, NULL)) > > match = 1; > > > > @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, > > read_extent_buffer(eb, *name, (unsigned long)&extref->name, > > *namelen); > > > > - *index = btrfs_inode_extref_index(eb, extref); > > + if (index) > > + *index = btrfs_inode_extref_index(eb, extref); > > if (parent_objectid) > > *parent_objectid = btrfs_inode_extref_parent(eb, extref); > > > > @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, > > > > read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen); > > > > - *index = btrfs_inode_ref_index(eb, ref); > > + if (index) > > + *index = btrfs_inode_ref_index(eb, ref); > > > > return 0; > > } > > > > /* > > + * Take an inode reference item from the log tree and iterate all names from the > > + * inode reference item in the subvolume tree with the same key (if it exists). > > + * For any name that is not in the inode reference item from the log tree, do a > > + * proper unlink of that name (that is, remove its entry from the inode > > + * reference item and both dir index keys). > > + */ > > +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, > > + struct btrfs_root *root, > > + struct btrfs_path *path, > > + struct btrfs_inode *inode, > > + struct extent_buffer *log_eb, > > + int log_slot, > > + struct btrfs_key *key) > > +{ > > + int ret; > > + unsigned long ref_ptr; > > + unsigned long ref_end; > > + struct extent_buffer *eb; > > + > > +again: > > + btrfs_release_path(path); > > + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); > > + if (ret > 0) { > > + ret = 0; > > + goto out; > > + } > > + if (ret < 0) > > + goto out; > > + > > + eb = path->nodes[0]; > > + ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]); > > + ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]); > > + while (ref_ptr < ref_end) { > > + char *name = NULL; > > + int namelen; > > + u64 parent_id; > > + > > + if (key->type == BTRFS_INODE_EXTREF_KEY) { > > + ret = extref_get_fields(eb, ref_ptr, &namelen, &name, > > + NULL, &parent_id); > > + } else { > > + parent_id = key->offset; > > + ret = ref_get_fields(eb, ref_ptr, &namelen, &name, > > + NULL); > > + } > > + if (ret) > > + goto out; > > + > > + if (key->type == BTRFS_INODE_EXTREF_KEY) > > + ret = btrfs_find_name_in_ext_backref(log_eb, log_slot, > > + parent_id, name, > > + namelen, NULL); > > + else > > + ret = btrfs_find_name_in_backref(log_eb, log_slot, name, > > + namelen, NULL); > > + > > + if (!ret) { > > + struct inode *dir; > > + > > + btrfs_release_path(path); > > + dir = read_one_inode(root, parent_id); > > + if (!dir) { > > + ret = -ENOENT; > > + kfree(name); > > + goto out; > > + } > > + ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), > > + inode, name, namelen); > > + kfree(name); > > + iput(dir); > > + if (ret) > > + goto out; > > + goto again; > > + } > > + > > + kfree(name); > > + ref_ptr += namelen; > > + if (key->type == BTRFS_INODE_EXTREF_KEY) > > + ref_ptr += sizeof(struct btrfs_inode_extref); > > + else > > + ref_ptr += sizeof(struct btrfs_inode_ref); > > + } > > + ret = 0; > > + out: > > + btrfs_release_path(path); > > + return ret; > > +} > > + > > +/* > > * replay one inode back reference item found in the log tree. > > * eb, slot and key refer to the buffer and key found in the log tree. > > * root is the destination we are replaying into, and path is for temp > > @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, > > } > > } > > > > + /* > > + * Before we overwrite the inode reference item in the subvolume tree > > + * with the item from the log tree, we must unlink all names from the > > + * parent directory that are in the subvolume's tree inode reference > > + * item, otherwise we end up with an inconsistent subvolume tree where > > + * dir index entries exist for a name but there is no inode reference > > + * item with the same name. > > + */ > > + ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot, > > + key); > > + if (ret) > > + goto out; > > + > > /* finally write the back reference in the inode */ > > ret = overwrite_item(trans, root, path, eb, slot, key); > > out: > > -- > > 2.11.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 2, 2018 at 6:29 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > On Wed, Feb 28, 2018 at 03:56:10PM +0000, fdmanana@kernel.org wrote: >> From: Filipe Manana <fdmanana@suse.com> >> >> If we have a file with 2 (or more) hard links in the same directory, >> remove one of the hard links, create a new file (or link an existing file) >> in the same directory with the name of the removed hard link, and then >> finally fsync the new file, we end up with a log that fails to replay, >> causing a mount failure. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ mkdir /mnt/testdir >> $ touch /mnt/testdir/foo >> $ ln /mnt/testdir/foo /mnt/testdir/bar >> >> $ sync >> >> $ unlink /mnt/testdir/bar >> $ touch /mnt/testdir/bar >> $ xfs_io -c "fsync" /mnt/testdir/bar >> >> <power failure> >> >> $ mount /dev/sdb /mnt >> mount: mount(2) failed: /mnt: No such file or directory >> >> When replaying the log, for that example, we also see the following in >> dmesg/syslog: >> >> [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 >> [71813.674204] ------------[ cut here ]------------ >> [71813.675694] BTRFS: Transaction aborted (error -2) >> [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] >> [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] >> [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1 >> [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 >> [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] >> [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286 >> [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001 >> [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff >> [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001 >> [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8 >> [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101 >> [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 >> [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0 >> [71813.679669] Call Trace: >> [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] >> [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] >> [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] >> [71813.679669] ? __lock_is_held+0x39/0x71 >> [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] >> [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] >> [71813.679669] ? rcu_read_unlock+0x3a/0x57 >> [71813.679669] ? __lock_is_held+0x39/0x71 >> [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] >> [71813.679669] walk_log_tree+0xad/0x188 [btrfs] >> [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] >> [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] >> [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] >> [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] >> [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 >> [71813.679669] ? __lockdep_init_map+0x176/0x1c2 >> [71813.679669] ? mount_fs+0x64/0x10b >> [71813.679669] mount_fs+0x64/0x10b >> [71813.679669] vfs_kern_mount+0x68/0xce >> [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] >> [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 >> [71813.679669] ? __lockdep_init_map+0x176/0x1c2 >> [71813.679669] ? mount_fs+0x64/0x10b >> [71813.679669] mount_fs+0x64/0x10b >> [71813.679669] vfs_kern_mount+0x68/0xce >> [71813.679669] do_mount+0x6e5/0x973 >> [71813.679669] ? memdup_user+0x3e/0x5c >> [71813.679669] SyS_mount+0x72/0x98 >> [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b >> [71813.679669] RIP: 0033:0x7f7cedf150ba >> [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206 >> [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c >> [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- >> [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry >> [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) >> [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 >> [71814.128078] BTRFS error (device dm-0): open_ctree failed >> >> This happens because the log has inode reference items for both inode 258 >> (the first file we created) and inode 259 (the second file created), and >> when processing the reference item for inode 258, we replace the >> corresponding item in the subvolume tree (which has two names, "foo" and >> "bar") witht he one in the log (which only has one name, "foo") without >> removing the corresponding dir index keys from the parent directory. >> Later, when processing the inode reference item for inode 259, which has >> a name of "bar" associated to it, we notice that dir index entries exist >> for that name and for a different inode, so we attempt to unlink that >> name, which fails because the inode reference item for inode 258 no longer >> has the name "bar" associated to it, making a call to btrfs_unlink_inode() >> fail with a -ENOENT error. >> >> Fix this by unlinking all the names in an inode reference item from a >> subvolume tree that are not present in the inode reference item found in >> the log tree, before overwriting it with the item from the log tree. > > Since the order during replaying is INODE_ITEM then DIR_INDEX then > other types, in this particular case, can we fix the problem by also > logging the parent so that we have the correct DIR_INDEX? > > With DIR_INDEX, the problem would be fixed simpler, wouldn't it? It wouldn't work when the parent directory has a higher inode number then the child. Plus for large directories, that would take a performance penalty on fsync'ing any file inside them (existing or new). thanks > > Thanks, > > -liubo >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> --- >> fs/btrfs/ctree.h | 5 ++- >> fs/btrfs/inode-item.c | 44 ++++++++++++-------- >> fs/btrfs/tree-log.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++-- >> 3 files changed, 139 insertions(+), 22 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index 1a462ab85c49..5d33478bc704 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, >> u64 inode_objectid, u64 ref_objectid, int ins_len, >> int cow); >> >> -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, >> +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, >> + const char *name, >> + int name_len, struct btrfs_inode_ref **ref_ret); >> +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, >> u64 ref_objectid, const char *name, >> int name_len, >> struct btrfs_inode_extref **extref_ret); >> diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c >> index 39c968f80157..65e1a76bf755 100644 >> --- a/fs/btrfs/inode-item.c >> +++ b/fs/btrfs/inode-item.c >> @@ -22,10 +22,10 @@ >> #include "transaction.h" >> #include "print-tree.h" >> >> -static int find_name_in_backref(struct btrfs_path *path, const char *name, >> - int name_len, struct btrfs_inode_ref **ref_ret) >> +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, >> + const char *name, >> + int name_len, struct btrfs_inode_ref **ref_ret) >> { >> - struct extent_buffer *leaf; >> struct btrfs_inode_ref *ref; >> unsigned long ptr; >> unsigned long name_ptr; >> @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, >> u32 cur_offset = 0; >> int len; >> >> - leaf = path->nodes[0]; >> - item_size = btrfs_item_size_nr(leaf, path->slots[0]); >> - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); >> + item_size = btrfs_item_size_nr(leaf, slot); >> + ptr = btrfs_item_ptr_offset(leaf, slot); >> while (cur_offset < item_size) { >> ref = (struct btrfs_inode_ref *)(ptr + cur_offset); >> len = btrfs_inode_ref_name_len(leaf, ref); >> @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, >> if (len != name_len) >> continue; >> if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) { >> - *ref_ret = ref; >> + if (ref_ret) >> + *ref_ret = ref; >> return 1; >> } >> } >> return 0; >> } >> >> -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, >> +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, >> + u64 ref_objectid, >> const char *name, int name_len, >> struct btrfs_inode_extref **extref_ret) >> { >> - struct extent_buffer *leaf; >> struct btrfs_inode_extref *extref; >> unsigned long ptr; >> unsigned long name_ptr; >> @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, >> u32 cur_offset = 0; >> int ref_name_len; >> >> - leaf = path->nodes[0]; >> - item_size = btrfs_item_size_nr(leaf, path->slots[0]); >> - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); >> + item_size = btrfs_item_size_nr(leaf, slot); >> + ptr = btrfs_item_ptr_offset(leaf, slot); >> >> /* >> * Search all extended backrefs in this item. We're only >> @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, >> return ERR_PTR(ret); >> if (ret > 0) >> return NULL; >> - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, &extref)) >> + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], >> + ref_objectid, name, name_len, >> + &extref)) >> return NULL; >> return extref; >> } >> @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, >> * This should always succeed so error here will make the FS >> * readonly. >> */ >> - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, >> + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], >> + ref_objectid, >> name, name_len, &extref)) { >> btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL); >> ret = -EROFS; >> @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, >> } else if (ret < 0) { >> goto out; >> } >> - if (!find_name_in_backref(path, name, name_len, &ref)) { >> + if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], >> + name, name_len, &ref)) { >> ret = -ENOENT; >> search_ext_refs = 1; >> goto out; >> @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans, >> ret = btrfs_insert_empty_item(trans, root, path, &key, >> ins_len); >> if (ret == -EEXIST) { >> - if (btrfs_find_name_in_ext_backref(path, ref_objectid, >> + if (btrfs_find_name_in_ext_backref(path->nodes[0], >> + path->slots[0], >> + ref_objectid, >> name, name_len, NULL)) >> goto out; >> >> @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, >> if (ret == -EEXIST) { >> u32 old_size; >> >> - if (find_name_in_backref(path, name, name_len, &ref)) >> + if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0], >> + name, name_len, &ref)) >> goto out; >> >> old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]); >> @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, >> ret = 0; >> } else if (ret < 0) { >> if (ret == -EOVERFLOW) { >> - if (find_name_in_backref(path, name, name_len, &ref)) >> + if (btrfs_find_name_in_backref(path->nodes[0], >> + path->slots[0], >> + name, name_len, &ref)) >> ret = -EEXIST; >> else >> ret = -EMLINK; >> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c >> index 411a022489e4..fd573816f461 100644 >> --- a/fs/btrfs/tree-log.c >> +++ b/fs/btrfs/tree-log.c >> @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root *log, >> ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]); >> >> if (key->type == BTRFS_INODE_EXTREF_KEY) { >> - if (btrfs_find_name_in_ext_backref(path, ref_objectid, >> + if (btrfs_find_name_in_ext_backref(path->nodes[0], >> + path->slots[0], >> + ref_objectid, >> name, namelen, NULL)) >> match = 1; >> >> @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, >> read_extent_buffer(eb, *name, (unsigned long)&extref->name, >> *namelen); >> >> - *index = btrfs_inode_extref_index(eb, extref); >> + if (index) >> + *index = btrfs_inode_extref_index(eb, extref); >> if (parent_objectid) >> *parent_objectid = btrfs_inode_extref_parent(eb, extref); >> >> @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, >> >> read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen); >> >> - *index = btrfs_inode_ref_index(eb, ref); >> + if (index) >> + *index = btrfs_inode_ref_index(eb, ref); >> >> return 0; >> } >> >> /* >> + * Take an inode reference item from the log tree and iterate all names from the >> + * inode reference item in the subvolume tree with the same key (if it exists). >> + * For any name that is not in the inode reference item from the log tree, do a >> + * proper unlink of that name (that is, remove its entry from the inode >> + * reference item and both dir index keys). >> + */ >> +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, >> + struct btrfs_root *root, >> + struct btrfs_path *path, >> + struct btrfs_inode *inode, >> + struct extent_buffer *log_eb, >> + int log_slot, >> + struct btrfs_key *key) >> +{ >> + int ret; >> + unsigned long ref_ptr; >> + unsigned long ref_end; >> + struct extent_buffer *eb; >> + >> +again: >> + btrfs_release_path(path); >> + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); >> + if (ret > 0) { >> + ret = 0; >> + goto out; >> + } >> + if (ret < 0) >> + goto out; >> + >> + eb = path->nodes[0]; >> + ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]); >> + ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]); >> + while (ref_ptr < ref_end) { >> + char *name = NULL; >> + int namelen; >> + u64 parent_id; >> + >> + if (key->type == BTRFS_INODE_EXTREF_KEY) { >> + ret = extref_get_fields(eb, ref_ptr, &namelen, &name, >> + NULL, &parent_id); >> + } else { >> + parent_id = key->offset; >> + ret = ref_get_fields(eb, ref_ptr, &namelen, &name, >> + NULL); >> + } >> + if (ret) >> + goto out; >> + >> + if (key->type == BTRFS_INODE_EXTREF_KEY) >> + ret = btrfs_find_name_in_ext_backref(log_eb, log_slot, >> + parent_id, name, >> + namelen, NULL); >> + else >> + ret = btrfs_find_name_in_backref(log_eb, log_slot, name, >> + namelen, NULL); >> + >> + if (!ret) { >> + struct inode *dir; >> + >> + btrfs_release_path(path); >> + dir = read_one_inode(root, parent_id); >> + if (!dir) { >> + ret = -ENOENT; >> + kfree(name); >> + goto out; >> + } >> + ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), >> + inode, name, namelen); >> + kfree(name); >> + iput(dir); >> + if (ret) >> + goto out; >> + goto again; >> + } >> + >> + kfree(name); >> + ref_ptr += namelen; >> + if (key->type == BTRFS_INODE_EXTREF_KEY) >> + ref_ptr += sizeof(struct btrfs_inode_extref); >> + else >> + ref_ptr += sizeof(struct btrfs_inode_ref); >> + } >> + ret = 0; >> + out: >> + btrfs_release_path(path); >> + return ret; >> +} >> + >> +/* >> * replay one inode back reference item found in the log tree. >> * eb, slot and key refer to the buffer and key found in the log tree. >> * root is the destination we are replaying into, and path is for temp >> @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, >> } >> } >> >> + /* >> + * Before we overwrite the inode reference item in the subvolume tree >> + * with the item from the log tree, we must unlink all names from the >> + * parent directory that are in the subvolume's tree inode reference >> + * item, otherwise we end up with an inconsistent subvolume tree where >> + * dir index entries exist for a name but there is no inode reference >> + * item with the same name. >> + */ >> + ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot, >> + key); >> + if (ret) >> + goto out; >> + >> /* finally write the back reference in the inode */ >> ret = overwrite_item(trans, root, path, eb, slot, key); >> out: >> -- >> 2.11.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 2, 2018 at 7:00 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote: >> On Wed, Feb 28, 2018 at 03:56:10PM +0000, fdmanana@kernel.org wrote: >> > From: Filipe Manana <fdmanana@suse.com> >> > >> > If we have a file with 2 (or more) hard links in the same directory, >> > remove one of the hard links, create a new file (or link an existing file) >> > in the same directory with the name of the removed hard link, and then >> > finally fsync the new file, we end up with a log that fails to replay, >> > causing a mount failure. >> > >> > Example: >> > >> > $ mkfs.btrfs -f /dev/sdb >> > $ mount /dev/sdb /mnt >> > >> > $ mkdir /mnt/testdir >> > $ touch /mnt/testdir/foo >> > $ ln /mnt/testdir/foo /mnt/testdir/bar >> > >> > $ sync >> > >> > $ unlink /mnt/testdir/bar >> > $ touch /mnt/testdir/bar > > So here, "unlink & touch" similar to rename, therefore we could also > be conservative and set /mnt/testdir 's last_unlink_trans to force to > commit transaction. That would cause any fsync on any file inside the directory to result in a full transaction commit. We had in the past more "exotic" scenarios with renames and creating new files with the same old name where the initial solution was just that, forcing a full commit [1], just to find out later that it caused big performance drop on real workloads [2]. That's why I'm avoiding it, even because fsync'ing any file in the directory (existing or new) isn't uncommon at all. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56f23fdbb600e6087db7b009775b95ce07cc3195 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=44f714dae50a2e795d3268a6831762aa6fa54f55 thanks > > Thanks, > > -liubo > >> > $ xfs_io -c "fsync" /mnt/testdir/bar >> > >> > <power failure> >> > >> > $ mount /dev/sdb /mnt >> > mount: mount(2) failed: /mnt: No such file or directory >> > >> > When replaying the log, for that example, we also see the following in >> > dmesg/syslog: >> > >> > [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 >> > [71813.674204] ------------[ cut here ]------------ >> > [71813.675694] BTRFS: Transaction aborted (error -2) >> > [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] >> > [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] >> > [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1 >> > [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 >> > [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] >> > [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286 >> > [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001 >> > [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff >> > [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001 >> > [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8 >> > [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101 >> > [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 >> > [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0 >> > [71813.679669] Call Trace: >> > [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] >> > [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] >> > [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] >> > [71813.679669] ? __lock_is_held+0x39/0x71 >> > [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] >> > [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] >> > [71813.679669] ? rcu_read_unlock+0x3a/0x57 >> > [71813.679669] ? __lock_is_held+0x39/0x71 >> > [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] >> > [71813.679669] walk_log_tree+0xad/0x188 [btrfs] >> > [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] >> > [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] >> > [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] >> > [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] >> > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 >> > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 >> > [71813.679669] ? mount_fs+0x64/0x10b >> > [71813.679669] mount_fs+0x64/0x10b >> > [71813.679669] vfs_kern_mount+0x68/0xce >> > [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] >> > [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 >> > [71813.679669] ? __lockdep_init_map+0x176/0x1c2 >> > [71813.679669] ? mount_fs+0x64/0x10b >> > [71813.679669] mount_fs+0x64/0x10b >> > [71813.679669] vfs_kern_mount+0x68/0xce >> > [71813.679669] do_mount+0x6e5/0x973 >> > [71813.679669] ? memdup_user+0x3e/0x5c >> > [71813.679669] SyS_mount+0x72/0x98 >> > [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b >> > [71813.679669] RIP: 0033:0x7f7cedf150ba >> > [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206 >> > [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c >> > [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- >> > [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry >> > [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) >> > [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 >> > [71814.128078] BTRFS error (device dm-0): open_ctree failed >> > >> > This happens because the log has inode reference items for both inode 258 >> > (the first file we created) and inode 259 (the second file created), and >> > when processing the reference item for inode 258, we replace the >> > corresponding item in the subvolume tree (which has two names, "foo" and >> > "bar") witht he one in the log (which only has one name, "foo") without >> > removing the corresponding dir index keys from the parent directory. >> > Later, when processing the inode reference item for inode 259, which has >> > a name of "bar" associated to it, we notice that dir index entries exist >> > for that name and for a different inode, so we attempt to unlink that >> > name, which fails because the inode reference item for inode 258 no longer >> > has the name "bar" associated to it, making a call to btrfs_unlink_inode() >> > fail with a -ENOENT error. >> > >> > Fix this by unlinking all the names in an inode reference item from a >> > subvolume tree that are not present in the inode reference item found in >> > the log tree, before overwriting it with the item from the log tree. >> >> Since the order during replaying is INODE_ITEM then DIR_INDEX then >> other types, in this particular case, can we fix the problem by also >> logging the parent so that we have the correct DIR_INDEX? >> >> With DIR_INDEX, the problem would be fixed simpler, wouldn't it? >> >> Thanks, >> >> -liubo >> > >> > Signed-off-by: Filipe Manana <fdmanana@suse.com> >> > --- >> > fs/btrfs/ctree.h | 5 ++- >> > fs/btrfs/inode-item.c | 44 ++++++++++++-------- >> > fs/btrfs/tree-log.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++-- >> > 3 files changed, 139 insertions(+), 22 deletions(-) >> > >> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> > index 1a462ab85c49..5d33478bc704 100644 >> > --- a/fs/btrfs/ctree.h >> > +++ b/fs/btrfs/ctree.h >> > @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, >> > u64 inode_objectid, u64 ref_objectid, int ins_len, >> > int cow); >> > >> > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, >> > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, >> > + const char *name, >> > + int name_len, struct btrfs_inode_ref **ref_ret); >> > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, >> > u64 ref_objectid, const char *name, >> > int name_len, >> > struct btrfs_inode_extref **extref_ret); >> > diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c >> > index 39c968f80157..65e1a76bf755 100644 >> > --- a/fs/btrfs/inode-item.c >> > +++ b/fs/btrfs/inode-item.c >> > @@ -22,10 +22,10 @@ >> > #include "transaction.h" >> > #include "print-tree.h" >> > >> > -static int find_name_in_backref(struct btrfs_path *path, const char *name, >> > - int name_len, struct btrfs_inode_ref **ref_ret) >> > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, >> > + const char *name, >> > + int name_len, struct btrfs_inode_ref **ref_ret) >> > { >> > - struct extent_buffer *leaf; >> > struct btrfs_inode_ref *ref; >> > unsigned long ptr; >> > unsigned long name_ptr; >> > @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, >> > u32 cur_offset = 0; >> > int len; >> > >> > - leaf = path->nodes[0]; >> > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); >> > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); >> > + item_size = btrfs_item_size_nr(leaf, slot); >> > + ptr = btrfs_item_ptr_offset(leaf, slot); >> > while (cur_offset < item_size) { >> > ref = (struct btrfs_inode_ref *)(ptr + cur_offset); >> > len = btrfs_inode_ref_name_len(leaf, ref); >> > @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, >> > if (len != name_len) >> > continue; >> > if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) { >> > - *ref_ret = ref; >> > + if (ref_ret) >> > + *ref_ret = ref; >> > return 1; >> > } >> > } >> > return 0; >> > } >> > >> > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, >> > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, >> > + u64 ref_objectid, >> > const char *name, int name_len, >> > struct btrfs_inode_extref **extref_ret) >> > { >> > - struct extent_buffer *leaf; >> > struct btrfs_inode_extref *extref; >> > unsigned long ptr; >> > unsigned long name_ptr; >> > @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, >> > u32 cur_offset = 0; >> > int ref_name_len; >> > >> > - leaf = path->nodes[0]; >> > - item_size = btrfs_item_size_nr(leaf, path->slots[0]); >> > - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); >> > + item_size = btrfs_item_size_nr(leaf, slot); >> > + ptr = btrfs_item_ptr_offset(leaf, slot); >> > >> > /* >> > * Search all extended backrefs in this item. We're only >> > @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, >> > return ERR_PTR(ret); >> > if (ret > 0) >> > return NULL; >> > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, &extref)) >> > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], >> > + ref_objectid, name, name_len, >> > + &extref)) >> > return NULL; >> > return extref; >> > } >> > @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, >> > * This should always succeed so error here will make the FS >> > * readonly. >> > */ >> > - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, >> > + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], >> > + ref_objectid, >> > name, name_len, &extref)) { >> > btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL); >> > ret = -EROFS; >> > @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, >> > } else if (ret < 0) { >> > goto out; >> > } >> > - if (!find_name_in_backref(path, name, name_len, &ref)) { >> > + if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], >> > + name, name_len, &ref)) { >> > ret = -ENOENT; >> > search_ext_refs = 1; >> > goto out; >> > @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans, >> > ret = btrfs_insert_empty_item(trans, root, path, &key, >> > ins_len); >> > if (ret == -EEXIST) { >> > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, >> > + if (btrfs_find_name_in_ext_backref(path->nodes[0], >> > + path->slots[0], >> > + ref_objectid, >> > name, name_len, NULL)) >> > goto out; >> > >> > @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, >> > if (ret == -EEXIST) { >> > u32 old_size; >> > >> > - if (find_name_in_backref(path, name, name_len, &ref)) >> > + if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0], >> > + name, name_len, &ref)) >> > goto out; >> > >> > old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]); >> > @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, >> > ret = 0; >> > } else if (ret < 0) { >> > if (ret == -EOVERFLOW) { >> > - if (find_name_in_backref(path, name, name_len, &ref)) >> > + if (btrfs_find_name_in_backref(path->nodes[0], >> > + path->slots[0], >> > + name, name_len, &ref)) >> > ret = -EEXIST; >> > else >> > ret = -EMLINK; >> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c >> > index 411a022489e4..fd573816f461 100644 >> > --- a/fs/btrfs/tree-log.c >> > +++ b/fs/btrfs/tree-log.c >> > @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root *log, >> > ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]); >> > >> > if (key->type == BTRFS_INODE_EXTREF_KEY) { >> > - if (btrfs_find_name_in_ext_backref(path, ref_objectid, >> > + if (btrfs_find_name_in_ext_backref(path->nodes[0], >> > + path->slots[0], >> > + ref_objectid, >> > name, namelen, NULL)) >> > match = 1; >> > >> > @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, >> > read_extent_buffer(eb, *name, (unsigned long)&extref->name, >> > *namelen); >> > >> > - *index = btrfs_inode_extref_index(eb, extref); >> > + if (index) >> > + *index = btrfs_inode_extref_index(eb, extref); >> > if (parent_objectid) >> > *parent_objectid = btrfs_inode_extref_parent(eb, extref); >> > >> > @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, >> > >> > read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen); >> > >> > - *index = btrfs_inode_ref_index(eb, ref); >> > + if (index) >> > + *index = btrfs_inode_ref_index(eb, ref); >> > >> > return 0; >> > } >> > >> > /* >> > + * Take an inode reference item from the log tree and iterate all names from the >> > + * inode reference item in the subvolume tree with the same key (if it exists). >> > + * For any name that is not in the inode reference item from the log tree, do a >> > + * proper unlink of that name (that is, remove its entry from the inode >> > + * reference item and both dir index keys). >> > + */ >> > +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, >> > + struct btrfs_root *root, >> > + struct btrfs_path *path, >> > + struct btrfs_inode *inode, >> > + struct extent_buffer *log_eb, >> > + int log_slot, >> > + struct btrfs_key *key) >> > +{ >> > + int ret; >> > + unsigned long ref_ptr; >> > + unsigned long ref_end; >> > + struct extent_buffer *eb; >> > + >> > +again: >> > + btrfs_release_path(path); >> > + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); >> > + if (ret > 0) { >> > + ret = 0; >> > + goto out; >> > + } >> > + if (ret < 0) >> > + goto out; >> > + >> > + eb = path->nodes[0]; >> > + ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]); >> > + ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]); >> > + while (ref_ptr < ref_end) { >> > + char *name = NULL; >> > + int namelen; >> > + u64 parent_id; >> > + >> > + if (key->type == BTRFS_INODE_EXTREF_KEY) { >> > + ret = extref_get_fields(eb, ref_ptr, &namelen, &name, >> > + NULL, &parent_id); >> > + } else { >> > + parent_id = key->offset; >> > + ret = ref_get_fields(eb, ref_ptr, &namelen, &name, >> > + NULL); >> > + } >> > + if (ret) >> > + goto out; >> > + >> > + if (key->type == BTRFS_INODE_EXTREF_KEY) >> > + ret = btrfs_find_name_in_ext_backref(log_eb, log_slot, >> > + parent_id, name, >> > + namelen, NULL); >> > + else >> > + ret = btrfs_find_name_in_backref(log_eb, log_slot, name, >> > + namelen, NULL); >> > + >> > + if (!ret) { >> > + struct inode *dir; >> > + >> > + btrfs_release_path(path); >> > + dir = read_one_inode(root, parent_id); >> > + if (!dir) { >> > + ret = -ENOENT; >> > + kfree(name); >> > + goto out; >> > + } >> > + ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), >> > + inode, name, namelen); >> > + kfree(name); >> > + iput(dir); >> > + if (ret) >> > + goto out; >> > + goto again; >> > + } >> > + >> > + kfree(name); >> > + ref_ptr += namelen; >> > + if (key->type == BTRFS_INODE_EXTREF_KEY) >> > + ref_ptr += sizeof(struct btrfs_inode_extref); >> > + else >> > + ref_ptr += sizeof(struct btrfs_inode_ref); >> > + } >> > + ret = 0; >> > + out: >> > + btrfs_release_path(path); >> > + return ret; >> > +} >> > + >> > +/* >> > * replay one inode back reference item found in the log tree. >> > * eb, slot and key refer to the buffer and key found in the log tree. >> > * root is the destination we are replaying into, and path is for temp >> > @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, >> > } >> > } >> > >> > + /* >> > + * Before we overwrite the inode reference item in the subvolume tree >> > + * with the item from the log tree, we must unlink all names from the >> > + * parent directory that are in the subvolume's tree inode reference >> > + * item, otherwise we end up with an inconsistent subvolume tree where >> > + * dir index entries exist for a name but there is no inode reference >> > + * item with the same name. >> > + */ >> > + ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot, >> > + key); >> > + if (ret) >> > + goto out; >> > + >> > /* finally write the back reference in the inode */ >> > ret = overwrite_item(trans, root, path, eb, slot, key); >> > out: >> > -- >> > 2.11.0 >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1a462ab85c49..5d33478bc704 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, u64 inode_objectid, u64 ref_objectid, int ins_len, int cow); -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, + const char *name, + int name_len, struct btrfs_inode_ref **ref_ret); +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, u64 ref_objectid, const char *name, int name_len, struct btrfs_inode_extref **extref_ret); diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index 39c968f80157..65e1a76bf755 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -22,10 +22,10 @@ #include "transaction.h" #include "print-tree.h" -static int find_name_in_backref(struct btrfs_path *path, const char *name, - int name_len, struct btrfs_inode_ref **ref_ret) +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot, + const char *name, + int name_len, struct btrfs_inode_ref **ref_ret) { - struct extent_buffer *leaf; struct btrfs_inode_ref *ref; unsigned long ptr; unsigned long name_ptr; @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, u32 cur_offset = 0; int len; - leaf = path->nodes[0]; - item_size = btrfs_item_size_nr(leaf, path->slots[0]); - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); + item_size = btrfs_item_size_nr(leaf, slot); + ptr = btrfs_item_ptr_offset(leaf, slot); while (cur_offset < item_size) { ref = (struct btrfs_inode_ref *)(ptr + cur_offset); len = btrfs_inode_ref_name_len(leaf, ref); @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name, if (len != name_len) continue; if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) { - *ref_ret = ref; + if (ref_ret) + *ref_ret = ref; return 1; } } return 0; } -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot, + u64 ref_objectid, const char *name, int name_len, struct btrfs_inode_extref **extref_ret) { - struct extent_buffer *leaf; struct btrfs_inode_extref *extref; unsigned long ptr; unsigned long name_ptr; @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 ref_objectid, u32 cur_offset = 0; int ref_name_len; - leaf = path->nodes[0]; - item_size = btrfs_item_size_nr(leaf, path->slots[0]); - ptr = btrfs_item_ptr_offset(leaf, path->slots[0]); + item_size = btrfs_item_size_nr(leaf, slot); + ptr = btrfs_item_ptr_offset(leaf, slot); /* * Search all extended backrefs in this item. We're only @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans, return ERR_PTR(ret); if (ret > 0) return NULL; - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, &extref)) + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], + ref_objectid, name, name_len, + &extref)) return NULL; return extref; } @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, * This should always succeed so error here will make the FS * readonly. */ - if (!btrfs_find_name_in_ext_backref(path, ref_objectid, + if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0], + ref_objectid, name, name_len, &extref)) { btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL); ret = -EROFS; @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, } else if (ret < 0) { goto out; } - if (!find_name_in_backref(path, name, name_len, &ref)) { + if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], + name, name_len, &ref)) { ret = -ENOENT; search_ext_refs = 1; goto out; @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans, ret = btrfs_insert_empty_item(trans, root, path, &key, ins_len); if (ret == -EEXIST) { - if (btrfs_find_name_in_ext_backref(path, ref_objectid, + if (btrfs_find_name_in_ext_backref(path->nodes[0], + path->slots[0], + ref_objectid, name, name_len, NULL)) goto out; @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, if (ret == -EEXIST) { u32 old_size; - if (find_name_in_backref(path, name, name_len, &ref)) + if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0], + name, name_len, &ref)) goto out; old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]); @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, ret = 0; } else if (ret < 0) { if (ret == -EOVERFLOW) { - if (find_name_in_backref(path, name, name_len, &ref)) + if (btrfs_find_name_in_backref(path->nodes[0], + path->slots[0], + name, name_len, &ref)) ret = -EEXIST; else ret = -EMLINK; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 411a022489e4..fd573816f461 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root *log, ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]); if (key->type == BTRFS_INODE_EXTREF_KEY) { - if (btrfs_find_name_in_ext_backref(path, ref_objectid, + if (btrfs_find_name_in_ext_backref(path->nodes[0], + path->slots[0], + ref_objectid, name, namelen, NULL)) match = 1; @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, read_extent_buffer(eb, *name, (unsigned long)&extref->name, *namelen); - *index = btrfs_inode_extref_index(eb, extref); + if (index) + *index = btrfs_inode_extref_index(eb, extref); if (parent_objectid) *parent_objectid = btrfs_inode_extref_parent(eb, extref); @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer *eb, unsigned long ref_ptr, read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen); - *index = btrfs_inode_ref_index(eb, ref); + if (index) + *index = btrfs_inode_ref_index(eb, ref); return 0; } /* + * Take an inode reference item from the log tree and iterate all names from the + * inode reference item in the subvolume tree with the same key (if it exists). + * For any name that is not in the inode reference item from the log tree, do a + * proper unlink of that name (that is, remove its entry from the inode + * reference item and both dir index keys). + */ +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_path *path, + struct btrfs_inode *inode, + struct extent_buffer *log_eb, + int log_slot, + struct btrfs_key *key) +{ + int ret; + unsigned long ref_ptr; + unsigned long ref_end; + struct extent_buffer *eb; + +again: + btrfs_release_path(path); + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret > 0) { + ret = 0; + goto out; + } + if (ret < 0) + goto out; + + eb = path->nodes[0]; + ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]); + ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]); + while (ref_ptr < ref_end) { + char *name = NULL; + int namelen; + u64 parent_id; + + if (key->type == BTRFS_INODE_EXTREF_KEY) { + ret = extref_get_fields(eb, ref_ptr, &namelen, &name, + NULL, &parent_id); + } else { + parent_id = key->offset; + ret = ref_get_fields(eb, ref_ptr, &namelen, &name, + NULL); + } + if (ret) + goto out; + + if (key->type == BTRFS_INODE_EXTREF_KEY) + ret = btrfs_find_name_in_ext_backref(log_eb, log_slot, + parent_id, name, + namelen, NULL); + else + ret = btrfs_find_name_in_backref(log_eb, log_slot, name, + namelen, NULL); + + if (!ret) { + struct inode *dir; + + btrfs_release_path(path); + dir = read_one_inode(root, parent_id); + if (!dir) { + ret = -ENOENT; + kfree(name); + goto out; + } + ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), + inode, name, namelen); + kfree(name); + iput(dir); + if (ret) + goto out; + goto again; + } + + kfree(name); + ref_ptr += namelen; + if (key->type == BTRFS_INODE_EXTREF_KEY) + ref_ptr += sizeof(struct btrfs_inode_extref); + else + ref_ptr += sizeof(struct btrfs_inode_ref); + } + ret = 0; + out: + btrfs_release_path(path); + return ret; +} + +/* * replay one inode back reference item found in the log tree. * eb, slot and key refer to the buffer and key found in the log tree. * root is the destination we are replaying into, and path is for temp @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, } } + /* + * Before we overwrite the inode reference item in the subvolume tree + * with the item from the log tree, we must unlink all names from the + * parent directory that are in the subvolume's tree inode reference + * item, otherwise we end up with an inconsistent subvolume tree where + * dir index entries exist for a name but there is no inode reference + * item with the same name. + */ + ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot, + key); + if (ret) + goto out; + /* finally write the back reference in the inode */ ret = overwrite_item(trans, root, path, eb, slot, key); out: