Message ID | 20181024091303.20324-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] Btrfs: fix deadlock on tree root leaf when finding free extent | expand |
On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by using the tree root's commit root when searching for a > block group's free space cache inode item when we are attempting to load > a free space cache. This is safe since block groups once loaded stay in > memory forever, as well as their caches, so after they are first loaded > we will never need to read their inode items again. For new block groups, > once they are created they get their ->cached field set to > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com> > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > Now my goal is to see how many times I can get you to redo this thing. Why not instead just do if (btrfs_is_free_space_inode(inode)) path->search_commit_root = 1; in read_locked_inode? That would be cleaner. If we don't want to do that for the inode cache (I'm not sure if it's ok or not) we could just do if (root == fs_info->tree_root) instead. Thanks, Josef
On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote: > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by using the tree root's commit root when searching for a > > block group's free space cache inode item when we are attempting to load > > a free space cache. This is safe since block groups once loaded stay in > > memory forever, as well as their caches, so after they are first loaded > > we will never need to read their inode items again. For new block groups, > > once they are created they get their ->cached field set to > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/ > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > --- > > > > Now my goal is to see how many times I can get you to redo this thing. > > Why not instead just do > > if (btrfs_is_free_space_inode(inode)) > path->search_commit_root = 1; > > in read_locked_inode? That would be cleaner. If we don't want to do that for > the inode cache (I'm not sure if it's ok or not) we could just do > > if (root == fs_info->tree_root) We can't (not just that at least). Tried something like that, but we get into a BUG_ON when writing out the space cache for new block groups (created in the current transaction). Because at cache_save_setup() we have this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 Lookup for the inode in normal root, doesn't exist, create it then repeat - if still not found, BUG_ON. Could also make create_free_space_inode() return an inode pointer and make it call btrfs_iget(). > > instead. Thanks, > > Josef
On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote: > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > When we are writing out a free space cache, during the transaction commit > > > phase, we can end up in a deadlock which results in a stack trace like the > > > following: > > > > > > schedule+0x28/0x80 > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > ? finish_wait+0x80/0x80 > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > ? inode_insert5+0x119/0x190 > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_iget+0x113/0x690 [btrfs] > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > ? finish_wait+0x80/0x80 > > > find_free_extent+0x799/0x1010 [btrfs] > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > cache which is located in the tree root (fs_info->tree_root), which means > > > that it may result in COWing a leaf from that tree. If that happens we > > > need to find a free metadata extent and while looking for one, if we find > > > a block group which was not cached yet we attempt to load its cache by > > > calling cache_block_group(). However this function will try to load the > > > inode of the free space cache, which requires finding the matching inode > > > item in the tree root - if that inode item is located in the same leaf as > > > the inode item of the space cache we are updating at cache_save_setup(), > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > extent buffer that we previously write locked. > > > > > > So fix this by using the tree root's commit root when searching for a > > > block group's free space cache inode item when we are attempting to load > > > a free space cache. This is safe since block groups once loaded stay in > > > memory forever, as well as their caches, so after they are first loaded > > > we will never need to read their inode items again. For new block groups, > > > once they are created they get their ->cached field set to > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/ > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > > --- > > > > > > > Now my goal is to see how many times I can get you to redo this thing. > > > > Why not instead just do > > > > if (btrfs_is_free_space_inode(inode)) > > path->search_commit_root = 1; > > > > in read_locked_inode? That would be cleaner. If we don't want to do that for > > the inode cache (I'm not sure if it's ok or not) we could just do > > > > if (root == fs_info->tree_root) > > We can't (not just that at least). > Tried something like that, but we get into a BUG_ON when writing out > the space cache for new block groups (created in the current > transaction). > Because at cache_save_setup() we have this: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 > > Lookup for the inode in normal root, doesn't exist, create it then > repeat - if still not found, BUG_ON. > Could also make create_free_space_inode() return an inode pointer and > make it call btrfs_iget(). > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() take a path, and allocate it in btrfs_iget, that'll remove the ugly if (path != in_path) stuff. Thanks, Josef
On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik <josef@toxicpanda.com> wrote: > > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote: > > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > > > When we are writing out a free space cache, during the transaction commit > > > > phase, we can end up in a deadlock which results in a stack trace like the > > > > following: > > > > > > > > schedule+0x28/0x80 > > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > > ? inode_insert5+0x119/0x190 > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_iget+0x113/0x690 [btrfs] > > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > find_free_extent+0x799/0x1010 [btrfs] > > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > > cache which is located in the tree root (fs_info->tree_root), which means > > > > that it may result in COWing a leaf from that tree. If that happens we > > > > need to find a free metadata extent and while looking for one, if we find > > > > a block group which was not cached yet we attempt to load its cache by > > > > calling cache_block_group(). However this function will try to load the > > > > inode of the free space cache, which requires finding the matching inode > > > > item in the tree root - if that inode item is located in the same leaf as > > > > the inode item of the space cache we are updating at cache_save_setup(), > > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > > extent buffer that we previously write locked. > > > > > > > > So fix this by using the tree root's commit root when searching for a > > > > block group's free space cache inode item when we are attempting to load > > > > a free space cache. This is safe since block groups once loaded stay in > > > > memory forever, as well as their caches, so after they are first loaded > > > > we will never need to read their inode items again. For new block groups, > > > > once they are created they get their ->cached field set to > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > > > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/ > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > > > --- > > > > > > > > > > Now my goal is to see how many times I can get you to redo this thing. > > > > > > Why not instead just do > > > > > > if (btrfs_is_free_space_inode(inode)) > > > path->search_commit_root = 1; > > > > > > in read_locked_inode? That would be cleaner. If we don't want to do that for > > > the inode cache (I'm not sure if it's ok or not) we could just do > > > > > > if (root == fs_info->tree_root) > > > > We can't (not just that at least). > > Tried something like that, but we get into a BUG_ON when writing out > > the space cache for new block groups (created in the current > > transaction). > > Because at cache_save_setup() we have this: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 > > > > Lookup for the inode in normal root, doesn't exist, create it then > > repeat - if still not found, BUG_ON. > > Could also make create_free_space_inode() return an inode pointer and > > make it call btrfs_iget(). > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > if (path != in_path) You mean the following on top of v4: https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg Not much different, just saves one such if statement. I'm ok with that. > > stuff. Thanks, > > Josef
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote: > > > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > > > > > When we are writing out a free space cache, during the transaction commit > > > > > phase, we can end up in a deadlock which results in a stack trace like the > > > > > following: > > > > > > > > > > schedule+0x28/0x80 > > > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > > > ? finish_wait+0x80/0x80 > > > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > > > ? inode_insert5+0x119/0x190 > > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > > btrfs_iget+0x113/0x690 [btrfs] > > > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > > > ? finish_wait+0x80/0x80 > > > > > find_free_extent+0x799/0x1010 [btrfs] > > > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > > > cache which is located in the tree root (fs_info->tree_root), which means > > > > > that it may result in COWing a leaf from that tree. If that happens we > > > > > need to find a free metadata extent and while looking for one, if we find > > > > > a block group which was not cached yet we attempt to load its cache by > > > > > calling cache_block_group(). However this function will try to load the > > > > > inode of the free space cache, which requires finding the matching inode > > > > > item in the tree root - if that inode item is located in the same leaf as > > > > > the inode item of the space cache we are updating at cache_save_setup(), > > > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > > > extent buffer that we previously write locked. > > > > > > > > > > So fix this by using the tree root's commit root when searching for a > > > > > block group's free space cache inode item when we are attempting to load > > > > > a free space cache. This is safe since block groups once loaded stay in > > > > > memory forever, as well as their caches, so after they are first loaded > > > > > we will never need to read their inode items again. For new block groups, > > > > > once they are created they get their ->cached field set to > > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > > > > > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/ > > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com> > > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > > > > --- > > > > > > > > > > > > > Now my goal is to see how many times I can get you to redo this thing. > > > > > > > > Why not instead just do > > > > > > > > if (btrfs_is_free_space_inode(inode)) > > > > path->search_commit_root = 1; > > > > > > > > in read_locked_inode? That would be cleaner. If we don't want to do that for > > > > the inode cache (I'm not sure if it's ok or not) we could just do > > > > > > > > if (root == fs_info->tree_root) > > > > > > We can't (not just that at least). > > > Tried something like that, but we get into a BUG_ON when writing out > > > the space cache for new block groups (created in the current > > > transaction). > > > Because at cache_save_setup() we have this: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 > > > > > > Lookup for the inode in normal root, doesn't exist, create it then > > > repeat - if still not found, BUG_ON. > > > Could also make create_free_space_inode() return an inode pointer and > > > make it call btrfs_iget(). > > > > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > if (path != in_path) > > You mean the following on top of v4: > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > Not much different, just saves one such if statement. I'm ok with that. > Yeah ok, thanks, Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > if (path != in_path) > > You mean the following on top of v4: > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > Not much different, just saves one such if statement. I'm ok with that. Now in misc-next with v4 and the friendpaste incremental as https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote: > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > if (path != in_path) > > > > You mean the following on top of v4: > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > Not much different, just saves one such if statement. I'm ok with that. > > Now in misc-next with v4 and the friendpaste incremental as > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 Please don't add the incremental. It's buggy. It was meant to figure out what Josef was saying. That's why I haven't sent a V5.
On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote: > On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote: > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > if (path != in_path) > > > > > > You mean the following on top of v4: > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > Not much different, just saves one such if statement. I'm ok with that. > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > Please don't add the incremental. It's buggy. It was meant to figure > out what Josef was saying. That's why I haven't sent a V5. Ok dropped, I'll will wait for a proper patch.
On Mon, Nov 5, 2018 at 4:34 PM David Sterba <dsterba@suse.cz> wrote: > > On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote: > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote: > > > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > > > if (path != in_path) > > > > > > > > You mean the following on top of v4: > > > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > > > Not much different, just saves one such if statement. I'm ok with that. > > > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > > > Please don't add the incremental. It's buggy. It was meant to figure > > out what Josef was saying. That's why I haven't sent a V5. > > Ok dropped, I'll will wait for a proper patch. It's V4, the last sent version. Just forget the incremental. Thanks.
On Mon, Nov 05, 2018 at 04:36:34PM +0000, Filipe Manana wrote: > On Mon, Nov 5, 2018 at 4:34 PM David Sterba <dsterba@suse.cz> wrote: > > On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote: > > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote: > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > > > > > if (path != in_path) > > > > > > > > > > You mean the following on top of v4: > > > > > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > > > > > Not much different, just saves one such if statement. I'm ok with that. > > > > > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > > > > > Please don't add the incremental. It's buggy. It was meant to figure > > > out what Josef was saying. That's why I haven't sent a V5. > > > > Ok dropped, I'll will wait for a proper patch. > > It's V4, the last sent version. Just forget the incremental. > Thanks. For the record, V4 has been merged to master in 4.20-rc2.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..2b34b2a05ad6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3177,6 +3177,9 @@ void btrfs_destroy_inode(struct inode *inode); int btrfs_drop_inode(struct inode *inode); int __init btrfs_init_cachep(void); void __cold btrfs_destroy_cachep(void); +struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location, + struct btrfs_root *root, int *new, + struct btrfs_path *path); struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *was_new); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 65b79500e09f..7265f35324f6 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -68,7 +68,8 @@ static struct inode *__lookup_free_space_inode(struct btrfs_root *root, btrfs_disk_key_to_cpu(&location, &disk_key); btrfs_release_path(path); - inode = btrfs_iget(fs_info->sb, &location, root, NULL); + inode = btrfs_iget_path(fs_info->sb, &location, root, NULL, path); + btrfs_release_path(path); if (IS_ERR(inode)) return inode; @@ -830,6 +831,25 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info, path->search_commit_root = 1; path->skip_locking = 1; + /* + * We must pass a path with search_commit_root set to btrfs_iget in + * order to avoid a deadlock when allocating extents for the tree root. + * + * When we are COWing an extent buffer from the tree root, when looking + * for a free extent, at extent-tree.c:find_free_extent(), we can find + * block group without its free space cache loaded. When we find one + * we must load its space cache which requires reading its free space + * cache's inode item from the root tree. If this inode item is located + * in the same leaf that we started COWing before, then we end up in + * deadlock on the extent buffer (trying to read lock it when we + * previously write locked it). + * + * It's safe to read the inode item using the commit root because + * block groups, once loaded, stay in memory forever (until they are + * removed) as well as their space caches once loaded. New block groups + * once created get their ->cached field set to BTRFS_CACHE_FINISHED so + * we will never try to read their inode item while the fs is mounted. + */ inode = lookup_free_space_inode(fs_info, block_group, path); if (IS_ERR(inode)) { btrfs_free_path(path); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d6b61b1facdd..64ea749c1ba4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3568,10 +3568,11 @@ static noinline int acls_after_inode_item(struct extent_buffer *leaf, /* * read an inode from the btree into the in-memory inode */ -static int btrfs_read_locked_inode(struct inode *inode) +static int btrfs_read_locked_inode(struct inode *inode, + struct btrfs_path *in_path) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_path *path; + struct btrfs_path *path = in_path; struct extent_buffer *leaf; struct btrfs_inode_item *inode_item; struct btrfs_root *root = BTRFS_I(inode)->root; @@ -3587,15 +3588,18 @@ static int btrfs_read_locked_inode(struct inode *inode) if (!ret) filled = true; - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; + if (!path) { + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + } memcpy(&location, &BTRFS_I(inode)->location, sizeof(location)); ret = btrfs_lookup_inode(NULL, root, path, &location, 0); if (ret) { - btrfs_free_path(path); + if (path != in_path) + btrfs_free_path(path); return ret; } @@ -3720,7 +3724,8 @@ static int btrfs_read_locked_inode(struct inode *inode) btrfs_ino(BTRFS_I(inode)), root->root_key.objectid, ret); } - btrfs_free_path(path); + if (path != in_path) + btrfs_free_path(path); if (!maybe_acls) cache_no_acl(inode); @@ -5662,8 +5667,9 @@ static struct inode *btrfs_iget_locked(struct super_block *s, /* Get an inode object given its location and corresponding root. * Returns in *is_new if the inode was read from disk */ -struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, - struct btrfs_root *root, int *new) +struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location, + struct btrfs_root *root, int *new, + struct btrfs_path *path) { struct inode *inode; @@ -5674,7 +5680,7 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, if (inode->i_state & I_NEW) { int ret; - ret = btrfs_read_locked_inode(inode); + ret = btrfs_read_locked_inode(inode, path); if (!ret) { inode_tree_add(inode); unlock_new_inode(inode); @@ -5696,6 +5702,12 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, return inode; } +struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, + struct btrfs_root *root, int *new) +{ + return btrfs_iget_path(s, location, root, new, NULL); +} + static struct inode *new_simple_dir(struct super_block *s, struct btrfs_key *key, struct btrfs_root *root)