From patchwork Wed Dec 30 16:02:04 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 7934141 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 13D28BEEE5 for ; Wed, 30 Dec 2015 16:02:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 13A8D202C8 for ; Wed, 30 Dec 2015 16:02:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F28AB20220 for ; Wed, 30 Dec 2015 16:02:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753360AbbL3QCU (ORCPT ); Wed, 30 Dec 2015 11:02:20 -0500 Received: from mail.kernel.org ([198.145.29.136]:45791 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752519AbbL3QCS (ORCPT ); Wed, 30 Dec 2015 11:02:18 -0500 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 64CAA202C8; Wed, 30 Dec 2015 16:02:17 +0000 (UTC) Received: from debian3.lan (bl8-199-62.dsl.telepac.pt [85.241.199.62]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A637920220; Wed, 30 Dec 2015 16:02:15 +0000 (UTC) From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH] Btrfs: fix race between free space endio workers and space cache writeout Date: Wed, 30 Dec 2015 16:02:04 +0000 Message-Id: <1451491324-23937-1-git-send-email-fdmanana@kernel.org> X-Mailer: git-send-email 2.1.3 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Filipe Manana While running a stress test I ran into the following trace/transaction abort: [471626.672243] ------------[ cut here ]------------ [471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]() [471626.675492] BTRFS: Transaction aborted (error -2) [471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix [471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [471626.691901] 0000000000000000 ffff880016037cf0 ffffffff812566f4 ffff880016037d38 [471626.695009] ffff880016037d28 ffffffff8104d0a6 ffffffffa040c84e 00000000fffffffe [471626.697490] ffff88011fe855f8 ffff88000c484cb0 ffff88000d195000 ffff880016037d90 [471626.699201] Call Trace: [471626.699804] [] dump_stack+0x4e/0x79 [471626.701049] [] warn_slowpath_common+0x9f/0xb8 [471626.702542] [] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [471626.704326] [] warn_slowpath_fmt+0x48/0x50 [471626.705636] [] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs] [471626.707048] [] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [471626.708616] [] commit_cowonly_roots+0x1d7/0x25a [btrfs] [471626.709950] [] btrfs_commit_transaction+0x4c4/0x991 [btrfs] [471626.711286] [] ? signal_pending_state+0x31/0x31 [471626.712611] [] btrfs_sync_fs+0x145/0x1ad [btrfs] [471626.715610] [] ? SyS_tee+0x226/0x226 [471626.716718] [] sync_fs_one_sb+0x20/0x22 [471626.717672] [] iterate_supers+0x75/0xc2 [471626.718800] [] sys_sync+0x52/0x80 [471626.719990] [] entry_SYSCALL_64_fastpath+0x12/0x6f [471626.721835] ---[ end trace baf57f43d76693f4 ]--- [471626.722954] BTRFS: error (device sdc) in btrfs_write_dirty_block_groups:3740: errno=-2 No such entry This is a very rare situation and it happened due to a race between a free space endio worker and writing the space caches for dirty block groups at a transaction's commit critical section. The steps leading to this are: 1) A task calls btrfs_commit_transaction() and starts the writeout of the space caches for all currently dirty block groups (i.e. it calls btrfs_start_dirty_block_groups()); 2) The previous step starts writeback for space caches; 3) When the writeback finishes it queues jobs for free space endio work queue (fs_info->endio_freespace_worker) that execute btrfs_finish_ordered_io(); 4) The task committing the transaction sets the transaction's state to TRANS_STATE_COMMIT_DOING and shortly after calls btrfs_write_dirty_block_groups(); 5) A free space endio job joins the transaction, through btrfs_join_transaction_nolock(), and updates a free space inode item in the root tree through btrfs_update_inode_fallback(); 6) Updating the free space inode item resulted in COWing one or more nodes/leaves of the root tree, and that resulted in creating a new metadata block group, which gets added to the transaction's list of dirty block groups (this is a very rare case); 7) The free space endio job has not released yet its transaction handle at this point, so the new metadata block group was not yet fully created (didn't go through btrfs_create_pending_block_groups() yet); 8) The transaction commit task sees the new metadata block group in the transaction's list of dirty block groups and processes it. When it attempts to update the block group's block group item in the extent tree, through write_one_cache_group(), it isn't able to find it and aborts the transaction with error -ENOENT - this is because the free space endio job hasn't yet released its transaction handle (which calls btrfs_create_pending_block_groups()) and therefore the block group item was not yet added to the extent tree. Fix this waiting for free space endio jobs if we fail to find a block group item in the extent tree and then retry once updating the block group item. Signed-off-by: Filipe Manana --- fs/btrfs/extent-tree.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0617cb7..9bca90d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3763,6 +3763,25 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans, } if (!ret) { ret = write_one_cache_group(trans, root, path, cache); + /* + * One of the free space endio workers might have + * created a new block group while updating a free space + * cache's inode (at inode.c:btrfs_finish_ordered_io()) + * and hasn't released its transaction handle yet, in + * which case the new block group is still attached to + * its transaction handle and its creation has not + * finished yet (no block group item in the extent tree + * yet, etc). If this is the case, wait for all free + * space endio workers to finish and retry. This is a + * a very rare case so no need for a more efficient and + * complex approach. + */ + if (ret == -ENOENT) { + wait_event(cur_trans->writer_wait, + atomic_read(&cur_trans->num_writers) == 1); + ret = write_one_cache_group(trans, root, path, + cache); + } if (ret) btrfs_abort_transaction(trans, root, ret); }