From patchwork Tue Jun 2 13:43:21 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 6529461 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DCF609F3D1 for ; Tue, 2 Jun 2015 15:38:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E03F920544 for ; Tue, 2 Jun 2015 15:38:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F3C19204FB for ; Tue, 2 Jun 2015 15:38:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759427AbbFBPie (ORCPT ); Tue, 2 Jun 2015 11:38:34 -0400 Received: from mail.kernel.org ([198.145.29.136]:38866 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759059AbbFBPic (ORCPT ); Tue, 2 Jun 2015 11:38:32 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EE08420558; Tue, 2 Jun 2015 15:38:30 +0000 (UTC) Received: from debian3.lan (bl13-138-253.dsl.telepac.pt [85.246.138.253]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 84826204FB; Tue, 2 Jun 2015 15:38:29 +0000 (UTC) From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH] Btrfs: check pending chunks when shrinking fs to avoid corruption Date: Tue, 2 Jun 2015 14:43:21 +0100 Message-Id: <1433252601-14088-1-git-send-email-fdmanana@kernel.org> X-Mailer: git-send-email 2.1.3 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Filipe Manana When we shrink the usable size of a device (its total_bytes), we go over all the device extent items in the device tree and attempt to relocate the chunk of any device extent that goes beyond the new usable size for the device. We do that after setting the new usable size (total_bytes) in the device object, so that all new allocations (and reallocations) don't use areas of the device that go beyond the new (shorter) size. However we were not considering that before setting the new size in the device, pending chunks might have been created that use device extents that go beyond the new size, and those device extents are not yet in the device tree after we search the device tree - they are still attached to the list of new block group for some ongoing transaction handle, and they are only added to the device tree when the transaction handle is ended (via btrfs_create_pending_block_groups()). So check for pending chunks with device extents that go beyond the new size and if any exists, commit the current transaction and repeat the search in the device tree. Not doing this it would mean we would return success to user space while still having extents that go beyond the new size, and later user space could override those locations on the device while the fs still references them, causing all sorts of corruption and unexpected events. Signed-off-by: Filipe Manana --- fs/btrfs/volumes.c | 49 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 40 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index dbea12e..09e89a6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3984,6 +3984,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) int slot; int failed = 0; bool retried = false; + bool checked_pending_chunks = false; struct extent_buffer *l; struct btrfs_key key; struct btrfs_super_block *super_copy = root->fs_info->super_copy; @@ -4064,15 +4065,6 @@ again: goto again; } else if (failed && retried) { ret = -ENOSPC; - lock_chunks(root); - - btrfs_device_set_total_bytes(device, old_size); - if (device->writeable) - device->fs_devices->total_rw_bytes += diff; - spin_lock(&root->fs_info->free_chunk_lock); - root->fs_info->free_chunk_space += diff; - spin_unlock(&root->fs_info->free_chunk_lock); - unlock_chunks(root); goto done; } @@ -4084,6 +4076,35 @@ again: } lock_chunks(root); + + /* + * We checked in the above loop all device extents that were already in + * the device tree. However before we have updated the device's + * total_bytes to the new size, we might have had chunk allocations that + * have not complete yet (new block groups attached to transaction + * handles), and therefore their device extents were not yet in the + * device tree and we missed them in the loop above. So if we have any + * pending chunk using a device extent that overlaps the device range + * that we can not use anymore, commit the current transaction and + * repeat the search on the device tree - this way we guarantee we will + * not have chunks using device extents that end beyond 'new_size'. + */ + if (!checked_pending_chunks) { + u64 start = new_size; + u64 len = old_size - new_size; + + if (contains_pending_extent(trans, device, &start, len)) { + unlock_chunks(root); + checked_pending_chunks = true; + failed = 0; + retried = false; + ret = btrfs_commit_transaction(trans, root); + if (ret) + goto done; + goto again; + } + } + btrfs_device_set_disk_total_bytes(device, new_size); if (list_empty(&device->resized_list)) list_add_tail(&device->resized_list, @@ -4098,6 +4119,16 @@ again: btrfs_end_transaction(trans, root); done: btrfs_free_path(path); + if (ret) { + lock_chunks(root); + btrfs_device_set_total_bytes(device, old_size); + if (device->writeable) + device->fs_devices->total_rw_bytes += diff; + spin_lock(&root->fs_info->free_chunk_lock); + root->fs_info->free_chunk_space += diff; + spin_unlock(&root->fs_info->free_chunk_lock); + unlock_chunks(root); + } return ret; }