diff mbox series

[v2] Btrfs: fix send failure when root has deleted files still open

Message ID 20180724105404.2950-1-fdmanana@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2] Btrfs: fix send failure when root has deleted files still open | expand

Commit Message

Filipe Manana July 24, 2018, 10:54 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan cleanup. However, other
less common use cases for send can end up seeing inodes with a link count
of zero and in this case the send operation fails with a ENOENT error
because any attempt to generate a path for the inode, with the purpose
of creating it or updating it at the receiver, fails since there are no
inode reference items. One use case it to use a regular subvolume for
a send operation after turning it to RO mode or turning a RW snapshot
into RO mode and then using it for a send operation. In both cases, if a
file gets all its hard links deleted while there is an open file
descriptor before turning the subvolume/snapshot into RO mode, the send
operation will encounter an inode with a link count of zero and then
fail with errno ENOENT.

Example using a full send with a subvolume:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  # keep an open file descriptor on file bar
  $ exec 73</mnt/sv1/bar
  $ unlink /mnt/sv1/bar

  # Turn the subvolume to RO mode and use it for a full send, while
  # holding the open file descriptor.
  $ btrfs property set /mnt/sv1 ro true

  $ btrfs send -f /tmp/full.send /mnt/sv1
  At subvol /mnt/sv1
  ERROR: send ioctl failed with -2: No such file or directory

Example using an incremental send with snapshots:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap1

  $ echo "hello world" >> /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2

  # Turn the second snapshot to RW mode and delete file foo while
  # holding an open file descriptor on it.
  $ btrfs property set /mnt/snap2 ro false
  $ exec 73</mnt/snap2/foo
  $ unlink /mnt/snap2/foo

  # Set the second snapshot back to RO mode and do an incremental send.
  $ btrfs property set /mnt/snap2 ro true

  $ btrfs send -f /tmp/inc.send -p /mnt/snap1 /mnt/snap2
  At subvol /mnt/snap2
  ERROR: send ioctl failed with -2: No such file or directory

So fix this by ignoring inodes with a link count of zero if we are either
doing a full send or if they do not exist in the parent snapshot (they
are new in the send snapshot), and unlink all paths found in the parent
snapshot when doing an incremental send (and ignoring all other inode
items, such as xattrs and extents).

A test case for fstests follows soon.

Reported-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Fixed a null pointer dereference for non-incremental send on
    sctx->left_path.

 fs/btrfs/send.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 130 insertions(+), 8 deletions(-)

Comments

David Sterba July 24, 2018, 2:22 p.m. UTC | #1
On Tue, Jul 24, 2018 at 11:54:04AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The more common use case of send involves creating a RO snapshot and then
> use it for a send operation. In this case it's not possible to have inodes
> in the snapshot that have a link count of zero (inode with an orphan item)
> since during snapshot creation we do the orphan cleanup. However, other
> less common use cases for send can end up seeing inodes with a link count
> of zero and in this case the send operation fails with a ENOENT error
> because any attempt to generate a path for the inode, with the purpose
> of creating it or updating it at the receiver, fails since there are no
> inode reference items. One use case it to use a regular subvolume for
> a send operation after turning it to RO mode or turning a RW snapshot
> into RO mode and then using it for a send operation. In both cases, if a
> file gets all its hard links deleted while there is an open file
> descriptor before turning the subvolume/snapshot into RO mode, the send
> operation will encounter an inode with a link count of zero and then
> fail with errno ENOENT.
> 
> Example using a full send with a subvolume:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ btrfs subvolume create /mnt/sv1
>   $ touch /mnt/sv1/foo
>   $ touch /mnt/sv1/bar
> 
>   # keep an open file descriptor on file bar
>   $ exec 73</mnt/sv1/bar
>   $ unlink /mnt/sv1/bar
> 
>   # Turn the subvolume to RO mode and use it for a full send, while
>   # holding the open file descriptor.
>   $ btrfs property set /mnt/sv1 ro true
> 
>   $ btrfs send -f /tmp/full.send /mnt/sv1
>   At subvol /mnt/sv1
>   ERROR: send ioctl failed with -2: No such file or directory
> 
> Example using an incremental send with snapshots:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ btrfs subvolume create /mnt/sv1
>   $ touch /mnt/sv1/foo
>   $ touch /mnt/sv1/bar
> 
>   $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap1
> 
>   $ echo "hello world" >> /mnt/sv1/bar
> 
>   $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2
> 
>   # Turn the second snapshot to RW mode and delete file foo while
>   # holding an open file descriptor on it.
>   $ btrfs property set /mnt/snap2 ro false
>   $ exec 73</mnt/snap2/foo
>   $ unlink /mnt/snap2/foo
> 
>   # Set the second snapshot back to RO mode and do an incremental send.
>   $ btrfs property set /mnt/snap2 ro true
> 
>   $ btrfs send -f /tmp/inc.send -p /mnt/snap1 /mnt/snap2
>   At subvol /mnt/snap2
>   ERROR: send ioctl failed with -2: No such file or directory
> 
> So fix this by ignoring inodes with a link count of zero if we are either
> doing a full send or if they do not exist in the parent snapshot (they
> are new in the send snapshot), and unlink all paths found in the parent
> snapshot when doing an incremental send (and ignoring all other inode
> items, such as xattrs and extents).
> 
> A test case for fstests follows soon.
> 
> Reported-by: Martin Wilck <martin.wilck@suse.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks. I did a light review of the overall logic
how the ignore_cur_inode is passed around, skipping the orphans and
replacing with send_unlink sounds ok to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox series

Patch

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 30be18be0036..0f31760f875f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -100,6 +100,7 @@  struct send_ctx {
 	u64 cur_inode_rdev;
 	u64 cur_inode_last_extent;
 	u64 cur_inode_next_write_offset;
+	bool ignore_cur_inode;
 
 	u64 send_progress;
 
@@ -5799,6 +5800,9 @@  static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
 	int pending_move = 0;
 	int refs_processed = 0;
 
+	if (sctx->ignore_cur_inode)
+		return 0;
+
 	ret = process_recorded_refs_if_needed(sctx, at_end, &pending_move,
 					      &refs_processed);
 	if (ret < 0)
@@ -5917,6 +5921,94 @@  static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
 	return ret;
 }
 
+struct parent_paths_ctx {
+	struct list_head *refs;
+	struct send_ctx *sctx;
+};
+
+static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
+			     void *ctx)
+{
+	struct parent_paths_ctx *ppctx = ctx;
+
+	return record_ref(ppctx->sctx->parent_root, dir, name, ppctx->sctx,
+			  ppctx->refs);
+}
+
+/*
+ * Issue unlink operations for all paths of the current inode found in the
+ * parent snapshot.
+ */
+static int btrfs_unlink_all_paths(struct send_ctx *sctx)
+{
+	LIST_HEAD(deleted_refs);
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct parent_paths_ctx ctx;
+	int ret;
+
+	path = alloc_path_for_send();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = sctx->cur_ino;
+	key.type = BTRFS_INODE_REF_KEY;
+	key.offset = 0;
+	ret = btrfs_search_slot(NULL, sctx->parent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	ctx.refs = &deleted_refs;
+	ctx.sctx = sctx;
+
+	while (true) {
+		struct extent_buffer *eb = path->nodes[0];
+		int slot = path->slots[0];
+
+		if (slot >= btrfs_header_nritems(eb)) {
+			ret = btrfs_next_leaf(sctx->parent_root, path);
+			if (ret < 0)
+				goto out;
+			else if (ret > 0)
+				break;
+			continue;
+		}
+
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		if (key.objectid != sctx->cur_ino)
+			break;
+		if (key.type != BTRFS_INODE_REF_KEY &&
+		    key.type != BTRFS_INODE_EXTREF_KEY)
+			break;
+
+		ret = iterate_inode_ref(sctx->parent_root, path, &key, 1,
+					record_parent_ref, &ctx);
+		if (ret < 0)
+			goto out;
+
+		path->slots[0]++;
+	}
+
+	while (!list_empty(&deleted_refs)) {
+		struct recorded_ref *ref;
+
+		ref = list_first_entry(&deleted_refs, struct recorded_ref,
+				       list);
+		ret = send_unlink(sctx, ref->full_path);
+		if (ret < 0)
+			goto out;
+		fs_path_free(ref->full_path);
+		list_del(&ref->list);
+		kfree(ref);
+	}
+	ret = 0;
+out:
+	btrfs_free_path(path);
+	if (ret)
+		__free_recorded_refs(&deleted_refs);
+	return ret;
+}
+
 static int changed_inode(struct send_ctx *sctx,
 			 enum btrfs_compare_tree_result result)
 {
@@ -5931,6 +6023,7 @@  static int changed_inode(struct send_ctx *sctx,
 	sctx->cur_inode_new_gen = 0;
 	sctx->cur_inode_last_extent = (u64)-1;
 	sctx->cur_inode_next_write_offset = 0;
+	sctx->ignore_cur_inode = false;
 
 	/*
 	 * Set send_progress to current inode. This will tell all get_cur_xxx
@@ -5971,6 +6064,33 @@  static int changed_inode(struct send_ctx *sctx,
 			sctx->cur_inode_new_gen = 1;
 	}
 
+	/*
+	 * Normally we do not find inodes with a link count of zero (orphans)
+	 * because the most common case is to create a snapshot and use it
+	 * for a send operation. However other less common use cases involve
+	 * using a subvolume and send it after turning it to RO mode just
+	 * after deleting all hard links of a file while holding an open
+	 * file descriptor against it or turning a RO snapshot into RW mode,
+	 * keep an open file descriptor against a file, delete it and then
+	 * turn the snapshot back to RO mode before using it for a send
+	 * operation. So if we find such cases, ignore the inode and all its
+	 * items completely if it's a new inode, or if it's a changed inode
+	 * make sure all its previous paths (from the parent snapshot) are all
+	 * unlinked and all other the inode items are ignored.
+	 */
+	if ((result == BTRFS_COMPARE_TREE_NEW ||
+	     result == BTRFS_COMPARE_TREE_CHANGED)) {
+		u32 nlinks;
+
+		nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
+		if (nlinks == 0) {
+			sctx->ignore_cur_inode = true;
+			if (result == BTRFS_COMPARE_TREE_CHANGED)
+				ret = btrfs_unlink_all_paths(sctx);
+			goto out;
+		}
+	}
+
 	if (result == BTRFS_COMPARE_TREE_NEW) {
 		sctx->cur_inode_gen = left_gen;
 		sctx->cur_inode_new = 1;
@@ -6309,15 +6429,17 @@  static int changed_cb(struct btrfs_path *left_path,
 	    key->objectid == BTRFS_FREE_SPACE_OBJECTID)
 		goto out;
 
-	if (key->type == BTRFS_INODE_ITEM_KEY)
+	if (key->type == BTRFS_INODE_ITEM_KEY) {
 		ret = changed_inode(sctx, result);
-	else if (key->type == BTRFS_INODE_REF_KEY ||
-		 key->type == BTRFS_INODE_EXTREF_KEY)
-		ret = changed_ref(sctx, result);
-	else if (key->type == BTRFS_XATTR_ITEM_KEY)
-		ret = changed_xattr(sctx, result);
-	else if (key->type == BTRFS_EXTENT_DATA_KEY)
-		ret = changed_extent(sctx, result);
+	} else if (!sctx->ignore_cur_inode) {
+		if (key->type == BTRFS_INODE_REF_KEY ||
+		    key->type == BTRFS_INODE_EXTREF_KEY)
+			ret = changed_ref(sctx, result);
+		else if (key->type == BTRFS_XATTR_ITEM_KEY)
+			ret = changed_xattr(sctx, result);
+		else if (key->type == BTRFS_EXTENT_DATA_KEY)
+			ret = changed_extent(sctx, result);
+	}
 
 out:
 	return ret;