diff mbox

Btrfs: protect the pending_snapshots list with trans_lock

Message ID 1308079067-20733-1-git-send-email-josef@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik June 14, 2011, 7:17 p.m. UTC
Currently there is nothing protecting the pending_snapshots list on the
transaction.  We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption.  So protect
this list with the trans_lock.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/btrfs/ioctl.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

David Sterba June 15, 2011, 9:53 a.m. UTC | #1
Hi,

On Tue, Jun 14, 2011 at 03:17:47PM -0400, Josef Bacik wrote:
> Currently there is nothing protecting the pending_snapshots list on the
> transaction.  We only hold the directory mutex that we are snapshotting and a
> read lock on the subvol_sem, so we could race with somebody else creating a
> snapshot in a different directory and end up with list corruption.  So protect
> this list with the trans_lock.  Thanks,

it's correct that the pending_snapshots list is not protected, but I do
not see a way how to corrupt the list.

The callchain goes like this:

<user called ioctl>
  btrfs_ioctl_snap_create_transid
    btrfs_mksubvol
      create_snapshot

the interesting stuff happens inside create_snapshot:

* alloc pending snapshot struct
* start transaction (btrfs_start_transaction)
* reserve metadata
* list_add pending_snapshot to the transaction
* commit the transaction (sync or async)
* free pending snapshot struct

the snapshots are really created from btrfs_commit_transaction,
create_pending_snapshots(), which does not protect the
trans->pending_snapshots list, just traverses it. There is a potential
corruption too but ...

... there is the 'start transaction' call, which tries very hard to
start a new transaction, ie. this call returns a new transaction handle.
To prove this point needs more space, I will skip it now. The important
thing for our case is that new transaction waits for the current to
unblock.

Then, each snap create will have its own transaction and own
pending_snapshots list, and will create just one snapshot per
transaction, because any concurrently running ioctl snap create will block
in the start_transaction call.

Therefore I think that either

1) there is no locking needed around pending_snapshots (under assumption
one snapshot per transaction), or

2) you need to protect every access to the pending_snapshot:
- add in create_snapshot
- splice in btrfs_destroy_pending_snapshots
- list foreach in create_pending_snapshots
- list_empty in btrfs_commit_transaction


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason June 15, 2011, 11:41 a.m. UTC | #2
Excerpts from David Sterba's message of 2011-06-15 05:53:29 -0400:
> Hi,
> 
> On Tue, Jun 14, 2011 at 03:17:47PM -0400, Josef Bacik wrote:
> > Currently there is nothing protecting the pending_snapshots list on the
> > transaction.  We only hold the directory mutex that we are snapshotting and a
> > read lock on the subvol_sem, so we could race with somebody else creating a
> > snapshot in a different directory and end up with list corruption.  So protect
> > this list with the trans_lock.  Thanks,
> 
> it's correct that the pending_snapshots list is not protected, but I do
> not see a way how to corrupt the list.
> 
> The callchain goes like this:
> 
> <user called ioctl>
>   btrfs_ioctl_snap_create_transid
>     btrfs_mksubvol
>       create_snapshot
> 
> the interesting stuff happens inside create_snapshot:
> 
> * alloc pending snapshot struct
> * start transaction (btrfs_start_transaction)
> * reserve metadata
> * list_add pending_snapshot to the transaction
> * commit the transaction (sync or async)
> * free pending snapshot struct
> 
> the snapshots are really created from btrfs_commit_transaction,
> create_pending_snapshots(), which does not protect the
> trans->pending_snapshots list, just traverses it. There is a potential
> corruption too but ...
> 
> ... there is the 'start transaction' call, which tries very hard to
> start a new transaction, ie. this call returns a new transaction handle.
> To prove this point needs more space, I will skip it now. The important
> thing for our case is that new transaction waits for the current to
> unblock.
> 
> Then, each snap create will have its own transaction and own
> pending_snapshots list, and will create just one snapshot per
> transaction, because any concurrently running ioctl snap create will block
> in the start_transaction call.
> 
> Therefore I think that either
> 
> 1) there is no locking needed around pending_snapshots (under assumption
> one snapshot per transaction), or

There is definitely a window where two procs can be inside
create_snapshot() at the same time in the same transaction.  You're
correct that the window is small, but its there.  We're not racing with
the bulk of the work done at transaction commit time though.

Going down to your list:

> 
> 2) you need to protect every access to the pending_snapshot:
> - add in create_snapshot

create_snapshot is racey.

> - splice in btrfs_destroy_pending_snapshots

This can only happen during unmount while there are no writers.

> - list foreach in create_pending_snapshots

This can only happen during commit while there are no writers

> - list_empty in btrfs_commit_transaction

This is there to make sure that any delayed allocation from the source
subvol is actually done in the snapshot.  You're right that we should
have a better check here.  The worst possible case in getting this wrong
is that the snapshot misses some recent file writes in the original, but
it will be completely consistent.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba June 20, 2011, 10:32 p.m. UTC | #3
On Wed, Jun 15, 2011 at 07:41:37AM -0400, Chris Mason wrote:
> There is definitely a window where two procs can be inside
> create_snapshot() at the same time in the same transaction.

I trust you on that. I was trying to follow the callgraph from
btrfs_strat_transaction called from create_snapshot to see whether and
how two calls may access the list and came to conclusion that there must
be a fresh transaction after the call. There are lots of
join_transaction, commit_transaction calls and I could have lost track.
I'm leaving this to me as a homework, it was very educative anyway.

> Going down to your list:
> 
> > 
> > 2) you need to protect every access to the pending_snapshot:
> > - add in create_snapshot
> 
> create_snapshot is racey.

agreed, the patch fixes it

> 
> > - splice in btrfs_destroy_pending_snapshots
> 
> This can only happen during unmount while there are no writers.
>
> > - list foreach in create_pending_snapshots
> 
> This can only happen during commit while there are no writers

this counts as protection, for both

> > - list_empty in btrfs_commit_transaction
> 
> This is there to make sure that any delayed allocation from the source
> subvol is actually done in the snapshot.  You're right that we should
> have a better check here.  The worst possible case in getting this wrong
> is that the snapshot misses some recent file writes in the original, but
> it will be completely consistent.

I think this is ok and acceptable.


Thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b793d11..a3c4751 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -482,8 +482,10 @@  static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
 	ret = btrfs_snap_reserve_metadata(trans, pending_snapshot);
 	BUG_ON(ret);
 
+	spin_lock(&root->fs_info->trans_lock);
 	list_add(&pending_snapshot->list,
 		 &trans->transaction->pending_snapshots);
+	spin_unlock(&root->fs_info->trans_lock);
 	if (async_transid) {
 		*async_transid = trans->transid;
 		ret = btrfs_commit_transaction_async(trans,