Message ID | 50AB86D5.6040501@jan-o-sch.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 20, 2012 at 06:34:13AM -0700, Jan Schmidt wrote: > I found a race condition in the way scrub interacts with btrfs workers: > > Please carefully distinguish "workers" as struct btrfs_workers from "worker" as > struct btrfs_worker_thread. > > Process A: umount > Process B: scrub-workers > Process C: genwork-worker > > A: close_ctree() > A: scrub_cancel() > B: scrub_workers_put() > B: btrfs_stop_workers() > B: -> lock(workers->lock) > B: -> splice idle worker list into worker list > B: -> while (worker list) > B: -> worker = delete from worker list > B: -> unlock(workers->lock) > B: -> kthread_stop(worker) > C: __btrfs_start_workers() > C: -> kthread_run starts new scrub-worker > C: -> lock(workers->lock) /* from scrub-workers */ > C: -> add to workers idle list > C: -> unlock(workers->lock) > B: -> lock(workers->lock) > A: stop genwork-worker > > In that situation, the scrub-workers have one idle worker but return from > btrfs_stop_workers(). There are several strategies to fix this: > > 1. Let close_ctree call scrub_cancel() after stopping the generic worker. > > 2. Let btrfs_stop_workers be more careful on insertions to the idle list, like: > > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c > index 58b7d14..2523781 100644 > --- a/fs/btrfs/async-thread.c > +++ b/fs/btrfs/async-thread.c > @@ -431,6 +431,8 @@ void btrfs_stop_workers(struct btrfs_workers *workers) > kthread_stop(worker->task); > spin_lock_irq(&workers->lock); > put_worker(worker); > + /* we dropped the lock above, check for new idle workers */ > + list_splice_init(&workers->idle_list, &workers->worker_list); > } > spin_unlock_irq(&workers->lock); > } > > 3. Find out why __btrfs_start_workers() gets called in the above scenario at all. > > Any thoughts? Otherwise, I'm going for option 3 and will call back. I'd go for a mixture of 1 and 3. My first guess is that we're getting endio events for scrub reads and that is kicking the scrub state machine. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 58b7d14..2523781 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -431,6 +431,8 @@ void btrfs_stop_workers(struct btrfs_workers *workers) kthread_stop(worker->task); spin_lock_irq(&workers->lock); put_worker(worker); + /* we dropped the lock above, check for new idle workers */ + list_splice_init(&workers->idle_list, &workers->worker_list); } spin_unlock_irq(&workers->lock); }