[v3] btrfs: qgroup: exit the rescan worker during umount

Message ID	1446681376-24583-1-git-send-email-jmaggard@netgear.com (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Justin Maggard <jmaggard10@gmail.com> To: linux-btrfs@vger.kernel.org Cc: fdmanana@gmail.com, Justin Maggard <jmaggard@netgear.com> Subject: [PATCH v3] btrfs: qgroup: exit the rescan worker during umount Date: Wed, 4 Nov 2015 15:56:16 -0800 Message-Id: <1446681376-24583-1-git-send-email-jmaggard@netgear.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Message ID

1446681376-24583-1-git-send-email-jmaggard@netgear.com (mailing list archive)

State

Accepted

Headers

From: Justin Maggard <jmaggard10@gmail.com>
To: linux-btrfs@vger.kernel.org
Cc: fdmanana@gmail.com, Justin Maggard <jmaggard@netgear.com>
Subject: [PATCH v3] btrfs: qgroup: exit the rescan worker during umount
Date: Wed,  4 Nov 2015 15:56:16 -0800
Message-Id: <1446681376-24583-1-git-send-email-jmaggard@netgear.com>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Justin Maggard Nov. 4, 2015, 11:56 p.m. UTC

I was hitting a consistent NULL pointer dereference during shutdown that
showed the trace running through end_workqueue_bio().  I traced it back to
the endio_meta_workers workqueue being poked after it had already been
destroyed.

Eventually I found that the root cause was a qgroup rescan that was still
in progress while we were stopping all the btrfs workers.

Currently we explicitly pause balance and scrub operations in
close_ctree(), but we do nothing to stop the qgroup rescan.  We should
probably be doing the same for qgroup rescan, but that's a much larger
change.  This small change is good enough to allow me to unmount without
crashing.

v3: avoid more races by calling btrfs_qgroup_wait_for_completion()

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
---
 fs/btrfs/disk-io.c | 3 +++
 fs/btrfs/qgroup.c  | 9 ++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Filipe Manana Nov. 5, 2015, 10:42 a.m. UTC | #1

On Wed, Nov 4, 2015 at 11:56 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
> I was hitting a consistent NULL pointer dereference during shutdown that
> showed the trace running through end_workqueue_bio().  I traced it back to
> the endio_meta_workers workqueue being poked after it had already been
> destroyed.
>
> Eventually I found that the root cause was a qgroup rescan that was still
> in progress while we were stopping all the btrfs workers.
>
> Currently we explicitly pause balance and scrub operations in
> close_ctree(), but we do nothing to stop the qgroup rescan.  We should
> probably be doing the same for qgroup rescan, but that's a much larger
> change.  This small change is good enough to allow me to unmount without
> crashing.
>
> v3: avoid more races by calling btrfs_qgroup_wait_for_completion()

Btw, information about what changes between versions should go after
the "---" (triple dash) below. You should also have mentioned what
changed between v2 and v1 as well (see
https://btrfs.wiki.kernel.org/index.php/Developer%27s_FAQ#Repeated_submissions).

>
> Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>

I've added this to my integration branch for 4.4 [1] and will prepare
later a pull request for Chris. I've removed there the v3 line from
the change log, as that's not  intended to be there, serves only for
patch reviewing in the list.

Thanks for doing this and the test case for fstests.

[1] http://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/log/?h=integration-4.4

> ---
>  fs/btrfs/disk-io.c | 3 +++
>  fs/btrfs/qgroup.c  | 9 ++++++---
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 2d46675..1eb0839 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3780,6 +3780,9 @@ void close_ctree(struct btrfs_root *root)
>         fs_info->closing = 1;
>         smp_mb();
>
> +       /* wait for the qgroup rescan worker to stop */
> +       btrfs_qgroup_wait_for_completion(fs_info);
> +
>         /* wait for the uuid_scan task to finish */
>         down(&fs_info->uuid_tree_rescan_sem);
>         /* avoid complains from lockdep et al., set sem back to initial state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 46476c2..75c0249 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2286,7 +2286,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>                 goto out;
>
>         err = 0;
> -       while (!err) {
> +       while (!err && !btrfs_fs_closing(fs_info)) {
>                 trans = btrfs_start_transaction(fs_info->fs_root, 0);
>                 if (IS_ERR(trans)) {
>                         err = PTR_ERR(trans);
> @@ -2307,7 +2307,8 @@ out:
>         btrfs_free_path(path);
>
>         mutex_lock(&fs_info->qgroup_rescan_lock);
> -       fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +       if (!btrfs_fs_closing(fs_info))
> +               fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>
>         if (err > 0 &&
>             fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
> @@ -2336,7 +2337,9 @@ out:
>         }
>         btrfs_end_transaction(trans, fs_info->quota_root);
>
> -       if (err >= 0) {
> +       if (btrfs_fs_closing(fs_info)) {
> +               btrfs_info(fs_info, "qgroup scan paused");
> +       } else if (err >= 0) {
>                 btrfs_info(fs_info, "qgroup scan completed%s",
>                         err > 0 ? " (inconsistency flag cleared)" : "");
>         } else {
> --
> 2.6.2
>

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2d46675..1eb0839 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3780,6 +3780,9 @@  void close_ctree(struct btrfs_root *root)
 	fs_info->closing = 1;
 	smp_mb();
 
+	/* wait for the qgroup rescan worker to stop */
+	btrfs_qgroup_wait_for_completion(fs_info);
+
 	/* wait for the uuid_scan task to finish */
 	down(&fs_info->uuid_tree_rescan_sem);
 	/* avoid complains from lockdep et al., set sem back to initial state */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 46476c2..75c0249 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2286,7 +2286,7 @@  static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 		goto out;
 
 	err = 0;
-	while (!err) {
+	while (!err && !btrfs_fs_closing(fs_info)) {
 		trans = btrfs_start_transaction(fs_info->fs_root, 0);
 		if (IS_ERR(trans)) {
 			err = PTR_ERR(trans);
@@ -2307,7 +2307,8 @@  out:
 	btrfs_free_path(path);
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+	if (!btrfs_fs_closing(fs_info))
+		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
 
 	if (err > 0 &&
 	    fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
@@ -2336,7 +2337,9 @@  out:
 	}
 	btrfs_end_transaction(trans, fs_info->quota_root);
 
-	if (err >= 0) {
+	if (btrfs_fs_closing(fs_info)) {
+		btrfs_info(fs_info, "qgroup scan paused");
+	} else if (err >= 0) {
 		btrfs_info(fs_info, "qgroup scan completed%s",
 			err > 0 ? " (inconsistency flag cleared)" : "");
 	} else {

[v3] btrfs: qgroup: exit the rescan worker during umount

Commit Message

Comments

Patch