diff mbox

btrfs: cancel scrub/replace if the user space process receive SIGKILL.

Message ID

1407317662-9364-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)

State

Accepted

Headers

show

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Subject: [PATCH] btrfs: cancel scrub/replace if the user space process
	receive SIGKILL.
Date: Wed, 6 Aug 2014 17:34:22 +0800
Message-ID: <1407317662-9364-1-git-send-email-quwenruo@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Qu Wenruo Aug. 6, 2014, 9:34 a.m. UTC

When impatient sysadmin is tired of waiting background running btrfs
scrub/replace and send SIGKILL to btrfs process, unlike
SIGINT/SIGTERM which can be caught by user space program and cancel the
scrub work, user space program will continue running until ioctl exits.

To keep it consistent with the behavior of btrfs-progs, which cancels
the work when SIGINT is received, this patch will make scrub routine to
check SIGKILL pending of current task and cancel the work if SIGKILL is
already pending.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/scrub.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

Comments

David Sterba Sept. 2, 2014, 11:05 a.m. UTC | #1

On Wed, Aug 06, 2014 at 05:34:22PM +0800, Qu Wenruo wrote:
> When impatient sysadmin is tired of waiting background running btrfs
> scrub/replace and send SIGKILL to btrfs process, unlike
> SIGINT/SIGTERM which can be caught by user space program and cancel the
> scrub work, user space program will continue running until ioctl exits.

I don't understand why it's needed to add another way to cancel scrub.
Does it mean that 'btrfs scrub cancel' is not sufficient? It cancels
both foreground and background scrub.  Same for dev-replace, it has the
cancel subcommand.

Sending KILL signal to some random process is not the right way, how can
the admin know to which filesystem the process belongs?

> To keep it consistent with the behavior of btrfs-progs, which cancels
> the work when SIGINT is received, this patch will make scrub routine to
> check SIGKILL pending of current task and cancel the work if SIGKILL is
> already pending.

The foreground scrub starts a separate process and then wait()s. If you
want to catch a SIGINT, then change it to a loop that checks for if the
forked process exited or if Ctrl-c was pressed.

The dev-replace can be started without a userspace process via
kthread_run from btrfs_dev_replace_continue_on_mount, and sending
signals to kernel processes requires some caution. For one, the signals
have to be explicitly allowed. But before that I'd like to better
understand where the SIGKILL is unavoidable.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Sept. 3, 2014, 1:02 a.m. UTC | #2

-------- Original Message --------
Subject: Re: [PATCH] btrfs: cancel scrub/replace if the user space 
process receive SIGKILL.
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2014?09?02? 19:05
> On Wed, Aug 06, 2014 at 05:34:22PM +0800, Qu Wenruo wrote:
>> When impatient sysadmin is tired of waiting background running btrfs
>> scrub/replace and send SIGKILL to btrfs process, unlike
>> SIGINT/SIGTERM which can be caught by user space program and cancel the
>> scrub work, user space program will continue running until ioctl exits.
> I don't understand why it's needed to add another way to cancel scrub.
> Does it mean that 'btrfs scrub cancel' is not sufficient? It cancels
> both foreground and background scrub.  Same for dev-replace, it has the
> cancel subcommand.
Yes, 'scrub cacnel' is sufficient and it's what userspace calls when 
catching SIGINT.
I sent the user-space patch to fix the 'dev-replace cancel' signal 
handling and then consider
since SIGKILL can't be caught, it can't be handle in user-space so I 
then sent the kernel patch to handle it.

But if user-space can handle SIGINT correctly, the SIGKILL won't be 
sent, so the kernel patch can be ignored.

Thanks,
Qu
>
> Sending KILL signal to some random process is not the right way, how can
> the admin know to which filesystem the process belongs?
>
>> To keep it consistent with the behavior of btrfs-progs, which cancels
>> the work when SIGINT is received, this patch will make scrub routine to
>> check SIGKILL pending of current task and cancel the work if SIGKILL is
>> already pending.
> The foreground scrub starts a separate process and then wait()s.If you
> want to catch a SIGINT, then change it to a loop that checks for if the
> forked process exited or if Ctrl-c was pressed.

>
> The dev-replace can be started without a userspace process via
> kthread_run from btrfs_dev_replace_continue_on_mount, and sending
> signals to kernel processes requires some caution. For one, the signals
> have to be explicitly allowed. But before that I'd like to better
> understand where the SIGKILL is unavoidable.
>
> Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff mbox

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b6d198f..0c8047f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2277,6 +2277,29 @@  static int get_raid56_logic_offset(u64 physical, int num,
 	return 1;
 }
 
+/*
+ * check whether the scrub is canceled
+ * if canceled, return -ECANCELED, else return 0
+ *
+ * scrub/replace will be canceled when
+ * 1) cancel is called manually
+ * 2) caller user space process(btrfs-progs) receive SIGKILL
+ * other signals can be caught in btrfs-progs using multi-thread
+ * and cancel the work.
+ * but SIGKILL can't be caught and btrfs-progs already fallen into ioctl
+ * so cancel current scrub to return asap if SIGKILL is received.
+ */
+static inline int is_scrub_canceled(struct btrfs_fs_info *fs_info,
+				    struct scrub_ctx *sctx)
+{
+	int ret = 0;
+
+	if (unlikely(atomic_read(&fs_info->scrub_cancel_req) ||
+		     atomic_read(&sctx->cancel_req) ||
+		     __fatal_signal_pending(current)))
+		ret = -ECANCELED;
+	return ret;
+}
 static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   struct map_lookup *map,
 					   struct btrfs_device *scrub_dev,
@@ -2420,11 +2443,9 @@  static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		/*
 		 * canceled?
 		 */
-		if (atomic_read(&fs_info->scrub_cancel_req) ||
-		    atomic_read(&sctx->cancel_req)) {
-			ret = -ECANCELED;
+		ret = is_scrub_canceled(fs_info, sctx);
+		if (ret)
 			goto out;
-		}
 		/*
 		 * check to see if we have to pause
 		 */