From patchwork Tue Feb  7 02:49:06 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kent Overstreet <kent.overstreet@gmail.com>
X-Patchwork-Id: 9559329
X-Patchwork-Delegate: snitzer@redhat.com
Return-Path: <dm-devel-bounces@redhat.com>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DEAAE6047A for <patchwork-dm-devel@patchwork.kernel.org>;
	Tue,  7 Feb 2017 07:57:44 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB68E28159
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Tue,  7 Feb 2017 07:57:44 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B032E28135; Tue,  7 Feb 2017 07:57:44 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_MED,T_DKIM_INVALID
	autolearn=unavailable version=3.3.1
Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E739828135
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Tue,  7 Feb 2017 07:57:43 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com
	(lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v177tE16063039;
	Tue, 7 Feb 2017 02:55:18 -0500
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id v172nDYo026098 for <dm-devel@listman.util.phx.redhat.com>;
	Mon, 6 Feb 2017 21:49:13 -0500
Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com
	[10.5.110.27])
	by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id v172nDiV002413
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=NO); Mon, 6 Feb 2017 21:49:13 -0500
Received: from mail-pg0-f67.google.com (mail-pg0-f67.google.com
	[74.125.83.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
	bits)) (No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 6338A83F38;
	Tue,  7 Feb 2017 02:49:12 +0000 (UTC)
Received: by mail-pg0-f67.google.com with SMTP id 204so10675000pge.2;
	Mon, 06 Feb 2017 18:49:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-disposition:in-reply-to:user-agent;
	bh=laENH7yYcsmbeCxGo9QXl45Urnj+cojil2WF6r5sS10=;
	b=kXpxqObJJ8UJpSjfT6P2qvA36NNDL8C0wjxKz8T8haEdmFBLazN1v345RmeaFY0H6s
	WAfOaTtwcxyNwS+qXFAM3oTXQG/U2HOo37/ggXifc3Jm8JcEQ4CC0CYWGS1QvTnic7Xw
	Unh9M3IktjmuVbIR2enjrTo7pxJXVgepvm/ERxpCbDjFQBOP3zMdde4/yMWPRLrXCQyw
	oqRQjOqg/m+KFHjiPdz5cwqeFyKlDVU3ULSsPVKbnbf9woi4pVVmMGHvObpCLg6YzDM+
	WKN4shTTHwebX6IeIvmZcDx4LhmuhPyDWQ3W5ueh4B/aJkGcUG0vys9Ubpj8dXvbHmf6
	nIxg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:date:from:to:cc:subject:message-id:references
	:mime-version:content-disposition:in-reply-to:user-agent;
	bh=laENH7yYcsmbeCxGo9QXl45Urnj+cojil2WF6r5sS10=;
	b=KDz+XhRE1eNGjUWmWYymo07ALwmpN1KmKi4HrhapIL2N7IoJDM/osXlejnqrLpy8i0
	MB/8bTlfdfhXYnKoUghgHQi5BF+CH7xS9cWXQIPqCn1K1w51Qix8t1RhUUokZ13ypT6x
	LGB2QVjpZvS5ddJYgQ89HoMYyAzz52DD/UxZpjwKROYf5l7/oVUlW88a0D6kOp8ZUoUC
	zcbKS05KUgi9JisEYOnu1XEsZyEP3L2Yfoznxk+Vo4gSvXUcVYyC4Ai+nykFC//xLgh3
	8ntzjDLVW7uFtFOQSwqcqg44d/IzVaBi6hyKOOZssELylh4Wgk7DjJKva8L43WkzaCQx
	bV2g==
X-Gm-Message-State: 
 AIkVDXIQoxEsEluTIalEilfKOufJ4JsB2hCZsvKt/1xU/POQ/eIexCuzTxB+6r0Cz+GmAw==
X-Received: by 10.84.198.164 with SMTP id p33mr22135109pld.85.1486435750800;
	Mon, 06 Feb 2017 18:49:10 -0800 (PST)
Received: from moria.home.lan (213-168-42-72.gci.net. [72.42.168.213])
	by smtp.gmail.com with ESMTPSA id
	o18sm5824825pgn.36.2017.02.06.18.49.08
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Mon, 06 Feb 2017 18:49:09 -0800 (PST)
Date: Mon, 6 Feb 2017 17:49:06 -0900
From: Kent Overstreet <kent.overstreet@gmail.com>
To: Pavel Machek <pavel@ucw.cz>
Message-ID: <20170207024906.4oswyuvxfnqkvbhr@moria.home.lan>
References: <20151211104937.GA23165@amd> <20151211140841.GA22873@redhat.com>
	<20160220174035.GA16459@amd> <20160220184258.GA3753@amd>
	<20160220195136.GA27149@redhat.com> <20160220200432.GB22120@amd>
	<20170206125309.GA29395@amd>
	<20170207014724.74tb37jj7u66lww3@moria.home.lan>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20170207014724.74tb37jj7u66lww3@moria.home.lan>
User-Agent: NeoMutt/20170113 (1.7.2)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.27]);
	Tue, 07 Feb 2017 02:49:12 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com
	[10.5.110.27]);
	Tue, 07 Feb 2017 02:49:12 +0000 (UTC) for IP:'74.125.83.67'
	DOMAIN:'mail-pg0-f67.google.com' HELO:'mail-pg0-f67.google.com'
	FROM:'kent.overstreet@gmail.com' RCPT:''
X-RedHat-Spam-Score: 0.67  (BAYES_50, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU,
	FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,
	RCVD_IN_MSPIKE_WL,
	SPF_PASS) 74.125.83.67 mail-pg0-f67.google.com 74.125.83.67
	mail-pg0-f67.google.com <kent.overstreet@gmail.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27
X-Scanned-By: MIMEDefang 2.78 on 10.5.110.27
X-loop: dm-devel@redhat.com
X-Mailman-Approved-At: Tue, 07 Feb 2017 02:55:13 -0500
Cc: oleg.drokin@intel.com, ming.l@ssi.samsung.com, andreas.dilger@intel.com,
	Mike Snitzer <snitzer@redhat.com>, martin.petersen@oracle.com,
	minchan@kernel.org, jkosina@suse.cz, ming.lei@canonical.com,
	kernel list <linux-kernel@vger.kernel.org>, jim@jtan.com,
	pjk1939@linux.vnet.ibm.com, axboe@fb.com, geoff@infradead.org,
	dm-devel@redhat.com, dpark@posteo.net, ngupta@vflare.org, hch@lst.de,
	agk@redhat.com
Subject: Re: [dm-devel] v4.9, 4.4-final: 28 bioset threads on small notebook,
	36 threads on cellphone
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com
X-Virus-Scanned: ClamAV using ClamSMTP

On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > 
> > So.. what needs to be done there?

> But, I just got an idea for how to handle this that might be halfway sane, maybe
> I'll try and come up with a patch...

Ok, here's such a patch, only lightly tested:

-- >8 --
Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset

Note: this patch is very lightly tested.

Also, trigger rescuing whenever with bios on current->bio_list, instead
of only when we block in bio_alloc_bioset(). This is more correct, and
should result in fewer rescuer threads.

XXX: The current->bio_list plugging needs to be unified with the
blk_plug mechanism.

TODO: If we change normal request_queue drivers to handle arbitrary size
bios by processing requests incrementally, instead of splitting bios,
then we can get rid of rescuer threads from those devices.
---
 block/bio.c            | 107 ++++---------------------------------------------
 block/blk-core.c       |  58 ++++++++++++++++++++++++---
 block/blk-sysfs.c      |   2 +
 include/linux/bio.h    |  16 ++++----
 include/linux/blkdev.h |  10 +++++
 include/linux/sched.h  |   2 +-
 kernel/sched/core.c    |   4 ++
 7 files changed, 83 insertions(+), 116 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f3b5786202..9ad54a9b12 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
 }
 EXPORT_SYMBOL(bio_chain);
 
-static void bio_alloc_rescue(struct work_struct *work)
-{
-	struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
-	struct bio *bio;
-
-	while (1) {
-		spin_lock(&bs->rescue_lock);
-		bio = bio_list_pop(&bs->rescue_list);
-		spin_unlock(&bs->rescue_lock);
-
-		if (!bio)
-			break;
-
-		generic_make_request(bio);
-	}
-}
-
-static void punt_bios_to_rescuer(struct bio_set *bs)
-{
-	struct bio_list punt, nopunt;
-	struct bio *bio;
-
-	/*
-	 * In order to guarantee forward progress we must punt only bios that
-	 * were allocated from this bio_set; otherwise, if there was a bio on
-	 * there for a stacking driver higher up in the stack, processing it
-	 * could require allocating bios from this bio_set, and doing that from
-	 * our own rescuer would be bad.
-	 *
-	 * Since bio lists are singly linked, pop them all instead of trying to
-	 * remove from the middle of the list:
-	 */
-
-	bio_list_init(&punt);
-	bio_list_init(&nopunt);
-
-	while ((bio = bio_list_pop(current->bio_list)))
-		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
-
-	*current->bio_list = nopunt;
-
-	spin_lock(&bs->rescue_lock);
-	bio_list_merge(&bs->rescue_list, &punt);
-	spin_unlock(&bs->rescue_lock);
-
-	queue_work(bs->rescue_workqueue, &bs->rescue_work);
-}
-
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-	gfp_t saved_gfp = gfp_mask;
 	unsigned front_pad;
 	unsigned inline_vecs;
 	struct bio_vec *bvl = NULL;
 	struct bio *bio;
 	void *p;
 
-	if (!bs) {
-		if (nr_iovecs > UIO_MAXIOV)
-			return NULL;
+	WARN(current->bio_list &&
+	     !current->bio_list->q->rescue_workqueue,
+	     "allocating bio beneath generic_make_request() without rescuer");
 
+	if (nr_iovecs > UIO_MAXIOV)
+		return NULL;
+
+	if (!bs) {
 		p = kmalloc(sizeof(struct bio) +
 			    nr_iovecs * sizeof(struct bio_vec),
 			    gfp_mask);
 		front_pad = 0;
 		inline_vecs = nr_iovecs;
 	} else {
-		/*
-		 * generic_make_request() converts recursion to iteration; this
-		 * means if we're running beneath it, any bios we allocate and
-		 * submit will not be submitted (and thus freed) until after we
-		 * return.
-		 *
-		 * This exposes us to a potential deadlock if we allocate
-		 * multiple bios from the same bio_set() while running
-		 * underneath generic_make_request(). If we were to allocate
-		 * multiple bios (say a stacking block driver that was splitting
-		 * bios), we would deadlock if we exhausted the mempool's
-		 * reserve.
-		 *
-		 * We solve this, and guarantee forward progress, with a rescuer
-		 * workqueue per bio_set. If we go to allocate and there are
-		 * bios on current->bio_list, we first try the allocation
-		 * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
-		 * bios we would be blocking to the rescuer workqueue before
-		 * we retry with the original gfp_flags.
-		 */
-
-		if (current->bio_list && !bio_list_empty(current->bio_list))
-			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
-
 		p = mempool_alloc(&bs->bio_pool, gfp_mask);
-		if (!p && gfp_mask != saved_gfp) {
-			punt_bios_to_rescuer(bs);
-			gfp_mask = saved_gfp;
-			p = mempool_alloc(&bs->bio_pool, gfp_mask);
-		}
-
 		front_pad = bs->front_pad;
 		inline_vecs = BIO_INLINE_VECS;
 	}
@@ -483,12 +408,6 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 		unsigned long idx = 0;
 
 		bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
-		if (!bvl && gfp_mask != saved_gfp) {
-			punt_bios_to_rescuer(bs);
-			gfp_mask = saved_gfp;
-			bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
-		}
-
 		if (unlikely(!bvl))
 			goto err_free;
 
@@ -1938,10 +1857,6 @@ int biovec_init_pool(mempool_t *pool, int pool_entries)
 
 void bioset_exit(struct bio_set *bs)
 {
-	if (bs->rescue_workqueue)
-		destroy_workqueue(bs->rescue_workqueue);
-	bs->rescue_workqueue = NULL;
-
 	mempool_exit(&bs->bio_pool);
 	mempool_exit(&bs->bvec_pool);
 
@@ -1968,10 +1883,6 @@ static int __bioset_init(struct bio_set *bs,
 
 	bs->front_pad = front_pad;
 
-	spin_lock_init(&bs->rescue_lock);
-	bio_list_init(&bs->rescue_list);
-	INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
-
 	bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
 	if (!bs->bio_slab)
 		return -ENOMEM;
@@ -1983,10 +1894,6 @@ static int __bioset_init(struct bio_set *bs,
 	    biovec_init_pool(&bs->bvec_pool, pool_size))
 		goto bad;
 
-	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
-	if (!bs->rescue_workqueue)
-		goto bad;
-
 	return 0;
 bad:
 	bioset_exit(bs);
diff --git a/block/blk-core.c b/block/blk-core.c
index 7e3cfa9c88..f716164cb3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -48,6 +48,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
 
 DEFINE_IDA(blk_queue_ida);
 
+static void bio_rescue_work(struct work_struct *);
+
 /*
  * For the allocated request tables
  */
@@ -759,11 +761,21 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
 		goto fail_bdi;
 
-	if (blkcg_init_queue(q))
+	spin_lock_init(&q->rescue_lock);
+	bio_list_init(&q->rescue_list);
+	INIT_WORK(&q->rescue_work, bio_rescue_work);
+
+	q->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
+	if (!q->rescue_workqueue)
 		goto fail_ref;
 
+	if (blkcg_init_queue(q))
+		goto fail_rescue;
+
 	return q;
 
+fail_rescue:
+	destroy_workqueue(q->rescue_workqueue);
 fail_ref:
 	percpu_ref_exit(&q->q_usage_counter);
 fail_bdi:
@@ -1994,7 +2006,7 @@ generic_make_request_checks(struct bio *bio)
  */
 blk_qc_t generic_make_request(struct bio *bio)
 {
-	struct bio_list bio_list_on_stack;
+	struct bio_plug_list bio_list_on_stack;
 	blk_qc_t ret = BLK_QC_T_NONE;
 
 	if (!generic_make_request_checks(bio))
@@ -2011,7 +2023,9 @@ blk_qc_t generic_make_request(struct bio *bio)
 	 * should be added at the tail
 	 */
 	if (current->bio_list) {
-		bio_list_add(current->bio_list, bio);
+		WARN(!current->bio_list->q->rescue_workqueue,
+		     "submitting bio beneath generic_make_request() without rescuer");
+		bio_list_add(&current->bio_list->bios, bio);
 		goto out;
 	}
 
@@ -2030,19 +2044,23 @@ blk_qc_t generic_make_request(struct bio *bio)
 	 * bio_list, and call into ->make_request() again.
 	 */
 	BUG_ON(bio->bi_next);
-	bio_list_init(&bio_list_on_stack);
+	bio_list_init(&bio_list_on_stack.bios);
 	current->bio_list = &bio_list_on_stack;
+
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
+		current->bio_list->q = q;
+
 		if (likely(blk_queue_enter(q, false) == 0)) {
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
 
-			bio = bio_list_pop(current->bio_list);
+			bio = bio_list_pop(&current->bio_list->bios);
 		} else {
-			struct bio *bio_next = bio_list_pop(current->bio_list);
+			struct bio *bio_next =
+				bio_list_pop(&current->bio_list->bios);
 
 			bio_io_error(bio);
 			bio = bio_next;
@@ -2055,6 +2073,34 @@ blk_qc_t generic_make_request(struct bio *bio)
 }
 EXPORT_SYMBOL(generic_make_request);
 
+static void bio_rescue_work(struct work_struct *work)
+{
+	struct request_queue *q =
+		container_of(work, struct request_queue, rescue_work);
+	struct bio *bio;
+
+	while (1) {
+		spin_lock(&q->rescue_lock);
+		bio = bio_list_pop(&q->rescue_list);
+		spin_unlock(&q->rescue_lock);
+
+		if (!bio)
+			break;
+
+		generic_make_request(bio);
+	}
+}
+
+void blk_punt_blocked_bios(struct bio_plug_list *list)
+{
+	spin_lock(&list->q->rescue_lock);
+	bio_list_merge(&list->q->rescue_list, &list->bios);
+	bio_list_init(&list->bios);
+	spin_unlock(&list->q->rescue_lock);
+
+	queue_work(list->q->rescue_workqueue, &list->q->rescue_work);
+}
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The &struct bio which describes the I/O
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7f27a18cc4..77529238d1 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -660,6 +660,8 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_trace_shutdown(q);
 
+	if (q->rescue_workqueue)
+		destroy_workqueue(q->rescue_workqueue);
 	if (q->bio_split)
 		bioset_free(q->bio_split);
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1ffe8e37ae..87eeec7eda 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -658,6 +658,13 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
 	return bio;
 }
 
+struct bio_plug_list {
+	struct bio_list		bios;
+	struct request_queue	*q;
+};
+
+void blk_punt_blocked_bios(struct bio_plug_list *);
+
 /*
  * Increment chain count for the bio. Make sure the CHAIN flag update
  * is visible before the raised count.
@@ -687,15 +694,6 @@ struct bio_set {
 	mempool_t bio_integrity_pool;
 	mempool_t bvec_integrity_pool;
 #endif
-
-	/*
-	 * Deadlock avoidance for stacking block drivers: see comments in
-	 * bio_alloc_bioset() for details
-	 */
-	spinlock_t		rescue_lock;
-	struct bio_list		rescue_list;
-	struct work_struct	rescue_work;
-	struct workqueue_struct	*rescue_workqueue;
 };
 
 struct biovec_slab {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358ba0..f64b886c65 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -476,6 +476,16 @@ struct request_queue {
 	struct bio_set		*bio_split;
 
 	bool			mq_sysfs_init_done;
+
+	/*
+	 * Deadlock avoidance, to deal with the plugging in
+	 * generic_make_request() that converts recursion to iteration to avoid
+	 * stack overflow:
+	 */
+	spinlock_t		rescue_lock;
+	struct bio_list		rescue_list;
+	struct work_struct	rescue_work;
+	struct workqueue_struct	*rescue_workqueue;
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2865d10a28..59df7a1030 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1762,7 +1762,7 @@ struct task_struct {
 	void *journal_info;
 
 /* stacked block device info */
-	struct bio_list *bio_list;
+	struct bio_plug_list *bio_list;
 
 #ifdef CONFIG_BLOCK
 /* stack plugging */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bd39d698cb..23b6290ba1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3439,6 +3439,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))
 		return;
+
+	if (tsk->bio_list && !bio_list_empty(&tsk->bio_list->bios))
+		blk_punt_blocked_bios(tsk->bio_list);
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.