From patchwork Fri Aug  8 15:03:41 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Moyer <jmoyer@redhat.com>
X-Patchwork-Id: 4696731
X-Patchwork-Delegate: snitzer@redhat.com
Return-Path: <dm-devel-bounces@redhat.com>
X-Original-To: patchwork-dm-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 899229F373
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Fri,  8 Aug 2014 15:07:40 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id B2268201C0
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Fri,  8 Aug 2014 15:07:39 +0000 (UTC)
Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37])
	by mail.kernel.org (Postfix) with ESMTP id C3D5C201BB
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Fri,  8 Aug 2014 15:07:38 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com
	(lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s78F3iiC010843;
	Fri, 8 Aug 2014 11:03:45 -0400
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id s78F3g1J011313 for <dm-devel@listman.util.phx.redhat.com>;
	Fri, 8 Aug 2014 11:03:42 -0400
Received: from segfault.boston.devel.redhat.com
	(segfault.boston.devel.redhat.com [10.19.60.26])
	by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id s78F3foW025909
	(version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NO);
	Fri, 8 Aug 2014 11:03:41 -0400
From: Jeff Moyer <jmoyer@redhat.com>
To: axboe@kernel.dk, dm-devel@redhat.com, linux-kernel@vger.kernel.org
X-PGP-KeyID: 1F78E1B4
X-PGP-CertKey: F6FE 280D 8293 F72C 65FD  5A58 1FF8 A7CA 1F78 E1B4
X-PCLoadLetter: What the f**k does that mean?
Date: Fri, 08 Aug 2014 11:03:41 -0400
Message-ID: <x49iom3hvle.fsf@segfault.boston.devel.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27
X-loop: dm-devel@redhat.com
Cc: stable@vger.kernel.org
Subject: [dm-devel] [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com
X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi,

Commit 05f1dd5 introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.
This gets set by default in blk_mq_init_queue for mq-enabled devices.
The effect of the flag is to bypass the SG segment merging.  Instead,
the bio->bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.
Attached is a patch that does this, tested and confirmed to fix the
problem in my environment.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.

stable notes: this patch should be applied to 3.16.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
---
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5f59f1e..bf756d1 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1386,6 +1386,14 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
+static int queue_supports_sg_merge(struct dm_target *ti, struct dm_dev *dev,
+				   sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return q && !test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
+}
+
 static bool dm_table_all_devices_attribute(struct dm_table *t,
 					   iterate_devices_callout_fn func)
 {
@@ -1464,6 +1472,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_same(t))
 		q->limits.max_write_same_sectors = 0;
 
+	if (!dm_table_all_devices_attribute(t, queue_supports_sg_merge))
+		queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
+
 	dm_table_set_integrity(t);
 
 	/*