diff mbox

[5/7] dm core: reject I/O violating new queue limits

Message ID 49F17510.7010805@ct.jp.nec.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Kiyoshi Ueda April 24, 2009, 8:15 a.m. UTC
This patch detects requests violating the queue limitations
and rejects them.

The same limitation checks are done when requests are submitted
to the queue by blk_insert_cloned_request().
However, such violation can happen if a table is swapped and
the queue limitations are shrunk while some requests are
in the queue.

Since struct request is a reliable one in the block layer and
device drivers, dispatching such requests is pretty dangerous.
(e.g. it may cause kernel panic easily.)
So avoid to dispatch such problematic requests in request-based dm.


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Hannes Reinecke April 24, 2009, 8:59 a.m. UTC | #1
Hi Kiyoshi,

Kiyoshi Ueda wrote:
> This patch detects requests violating the queue limitations
> and rejects them.
> 
> The same limitation checks are done when requests are submitted
> to the queue by blk_insert_cloned_request().
> However, such violation can happen if a table is swapped and
> the queue limitations are shrunk while some requests are
> in the queue.
> 
> Since struct request is a reliable one in the block layer and
> device drivers, dispatching such requests is pretty dangerous.
> (e.g. it may cause kernel panic easily.)
> So avoid to dispatch such problematic requests in request-based dm.
> 
This patch actually triggers accidentally during a no-paths scenario;
multipathing seems to flush all device details once all paths are gone,
so it'll fall back to the system defaults.
And then this check will trigger and kill all queued I/Os. Not good.

So either we fix device-mapper to keep the device details even for
the all-paths down scenario (basically we'd have to copy the device
details into the target_io structures and use that for comparison)
or we should rather not check here.

Cheers,

Hannes
Kiyoshi Ueda April 28, 2009, 7:49 a.m. UTC | #2
Hi Hannes,

On 2009/04/24 17:59 +0900, Hannes Reinecke wrote:
> Hi Kiyoshi,
> 
> Kiyoshi Ueda wrote:
>> This patch detects requests violating the queue limitations
>> and rejects them.
>>
>> The same limitation checks are done when requests are submitted
>> to the queue by blk_insert_cloned_request().
>> However, such violation can happen if a table is swapped and
>> the queue limitations are shrunk while some requests are
>> in the queue.
>>
>> Since struct request is a reliable one in the block layer and
>> device drivers, dispatching such requests is pretty dangerous.
>> (e.g. it may cause kernel panic easily.)
>> So avoid to dispatch such problematic requests in request-based dm.
>>
> This patch actually triggers accidentally during a no-paths scenario;
> multipathing seems to flush all device details once all paths are gone,
> so it'll fall back to the system defaults.
> And then this check will trigger and kill all queued I/Os. Not good.
> 
> So either we fix device-mapper to keep the device details even for
> the all-paths down scenario (basically we'd have to copy the device
> details into the target_io structures and use that for comparison)
> or we should rather not check here.

Thank you for your review and the comment.
I haven't understood the problem correctly yet.
(e.g. Who flush the device details and how?  Restrictions are
 changed when the table is swapped, but no table swapping should happen
 even if the last path is gone while the device is in use.)

Unfortunately, I can't read/send e-mail from Apr 29th throught May 7th
because of long national holidays in Japan.
I'll look at it after the holidays.

> Cheers,
> 
> Hannes

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Kiyoshi Ueda June 23, 2009, 5:46 a.m. UTC | #3
Hi Hannes,

I'm very sorry for the huge delay, but please see below.

On 2009/04/28 16:49 +0900, Kiyoshi Ueda wrote:
> Hi Hannes,
> 
> On 2009/04/24 17:59 +0900, Hannes Reinecke wrote:
>> Hi Kiyoshi,
>>
>> Kiyoshi Ueda wrote:
>>> This patch detects requests violating the queue limitations
>>> and rejects them.
>>>
>>> The same limitation checks are done when requests are submitted
>>> to the queue by blk_insert_cloned_request().
>>> However, such violation can happen if a table is swapped and
>>> the queue limitations are shrunk while some requests are
>>> in the queue.
>>>
>>> Since struct request is a reliable one in the block layer and
>>> device drivers, dispatching such requests is pretty dangerous.
>>> (e.g. it may cause kernel panic easily.)
>>> So avoid to dispatch such problematic requests in request-based dm.
>>>
>> This patch actually triggers accidentally during a no-paths scenario;
>> multipathing seems to flush all device details once all paths are gone,
>> so it'll fall back to the system defaults.
>> And then this check will trigger and kill all queued I/Os. Not good.
>>
>> So either we fix device-mapper to keep the device details even for
>> the all-paths down scenario (basically we'd have to copy the device
>> details into the target_io structures and use that for comparison)
>> or we should rather not check here.
> 
> Thank you for your review and the comment.
> I haven't understood the problem correctly yet.
> (e.g. Who flush the device details and how?  Restrictions are
>  changed when the table is swapped, but no table swapping should happen
>  even if the last path is gone while the device is in use.)

I tried to understand/reproduce your scenario, but I can't.
I found a bug (http://marc.info/?l=dm-devel&m=124572577028515&w=2)
during my testing (which lets all paths down at once), but it is
an user-space bug, not a kernel bug.

Could you elaborate your problem related to the patch, if you have
detailed analisys?
(e.g. I can't see the code-path which flushes device details
      on an all-paths down scenario.)

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

Index: 2.6.30-rc3/drivers/md/dm.c
===================================================================
--- 2.6.30-rc3.orig/drivers/md/dm.c
+++ 2.6.30-rc3/drivers/md/dm.c
@@ -1500,6 +1500,30 @@  static void map_request(struct dm_target
 	dm_get(md);
 
 	tio->ti = ti;
+
+	/*
+	 * Although submitted requests to the md->queue are checked against
+	 * the table/queue limitations at the submission time, the limitations
+	 * may be changed by a table swapping while those already checked
+	 * requests are in the md->queue.
+	 * If the limitations have been shrunk in such situations, we may be
+	 * dispatching requests violating the current limitations here.
+	 * Since struct request is a reliable one in the block-layer
+	 * and device drivers, dispatching such requests is dangerous.
+	 * (e.g. it may cause kernel panic easily.)
+	 * Avoid to dispatch such problematic requests in request-based dm.
+	 *
+	 * Since dm_kill_unmapped_request() expects that tio->ti is correctly
+	 * set, this has to be done after the set.
+	 */
+	r = blk_rq_check_limits(rq->q, rq);
+	if (unlikely(r)) {
+		DMWARN("violating the queue limitation. the limitation may be"
+		       " shrunk while there are some requests in the queue.");
+		dm_kill_unmapped_request(clone, r);
+		return;
+	}
+
 	r = ti->type->map_rq(ti, clone, &tio->info);
 	switch (r) {
 	case DM_MAPIO_SUBMITTED: