Message ID | 49F17510.7010805@ct.jp.nec.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Hi Kiyoshi, Kiyoshi Ueda wrote: > This patch detects requests violating the queue limitations > and rejects them. > > The same limitation checks are done when requests are submitted > to the queue by blk_insert_cloned_request(). > However, such violation can happen if a table is swapped and > the queue limitations are shrunk while some requests are > in the queue. > > Since struct request is a reliable one in the block layer and > device drivers, dispatching such requests is pretty dangerous. > (e.g. it may cause kernel panic easily.) > So avoid to dispatch such problematic requests in request-based dm. > This patch actually triggers accidentally during a no-paths scenario; multipathing seems to flush all device details once all paths are gone, so it'll fall back to the system defaults. And then this check will trigger and kill all queued I/Os. Not good. So either we fix device-mapper to keep the device details even for the all-paths down scenario (basically we'd have to copy the device details into the target_io structures and use that for comparison) or we should rather not check here. Cheers, Hannes
Hi Hannes, On 2009/04/24 17:59 +0900, Hannes Reinecke wrote: > Hi Kiyoshi, > > Kiyoshi Ueda wrote: >> This patch detects requests violating the queue limitations >> and rejects them. >> >> The same limitation checks are done when requests are submitted >> to the queue by blk_insert_cloned_request(). >> However, such violation can happen if a table is swapped and >> the queue limitations are shrunk while some requests are >> in the queue. >> >> Since struct request is a reliable one in the block layer and >> device drivers, dispatching such requests is pretty dangerous. >> (e.g. it may cause kernel panic easily.) >> So avoid to dispatch such problematic requests in request-based dm. >> > This patch actually triggers accidentally during a no-paths scenario; > multipathing seems to flush all device details once all paths are gone, > so it'll fall back to the system defaults. > And then this check will trigger and kill all queued I/Os. Not good. > > So either we fix device-mapper to keep the device details even for > the all-paths down scenario (basically we'd have to copy the device > details into the target_io structures and use that for comparison) > or we should rather not check here. Thank you for your review and the comment. I haven't understood the problem correctly yet. (e.g. Who flush the device details and how? Restrictions are changed when the table is swapped, but no table swapping should happen even if the last path is gone while the device is in use.) Unfortunately, I can't read/send e-mail from Apr 29th throught May 7th because of long national holidays in Japan. I'll look at it after the holidays. > Cheers, > > Hannes Thanks, Kiyoshi Ueda -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Hannes, I'm very sorry for the huge delay, but please see below. On 2009/04/28 16:49 +0900, Kiyoshi Ueda wrote: > Hi Hannes, > > On 2009/04/24 17:59 +0900, Hannes Reinecke wrote: >> Hi Kiyoshi, >> >> Kiyoshi Ueda wrote: >>> This patch detects requests violating the queue limitations >>> and rejects them. >>> >>> The same limitation checks are done when requests are submitted >>> to the queue by blk_insert_cloned_request(). >>> However, such violation can happen if a table is swapped and >>> the queue limitations are shrunk while some requests are >>> in the queue. >>> >>> Since struct request is a reliable one in the block layer and >>> device drivers, dispatching such requests is pretty dangerous. >>> (e.g. it may cause kernel panic easily.) >>> So avoid to dispatch such problematic requests in request-based dm. >>> >> This patch actually triggers accidentally during a no-paths scenario; >> multipathing seems to flush all device details once all paths are gone, >> so it'll fall back to the system defaults. >> And then this check will trigger and kill all queued I/Os. Not good. >> >> So either we fix device-mapper to keep the device details even for >> the all-paths down scenario (basically we'd have to copy the device >> details into the target_io structures and use that for comparison) >> or we should rather not check here. > > Thank you for your review and the comment. > I haven't understood the problem correctly yet. > (e.g. Who flush the device details and how? Restrictions are > changed when the table is swapped, but no table swapping should happen > even if the last path is gone while the device is in use.) I tried to understand/reproduce your scenario, but I can't. I found a bug (http://marc.info/?l=dm-devel&m=124572577028515&w=2) during my testing (which lets all paths down at once), but it is an user-space bug, not a kernel bug. Could you elaborate your problem related to the patch, if you have detailed analisys? (e.g. I can't see the code-path which flushes device details on an all-paths down scenario.) Thanks, Kiyoshi Ueda -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Index: 2.6.30-rc3/drivers/md/dm.c =================================================================== --- 2.6.30-rc3.orig/drivers/md/dm.c +++ 2.6.30-rc3/drivers/md/dm.c @@ -1500,6 +1500,30 @@ static void map_request(struct dm_target dm_get(md); tio->ti = ti; + + /* + * Although submitted requests to the md->queue are checked against + * the table/queue limitations at the submission time, the limitations + * may be changed by a table swapping while those already checked + * requests are in the md->queue. + * If the limitations have been shrunk in such situations, we may be + * dispatching requests violating the current limitations here. + * Since struct request is a reliable one in the block-layer + * and device drivers, dispatching such requests is dangerous. + * (e.g. it may cause kernel panic easily.) + * Avoid to dispatch such problematic requests in request-based dm. + * + * Since dm_kill_unmapped_request() expects that tio->ti is correctly + * set, this has to be done after the set. + */ + r = blk_rq_check_limits(rq->q, rq); + if (unlikely(r)) { + DMWARN("violating the queue limitation. the limitation may be" + " shrunk while there are some requests in the queue."); + dm_kill_unmapped_request(clone, r); + return; + } + r = ti->type->map_rq(ti, clone, &tio->info); switch (r) { case DM_MAPIO_SUBMITTED: