Message ID | 20170426183728.10821-3-bart.vanassche@sandisk.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On 04/26/2017 08:37 PM, Bart Van Assche wrote: > If blk_get_request() fails check whether the failure is due to > a path being removed. If that is the case fail the path by > triggering a call to fail_path(). This patch avoids that the > following scenario can be encountered while removing paths: > * CPU usage of a kworker thread jumps to 100%. > * Removing the dm device becomes impossible. > > Delay requeueing if blk_get_request() returns -EBUSY or > -EWOULDBLOCK because in these cases immediate requeuing is > inappropriate. > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: <stable@vger.kernel.org> > --- > drivers/md/dm-mpath.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > index 909098e18643..6d4333fdddf5 100644 > --- a/drivers/md/dm-mpath.c > +++ b/drivers/md/dm-mpath.c > @@ -490,6 +490,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, > struct pgpath *pgpath; > struct block_device *bdev; > struct dm_mpath_io *mpio = get_mpio(map_context); > + struct request_queue *q; > struct request *clone; > > /* Do we need to select a new pgpath? */ > @@ -512,13 +513,19 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, > mpio->nr_bytes = nr_bytes; > > bdev = pgpath->path.dev->bdev; > - > - clone = blk_get_request(bdev_get_queue(bdev), > - rq->cmd_flags | REQ_NOMERGE, > - GFP_ATOMIC); > + q = bdev_get_queue(bdev); > + clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); > if (IS_ERR(clone)) { > /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ > - return r; > + pr_debug("blk_get_request() returned %ld%s - requeuing\n", > + PTR_ERR(clone), blk_queue_dying(q) ? > + " (path offline)" : ""); > + if (blk_queue_dying(q)) { > + atomic_inc(&m->pg_init_in_progress); > + activate_path(pgpath); > + return DM_MAPIO_REQUEUE; > + } > + return DM_MAPIO_DELAY_REQUEUE; > } > clone->bio = clone->biotail = NULL; > clone->rq_disk = bdev->bd_disk; > At the very least this does warrant some inline comments. Why do we call activate_path() here, seeing that the queue is dying? Cheers, Hannes
On Thu, 2017-04-27 at 07:46 +0200, Hannes Reinecke wrote: > On 04/26/2017 08:37 PM, Bart Van Assche wrote: > > + clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); > > if (IS_ERR(clone)) { > > /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ > > - return r; > > + pr_debug("blk_get_request() returned %ld%s - requeuing\n", > > + PTR_ERR(clone), blk_queue_dying(q) ? > > + " (path offline)" : ""); > > + if (blk_queue_dying(q)) { > > + atomic_inc(&m->pg_init_in_progress); > > + activate_path(pgpath); > > + return DM_MAPIO_REQUEUE; > > + } > > + return DM_MAPIO_DELAY_REQUEUE; > > } > > clone->bio = clone->biotail = NULL; > > clone->rq_disk = bdev->bd_disk; > > At the very least this does warrant some inline comments. > Why do we call activate_path() here, seeing that the queue is dying? Hello Hannes, activate_path() is not only able to activate a path but can also change the state of a path to offline. The body of the activate_path() function makes that clear and that is why I had not added a comment above the activate_path() call: static void activate_path(struct pgpath *pgpath) { struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev); if (pgpath->is_active && !blk_queue_dying(q)) scsi_dh_activate(q, pg_init_done, pgpath); else pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED); } Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 04/27/2017 05:11 PM, Bart Van Assche wrote: > On Thu, 2017-04-27 at 07:46 +0200, Hannes Reinecke wrote: >> On 04/26/2017 08:37 PM, Bart Van Assche wrote: >>> + clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); >>> if (IS_ERR(clone)) { >>> /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ >>> - return r; >>> + pr_debug("blk_get_request() returned %ld%s - requeuing\n", >>> + PTR_ERR(clone), blk_queue_dying(q) ? >>> + " (path offline)" : ""); >>> + if (blk_queue_dying(q)) { >>> + atomic_inc(&m->pg_init_in_progress); >>> + activate_path(pgpath); >>> + return DM_MAPIO_REQUEUE; >>> + } >>> + return DM_MAPIO_DELAY_REQUEUE; >>> } >>> clone->bio = clone->biotail = NULL; >>> clone->rq_disk = bdev->bd_disk; >> >> At the very least this does warrant some inline comments. >> Why do we call activate_path() here, seeing that the queue is dying? > > Hello Hannes, > > activate_path() is not only able to activate a path but can also change > the state of a path to offline. The body of the activate_path() function > makes that clear and that is why I had not added a comment above the > activate_path() call: > > static void activate_path(struct pgpath *pgpath) > { > struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev); > > if (pgpath->is_active && !blk_queue_dying(q)) > scsi_dh_activate(q, pg_init_done, pgpath); > else > pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED); > } > So why not call 'pg_init_done()' directly and avoid the confusion? Cheers, Hannes
On Thu, Apr 27 2017 at 11:13am -0400, Hannes Reinecke <hare@suse.de> wrote: > On 04/27/2017 05:11 PM, Bart Van Assche wrote: > > On Thu, 2017-04-27 at 07:46 +0200, Hannes Reinecke wrote: > >> On 04/26/2017 08:37 PM, Bart Van Assche wrote: > >>> + clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); > >>> if (IS_ERR(clone)) { > >>> /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ > >>> - return r; > >>> + pr_debug("blk_get_request() returned %ld%s - requeuing\n", > >>> + PTR_ERR(clone), blk_queue_dying(q) ? > >>> + " (path offline)" : ""); > >>> + if (blk_queue_dying(q)) { > >>> + atomic_inc(&m->pg_init_in_progress); > >>> + activate_path(pgpath); > >>> + return DM_MAPIO_REQUEUE; > >>> + } > >>> + return DM_MAPIO_DELAY_REQUEUE; > >>> } > >>> clone->bio = clone->biotail = NULL; > >>> clone->rq_disk = bdev->bd_disk; > >> > >> At the very least this does warrant some inline comments. > >> Why do we call activate_path() here, seeing that the queue is dying? > > > > Hello Hannes, > > > > activate_path() is not only able to activate a path but can also change > > the state of a path to offline. The body of the activate_path() function > > makes that clear and that is why I had not added a comment above the > > activate_path() call: > > > > static void activate_path(struct pgpath *pgpath) > > { > > struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev); > > > > if (pgpath->is_active && !blk_queue_dying(q)) > > scsi_dh_activate(q, pg_init_done, pgpath); > > else > > pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED); > > } > > > So why not call 'pg_init_done()' directly and avoid the confusion? Doing so is sprinkling more SCSI specific droppings in code that should be increasingly transport agnostic. Might be worth renaming activate_path() to activate_or_offline_path() ? Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 909098e18643..6d4333fdddf5 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -490,6 +490,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, struct pgpath *pgpath; struct block_device *bdev; struct dm_mpath_io *mpio = get_mpio(map_context); + struct request_queue *q; struct request *clone; /* Do we need to select a new pgpath? */ @@ -512,13 +513,19 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, mpio->nr_bytes = nr_bytes; bdev = pgpath->path.dev->bdev; - - clone = blk_get_request(bdev_get_queue(bdev), - rq->cmd_flags | REQ_NOMERGE, - GFP_ATOMIC); + q = bdev_get_queue(bdev); + clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); if (IS_ERR(clone)) { /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ - return r; + pr_debug("blk_get_request() returned %ld%s - requeuing\n", + PTR_ERR(clone), blk_queue_dying(q) ? + " (path offline)" : ""); + if (blk_queue_dying(q)) { + atomic_inc(&m->pg_init_in_progress); + activate_path(pgpath); + return DM_MAPIO_REQUEUE; + } + return DM_MAPIO_DELAY_REQUEUE; } clone->bio = clone->biotail = NULL; clone->rq_disk = bdev->bd_disk;
If blk_get_request() fails check whether the failure is due to a path being removed. If that is the case fail the path by triggering a call to fail_path(). This patch avoids that the following scenario can be encountered while removing paths: * CPU usage of a kworker thread jumps to 100%. * Removing the dm device becomes impossible. Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK because in these cases immediate requeuing is inappropriate. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> --- drivers/md/dm-mpath.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)