Message ID | 49B613FE.3060501@suse.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Hi Hannes, On 2009/03/10 16:17 +0900, Hannes Reinecke wrote: >>>>> o kernel panic occurs by frequent table swapping during heavy I/Os. >>>>> >>>> That's probably fixed by this patch: >>>> >>>> --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 >>>> 15:59:22.741461315 +0100 >>>> +++ linux-2.6.27/drivers/md/dm.c 2009-01-26 >>>> 09:03:02.787605723 +0100 >>>> @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques >>>> struct dm_rq_target_io *tio = clone->end_io_data; >>>> struct mapped_device *md = tio->md; >>>> struct bio *bio; >>>> - struct dm_clone_bio_info *info; >>>> >>>> while ((bio = clone->bio) != NULL) { >>>> clone->bio = bio->bi_next; >>>> >>>> - info = bio->bi_private; >>>> - free_bio_info(md, info); >>>> + if (bio->bi_private) { >>>> + struct dm_clone_bio_info *info = >>>> bio->bi_private; >>>> + free_bio_info(md, info); >>>> + } >>>> >>>> bio->bi_private = md->bs; >>>> bio_put(bio); >>>> >>>> The info field is not necessarily filled here, so we have to check >>>> for it >>>> explicitly. >>>> >>>> With these two patches request-based multipathing have survived all >>>> stress-tests >>>> so far. Except on mainframe (zfcp), but that's more a driver-related >>>> thing. >> >> Do you hit some problem without the patch above? >> If so, that should be a programming bug and we need to fix it. >> Otherwise, >> we should be leaking a memory (since all cloned bio should always have >> the dm_clone_bio_info structure in ->bi_private). >> > Yes, I've found that one later on. > The real problem was in clone_setup_bios(), which might end up calling an > invalid end_io_data pointer. Patch is attached. Nice catch! Thank you for the patch. > -static void free_bio_clone(struct request *clone) > +static void free_bio_clone(struct request *clone, struct mapped_device *md) I have changed the argument order to match with other free_* functions: free_bio_clone(struct mapped_device *md, struct request *clone) Thanks, Kiyoshi Ueda -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Kiyoshi, Kiyoshi Ueda wrote: > Hi Hannes, > > On 2009/03/10 16:17 +0900, Hannes Reinecke wrote: [ .. ] >> Yes, I've found that one later on. >> The real problem was in clone_setup_bios(), which might end up calling an >> invalid end_io_data pointer. Patch is attached. > > Nice catch! Thank you for the patch. > Oh, nae bother. Took me only a month to track it down :-( >> -static void free_bio_clone(struct request *clone) >> +static void free_bio_clone(struct request *clone, struct mapped_device *md) > > I have changed the argument order to match with other free_* functions: > free_bio_clone(struct mapped_device *md, struct request *clone) > Sure. I wasn't sure myself which way round the arguments should be. Do you have an updated patch of your suspend fixes? We've run into an issue here which looks suspiciously close to that one (I/O is completed on a deleted pgpath), so we would be happy to test it these out. Cheers, Hannes
Hi Hannes, On 2009/03/12 18:08 +0900, Hannes Reinecke wrote: > Do you have an updated patch of your suspend fixes? We've run into an issue > here which looks suspiciously close to that one (I/O is completed on a > deleted pgpath), so we would be happy to test it these out. You mean that the issue occurs WITHOUT the suspend fix patch which I sent. Is it right? If so, you can use it, since I haven't added any big change about suspend fix since then. Logic changes I added about suspend fix since then are only in rq_complete() to follow your comment. The updated rq_complete() is below: static void rq_completed(struct mapped_device *md) { struct request_queue *q = md->queue; unsigned long flags; spin_lock_irqsave(q->queue_lock, flags); if (q->in_flight) { spin_unlock_irqrestore(q->queue_lock, flags); return; } spin_unlock_irqrestore(q->queue_lock, flags); /* nudge anyone waiting on suspend queue */ wake_up(&md->wait); } I merged the previous suspend fix patch into the request-based dm core patch, and I've been changing the core patch after that. So I don't have a patch which addresses only suspend fix update. Sorry about that. Thanks, Kiyoshi Ueda -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
--- linux-2.6.27-SLE11_BRANCH/drivers/md/dm.c.orig 2009-02-04 10:33:22.656627650 +0100 +++ linux-2.6.27-SLE11_BRANCH/drivers/md/dm.c 2009-02-05 11:03:35.843251773 +0100 @@ -709,10 +709,8 @@ static void end_clone_bio(struct bio *cl blk_update_request(tio->orig, 0, nr_bytes); } -static void free_bio_clone(struct request *clone) +static void free_bio_clone(struct request *clone, struct mapped_device *md) { - struct dm_rq_target_io *tio = clone->end_io_data; - struct mapped_device *md = tio->md; struct bio *bio; while ((bio = clone->bio) != NULL) { @@ -743,7 +741,7 @@ static void dm_unprep_request(struct req rq->special = NULL; rq->cmd_flags &= ~REQ_DONTPREP; - free_bio_clone(clone); + free_bio_clone(clone, tio->md); dec_rq_pending(tio); free_rq_tio(tio->md, tio); } @@ -820,7 +818,7 @@ static void dm_end_request(struct reques rq->sense_len = clone->sense_len; } - free_bio_clone(clone); + free_bio_clone(clone, tio->md); dec_rq_pending(tio); free_rq_tio(tio->md, tio); @@ -1406,7 +1404,7 @@ static int clone_request_bios(struct req return 0; free_and_out: - free_bio_clone(clone); + free_bio_clone(clone, md); return -ENOMEM; }