From patchwork Thu Apr 7 05:44:43 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 8770241 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6DCAC9FBEA for ; Thu, 7 Apr 2016 09:44:48 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5829D200E8 for ; Thu, 7 Apr 2016 09:44:47 +0000 (UTC) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 168512011E for ; Thu, 7 Apr 2016 09:44:46 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u379fd0M004032; Thu, 7 Apr 2016 05:41:39 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u375j4EW015045 for ; Thu, 7 Apr 2016 01:45:04 -0400 Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u375j4FV016407 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 7 Apr 2016 01:45:04 -0400 Received: from mail-pf0-f194.google.com (mail-pf0-f194.google.com [209.85.192.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CB4BE8AE72; Thu, 7 Apr 2016 05:45:02 +0000 (UTC) Received: by mail-pf0-f194.google.com with SMTP id q129so6218437pfb.3; Wed, 06 Apr 2016 22:45:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=dCGH/0driCAzSZ3GkWDfAgZxVjCfnJD4KeuU5nlfH8o=; b=quc9NaSA8fR46aqCzHI4ivX5lXgP/A4hl6xKOd99+X8unDwCyj8UfJxnJ8cHNIt89e qOOjK6TwXxbGKKbGD97NEoqtawyyWqV59WSHE0dRlpE89Fzf8fgS9i3d2ZDv7zuNzUHy Z64cF3aQ2NgBvK3BjvLTr0vgMxpOyfQuKmolux1PJxlrEPkV8GB08j7iE41UMA2JJFs5 cRdhtbjyKpw1WWEDR5kg80MJQdo8BbPOrJI3t1DBkVNG/V6eY2J0xPI3aCiWaUfpU/jI V0MQdFB4QkOOKSVT3nVVJTs8n8PKyF+vLSzQKND2k2KMPmoHApphz94u9/OB12zjQZAs lsAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=dCGH/0driCAzSZ3GkWDfAgZxVjCfnJD4KeuU5nlfH8o=; b=WwYYznCUMGrMxIxiBDcUas0HXjAWl5sX0JM4Eki032ieKYq7EGSpyqzv/TLAQzUmwC fXrvA7hcORDAW36C5s1C4GXaSY8eC7Vee+UDKEQOVvvwQgZFI/+c2MBPQiGl/NCZLYrJ MNR/mp+3U5AjUH9ODXUgdiZJvXz0vrBb2ANDCbC0eJB7crM1GrSGS/Xyq8CVb3R7XIo2 yElJSJn3hkodWqy5ovWniFK9q/LHgtqBKiIP1fT+XAaGxDfLi5Siz+nmlKxEgPunsH4D VSlj+86IIh3P4Gwld6N+B2STYTRUFUcfB20GrpnZKKjEu1MbP0J4ybYHsrrC5yNpc/Pv QhxQ== X-Gm-Message-State: AD7BkJI+BsfFa08xg1dBGBwNTW4xU206vUJ118dxn9NmLLzv90phMM7rtk36zGcEViZzUw== X-Received: by 10.98.33.208 with SMTP id o77mr2095667pfj.108.1460007902113; Wed, 06 Apr 2016 22:45:02 -0700 (PDT) Received: from tom-T450 ([103.192.227.7]) by smtp.gmail.com with ESMTPSA id rw2sm8809669pab.30.2016.04.06.22.44.57 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 06 Apr 2016 22:45:01 -0700 (PDT) Date: Thu, 7 Apr 2016 13:44:43 +0800 From: Ming Lei To: Ming Lei Message-ID: <20160407134443.552c2556@tom-T450> In-Reply-To: References: <1459746376-27983-1-git-send-email-shaun@tancheff.com> <1459746376-27983-11-git-send-email-shaun@tancheff.com> Organization: Ming MIME-Version: 1.0 X-RedHat-Spam-Score: -0.001 (BAYES_50, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.192.194 mail-pf0-f194.google.com 209.85.192.194 mail-pf0-f194.google.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Scanned-By: MIMEDefang 2.75 on 10.5.110.25 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Thu, 07 Apr 2016 05:41:34 -0400 Cc: Jens Axboe , Linux SCSI List , Christoph Hellwig , linux-block@vger.kernel.org, "linux-ide@vger.kernel.org" , "open list:DEVICE-MAPPER \(LVM\)" , Shaun Tancheff , Mikulas Patocka , Shaun Tancheff Subject: Re: [dm-devel] [PATCH 10/12] Limit bio_endio recursion X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 7 Apr 2016 11:54:49 +0800 Ming Lei wrote: > On Mon, Apr 4, 2016 at 1:06 PM, Shaun Tancheff wrote: > > Recursive endio calls can exceed 16k stack. Tested with > > 32k stack and observed: > > > > Depth Size Location (293 entries) > > ----- ---- -------- > > 0) 21520 16 __module_text_address+0x12/0x60 > > 1) 21504 8 __module_address+0x5/0x140 > > 2) 21496 24 __module_text_address+0x12/0x60 > > 3) 21472 16 is_module_text_address+0xe/0x20 > > 4) 21456 8 __kernel_text_address+0x50/0x80 > > 5) 21448 136 print_context_stack+0x5a/0xf0 > > 6) 21312 144 dump_trace+0x14c/0x300 > > 7) 21168 8 save_stack_trace+0x2f/0x50 > > 8) 21160 88 set_track+0x64/0x130 > > 9) 21072 96 free_debug_processing+0x200/0x290 > > 10) 20976 176 __slab_free+0x164/0x290 > > 11) 20800 48 kmem_cache_free+0x1b0/0x1e0 > > 12) 20752 16 mempool_free_slab+0x17/0x20 > > 13) 20736 48 mempool_free+0x2f/0x90 > > 14) 20688 16 bvec_free+0x36/0x40 > > 15) 20672 32 bio_free+0x3b/0x60 > > 16) 20640 16 bio_put+0x23/0x30 > > 17) 20624 64 end_bio_extent_writepage+0xcf/0xe0 > > 18) 20560 48 bio_endio+0x57/0x90 > > 19) 20512 48 btrfs_end_bio+0xa8/0x160 > > 20) 20464 48 bio_endio+0x57/0x90 > > 21) 20416 112 dec_pending+0x121/0x270 > > 22) 20304 64 clone_endio+0x7a/0x100 > > 23) 20240 48 bio_endio+0x57/0x90 > > ... > > 277) 1264 64 clone_endio+0x7a/0x100 > > 278) 1200 48 bio_endio+0x57/0x90 > > 279) 1152 112 dec_pending+0x121/0x270 > > 280) 1040 64 clone_endio+0x7a/0x100 > > 281) 976 48 bio_endio+0x57/0x90 > > 282) 928 80 blk_update_request+0x8f/0x340 > > 283) 848 80 scsi_end_request+0x33/0x1c0 > > 284) 768 112 scsi_io_completion+0xb5/0x620 > > 285) 656 48 scsi_finish_command+0xcf/0x120 > > 286) 608 48 scsi_softirq_done+0x126/0x150 > > 287) 560 24 blk_done_softirq+0x78/0x90 > > 288) 536 136 __do_softirq+0xfd/0x280 > > 289) 400 16 run_ksoftirqd+0x28/0x50 > > 290) 384 64 smpboot_thread_fn+0x105/0x160 > > 291) 320 144 kthread+0xc9/0xe0 > > 292) 176 176 ret_from_fork+0x3f/0x70 > > > > Based on earlier patch by Mikulas Patocka . > > https://lkml.org/lkml/2008/6/24/18 > > Looks a empty link, and the following had the discussion too: > > http://linux-kernel.2935.n7.nabble.com/PATCH-1-2-Avoid-bio-endio-recursion-td306043.html > > > > > Signed-off-by: Shaun Tancheff > > --- > > block/bio.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 54 insertions(+), 3 deletions(-) > > > > diff --git a/block/bio.c b/block/bio.c > > index d42e69c..4ac19f6 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -1733,6 +1733,59 @@ static inline bool bio_remaining_done(struct bio *bio) > > return false; > > } > > > > +static DEFINE_PER_CPU(struct bio **, bio_end_queue) = { NULL }; > > The idea looks very nice, but this patch can't be applid to current block-next > branch. > > Looks it might be simpler to implement the approach by using percpu bio_list. Cc Mikulas & Christoph How about the following implementation with bio_list? Looks more readable and simpler. Just run a recent testcase of bcache over raid1(virtio-blk, virtio-blk), looks it does work, :-) --- > > > + > > +static struct bio *unwind_bio_endio(struct bio *bio) > > +{ > > + struct bio ***bio_end_queue_ptr; > > + struct bio *bio_queue; > > + struct bio *chain_bio = NULL; > > + int error = bio->bi_error; > > + unsigned long flags; > > + > > + local_irq_save(flags); > > If we may resue current->bio_list for this purpose, disabling irq should have > been avoided, but looks it is difficult to do in that way, maybe impossible. > Also not realistic to introduce a new per-task variable. > > > + bio_end_queue_ptr = this_cpu_ptr(&bio_end_queue); > > + > > + if (*bio_end_queue_ptr) { > > + **bio_end_queue_ptr = bio; > > + *bio_end_queue_ptr = &bio->bi_next; > > + bio->bi_next = NULL; > > + } else { > > + bio_queue = NULL; > > + *bio_end_queue_ptr = &bio_queue; > > Suggest to comment that bio_queue is the bottom-most stack variable, > otherwise looks a bit tricky to understand. > > > + > > +next_bio: > > + if (bio->bi_end_io == bio_chain_endio) { > > + struct bio *parent = bio->bi_private; > > + > > + bio_put(bio); > > + chain_bio = parent; > > + goto out; > > + } > > + > > + if (bio->bi_end_io) { > > + if (!bio->bi_error) > > + bio->bi_error = error; > > + bio->bi_end_io(bio); > > + } > > + > > + if (bio_queue) { > > + bio = bio_queue; > > + bio_queue = bio->bi_next; > > + if (!bio_queue) > > + *bio_end_queue_ptr = &bio_queue; > > + goto next_bio; > > + } > > + *bio_end_queue_ptr = NULL; > > + } > > + > > +out: > > + > > + local_irq_restore(flags); > > + > > + return chain_bio; > > +} > > + > > /** > > * bio_endio - end I/O on a bio > > * @bio: bio > > @@ -1762,9 +1815,7 @@ void bio_endio(struct bio *bio) > > bio_put(bio); > > bio = parent; > > } else { > > - if (bio->bi_end_io) > > - bio->bi_end_io(bio); > > - bio = NULL; > > + bio = unwind_bio_endio(bio); > > } > > } > > } > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-block" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/block/bio.c b/block/bio.c index f124a0a..b97dfe8 100644 --- a/block/bio.c +++ b/block/bio.c @@ -68,6 +68,8 @@ static DEFINE_MUTEX(bio_slab_lock); static struct bio_slab *bio_slabs; static unsigned int bio_slab_nr, bio_slab_max; +static DEFINE_PER_CPU(struct bio_list *, bio_end_list) = { NULL }; + static struct kmem_cache *bio_find_or_create_slab(unsigned int extra_size) { unsigned int sz = sizeof(struct bio) + extra_size; @@ -1737,6 +1739,46 @@ static inline bool bio_remaining_done(struct bio *bio) return false; } +/* disable local irq when manipulating the percpu bio_list */ +static void unwind_bio_endio(struct bio *bio) +{ + struct bio_list bl_in_stack; + struct bio_list *bl; + unsigned long flags; + bool clear_list = false; + + local_irq_save(flags); + + bl = this_cpu_read(bio_end_list); + if (!bl) { + bl = &bl_in_stack; + bio_list_init(bl); + clear_list = true; + } + + if (!bio_list_empty(bl)) { + bio_list_add(bl, bio); + goto out; + } + + while (1) { + local_irq_restore(flags); + + if (!bio) + break; + + if (bio->bi_end_io) + bio->bi_end_io(bio); + + local_irq_save(flags); + bio = bio_list_pop(bl); + } + if (clear_list) + this_cpu_write(bio_end_list, NULL); + out: + local_irq_restore(flags); +} + /** * bio_endio - end I/O on a bio * @bio: bio @@ -1765,8 +1807,7 @@ again: goto again; } - if (bio->bi_end_io) - bio->bi_end_io(bio); + unwind_bio_endio(bio); } EXPORT_SYMBOL(bio_endio);