From patchwork Thu Mar 16 16:12:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 9628843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B47D360244 for ; Thu, 16 Mar 2017 16:14:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5B1E283F9 for ; Thu, 16 Mar 2017 16:14:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9A9642841E; Thu, 16 Mar 2017 16:14:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EC5D2840E for ; Thu, 16 Mar 2017 16:14:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755069AbdCPQO1 (ORCPT ); Thu, 16 Mar 2017 12:14:27 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:34329 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754974AbdCPQOZ (ORCPT ); Thu, 16 Mar 2017 12:14:25 -0400 Received: by mail-pf0-f196.google.com with SMTP id o126so6267998pfb.1; Thu, 16 Mar 2017 09:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FoCkrNeWehWlKGApvJlJ+/sRkM6nbxf5nJEztAHKrrw=; b=i2Git6ezmdp/DIZZgCFGQqDvd55lDrvDgqoKDaB1x96nsGwgWLgDQMQdRa0rsnKaMe lMDMFtRyG+o0HPsUilaU1Hfj51dTYZpO7B0H1yJ+IOZuZYTDLPC1U+ZVDOXwWj5wUBsH eoxn7Chz1YrzbGzFmw/BOhdS5i+NSyLid2H3nMx5xkfxN6z9UPXNgcEuvaiG97UlhXb5 ocvyos4t8yXZRb1GjIW4DST4ONBPyJHXzlCCjGQtc3tNyXvO9Qq05T9GVWLsDuIN2vKH BC29mOT4Rmn2dctEGjlZhDyuRV2LkyutQ8ALHq1UOvRpPWWErI4WaVidx71aF7962x/j L88Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FoCkrNeWehWlKGApvJlJ+/sRkM6nbxf5nJEztAHKrrw=; b=UBv59Eb7XIDnom7M1zUUJugpszYEcoMKUo9aXMQHDUV75HozRt4LSX/MuIehkYLYjs VPjJykv8V09YYnQZB7OJT6Upof1Qx4LHih1aQvhXhzamunE2AVKUFuaYhIguFNHilRBW AoNyOIAjpRFa5XnKNKKA0E1Nhdtdc3DcCCaR5PSa1Cykzd/wpakOzZ4UQjTO8CZE5k01 uiPcALM8EcblRV3tKi+juLxe7n8u8Cz4dzibcdO9ysF8HIAYW9seIKDO9Zw5pBDm869j h4Jd4izRmEOMqaVxO9LLWBGgahugOyWob9KuFv+1x46Aic2ZlLY2Yd2/4cCvT3QTYOfA tNgQ== X-Gm-Message-State: AFeK/H1t2TE17i0vKS3lRqvRugTnOpNOQev3oUeJvcPKTmS1wLRI3EkcZPypZrUd0DahaQ== X-Received: by 10.99.8.194 with SMTP id 185mr10797461pgi.37.1489680864345; Thu, 16 Mar 2017 09:14:24 -0700 (PDT) Received: from localhost ([103.192.224.52]) by smtp.gmail.com with ESMTPSA id t70sm11285284pfe.64.2017.03.16.09.14.23 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 16 Mar 2017 09:14:23 -0700 (PDT) From: Ming Lei To: Shaohua Li , Jens Axboe , linux-raid@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig Cc: Ming Lei Subject: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages Date: Fri, 17 Mar 2017 00:12:26 +0800 Message-Id: <20170316161235.27110-6-tom.leiming@gmail.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170316161235.27110-1-tom.leiming@gmail.com> References: <20170316161235.27110-1-tom.leiming@gmail.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now we allocate one page array for managing resync pages, instead of using bio's vec table to do that, and the old way is very hacky and won't work any more if multipage bvec is enabled. The introduced cost is that we need to allocate (128 + 16) * raid_disks bytes per r1_bio, and it is fine because the inflight r1_bio for resync shouldn't be much, as pointed by Shaohua. Also the bio_reset() in raid1_sync_request() is removed because all bios are freshly new now and not necessary to reset any more. This patch can be thought as a cleanup too Suggested-by: Shaohua Li Signed-off-by: Ming Lei --- drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 64 insertions(+), 30 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index e30d89690109..0e64beb60e4d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr); #define raid1_log(md, fmt, args...) \ do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0) +/* + * 'strct resync_pages' stores actual pages used for doing the resync + * IO, and it is per-bio, so make .bi_private points to it. + */ +static inline struct resync_pages *get_resync_pages(struct bio *bio) +{ + return bio->bi_private; +} + +/* + * for resync bio, r1bio pointer can be retrieved from the per-bio + * 'struct resync_pages'. + */ +static inline struct r1bio *get_resync_r1bio(struct bio *bio) +{ + return get_resync_pages(bio)->raid_bio; +} + static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) { struct pool_info *pi = data; @@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) struct r1bio *r1_bio; struct bio *bio; int need_pages; - int i, j; + int j; + struct resync_pages *rps; r1_bio = r1bio_pool_alloc(gfp_flags, pi); if (!r1_bio) return NULL; + rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks, + gfp_flags); + if (!rps) + goto out_free_r1bio; + /* * Allocate bios : 1 for reading, n-1 for writing */ @@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) need_pages = pi->raid_disks; else need_pages = 1; - for (j = 0; j < need_pages; j++) { + for (j = 0; j < pi->raid_disks; j++) { + struct resync_pages *rp = &rps[j]; + bio = r1_bio->bios[j]; - bio->bi_vcnt = RESYNC_PAGES; - - if (bio_alloc_pages(bio, gfp_flags)) - goto out_free_pages; - } - /* If not user-requests, copy the page pointers to all bios */ - if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) { - for (i=0; iraid_disks; j++) { - struct page *page = - r1_bio->bios[0]->bi_io_vec[i].bv_page; - get_page(page); - r1_bio->bios[j]->bi_io_vec[i].bv_page = page; - } + + if (j < need_pages) { + if (resync_alloc_pages(rp, gfp_flags)) + goto out_free_pages; + } else { + memcpy(rp, &rps[0], sizeof(*rp)); + resync_get_all_pages(rp); + } + + rp->idx = 0; + rp->raid_bio = r1_bio; + bio->bi_private = rp; } r1_bio->master_bio = NULL; @@ -156,11 +180,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) out_free_pages: while (--j >= 0) - bio_free_pages(r1_bio->bios[j]); + resync_free_pages(&rps[j]); out_free_bio: while (++j < pi->raid_disks) bio_put(r1_bio->bios[j]); + kfree(rps); + +out_free_r1bio: r1bio_pool_free(r1_bio, data); return NULL; } @@ -168,14 +195,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) static void r1buf_pool_free(void *__r1_bio, void *data) { struct pool_info *pi = data; - int i,j; + int i; struct r1bio *r1bio = __r1_bio; + struct resync_pages *rp = NULL; - for (i = 0; i < RESYNC_PAGES; i++) - for (j = pi->raid_disks; j-- ;) - safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page); - for (i=0 ; i < pi->raid_disks; i++) + for (i = pi->raid_disks; i--; ) { + rp = get_resync_pages(r1bio->bios[i]); + resync_free_pages(rp); bio_put(r1bio->bios[i]); + } + + /* resync pages array stored in the 1st bio's .bi_private */ + kfree(rp); r1bio_pool_free(r1bio, data); } @@ -1863,7 +1894,7 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) static void end_sync_read(struct bio *bio) { - struct r1bio *r1_bio = bio->bi_private; + struct r1bio *r1_bio = get_resync_r1bio(bio); update_head_pos(r1_bio->read_disk, r1_bio); @@ -1882,7 +1913,7 @@ static void end_sync_read(struct bio *bio) static void end_sync_write(struct bio *bio) { int uptodate = !bio->bi_error; - struct r1bio *r1_bio = bio->bi_private; + struct r1bio *r1_bio = get_resync_r1bio(bio); struct mddev *mddev = r1_bio->mddev; struct r1conf *conf = mddev->private; sector_t first_bad; @@ -2099,6 +2130,7 @@ static void process_checks(struct r1bio *r1_bio) int size; int error; struct bio *b = r1_bio->bios[i]; + struct resync_pages *rp = get_resync_pages(b); if (b->bi_end_io != end_sync_read) continue; /* fixup the bio for reuse, but preserve errno */ @@ -2111,7 +2143,8 @@ static void process_checks(struct r1bio *r1_bio) conf->mirrors[i].rdev->data_offset; b->bi_bdev = conf->mirrors[i].rdev->bdev; b->bi_end_io = end_sync_read; - b->bi_private = r1_bio; + rp->raid_bio = r1_bio; + b->bi_private = rp; size = b->bi_iter.bi_size; for (j = 0; j < vcnt ; j++) { @@ -2769,7 +2802,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, for (i = 0; i < conf->raid_disks * 2; i++) { struct md_rdev *rdev; bio = r1_bio->bios[i]; - bio_reset(bio); rdev = rcu_dereference(conf->mirrors[i].rdev); if (rdev == NULL || @@ -2825,7 +2857,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, atomic_inc(&rdev->nr_pending); bio->bi_iter.bi_sector = sector_nr + rdev->data_offset; bio->bi_bdev = rdev->bdev; - bio->bi_private = r1_bio; if (test_bit(FailFast, &rdev->flags)) bio->bi_opf |= MD_FAILFAST; } @@ -2911,9 +2942,12 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, } for (i = 0 ; i < conf->raid_disks * 2; i++) { + struct resync_pages *rp; + bio = r1_bio->bios[i]; + rp = get_resync_pages(bio); if (bio->bi_end_io) { - page = bio->bi_io_vec[bio->bi_vcnt].bv_page; + page = resync_fetch_page(rp, rp->idx++); /* * won't fail because the vec table is big @@ -2925,8 +2959,8 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, nr_sectors += len>>9; sector_nr += len>>9; sync_blocks -= (len>>9); - } while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES); - bio_full: + } while (get_resync_pages(r1_bio->bios[disk]->bi_private)->idx < RESYNC_PAGES); + r1_bio->sectors = nr_sectors; if (mddev_is_clustered(mddev) &&