From patchwork Thu Jun 16 19:19:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 611B5C43334 for ; Thu, 16 Jun 2022 19:19:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241057AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbiFPTTz (ORCPT ); Thu, 16 Jun 2022 15:19:55 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 284EC562D2; Thu, 16 Jun 2022 12:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=OV1YTRm+v7AUkPS1d6z+GKLa8H4BrPMSMgto9jS50pY=; b=lBv63xbGNVy2/GfvH8Ajklh1w9 hQewm7t/Z4fxTZIr6h06gKkCdcNrtyXfBPCK5Sr7Irj8yWZuGFHwbvhiPlbqCQ7Gm9QqyCePGgXBK VBjRMeAQvCJ3QaCMFIjvIiR8Lf2issUZpUhyhfhwWfKhtzuZ+j1DtwkDKL+DBIWBjiCJ1nqCg8F4W fjuuFVFnVmGANe/xSzLS5WDfAix4hlCcCnTnJ9BFuElINI/wTHKBs+b/ZRmyDf/M7U25wTNkp+GuU Mpg8hXAcJsx5m4nQraxLJU+A5/QS7cWOhZqQlcnp1DBr+QNFxZKquPV1etYxngfOBJ0oFWKE0XCbL KCkkSIZA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1y-0092ij-51; Thu, 16 Jun 2022 13:19:50 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1u-0006F7-UR; Thu, 16 Jun 2022 13:19:46 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 16 Jun 2022 13:19:31 -0600 Message-Id: <20220616191945.23935-2-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 01/15] md/raid5: Make logic blocking check consistent with logic that blocks X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The check in raid5_make_request differs very slightly from the logic that causes it to block lower down. This likely does not cause a bug as the check is fuzzy anyway (as reshape may move on between the first check and the subsequent check). However, make it consistent so it can be cleaned up in a subsequent patch. The condition which causes the schedule is: !(mddev->reshape_backwards ? logical_sector < conf->reshape_progress : logical_sector >= conf->reshape_progress) && (mddev->reshape_backwards ? logical_sector < conf->reshape_safe : logical_sector >= conf->reshape_safe) The condition that causes the early bailout is made to match this. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 5d84bad8b854..b3d1f894f154 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5841,7 +5841,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress != MaxSector) && (mddev->reshape_backwards - ? (logical_sector > conf->reshape_progress && logical_sector <= conf->reshape_safe) + ? (logical_sector >= conf->reshape_progress && logical_sector < conf->reshape_safe) : (logical_sector >= conf->reshape_safe && logical_sector < conf->reshape_progress))) { bio_wouldblock_error(bi); if (rw == WRITE) From patchwork Thu Jun 16 19:19:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15124C43334 for ; Thu, 16 Jun 2022 19:20:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378152AbiFPTT7 (ORCPT ); Thu, 16 Jun 2022 15:19:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238330AbiFPTTz (ORCPT ); Thu, 16 Jun 2022 15:19:55 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8E975622F; Thu, 16 Jun 2022 12:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=1xNPrl/pbNOCuIDTZgLgk3AL8c256RaW8iEguFOOEkA=; b=rjky8FLUGVXDrB+1odXOkPieDR uVKu/KJc7hov+xSoeDuE1fCld7Z7mF8QyYL18RUAWPcRQnPQQlgFumq2V2RdWNrvsLM8pCfFRxdyb 2sXOYZsP4CTD5Pf9pkJM2Xndw/ZUcwBEzZDeBlonNVMbJtaWlbh0ZhTOVZCCXT98eja/i6b1y+mco p5MY/eS2hAb9o7sNgkbMyfXcua1mHAAMLEHG6gFYcDL+uGWiiktapdDHmYwlaW2Sk80CHutI2AJZZ QAgnCAPA4poIgC5FEF7KDUX7IjBRYcG4wpVVB2xgy14KabxsOsAXOT/sxVMHA6KbU3ks2gffATQoZ LDs1NbOQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1y-0092ik-Qt; Thu, 16 Jun 2022 13:19:51 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006F9-1n; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:32 -0600 Message-Id: <20220616191945.23935-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 02/15] md/raid5: Factor out ahead_of_reshape() function X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org There are a few uses of an ugly ternary operator in raid5_make_request() to check if a sector is a head of a reshape sector. Factor this out into a simple helper called ahead_of_reshape(). No functional changes intended. Suggested-by: Christoph Hellwig Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b3d1f894f154..6e53b8490fff 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5784,6 +5784,13 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi) bio_endio(bi); } +static bool ahead_of_reshape(struct mddev *mddev, sector_t sector, + sector_t reshape_sector) +{ + return mddev->reshape_backwards ? sector < reshape_sector : + sector >= reshape_sector; +} + static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf = mddev->private; @@ -5840,9 +5847,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress != MaxSector) && - (mddev->reshape_backwards - ? (logical_sector >= conf->reshape_progress && logical_sector < conf->reshape_safe) - : (logical_sector >= conf->reshape_safe && logical_sector < conf->reshape_progress))) { + !ahead_of_reshape(mddev, logical_sector, conf->reshape_progress) && + ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) { bio_wouldblock_error(bi); if (rw == WRITE) md_write_end(mddev); @@ -5871,14 +5877,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) * to check again. */ spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector < conf->reshape_progress - : logical_sector >= conf->reshape_progress) { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) { previous = 1; } else { - if (mddev->reshape_backwards - ? logical_sector < conf->reshape_safe - : logical_sector >= conf->reshape_safe) { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_safe)) { spin_unlock_irq(&conf->device_lock); schedule(); do_prepare = true; @@ -5909,9 +5913,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) */ int must_retry = 0; spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector >= conf->reshape_progress - : logical_sector < conf->reshape_progress) + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) /* mismatch, need to try again */ must_retry = 1; spin_unlock_irq(&conf->device_lock); From patchwork Thu Jun 16 19:19:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CAD5C433EF for ; Thu, 16 Jun 2022 19:20:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378218AbiFPTUH (ORCPT ); Thu, 16 Jun 2022 15:20:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344222AbiFPTT4 (ORCPT ); Thu, 16 Jun 2022 15:19:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8F7A562CB; Thu, 16 Jun 2022 12:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=3Uqcpcl0CPGmjy0b4vTBk8sx7NGeb3GvH+io4l8BPMQ=; b=ILiKayANW5q6goqHnKasqhbTej zarImsshAZ3Zop2+A7zmQGtEDDrdn7HVai36FWFTL/tA5N4SFomS8XUaMHseeHtBNg4M8nNN+Ysfq a68aSws0hSnh/4tlyeH7DVnN07G3V0KVQogZn78R2VCuGCYxO58HoUZYki1V/8BQhFkGpIu0RC9D6 NRd0Tg2MWCGDyIQ6q8DPv2pWTCdVzbk9rWLKxU7waGN89Sb29dJIKzBMjZKtHzINh6EajdXSIiBhZ EYMwJYL4DeatNKovUwd81owHJWSOrFX+MHSFoRviQCL/N85ZVwC24HTuhzzMZF0urms7XPF6tJOGX anBl6psw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1y-0092il-Qu; Thu, 16 Jun 2022 13:19:51 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FD-53; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:33 -0600 Message-Id: <20220616191945.23935-4-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 03/15] md/raid5: Refactor raid5_make_request loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Break immediately if raid5_get_active_stripe() returns NULL and deindent the rest of the loop. Annotate this check with an unlikely(). This makes the code easier to read and reduces the indentation level. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 109 +++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 54 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 6e53b8490fff..25db747c5856 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5901,68 +5901,69 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) sh = raid5_get_active_stripe(conf, new_sector, previous, (bi->bi_opf & REQ_RAHEAD), 0); - if (sh) { - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. - * Expansion could still move past after this - * test, but as we are holding a reference to - * 'sh', we know that if that happens, - * STRIPE_EXPANDING will get set and the expansion - * won't proceed until we finish with the stripe. - */ - int must_retry = 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry = 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - schedule(); - do_prepare = true; - goto retry; - } - } - if (read_seqcount_retry(&conf->gen_lock, seq)) { - /* Might have got the wrong stripe_head - * by accident - */ - raid5_release_stripe(sh); - goto retry; - } + if (unlikely(!sh)) { + /* cannot get stripe, just give-up */ + bi->bi_status = BLK_STS_IOERR; + break; + } - if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { - /* Stripe is busy expanding or - * add failed due to overlap. Flush everything - * and wait a while - */ - md_wakeup_thread(mddev->thread); + if (unlikely(previous)) { + /* expansion might have moved on while waiting for a + * stripe, so we must do the range check again. + * Expansion could still move past after this + * test, but as we are holding a reference to + * 'sh', we know that if that happens, + * STRIPE_EXPANDING will get set and the expansion + * won't proceed until we finish with the stripe. + */ + int must_retry = 0; + spin_lock_irq(&conf->device_lock); + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + must_retry = 1; + spin_unlock_irq(&conf->device_lock); + if (must_retry) { raid5_release_stripe(sh); schedule(); do_prepare = true; goto retry; } - if (do_flush) { - set_bit(STRIPE_R5C_PREFLUSH, &sh->state); - /* we only need flush for one stripe */ - do_flush = false; - } + } - set_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - if ((!sh->batch_head || sh == sh->batch_head) && - (bi->bi_opf & REQ_SYNC) && - !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - release_stripe_plug(mddev, sh); - } else { - /* cannot get stripe for read-ahead, just give-up */ - bi->bi_status = BLK_STS_IOERR; - break; + if (read_seqcount_retry(&conf->gen_lock, seq)) { + /* Might have got the wrong stripe_head by accident */ + raid5_release_stripe(sh); + goto retry; + } + + if (test_bit(STRIPE_EXPANDING, &sh->state) || + !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + /* + * Stripe is busy expanding or add failed due to + * overlap. Flush everything and wait a while. + */ + md_wakeup_thread(mddev->thread); + raid5_release_stripe(sh); + schedule(); + do_prepare = true; + goto retry; } + + if (do_flush) { + set_bit(STRIPE_R5C_PREFLUSH, &sh->state); + /* we only need flush for one stripe */ + do_flush = false; + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if ((!sh->batch_head || sh == sh->batch_head) && + (bi->bi_opf & REQ_SYNC) && + !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + atomic_inc(&conf->preread_active_stripes); + + release_stripe_plug(mddev, sh); } finish_wait(&conf->wait_for_overlap, &w); From patchwork Thu Jun 16 19:19:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A29C43334 for ; Thu, 16 Jun 2022 19:20:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233405AbiFPTUC (ORCPT ); Thu, 16 Jun 2022 15:20:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238982AbiFPTTz (ORCPT ); Thu, 16 Jun 2022 15:19:55 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58EE1562FC; Thu, 16 Jun 2022 12:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=daKc0MDOWNHTXxvql8ZG6tTIOLH1pnkSzHBwumdwhis=; b=OKVCYdHimWHgCCmlVHbIgXyMG6 57j5AHkcv8vNRPWCO4NAVEQo3LAG8ng3B9R6/y8wxNmDtyhKUgRMHFUl+MG+y8tIAis2JtxRioja/ cJicaAoG15+qyiuINs1dPCASRIufqlS9Nnqzu6Gkyn6w5STxXmJjlsO9NC9q7sjZ5Dhp9JTQPGgkj ay/ibRS/o15FaA5ftsQiCpQNFn2Oh8fqUKxvz4QxPZtnNT6tBFEKyTs6ZlELPrXG0IOtZaprGRKPh JW2qf2YH0vGhmo5LVuCRDm8dXmldpGKtzgXZi2E/LZ3BOaUO5KrIXLyMrbJWkoz/GjQNrSET3ot57 PY/dydSw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1y-0092im-Qu; Thu, 16 Jun 2022 13:19:51 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FG-8Y; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:34 -0600 Message-Id: <20220616191945.23935-5-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 04/15] md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org stripe_add_to_batch_list() is better done in the loop in make_request instead of inside add_stripe_bio(). This is clearer and allows for storing the batch_head state outside the loop in a subsequent patch. The call to add_stripe_bio() in retry_aligned_read() is for read and batching only applies to write. So it's impossible for batching to happen at that call site. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 25db747c5856..969609b7114b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3531,8 +3531,6 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, } spin_unlock_irq(&sh->stripe_lock); - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); return 1; overlap: @@ -5950,6 +5948,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) goto retry; } + if (stripe_can_batch(sh)) + stripe_add_to_batch_list(conf, sh); + if (do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); /* we only need flush for one stripe */ From patchwork Thu Jun 16 19:19:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14DB9C433EF for ; Thu, 16 Jun 2022 19:20:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378267AbiFPTUS (ORCPT ); Thu, 16 Jun 2022 15:20:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377934AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFC3956414; Thu, 16 Jun 2022 12:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=2kM9OhtMeK3+AUYfV6JzJmyz2Q8SE9IesWpL15reZBk=; b=H0wyD1biHpxXf4h6aSGqICapAV t3aiECR2E0lx7h7f6tCG2nx0LobskvQu0XuN1MJNXY+hriDzUTN1OsCEPchmobyDQDypOe28jGY86 t4APkMuWNPZncaFIc4X1djPWGo3EAsh3BykHvjQd6OAmY/soUzjYDb0gNT9fsxZUauk70nJcWV9iq KwD6BobwI+9urJiuOx/aHL19ipskVIRq4KbntcPVC+ohMB4iAjEivHlRTXoYgr3sNIRqFu8zmOING r5TODyqs8/3RZ4E3qMf10yrDjeBs1amEkU3yYwZo0LGMT0DphIftSAT6rmhVt4ZVHXmmRNeQXtrA+ ZVLpjS6w==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v21-0092ij-90; Thu, 16 Jun 2022 13:19:54 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FJ-Bp; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:35 -0600 Message-Id: <20220616191945.23935-6-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 05/15] md/raid5: Move common stripe get code into new find_get_stripe() helper X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Both uses of find_stripe() require a fairly complicated dance to increment the reference count. Move this into a common find_get_stripe() helper. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Acked-by: Guoqing Jiang --- drivers/md/raid5.c | 131 ++++++++++++++++++++++----------------------- 1 file changed, 64 insertions(+), 67 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 969609b7114b..1bbf87d15bc8 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -624,6 +624,49 @@ static struct stripe_head *__find_stripe(struct r5conf *conf, sector_t sector, return NULL; } +static struct stripe_head *find_get_stripe(struct r5conf *conf, + sector_t sector, short generation, int hash) +{ + int inc_empty_inactive_list_flag; + struct stripe_head *sh; + + sh = __find_stripe(conf, sector, generation); + if (!sh) + return NULL; + + if (atomic_inc_not_zero(&sh->count)) + return sh; + + /* + * Slow path. The reference count is zero which means the stripe must + * be on a list (sh->lru). Must remove the stripe from the list that + * references it with the device_lock held. + */ + + spin_lock(&conf->device_lock); + if (!atomic_read(&sh->count)) { + if (!test_bit(STRIPE_HANDLE, &sh->state)) + atomic_inc(&conf->active_stripes); + BUG_ON(list_empty(&sh->lru) && + !test_bit(STRIPE_EXPANDING, &sh->state)); + inc_empty_inactive_list_flag = 0; + if (!list_empty(conf->inactive_list + hash)) + inc_empty_inactive_list_flag = 1; + list_del_init(&sh->lru); + if (list_empty(conf->inactive_list + hash) && + inc_empty_inactive_list_flag) + atomic_inc(&conf->empty_inactive_list_nr); + if (sh->group) { + sh->group->stripes_cnt--; + sh->group = NULL; + } + } + atomic_inc(&sh->count); + spin_unlock(&conf->device_lock); + + return sh; +} + /* * Need to check if array has failed when deciding whether to: * - start an array @@ -716,7 +759,6 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector, { struct stripe_head *sh; int hash = stripe_hash_locks_hash(conf, sector); - int inc_empty_inactive_list_flag; pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector); @@ -726,57 +768,34 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector, wait_event_lock_irq(conf->wait_for_quiescent, conf->quiesce == 0 || noquiesce, *(conf->hash_locks + hash)); - sh = __find_stripe(conf, sector, conf->generation - previous); - if (!sh) { - if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { - sh = get_free_stripe(conf, hash); - if (!sh && !test_bit(R5_DID_ALLOC, - &conf->cache_state)) - set_bit(R5_ALLOC_MORE, - &conf->cache_state); - } - if (noblock && sh == NULL) - break; + sh = find_get_stripe(conf, sector, conf->generation - previous, + hash); + if (sh) + break; - r5c_check_stripe_cache_usage(conf); - if (!sh) { - set_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - r5l_wake_reclaim(conf->log, 0); - wait_event_lock_irq( - conf->wait_for_stripe, + if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { + sh = get_free_stripe(conf, hash); + if (!sh && !test_bit(R5_DID_ALLOC, &conf->cache_state)) + set_bit(R5_ALLOC_MORE, &conf->cache_state); + } + if (noblock && !sh) + break; + + r5c_check_stripe_cache_usage(conf); + if (!sh) { + set_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + r5l_wake_reclaim(conf->log, 0); + wait_event_lock_irq(conf->wait_for_stripe, !list_empty(conf->inactive_list + hash) && (atomic_read(&conf->active_stripes) < (conf->max_nr_stripes * 3 / 4) || !test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)), *(conf->hash_locks + hash)); - clear_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - } else { - init_stripe(sh, sector, previous); - atomic_inc(&sh->count); - } - } else if (!atomic_inc_not_zero(&sh->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&sh->count)) { - if (!test_bit(STRIPE_HANDLE, &sh->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&sh->lru) && - !test_bit(STRIPE_EXPANDING, &sh->state)); - inc_empty_inactive_list_flag = 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag = 1; - list_del_init(&sh->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_flag) - atomic_inc(&conf->empty_inactive_list_nr); - if (sh->group) { - sh->group->stripes_cnt--; - sh->group = NULL; - } - } + clear_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + } else { + init_stripe(sh, sector, previous); atomic_inc(&sh->count); - spin_unlock(&conf->device_lock); } } while (sh == NULL); @@ -830,7 +849,6 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh sector_t head_sector, tmp_sec; int hash; int dd_idx; - int inc_empty_inactive_list_flag; /* Don't cross chunks, so stripe pd_idx/qd_idx is the same */ tmp_sec = sh->sector; @@ -840,28 +858,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh hash = stripe_hash_locks_hash(conf, head_sector); spin_lock_irq(conf->hash_locks + hash); - head = __find_stripe(conf, head_sector, conf->generation); - if (head && !atomic_inc_not_zero(&head->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&head->count)) { - if (!test_bit(STRIPE_HANDLE, &head->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&head->lru) && - !test_bit(STRIPE_EXPANDING, &head->state)); - inc_empty_inactive_list_flag = 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag = 1; - list_del_init(&head->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_flag) - atomic_inc(&conf->empty_inactive_list_nr); - if (head->group) { - head->group->stripes_cnt--; - head->group = NULL; - } - } - atomic_inc(&head->count); - spin_unlock(&conf->device_lock); - } + head = find_get_stripe(conf, head_sector, conf->generation, hash); spin_unlock_irq(conf->hash_locks + hash); if (!head) From patchwork Thu Jun 16 19:19:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884702 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C14C7CCA47F for ; Thu, 16 Jun 2022 19:20:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238982AbiFPTUR (ORCPT ); Thu, 16 Jun 2022 15:20:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377835AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA798563BC; Thu, 16 Jun 2022 12:19:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=+sFRsvcRJw5+XZa6wPRtbYNbke+bt1JLaoOmg4B6nNg=; b=ei86I8EjHZRTzA4R/BJ+vogoH5 zEVvDNyXhKP+R2MkZHLDgyjBGm8DfVwFM0HXw5NwAyK/XhVeNhzTpsu0FwAD6aqhkXPK5ACs3YULP Wh0vLWlYcgF8pJtsvjTJpQnhbbRwWG5pVGhBUmXZW0tYIFgj+ZzlzZwVnR9b5YedJ8aLIM0lX2Hei g8OVXc06jmvzsUrBeH2m8t7D+dQPIu9sbdFpg3oKJoEKdp23a/HLHihtRBmbh8dk5aV4HNbBJG9Yp DbS6qD0qUV22CRc0QobuS8Dd5r+IXzt0L5EOhX9M5anUMGNa+SFOsdRGpzInZygq8/x21NxZZo2kY 145YgRiQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v20-0092ii-OX; Thu, 16 Jun 2022 13:19:53 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FM-G8; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 16 Jun 2022 13:19:36 -0600 Message-Id: <20220616191945.23935-7-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 06/15] md/raid5: Factor out helper from raid5_make_request() loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Factor out the inner loop of raid5_make_request() into it's own helper called make_stripe_request(). The helper returns a number of statuses: SUCCESS, RETRY, SCHEDULE_AND_RETRY and FAIL. This makes the code a bit easier to understand and allows the SCHEDULE_AND_RETRY path to be made common. A context structure is added to contain do_flush. It will be used more in subsequent patches for state that needs to be kept outside the loop. No functional changes intended. This will be cleaned up further in subsequent patches to untangle the gen_lock and do_prepare logic further. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 231 ++++++++++++++++++++++++++------------------- 1 file changed, 133 insertions(+), 98 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1bbf87d15bc8..26ef292842de 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5786,17 +5786,139 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector, sector >= reshape_sector; } +enum stripe_result { + STRIPE_SUCCESS = 0, + STRIPE_RETRY, + STRIPE_SCHEDULE_AND_RETRY, + STRIPE_FAIL, +}; + +struct stripe_request_ctx { + /* the request had REQ_PREFLUSH, cleared after the first stripe_head */ + bool do_flush; +}; + +static enum stripe_result make_stripe_request(struct mddev *mddev, + struct r5conf *conf, struct stripe_request_ctx *ctx, + sector_t logical_sector, struct bio *bi, int seq) +{ + const int rw = bio_data_dir(bi); + enum stripe_result ret; + struct stripe_head *sh; + sector_t new_sector; + int previous = 0; + int dd_idx; + + if (unlikely(conf->reshape_progress != MaxSector)) { + /* + * Spinlock is needed as reshape_progress may be + * 64bit on a 32bit platform, and so it might be + * possible to see a half-updated value + * Of course reshape_progress could change after + * the lock is dropped, so once we get a reference + * to the stripe that we think it is, we will have + * to check again. + */ + spin_lock_irq(&conf->device_lock); + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) { + previous = 1; + } else { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_safe)) { + spin_unlock_irq(&conf->device_lock); + return STRIPE_SCHEDULE_AND_RETRY; + } + } + spin_unlock_irq(&conf->device_lock); + } + + new_sector = raid5_compute_sector(conf, logical_sector, previous, + &dd_idx, NULL); + pr_debug("raid456: %s, sector %llu logical %llu\n", __func__, + new_sector, logical_sector); + + sh = raid5_get_active_stripe(conf, new_sector, previous, + (bi->bi_opf & REQ_RAHEAD), 0); + if (unlikely(!sh)) { + /* cannot get stripe, just give-up */ + bi->bi_status = BLK_STS_IOERR; + return STRIPE_FAIL; + } + + if (unlikely(previous)) { + /* + * Expansion might have moved on while waiting for a + * stripe, so we must do the range check again. + * Expansion could still move past after this + * test, but as we are holding a reference to + * 'sh', we know that if that happens, + * STRIPE_EXPANDING will get set and the expansion + * won't proceed until we finish with the stripe. + */ + int must_retry = 0; + spin_lock_irq(&conf->device_lock); + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + must_retry = 1; + spin_unlock_irq(&conf->device_lock); + if (must_retry) { + ret = STRIPE_SCHEDULE_AND_RETRY; + goto out_release; + } + } + + if (read_seqcount_retry(&conf->gen_lock, seq)) { + /* Might have got the wrong stripe_head by accident */ + ret = STRIPE_RETRY; + goto out_release; + } + + if (test_bit(STRIPE_EXPANDING, &sh->state) || + !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + /* + * Stripe is busy expanding or add failed due to + * overlap. Flush everything and wait a while. + */ + md_wakeup_thread(mddev->thread); + ret = STRIPE_SCHEDULE_AND_RETRY; + goto out_release; + } + + if (stripe_can_batch(sh)) + stripe_add_to_batch_list(conf, sh); + + if (ctx->do_flush) { + set_bit(STRIPE_R5C_PREFLUSH, &sh->state); + /* we only need flush for one stripe */ + ctx->do_flush = false; + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if ((!sh->batch_head || sh == sh->batch_head) && + (bi->bi_opf & REQ_SYNC) && + !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + atomic_inc(&conf->preread_active_stripes); + + release_stripe_plug(mddev, sh); + return STRIPE_SUCCESS; + +out_release: + raid5_release_stripe(sh); + return ret; +} + static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf = mddev->private; - int dd_idx; - sector_t new_sector; sector_t logical_sector, last_sector; - struct stripe_head *sh; + struct stripe_request_ctx ctx = {}; const int rw = bio_data_dir(bi); + enum stripe_result res; DEFINE_WAIT(w); bool do_prepare; - bool do_flush = false; if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { int ret = log_handle_flush_request(conf, bi); @@ -5812,7 +5934,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH, * we need to flush journal device */ - do_flush = bi->bi_opf & REQ_PREFLUSH; + ctx.do_flush = bi->bi_opf & REQ_PREFLUSH; } if (!md_write_start(mddev, bi)) @@ -5852,117 +5974,30 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (; logical_sector < last_sector; logical_sector += RAID5_STRIPE_SECTORS(conf)) { - int previous; int seq; do_prepare = false; retry: seq = read_seqcount_begin(&conf->gen_lock); - previous = 0; if (do_prepare) prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - if (unlikely(conf->reshape_progress != MaxSector)) { - /* spinlock is needed as reshape_progress may be - * 64bit on a 32bit platform, and so it might be - * possible to see a half-updated value - * Of course reshape_progress could change after - * the lock is dropped, so once we get a reference - * to the stripe that we think it is, we will have - * to check again. - */ - spin_lock_irq(&conf->device_lock); - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) { - previous = 1; - } else { - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_safe)) { - spin_unlock_irq(&conf->device_lock); - schedule(); - do_prepare = true; - goto retry; - } - } - spin_unlock_irq(&conf->device_lock); - } - - new_sector = raid5_compute_sector(conf, logical_sector, - previous, - &dd_idx, NULL); - pr_debug("raid456: raid5_make_request, sector %llu logical %llu\n", - (unsigned long long)new_sector, - (unsigned long long)logical_sector); - sh = raid5_get_active_stripe(conf, new_sector, previous, - (bi->bi_opf & REQ_RAHEAD), 0); - if (unlikely(!sh)) { - /* cannot get stripe, just give-up */ - bi->bi_status = BLK_STS_IOERR; + res = make_stripe_request(mddev, conf, &ctx, logical_sector, + bi, seq); + if (res == STRIPE_FAIL) break; - } - - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. - * Expansion could still move past after this - * test, but as we are holding a reference to - * 'sh', we know that if that happens, - * STRIPE_EXPANDING will get set and the expansion - * won't proceed until we finish with the stripe. - */ - int must_retry = 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry = 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - schedule(); - do_prepare = true; - goto retry; - } - } - if (read_seqcount_retry(&conf->gen_lock, seq)) { - /* Might have got the wrong stripe_head by accident */ - raid5_release_stripe(sh); + if (res == STRIPE_RETRY) goto retry; - } - if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { - /* - * Stripe is busy expanding or add failed due to - * overlap. Flush everything and wait a while. - */ - md_wakeup_thread(mddev->thread); - raid5_release_stripe(sh); + if (res == STRIPE_SCHEDULE_AND_RETRY) { schedule(); do_prepare = true; goto retry; } - - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); - - if (do_flush) { - set_bit(STRIPE_R5C_PREFLUSH, &sh->state); - /* we only need flush for one stripe */ - do_flush = false; - } - - set_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - if ((!sh->batch_head || sh == sh->batch_head) && - (bi->bi_opf & REQ_SYNC) && - !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - - release_stripe_plug(mddev, sh); } + finish_wait(&conf->wait_for_overlap, &w); if (rw == WRITE) From patchwork Thu Jun 16 19:19:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C966C433EF for ; Thu, 16 Jun 2022 19:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378289AbiFPTUV (ORCPT ); Thu, 16 Jun 2022 15:20:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377556AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D7145620D; Thu, 16 Jun 2022 12:19:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=ww0/TV5RRR8rJ7SvdiX8OXRQBEbLDtnpG/LBC1FtO8Q=; b=gyv2umXvu2Mri+AJM6Jopd/iEE qkDzmsHXyWgK1GP/QhpiRdBlVilFeXXg088N+jfDRwfV3bQ2rhcNyKGIv2OezJ4tlCYGWviGgAePi 66oIZDI5hO9BD5a79e8gV9Krk/q292fNpAt+miMLQKkLwSV1YAgQ0QeUoTN9hSF7f6QYrFRNBPgw4 Fut84X/z8EF1Ca38bXY5RGIEE/b0H14iJ4TU6tfv62vJBzfU8lt08Ni3nf7gNYlCtCIkU9rLf2d/z S8g9Fy/hd+8ojp7QrLh3fXcygmXGQqInoYn8p1KsipUnsYV4j/0ctcoCyqGvRxC/5FTIuHk7csU58 Q83BtyAQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v20-0092il-O1; Thu, 16 Jun 2022 13:19:53 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FP-Je; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:37 -0600 Message-Id: <20220616191945.23935-8-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 07/15] md/raid5: Drop the do_prepare flag in raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org prepare_to_wait() can be reasonably called after schedule instead of setting a flag and preparing in the next loop iteration. This means that prepare_to_wait() will be called before read_seqcount_begin(), but there shouldn't be any reason that the order matters here. On the first iteration of the loop prepare_to_wait() is already called first. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 26ef292842de..c58e70db204a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5918,7 +5918,6 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) const int rw = bio_data_dir(bi); enum stripe_result res; DEFINE_WAIT(w); - bool do_prepare; if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { int ret = log_handle_flush_request(conf, bi); @@ -5976,12 +5975,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) for (; logical_sector < last_sector; logical_sector += RAID5_STRIPE_SECTORS(conf)) { int seq; - do_prepare = false; retry: seq = read_seqcount_begin(&conf->gen_lock); - if (do_prepare) - prepare_to_wait(&conf->wait_for_overlap, &w, - TASK_UNINTERRUPTIBLE); res = make_stripe_request(mddev, conf, &ctx, logical_sector, bi, seq); @@ -5993,7 +5988,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) if (res == STRIPE_SCHEDULE_AND_RETRY) { schedule(); - do_prepare = true; + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); goto retry; } } From patchwork Thu Jun 16 19:19:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884706 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1F8FC43334 for ; Thu, 16 Jun 2022 19:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378295AbiFPTUZ (ORCPT ); Thu, 16 Jun 2022 15:20:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377820AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C1AF554A4; Thu, 16 Jun 2022 12:19:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=8KJ0fqDYJYgEK4f4muGiFRawXBYi4QX35dDzYRE9OKw=; b=Lnzd5/B83bcowLXlyx5PN4atmJ 6MMyU4m01OncYKsoopcqhAwnIXUuKVdh5uGLFEuunozx9qgW5saPxRPgFDPtqiuBgdZsOcV7HnKfj 2BhkePA+wvkAbP27x/EyYdqFjuA3E21Q5qY3wvcIxaRfVC5KacZ/AkO2te9Z761rY85NtE24xEPLL xI1SQf+tgA8A7kL5ARLBjV6PsBibLAGXkH+YKc/QAfQh9nqLbuuRzViOdSH6oSf2OfQ2HpBIRnjrB eUyzvpL+jd2y1gYtu8c/It8ahD+L3Jh0/0fMqjMfwxDcyS7yUVTilT16N90X0hPADZBJ2ovL3zJMW LrIaT7Zw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v20-0092im-LQ; Thu, 16 Jun 2022 13:19:53 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1v-0006FS-NA; Thu, 16 Jun 2022 13:19:47 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:38 -0600 Message-Id: <20220616191945.23935-9-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 08/15] md/raid5: Move read_seqcount_begin() into make_stripe_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Now that prepare_to_wait() isn't in the way, move read_sequcount_begin() into make_stripe_request(). No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index c58e70db204a..345350d34623 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5800,14 +5800,16 @@ struct stripe_request_ctx { static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, - sector_t logical_sector, struct bio *bi, int seq) + sector_t logical_sector, struct bio *bi) { const int rw = bio_data_dir(bi); enum stripe_result ret; struct stripe_head *sh; sector_t new_sector; int previous = 0; - int dd_idx; + int seq, dd_idx; + + seq = read_seqcount_begin(&conf->gen_lock); if (unlikely(conf->reshape_progress != MaxSector)) { /* @@ -5973,13 +5975,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (; logical_sector < last_sector; logical_sector += RAID5_STRIPE_SECTORS(conf)) { - int seq; - retry: - seq = read_seqcount_begin(&conf->gen_lock); - res = make_stripe_request(mddev, conf, &ctx, logical_sector, - bi, seq); + bi); if (res == STRIPE_FAIL) break; From patchwork Thu Jun 16 19:19:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884696 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9C99C433EF for ; Thu, 16 Jun 2022 19:20:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378200AbiFPTUE (ORCPT ); Thu, 16 Jun 2022 15:20:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347063AbiFPTT4 (ORCPT ); Thu, 16 Jun 2022 15:19:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EC95554BB; Thu, 16 Jun 2022 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=TqznKfD7muouxLbcabxY3rl2UiNCi1GG48QLUynPBk8=; b=TtqiVFP3QPkvf5CnjXYEln8JuL 1UqOuOCt/ntx4DInt92dOdN9bp7T/FnGUqN9AXMVMACp5GbxXJjd1Ew454rA6h80MbAd1T1Hj0Tpt GRBgcRnpYhnhYNak1x6AJ8qXO9XsNAwy26J5AstpVhsBulUTuYKNtb2XsSHHAqeLQd8GJlkURCdbh fGyg9v8rgvSlk3dZEzcCrTuQl1ydO70erknUx0yYFaM6capagHKS4N90/QT2gpuZU99vG2OFmCybq hoOdqn5di2+SCCw0fLaknSRBWc8xuz7skfPB79KHuevlBbE82/WqJZ0f8Rbx3wIJntydFKvdyiB3L JkQ1xdQA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v20-0092il-0m; Thu, 16 Jun 2022 13:19:52 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006FY-1Q; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:39 -0600 Message-Id: <20220616191945.23935-10-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 09/15] md/raid5: Refactor for loop in raid5_make_request() into while loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The for loop with retry label can be more cleanly expressed as a while loop by moving the logical_sector increment into the success path. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 345350d34623..17ddaa41147c 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5974,22 +5974,23 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) } md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - for (; logical_sector < last_sector; logical_sector += RAID5_STRIPE_SECTORS(conf)) { - retry: + while (logical_sector < last_sector) { res = make_stripe_request(mddev, conf, &ctx, logical_sector, bi); if (res == STRIPE_FAIL) break; if (res == STRIPE_RETRY) - goto retry; + continue; if (res == STRIPE_SCHEDULE_AND_RETRY) { schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - goto retry; + continue; } + + logical_sector += RAID5_STRIPE_SECTORS(conf); } finish_wait(&conf->wait_for_overlap, &w); From patchwork Thu Jun 16 19:19:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884700 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86CF6C43334 for ; Thu, 16 Jun 2022 19:20:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238190AbiFPTUN (ORCPT ); Thu, 16 Jun 2022 15:20:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377680AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B09DE5623A; Thu, 16 Jun 2022 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=5N+DuwbwjsNtA6MiAkyKqthsWUBtdpGk6ol5t9owlgA=; b=OEtl1eZgooRbxWrxo2k7IkqbRT D8nwWEVRc/mKxskHhRw6wqZE5vOTOBA00gp/MOIQnFT1KggEX02XM7FrWa69FsBpkj2feDmLbW3EO H7FrtkAuXxR/JByBS0mouVXUWOq97R6hoWfej0lppTMOvD84eT/UyiVqZh5ewL4aGwSp2zzohkjUM GwI2mUxMSpwGz90KtW6Ip3vN6LNDWR9DJJhfW2eegy4+5XMMxzY+vyB03GTPub1oihoXT5yLR841U S7k0v/bhI67zO1b3d5ZwHXMUqGYe1/MN1ovchnzuowKyozuTCx1Twj2gc4XPA2msxubUm+J3sXj6h FaMBzNSQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092ii-T0; Thu, 16 Jun 2022 13:19:52 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fc-6c; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:40 -0600 Message-Id: <20220616191945.23935-11-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 10/15] md/raid5: Keep a reference to last stripe_head for batch X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org When batching, every stripe head has to find the previous stripe head to add to the batch list. This involves taking the hash lock which is highly contended during IO. Instead of finding the previous stripe_head each time, store a reference to the previous stripe_head in a pointer so that it doesn't require taking the contended lock another time. The reference to the previous stripe must be released before scheduling and waiting for work to get done. Otherwise, it can hold up raid5_activate_delayed() and deadlock. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Acked-by: Guoqing Jiang --- drivers/md/raid5.c | 52 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 17ddaa41147c..34f8d6c18bd3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -843,7 +843,8 @@ static bool stripe_can_batch(struct stripe_head *sh) } /* we only do back search */ -static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh) +static void stripe_add_to_batch_list(struct r5conf *conf, + struct stripe_head *sh, struct stripe_head *last_sh) { struct stripe_head *head; sector_t head_sector, tmp_sec; @@ -856,15 +857,20 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh return; head_sector = sh->sector - RAID5_STRIPE_SECTORS(conf); - hash = stripe_hash_locks_hash(conf, head_sector); - spin_lock_irq(conf->hash_locks + hash); - head = find_get_stripe(conf, head_sector, conf->generation, hash); - spin_unlock_irq(conf->hash_locks + hash); - - if (!head) - return; - if (!stripe_can_batch(head)) - goto out; + if (last_sh && head_sector == last_sh->sector) { + head = last_sh; + atomic_inc(&head->count); + } else { + hash = stripe_hash_locks_hash(conf, head_sector); + spin_lock_irq(conf->hash_locks + hash); + head = find_get_stripe(conf, head_sector, conf->generation, + hash); + spin_unlock_irq(conf->hash_locks + hash); + if (!head) + return; + if (!stripe_can_batch(head)) + goto out; + } lock_two_stripes(head, sh); /* clear_batch_ready clear the flag */ @@ -5794,6 +5800,8 @@ enum stripe_result { }; struct stripe_request_ctx { + /* a reference to the last stripe_head for batching */ + struct stripe_head *batch_last; /* the request had REQ_PREFLUSH, cleared after the first stripe_head */ bool do_flush; }; @@ -5888,8 +5896,13 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, goto out_release; } - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); + if (stripe_can_batch(sh)) { + stripe_add_to_batch_list(conf, sh, ctx->batch_last); + if (ctx->batch_last) + raid5_release_stripe(ctx->batch_last); + atomic_inc(&sh->count); + ctx->batch_last = sh; + } if (ctx->do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); @@ -5984,6 +5997,18 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) continue; if (res == STRIPE_SCHEDULE_AND_RETRY) { + /* + * Must release the reference to batch_last before + * scheduling and waiting for work to be done, + * otherwise the batch_last stripe head could prevent + * raid5_activate_delayed() from making progress + * and thus deadlocking. + */ + if (ctx.batch_last) { + raid5_release_stripe(ctx.batch_last); + ctx.batch_last = NULL; + } + schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); @@ -5995,6 +6020,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) finish_wait(&conf->wait_for_overlap, &w); + if (ctx.batch_last) + raid5_release_stripe(ctx.batch_last); + if (rw == WRITE) md_write_end(mddev); bio_endio(bi); From patchwork Thu Jun 16 19:19:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B4DACCA47A for ; Thu, 16 Jun 2022 19:20:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378232AbiFPTUL (ORCPT ); Thu, 16 Jun 2022 15:20:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377330AbiFPTT4 (ORCPT ); Thu, 16 Jun 2022 15:19:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A54F6562C7; Thu, 16 Jun 2022 12:19:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=nalZ0QkEDN9kZmOnE+xFNb1DntyjDrSZVhBHWEqhODw=; b=IqwxvKeSKZHMHi2o8LSfDhlkfn lGvvPY5KNSoRo826V5LkLGPVZrI6KTCfOgMW3ddmMfwMUK8YdUxKIfZhYmIfMAT/yFZETxXYENFGy LjoEKpab6CnDLwc2T4XRu53JwjFs8ktWjMjByigTtO9sv02k0Cpkpl7/Pw3bePvnp16QOlKV9F/HG amnU6qilxOwJ9o8fmLPv+4ypt8RKAojllZO7xAsNOXsxLBsEUdfycAJrj8ekRLN2IfUBcU3Ev8IBz uUtvY3nClIejndCq9L7xzwDTAoz+SggviUFoA+LPZW3Wz5ONi5x2b2NnCAmDfSiWJ6RU/5wDnV5lz yDFJBRgQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092ij-Rr; Thu, 16 Jun 2022 13:19:53 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fg-CF; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:41 -0600 Message-Id: <20220616191945.23935-12-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 11/15] md/raid5: Refactor add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Factor out two helper functions from add_stripe_bio(): one to check for overlap (stripe_bio_overlaps()), and one to actually add the bio to the stripe (__add_stripe_bio()). The latter function will always succeed. This will be useful in the next patch so that overlap can be checked for multiple disks before adding any Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 86 ++++++++++++++++++++++++++++++---------------- 1 file changed, 56 insertions(+), 30 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 34f8d6c18bd3..f12773c2387a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3415,39 +3415,32 @@ schedule_reconstruction(struct stripe_head *sh, struct stripe_head_state *s, s->locked, s->ops_request); } -/* - * Each stripe/dev can have one or more bion attached. - * toread/towrite point to the first in a chain. - * The bi_next chain must be in order. - */ -static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, - int forwrite, int previous) +static bool stripe_bio_overlaps(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite) { - struct bio **bip; struct r5conf *conf = sh->raid_conf; - int firstwrite=0; + struct bio **bip; - pr_debug("adding bi b#%llu to stripe s#%llu\n", - (unsigned long long)bi->bi_iter.bi_sector, - (unsigned long long)sh->sector); + pr_debug("checking bi b#%llu to stripe s#%llu\n", + bi->bi_iter.bi_sector, sh->sector); - spin_lock_irq(&sh->stripe_lock); /* Don't allow new IO added to stripes in batch list */ if (sh->batch_head) - goto overlap; - if (forwrite) { + return true; + + if (forwrite) bip = &sh->dev[dd_idx].towrite; - if (*bip == NULL) - firstwrite = 1; - } else + else bip = &sh->dev[dd_idx].toread; + while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) { if (bio_end_sector(*bip) > bi->bi_iter.bi_sector) - goto overlap; - bip = & (*bip)->bi_next; + return true; + bip = &(*bip)->bi_next; } + if (*bip && (*bip)->bi_iter.bi_sector < bio_end_sector(bi)) - goto overlap; + return true; if (forwrite && raid5_has_ppl(conf)) { /* @@ -3476,9 +3469,30 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, } if (first + conf->chunk_sectors * (count - 1) != last) - goto overlap; + return true; } + return false; +} + +static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + struct r5conf *conf = sh->raid_conf; + struct bio **bip; + int firstwrite = 0; + + if (forwrite) { + bip = &sh->dev[dd_idx].towrite; + if (!*bip) + firstwrite = 1; + } else { + bip = &sh->dev[dd_idx].toread; + } + + while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) + bip = &(*bip)->bi_next; + if (!forwrite || previous) clear_bit(STRIPE_BATCH_READY, &sh->state); @@ -3505,8 +3519,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, } pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", - (unsigned long long)(*bip)->bi_iter.bi_sector, - (unsigned long long)sh->sector, dd_idx); + (*bip)->bi_iter.bi_sector, sh->sector, dd_idx); if (conf->mddev->bitmap && firstwrite) { /* Cannot hold spinlock over bitmap_startwrite, @@ -3514,7 +3527,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, * we have added to the bitmap and set bm_seq. * So set STRIPE_BITMAP_PENDING to prevent * batching. - * If multiple add_stripe_bio() calls race here they + * If multiple __add_stripe_bio() calls race here they * much all set STRIPE_BITMAP_PENDING. So only the first one * to complete "bitmap_startwrite" gets to set * STRIPE_BIT_DELAY. This is important as once a stripe @@ -3532,14 +3545,27 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, set_bit(STRIPE_BIT_DELAY, &sh->state); } } - spin_unlock_irq(&sh->stripe_lock); +} - return 1; +/* + * Each stripe/dev can have one or more bios attached. + * toread/towrite point to the first in a chain. + * The bi_next chain must be in order. + */ +static bool add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + spin_lock_irq(&sh->stripe_lock); - overlap: - set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + spin_unlock_irq(&sh->stripe_lock); + return false; + } + + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); spin_unlock_irq(&sh->stripe_lock); - return 0; + return true; } static void end_reshape(struct r5conf *conf); From patchwork Thu Jun 16 19:19:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884704 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91A3AC43334 for ; Thu, 16 Jun 2022 19:20:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378285AbiFPTUU (ORCPT ); Thu, 16 Jun 2022 15:20:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377972AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 978C756380; Thu, 16 Jun 2022 12:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=HjuLlnYUox4U67JGg6P48vjgRva2IywSj89prt0c6Ag=; b=gt26t8i2QN6xx2/zXveKjCTbI0 nj06bydHWcXEr5F9HZgJ/QMYN/0KXuWlIbmlg4ioXNH2kWiDgbJOiYjeV6XhchRfwKH0F2YChaN2L 8Hwz/GT1OEdngBIELvkdoqVMJNQEvfdcn6XHP9uWgIk4bQVvQrbP6xPX7fdhppwvOkS8lU6nq4Zxx aQen8YaSUMVE+yVaaLLY/H87lJOp64dob0TqOTtC+i9e57PBvM1CMWyjlW0dTK6uTLOQhDJ4EeqO8 v6QkFugFh2J4Px1YApFTCXSX8uZEDyd0dpadRcAPTWC4cTUPkWhj0Nw7cOBJ8KKpQWvDMIhbnrHI0 o1/egDbg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092ik-RT; Thu, 16 Jun 2022 13:19:54 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fk-He; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:42 -0600 Message-Id: <20220616191945.23935-13-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 12/15] md/raid5: Check all disks in a stripe_head for reshape progress X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org When testing if a previous stripe has had reshape expand past it, use the earliest or latest logical sector in all the disks for that stripe head. This will allow adding multiple disks at a time in a subesquent patch. To do this cleaner, refactor the check into a helper function called stripe_ahead_of_reshape(). Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 53 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index f12773c2387a..b27b754ee18c 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5818,6 +5818,40 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector, sector >= reshape_sector; } +static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min, + sector_t max, sector_t reshape_sector) +{ + return mddev->reshape_backwards ? max < reshape_sector : + min >= reshape_sector; +} + +static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf, + struct stripe_head *sh) +{ + sector_t max_sector = 0, min_sector = MaxSector; + bool ret = false; + int dd_idx; + + for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { + if (dd_idx == sh->pd_idx) + continue; + + min_sector = min(min_sector, sh->dev[dd_idx].sector); + max_sector = min(max_sector, sh->dev[dd_idx].sector); + } + + spin_lock_irq(&conf->device_lock); + + if (!range_ahead_of_reshape(mddev, min_sector, max_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + ret = true; + + spin_unlock_irq(&conf->device_lock); + + return ret; +} + enum stripe_result { STRIPE_SUCCESS = 0, STRIPE_RETRY, @@ -5882,27 +5916,18 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, return STRIPE_FAIL; } - if (unlikely(previous)) { + if (unlikely(previous) && + stripe_ahead_of_reshape(mddev, conf, sh)) { /* - * Expansion might have moved on while waiting for a - * stripe, so we must do the range check again. + * Expansion moved on while waiting for a stripe. * Expansion could still move past after this * test, but as we are holding a reference to * 'sh', we know that if that happens, * STRIPE_EXPANDING will get set and the expansion * won't proceed until we finish with the stripe. */ - int must_retry = 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry = 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - ret = STRIPE_SCHEDULE_AND_RETRY; - goto out_release; - } + ret = STRIPE_SCHEDULE_AND_RETRY; + goto out_release; } if (read_seqcount_retry(&conf->gen_lock, seq)) { From patchwork Thu Jun 16 19:19:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884698 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AF7AC43334 for ; Thu, 16 Jun 2022 19:20:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378222AbiFPTUJ (ORCPT ); Thu, 16 Jun 2022 15:20:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353050AbiFPTT4 (ORCPT ); Thu, 16 Jun 2022 15:19:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8014456213; Thu, 16 Jun 2022 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Ikvt2YFtng+Tcv7Pu2rsyFDnMTIkFfzesO1xEugM4JM=; b=R4xKSHUXNE4L79TM3BhlMPkKrP 9y53CnGSkEO6zN2K0Sp0kfNAJ1eUnGRJPq32XVpwID/YBZAq9ZeS3nmuTKI+JnAaEGBi8+4vXzLEm m/CgjWB/apPvVVW+uxH/LYwC9G7ERRofH0h8Zh9m5WVCSHEWe/dsVjpn/AJ37NUncz+aR25vxkwDo 9p+og1yuI4RUjSNl2DCe7sX0mvZ2yezIZBz9xCxhC2w/VQHRAWaPB57waHjQbZS+qUQSr4l7PZiF0 D0M7d+V1Ss+ZOdhKDd2mjvdZvrJ7AqOZRp0jOUvnHSohECZGIGFZ8A/cCj3jjOiZijF4aPmi9b0ik +rrWIPwA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092im-Im; Thu, 16 Jun 2022 13:19:52 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fo-Mm; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 16 Jun 2022 13:19:43 -0600 Message-Id: <20220616191945.23935-14-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 13/15] md/raid5: Pivot raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org raid5_make_request() loops through every page in the request, finds the appropriate stripe and adds the bio for that page in the disk. This causes a great deal of contention on the hash_lock and extra work seeing each stripe must be found once for every data disk. The number of times a stripe must be found can be reduced by pivoting raid5_make_request() so that it loops through every stripe and then loops through every disk in that stripe to see if the bio must be added. This reduces the number of times the hash lock must be taken by a factor equal to the number of data disks. To accomplish this, the logical sectors that have already been added must be tracked. Tracking them is done with a bitmap: the bits for all pages are set at the start of the request and each bit is cleared once the bio is added to a stripe. Finding the next sector to be done is then just a call to find_first_bit() so that sectors that have been done can simply be skipped. One minor downside is that the maximum sectors for a request must be limited so that the bitmap can be appropriately sized on the stack. This limit is arbitrarily chosen to be 256 stripe pages which works out to 1MB if PAGE_SIZE == DEFAULT_STRIPE_SIZE. This doesn't actually restrict the maximum request further seeing the default block queue settings are used which restricts the number of segments to 128 (which results in request sizes that are approximately 512KB). Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 89 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 83 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b27b754ee18c..9e18dc9ae8b4 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -61,6 +61,8 @@ #define cpu_to_group(cpu) cpu_to_node(cpu) #define ANY_GROUP NUMA_NO_NODE +#define RAID5_MAX_REQ_STRIPES 256 + static bool devices_handle_discard_safely = false; module_param(devices_handle_discard_safely, bool, 0644); MODULE_PARM_DESC(devices_handle_discard_safely, @@ -5862,10 +5864,69 @@ enum stripe_result { struct stripe_request_ctx { /* a reference to the last stripe_head for batching */ struct stripe_head *batch_last; + + /* first sector in the request */ + sector_t first_sector; + + /* last sector in the request */ + sector_t last_sector; + + /* bitmap to track stripe sectors that have been added to stripes */ + DECLARE_BITMAP(sectors_to_do, RAID5_MAX_REQ_STRIPES); + /* the request had REQ_PREFLUSH, cleared after the first stripe_head */ bool do_flush; }; +static int add_all_stripe_bios(struct r5conf *conf, + struct stripe_request_ctx *ctx, struct stripe_head *sh, + struct bio *bi, int forwrite, int previous) +{ + int dd_idx; + int ret = 1; + + spin_lock_irq(&sh->stripe_lock); + + for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { + struct r5dev *dev = &sh->dev[dd_idx]; + + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) + continue; + + if (dev->sector < ctx->first_sector || + dev->sector >= ctx->last_sector) + continue; + + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &dev->flags); + ret = 0; + continue; + } + } + + if (!ret) + goto out; + + for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { + struct r5dev *dev = &sh->dev[dd_idx]; + + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) + continue; + + if (dev->sector < ctx->first_sector || + dev->sector >= ctx->last_sector) + continue; + + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); + clear_bit((dev->sector - ctx->first_sector) >> + RAID5_STRIPE_SHIFT(conf), ctx->sectors_to_do); + } + +out: + spin_unlock_irq(&sh->stripe_lock); + return ret; +} + static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, sector_t logical_sector, struct bio *bi) @@ -5937,7 +5998,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, } if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + !add_all_stripe_bios(conf, ctx, sh, bi, rw, previous)) { /* * Stripe is busy expanding or add failed due to * overlap. Flush everything and wait a while. @@ -5979,11 +6040,12 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf = mddev->private; - sector_t logical_sector, last_sector; + sector_t logical_sector; struct stripe_request_ctx ctx = {}; const int rw = bio_data_dir(bi); enum stripe_result res; DEFINE_WAIT(w); + int s; if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { int ret = log_handle_flush_request(conf, bi); @@ -6023,9 +6085,14 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) } logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1); - last_sector = bio_end_sector(bi); + ctx.first_sector = logical_sector; + ctx.last_sector = bio_end_sector(bi); bi->bi_next = NULL; + bitmap_set(ctx.sectors_to_do, 0, + DIV_ROUND_UP(ctx.last_sector - logical_sector, + RAID5_STRIPE_SECTORS(conf))); + /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress != MaxSector) && @@ -6038,7 +6105,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) } md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - while (logical_sector < last_sector) { + while (1) { res = make_stripe_request(mddev, conf, &ctx, logical_sector, bi); if (res == STRIPE_FAIL) @@ -6066,7 +6133,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) continue; } - logical_sector += RAID5_STRIPE_SECTORS(conf); + s = find_first_bit(ctx.sectors_to_do, RAID5_MAX_REQ_STRIPES); + if (s == RAID5_MAX_REQ_STRIPES) + break; + + logical_sector = ctx.first_sector + + (s << RAID5_STRIPE_SHIFT(conf)); } finish_wait(&conf->wait_for_overlap, &w); @@ -7923,7 +7995,12 @@ static int raid5_run(struct mddev *mddev) mddev->queue->limits.discard_granularity < stripe) blk_queue_max_discard_sectors(mddev->queue, 0); - blk_queue_max_hw_sectors(mddev->queue, UINT_MAX); + /* + * Requests require having a bitmap for each stripe. + * Limit the max sectors based on this. + */ + blk_queue_max_hw_sectors(mddev->queue, + RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf)); } if (log_init(conf, journal_dev, raid5_has_ppl(conf))) From patchwork Thu Jun 16 19:19:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEC85CCA47A for ; Thu, 16 Jun 2022 19:20:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377987AbiFPTUM (ORCPT ); Thu, 16 Jun 2022 15:20:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377584AbiFPTT4 (ORCPT ); Thu, 16 Jun 2022 15:19:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30059563B9; Thu, 16 Jun 2022 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=XipH9iAJdASipp0LhFoe2s+U+0gG0U4iLjVccbFrlCk=; b=FivJ8zhelgL8koM3/scbGZ3aoM Em4KvatN3cse7OraBxBcOr8uUYTQhsKOZPFa6rpeCJbaW0B+8R4GZdrnOCdV69thVlE4i1M4ckfUt TexynLjoHYQ5E5pMYrV4fgaiotk0ckJZ2KvtBdllFj831FygoQjFDvsLLSKp9ysCQO/hFyNTgUMSG fvTIAOlzRE5XtGA+Yfnuvcknd2YqrlYfgbI031Qvrav0+pe41di0xao3h5MXhzqV6E/w2mg1UxyLz RzS1VPQQztU6jsbO8IYeUV3rMtCmFgFTQ9Y/fax3b8VBZqO7rSMx7+WZUi94nkWecL4WZ9NnvSKAK YSETGYpA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092ii-0e; Thu, 16 Jun 2022 13:19:51 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fs-RM; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 16 Jun 2022 13:19:44 -0600 Message-Id: <20220616191945.23935-15-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 14/15] md/raid5: Improve debug prints X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Add a debug print for raid5_make_request() so that each request is printed and add the logical sector number to the debug print in __add_stripe_bio(). Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 9e18dc9ae8b4..e48c16bfe657 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3520,8 +3520,9 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, sh->overwrite_disks++; } - pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", - (*bip)->bi_iter.bi_sector, sh->sector, dd_idx); + pr_debug("added bi b#%llu to stripe s#%llu, disk %d, logical %llu\n", + (*bip)->bi_iter.bi_sector, sh->sector, dd_idx, + sh->dev[dd_idx].sector); if (conf->mddev->bitmap && firstwrite) { /* Cannot hold spinlock over bitmap_startwrite, @@ -6093,6 +6094,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) DIV_ROUND_UP(ctx.last_sector - logical_sector, RAID5_STRIPE_SECTORS(conf))); + pr_debug("raid456: %s, logical %llu to %llu\n", __func__, + bi->bi_iter.bi_sector, ctx.last_sector); + /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress != MaxSector) && From patchwork Thu Jun 16 19:19:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12884692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8069AC433EF for ; Thu, 16 Jun 2022 19:19:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377939AbiFPTT6 (ORCPT ); Thu, 16 Jun 2022 15:19:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233405AbiFPTTz (ORCPT ); Thu, 16 Jun 2022 15:19:55 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB79856384; Thu, 16 Jun 2022 12:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=I6q4dKvk9uVUvtNALIt0GxW4LwedkxdjSmse0vN/buc=; b=a4lHtb9nA0RdHYCz392I5pzW7Z 7dDueIg5lJRJZHghoTzuh2WcLEWj/iFVWZ7du2buNlxbbs7kppmhzR8k/UhFevMj9tqbcfjKvXSU7 oNlv6BEGKDwPVIXO6lEKVfScu5Z/O0b45U9fMA65HfRM1w8y+2b12beQpS6cVpOgv5sl4u20cc3N0 d6600nKzM6LLVDIMmMmZ38tfroAoDQUi29K5nYfdtLitvCy63E+C2kJzou484Ie6lOejZyoRS6xX9 ix7tirWza1NojBC0zuixcrHtpZ0brcEBlfqoxbilcooPDgQTnUchGEKX9phfz9gKSLblLwjmwAsLq e0LKXdew==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1y-0092ij-U6; Thu, 16 Jun 2022 13:19:51 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fw-Vx; Thu, 16 Jun 2022 13:19:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 16 Jun 2022 13:19:45 -0600 Message-Id: <20220616191945.23935-16-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 15/15] md/raid5: Increase restriction on max segments per request X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The block layer defaults the maximum segments to 128, which means requests tend to get split around the 512KB depending on how many pages can be merged. There's no such restriction in the raid5 code so increase the limit to USHRT_MAX so that larger requests can be sent as one. Signed-off-by: Logan Gunthorpe Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index e48c16bfe657..5723a497108a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -8005,6 +8005,9 @@ static int raid5_run(struct mddev *mddev) */ blk_queue_max_hw_sectors(mddev->queue, RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf)); + + /* No restrictions on the number of segments in the request */ + blk_queue_max_segments(mddev->queue, USHRT_MAX); } if (log_init(conf, journal_dev, raid5_has_ppl(conf)))