From patchwork Fri Apr 9 11:48:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12193943 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4B6FC433ED for ; Fri, 9 Apr 2021 11:48:42 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 52B37610CC for ; Fri, 9 Apr 2021 11:48:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52B37610CC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-587-usw7sfWKMROIfDioxyh0wQ-1; Fri, 09 Apr 2021 07:48:38 -0400 X-MC-Unique: usw7sfWKMROIfDioxyh0wQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 548A510866AD; Fri, 9 Apr 2021 11:48:33 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2DFBA1045D01; Fri, 9 Apr 2021 11:48:33 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id EEA721806D10; Fri, 9 Apr 2021 11:48:32 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 139BmVIA000940 for ; Fri, 9 Apr 2021 07:48:31 -0400 Received: by smtp.corp.redhat.com (Postfix) id 797D621602A6; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 736AB21602A3 for ; Fri, 9 Apr 2021 11:48:26 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9ABA805F47 for ; Fri, 9 Apr 2021 11:48:26 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-191-4xHyzz76Nq-ljWUrIlYjgg-1; Fri, 09 Apr 2021 07:48:22 -0400 X-MC-Unique: 4xHyzz76Nq-ljWUrIlYjgg-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id EA95F4031D; Fri, 9 Apr 2021 07:48:18 -0400 (EDT) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Fri, 9 Apr 2021 13:48:17 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , Date: Fri, 9 Apr 2021 14:48:01 +0300 Message-ID: <1617968884-15149-2-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> References: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59657262 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v8 1/4] Adds blk_interposer. It allows to redirect bio requests to another block device. X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Signed-off-by: Sergei Shtepa --- block/genhd.c | 51 +++++++++++++++++++++++++++++++++++++++ fs/block_dev.c | 3 +++ include/linux/blk_types.h | 6 +++++ include/linux/blkdev.h | 32 ++++++++++++++++++++++++ 4 files changed, 92 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index 8c8f543572e6..533b33897187 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1938,3 +1938,54 @@ static void disk_release_events(struct gendisk *disk) WARN_ON_ONCE(disk->ev && disk->ev->block != 1); kfree(disk->ev); } + +/** + * bdev_interposer_attach - Attach interposer block device to original + * @original: original block device + * @interposer: interposer block device + * + * Before attaching the interposer, it is necessary to lock the processing + * of bio requests of the original device by calling the bdev_interposer_lock(). + * + * The bdev_interposer_detach() function allows to detach the interposer + * from original block device. + */ +int bdev_interposer_attach(struct block_device *original, + struct block_device *interposer) +{ + struct block_device *bdev; + + WARN_ON(!original); + if (original->bd_interposer) + return -EBUSY; + + bdev = bdgrab(interposer); + if (!bdev) + return -ENODEV; + + original->bd_interposer = bdev; + return 0; +} +EXPORT_SYMBOL_GPL(bdev_interposer_attach); + +/** + * bdev_interposer_detach - Detach interposer from block device + * @original: original block device + * + * Before detaching the interposer, it is necessary to lock the processing + * of bio requests of the original device by calling the bdev_interposer_lock(). + * + * The interposer should be attached using function bdev_interposer_attach(). + */ +void bdev_interposer_detach(struct block_device *original) +{ + if (WARN_ON(!original)) + return; + + if (!original->bd_interposer) + return; + + bdput(original->bd_interposer); + original->bd_interposer = NULL; +} +EXPORT_SYMBOL_GPL(bdev_interposer_detach); diff --git a/fs/block_dev.c b/fs/block_dev.c index 09d6f7229db9..a98a56cc634f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -809,6 +809,7 @@ static void bdev_free_inode(struct inode *inode) { struct block_device *bdev = I_BDEV(inode); + percpu_free_rwsem(&bdev->bd_interposer_lock); free_percpu(bdev->bd_stats); kfree(bdev->bd_meta_info); @@ -909,6 +910,8 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) iput(inode); return NULL; } + bdev->bd_interposer = NULL; + percpu_init_rwsem(&bdev->bd_interposer_lock); return bdev; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index db026b6ec15a..8e4309eb3b18 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -46,6 +46,11 @@ struct block_device { spinlock_t bd_size_lock; /* for bd_inode->i_size updates */ struct gendisk * bd_disk; struct backing_dev_info *bd_bdi; + /* The interposer allows to redirect bio to another device */ + struct block_device *bd_interposer; + /* Lock the queue of block device to attach or detach interposer. + * Allows to safely suspend and flush interposer. */ + struct percpu_rw_semaphore bd_interposer_lock; /* The counter of freeze processes */ int bd_fsfreeze_count; @@ -304,6 +309,7 @@ enum { BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ BIO_TRACKED, /* set if bio goes through the rq_qos path */ BIO_REMAPPED, + BIO_INTERPOSED, /* bio was reassigned to another block device */ BIO_FLAG_LAST }; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 158aefae1030..9e376ee34e19 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -2029,4 +2029,36 @@ int fsync_bdev(struct block_device *bdev); int freeze_bdev(struct block_device *bdev); int thaw_bdev(struct block_device *bdev); +/** + * bdev_interposer_lock - Lock bio processing + * @bdev: locking block device + * + * Lock the bio processing in submit_bio_noacct() for the new requests in the + * original block device. Requests from interposer will not be locked. + * + * To unlock, use the function bdev_interposer_unlock(). + * + * This lock should be used to attach/detach interposer to the device. + */ +static inline void bdev_interposer_lock(struct block_device *bdev) +{ + percpu_down_write(&bdev->bd_interposer_lock); +} + +/** + * bdev_interposer_unlock - Unlock bio processing + * @bdev: locked block device + * + * Unlock the bio processing that was locked by function bdev_interposer_lock(). + * + * This lock should be used to attach/detach interposer to the device. + */ +static inline void bdev_interposer_unlock(struct block_device *bdev) +{ + percpu_up_write(&bdev->bd_interposer_lock); +} + +int bdev_interposer_attach(struct block_device *original, + struct block_device *interposer); +void bdev_interposer_detach(struct block_device *original); #endif /* _LINUX_BLKDEV_H */ From patchwork Fri Apr 9 11:48:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12193941 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A134C433B4 for ; Fri, 9 Apr 2021 11:48:40 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF2C461182 for ; Fri, 9 Apr 2021 11:48:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF2C461182 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-148-3NGVZ7x5MpuvKVQj_JYJ6g-1; Fri, 09 Apr 2021 07:48:37 -0400 X-MC-Unique: 3NGVZ7x5MpuvKVQj_JYJ6g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E59D9107ACF3; Fri, 9 Apr 2021 11:48:32 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D23F718A50; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 7D2661806D0F; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 139BmU42000931 for ; Fri, 9 Apr 2021 07:48:30 -0400 Received: by smtp.corp.redhat.com (Postfix) id 1E8E121493D9; Fri, 9 Apr 2021 11:48:30 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast04.extmail.prod.ext.rdu2.redhat.com [10.11.55.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 191AB21493D8 for ; Fri, 9 Apr 2021 11:48:27 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 92C78101A531 for ; Fri, 9 Apr 2021 11:48:27 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-217-d7YMOkHuPJOvT53SaEjXHA-1; Fri, 09 Apr 2021 07:48:25 -0400 X-MC-Unique: d7YMOkHuPJOvT53SaEjXHA-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id D468F42280; Fri, 9 Apr 2021 07:48:21 -0400 (EDT) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Fri, 9 Apr 2021 13:48:18 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , Date: Fri, 9 Apr 2021 14:48:02 +0300 Message-ID: <1617968884-15149-3-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> References: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59657262 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v8 2/4] Adds the blk_interposers logic to __submit_bio_noacct(). X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com * The calling to blk_partition_remap() function has moved from submit_bio_checks() to submit_bio_noacct(). * The __submit_bio() and __submit_bio_noacct_mq() functions have been removed and their functionality moved to submit_bio_noacct(). * Added locking of the block device queue using the bd_interposer_lock. Signed-off-by: Sergei Shtepa --- block/bio.c | 2 + block/blk-core.c | 194 ++++++++++++++++++++++++++--------------------- 2 files changed, 108 insertions(+), 88 deletions(-) diff --git a/block/bio.c b/block/bio.c index 50e579088aca..6fc9e8f395a6 100644 --- a/block/bio.c +++ b/block/bio.c @@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio_set_flag(bio, BIO_THROTTLED); if (bio_flagged(bio_src, BIO_REMAPPED)) bio_set_flag(bio, BIO_REMAPPED); + if (bio_flagged(bio_src, BIO_INTERPOSED)) + bio_set_flag(bio, BIO_INTERPOSED); bio->bi_opf = bio_src->bi_opf; bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_write_hint = bio_src->bi_write_hint; diff --git a/block/blk-core.c b/block/blk-core.c index fc60ff208497..a987daa76a79 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -735,26 +735,27 @@ static inline int bio_check_eod(struct bio *bio) handle_bad_sector(bio, maxsector); return -EIO; } + + if (unlikely(should_fail_request(bio->bi_bdev, bio->bi_iter.bi_size))) + return -EIO; + return 0; } /* * Remap block n of partition p to block n+start(p) of the disk. */ -static int blk_partition_remap(struct bio *bio) +static inline void blk_partition_remap(struct bio *bio) { - struct block_device *p = bio->bi_bdev; + struct block_device *bdev = bio->bi_bdev; - if (unlikely(should_fail_request(p, bio->bi_iter.bi_size))) - return -EIO; - if (bio_sectors(bio)) { - bio->bi_iter.bi_sector += p->bd_start_sect; - trace_block_bio_remap(bio, p->bd_dev, + if (bdev->bd_partno && bio_sectors(bio)) { + bio->bi_iter.bi_sector += bdev->bd_start_sect; + trace_block_bio_remap(bio, bdev->bd_dev, bio->bi_iter.bi_sector - - p->bd_start_sect); + bdev->bd_start_sect); } bio_set_flag(bio, BIO_REMAPPED); - return 0; } /* @@ -819,8 +820,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) if (!bio_flagged(bio, BIO_REMAPPED)) { if (unlikely(bio_check_eod(bio))) goto end_io; - if (bdev->bd_partno && unlikely(blk_partition_remap(bio))) - goto end_io; } /* @@ -910,20 +909,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) return false; } -static blk_qc_t __submit_bio(struct bio *bio) -{ - struct gendisk *disk = bio->bi_bdev->bd_disk; - blk_qc_t ret = BLK_QC_T_NONE; - - if (blk_crypto_bio_prep(&bio)) { - if (!disk->fops->submit_bio) - return blk_mq_submit_bio(bio); - ret = disk->fops->submit_bio(bio); - } - blk_queue_exit(disk->queue); - return ret; -} - /* * The loop in this function may be a bit non-obvious, and so deserves some * explanation: @@ -931,7 +916,7 @@ static blk_qc_t __submit_bio(struct bio *bio) * - Before entering the loop, bio->bi_next is NULL (as all callers ensure * that), so we have a list with a single bio. * - We pretend that we have just taken it off a longer list, so we assign - * bio_list to a pointer to the bio_list_on_stack, thus initialising the + * bio_list to a pointer to the current->bio_list, thus initialising the * bio_list of new bios to be added. ->submit_bio() may indeed add some more * bios through a recursive call to submit_bio_noacct. If it did, we find a * non-NULL value in bio_list and re-enter the loop from the top. @@ -939,83 +924,75 @@ static blk_qc_t __submit_bio(struct bio *bio) * pretending) and so remove it from bio_list, and call into ->submit_bio() * again. * - * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. - * bio_list_on_stack[1] contains bios that were submitted before the current + * current->bio_list[0] contains bios submitted by the current ->submit_bio. + * current->bio_list[1] contains bios that were submitted before the current * ->submit_bio_bio, but that haven't been processed yet. */ static blk_qc_t __submit_bio_noacct(struct bio *bio) { - struct bio_list bio_list_on_stack[2]; - blk_qc_t ret = BLK_QC_T_NONE; - - BUG_ON(bio->bi_next); - - bio_list_init(&bio_list_on_stack[0]); - current->bio_list = bio_list_on_stack; - - do { - struct request_queue *q = bio->bi_bdev->bd_disk->queue; - struct bio_list lower, same; + struct gendisk *disk = bio->bi_bdev->bd_disk; + struct bio_list lower, same; + blk_qc_t ret; - if (unlikely(bio_queue_enter(bio) != 0)) - continue; + if (!blk_crypto_bio_prep(&bio)) { + blk_queue_exit(disk->queue); + return BLK_QC_T_NONE; + } - /* - * Create a fresh bio_list for all subordinate requests. - */ - bio_list_on_stack[1] = bio_list_on_stack[0]; - bio_list_init(&bio_list_on_stack[0]); + if (queue_is_mq(disk->queue)) + return blk_mq_submit_bio(bio); - ret = __submit_bio(bio); + /* + * Create a fresh bio_list for all subordinate requests. + */ + current->bio_list[1] = current->bio_list[0]; + bio_list_init(¤t->bio_list[0]); - /* - * Sort new bios into those for a lower level and those for the - * same level. - */ - bio_list_init(&lower); - bio_list_init(&same); - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) - if (q == bio->bi_bdev->bd_disk->queue) - bio_list_add(&same, bio); - else - bio_list_add(&lower, bio); + WARN_ON_ONCE(!disk->fops->submit_bio); + ret = disk->fops->submit_bio(bio); + blk_queue_exit(disk->queue); + /* + * Sort new bios into those for a lower level and those + * for the same level. + */ + bio_list_init(&lower); + bio_list_init(&same); + while ((bio = bio_list_pop(¤t->bio_list[0])) != NULL) + if (disk->queue == bio->bi_bdev->bd_disk->queue) + bio_list_add(&same, bio); + else + bio_list_add(&lower, bio); - /* - * Now assemble so we handle the lowest level first. - */ - bio_list_merge(&bio_list_on_stack[0], &lower); - bio_list_merge(&bio_list_on_stack[0], &same); - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); - } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); + /* + * Now assemble so we handle the lowest level first. + */ + bio_list_merge(¤t->bio_list[0], &lower); + bio_list_merge(¤t->bio_list[0], &same); + bio_list_merge(¤t->bio_list[0], ¤t->bio_list[1]); - current->bio_list = NULL; return ret; } -static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) +static inline struct block_device *bio_interposer_lock(struct bio *bio) { - struct bio_list bio_list[2] = { }; - blk_qc_t ret = BLK_QC_T_NONE; - - current->bio_list = bio_list; - - do { - struct gendisk *disk = bio->bi_bdev->bd_disk; - - if (unlikely(bio_queue_enter(bio) != 0)) - continue; + bool locked; + struct block_device *bdev = bio->bi_bdev; - if (!blk_crypto_bio_prep(&bio)) { - blk_queue_exit(disk->queue); - ret = BLK_QC_T_NONE; - continue; + if (bio->bi_opf & REQ_NOWAIT) { + locked = percpu_down_read_trylock(&bdev->bd_interposer_lock); + if (unlikely(!locked)) { + bio_wouldblock_error(bio); + return NULL; } + } else + percpu_down_read(&bdev->bd_interposer_lock); - ret = blk_mq_submit_bio(bio); - } while ((bio = bio_list_pop(&bio_list[0]))); + return bdev; +} - current->bio_list = NULL; - return ret; +static inline void bio_interposer_unlock(struct block_device *locked_bdev) +{ + percpu_up_read(&locked_bdev->bd_interposer_lock); } /** @@ -1029,6 +1006,10 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) */ blk_qc_t submit_bio_noacct(struct bio *bio) { + struct block_device *locked_bdev; + struct bio_list bio_list_on_stack[2] = { }; + blk_qc_t ret = BLK_QC_T_NONE; + if (!submit_bio_checks(bio)) return BLK_QC_T_NONE; @@ -1043,9 +1024,46 @@ blk_qc_t submit_bio_noacct(struct bio *bio) return BLK_QC_T_NONE; } - if (!bio->bi_bdev->bd_disk->fops->submit_bio) - return __submit_bio_noacct_mq(bio); - return __submit_bio_noacct(bio); + BUG_ON(bio->bi_next); + + locked_bdev = bio_interposer_lock(bio); + if (!locked_bdev) + return BLK_QC_T_NONE; + + current->bio_list = bio_list_on_stack; + + do { + if (unlikely(bio_queue_enter(bio) != 0)) { + ret = BLK_QC_T_NONE; + continue; + } + + if (!bio_flagged(bio, BIO_INTERPOSED) && + bio->bi_bdev->bd_interposer) { + struct gendisk *disk = bio->bi_bdev->bd_disk; + + bio_set_dev(bio, bio->bi_bdev->bd_interposer); + bio_set_flag(bio, BIO_INTERPOSED); + + bio_list_add(&bio_list_on_stack[0], bio); + + blk_queue_exit(disk->queue); + ret = BLK_QC_T_NONE; + continue; + } + + if (!bio_flagged(bio, BIO_REMAPPED)) + blk_partition_remap(bio); + + ret = __submit_bio_noacct(bio); + + } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); + + current->bio_list = NULL; + + bio_interposer_unlock(locked_bdev); + + return ret; } EXPORT_SYMBOL(submit_bio_noacct); From patchwork Fri Apr 9 11:48:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12193945 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F147AC433ED for ; Fri, 9 Apr 2021 11:48:50 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 77FC761165 for ; Fri, 9 Apr 2021 11:48:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77FC761165 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-596-y7jrbjNsPWCoj1nWj5ISGg-1; Fri, 09 Apr 2021 07:48:47 -0400 X-MC-Unique: y7jrbjNsPWCoj1nWj5ISGg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E35C81922035; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BB90110027C4; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 83626180B651; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 139BmV41000938 for ; Fri, 9 Apr 2021 07:48:31 -0400 Received: by smtp.corp.redhat.com (Postfix) id 3016021493CD; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast02.extmail.prod.ext.rdu2.redhat.com [10.11.55.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 29FE7210FE21 for ; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E2898001A6 for ; Fri, 9 Apr 2021 11:48:31 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-7-j29bE7U4MxuR3-uSK2b49Q-1; Fri, 09 Apr 2021 07:48:29 -0400 X-MC-Unique: j29bE7U4MxuR3-uSK2b49Q-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 1AD704097F; Fri, 9 Apr 2021 07:48:25 -0400 (EDT) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Fri, 9 Apr 2021 13:48:20 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , Date: Fri, 9 Apr 2021 14:48:03 +0300 Message-ID: <1617968884-15149-4-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> References: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59657262 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v8 3/4] Adds blk_interposer to md. X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com * The new flag DM_INTERPOSE_FLAG allows to specify that the dm target will be attached using blk_interposer. * The [interpose] option allows to specify which device will be attached via the interposer. * The connection and disconnection of the interrupter is performed in the functions __dm_suspend() and __dm_resume(). The flag DM_SUSPEND_DETACH_IP_FLAG was added for this purpose. * dm_submit_bio() sets BIO_INTERPOSED for each bio from the interposer. Signed-off-by: Sergei Shtepa Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- drivers/md/dm-core.h | 1 + drivers/md/dm-ioctl.c | 95 ++++++++++--- drivers/md/dm-table.c | 68 ++++++++- drivers/md/dm.c | 254 ++++++++++++++++++++++++++++++---- drivers/md/dm.h | 8 +- include/linux/device-mapper.h | 1 + include/uapi/linux/dm-ioctl.h | 6 + 7 files changed, 375 insertions(+), 58 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 5953ff2bd260..431b82461eae 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -112,6 +112,7 @@ struct mapped_device { /* for blk-mq request-based DM support */ struct blk_mq_tag_set *tag_set; bool init_tio_pdu:1; + bool interpose:1; struct srcu_struct io_barrier; }; diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 1ca65b434f1f..7ec37526920b 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -294,11 +294,29 @@ static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool md = hc->md; dm_get(md); - if (keep_open_devices && - dm_lock_for_deletion(md, mark_deferred, only_deferred)) { - dm_put(md); - dev_skipped++; - continue; + if (md->interpose) { + int r; + + /* + * Interposer should be suspended and detached + * from the interposed block device. + */ + r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG | + DM_SUSPEND_LOCKFS_FLAG); + if (r) { + DMERR("%s: unable to suspend and detach interposer", + dm_device_name(md)); + dm_put(md); + dev_skipped++; + continue; + } + } else { + if (keep_open_devices && + dm_lock_for_deletion(md, mark_deferred, only_deferred)) { + dm_put(md); + dev_skipped++; + continue; + } } t = __hash_remove(hc); @@ -732,6 +750,9 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param) if (dm_test_deferred_remove_flag(md)) param->flags |= DM_DEFERRED_REMOVE; + if (dm_interposer_attached(md)) + param->flags |= DM_INTERPOSE_FLAG; + param->dev = huge_encode_dev(disk_devt(disk)); /* @@ -878,20 +899,37 @@ static int dev_remove(struct file *filp, struct dm_ioctl *param, size_t param_si md = hc->md; - /* - * Ensure the device is not open and nothing further can open it. - */ - r = dm_lock_for_deletion(md, !!(param->flags & DM_DEFERRED_REMOVE), false); - if (r) { - if (r == -EBUSY && param->flags & DM_DEFERRED_REMOVE) { + if (!md->interpose) { + /* + * Ensure the device is not open and nothing further can open it. + */ + r = dm_lock_for_deletion(md, !!(param->flags & DM_DEFERRED_REMOVE), false); + if (r) { + if (r == -EBUSY && param->flags & DM_DEFERRED_REMOVE) { + up_write(&_hash_lock); + dm_put(md); + return 0; + } + DMDEBUG_LIMIT("unable to remove open device %s", + hc->name); up_write(&_hash_lock); dm_put(md); - return 0; + return r; + } + } else { + /* + * Interposer should be suspended and detached from + * the interposed block device. + */ + r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG | + DM_SUSPEND_LOCKFS_FLAG); + if (r) { + DMERR("%s: unable to suspend and detach interposer", + dm_device_name(md)); + up_write(&_hash_lock); + dm_put(md); + return r; } - DMDEBUG_LIMIT("unable to remove open device %s", hc->name); - up_write(&_hash_lock); - dm_put(md); - return r; } t = __hash_remove(hc); @@ -1050,6 +1088,7 @@ static int do_resume(struct dm_ioctl *param) md = hc->md; + new_map = hc->new_map; hc->new_map = NULL; param->flags &= ~DM_INACTIVE_PRESENT_FLAG; @@ -1063,8 +1102,14 @@ static int do_resume(struct dm_ioctl *param) suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG; if (param->flags & DM_NOFLUSH_FLAG) suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG; - if (!dm_suspended_md(md)) - dm_suspend(md, suspend_flags); + + if (md->interpose) { + if (!dm_suspended_md(md) || dm_interposer_attached(md)) + dm_suspend(md, suspend_flags | DM_SUSPEND_DETACH_IP_FLAG); + } else { + if (!dm_suspended_md(md)) + dm_suspend(md, suspend_flags); + } old_map = dm_swap_table(md, new_map); if (IS_ERR(old_map)) { @@ -1267,6 +1312,11 @@ static inline fmode_t get_mode(struct dm_ioctl *param) return mode; } +static inline bool get_interpose_flag(struct dm_ioctl *param) +{ + return (param->flags & DM_INTERPOSE_FLAG); +} + static int next_target(struct dm_target_spec *last, uint32_t next, void *end, struct dm_target_spec **spec, char **target_params) { @@ -1289,11 +1339,6 @@ static int populate_table(struct dm_table *table, void *end = (void *) param + param_size; char *target_params; - if (!param->target_count) { - DMWARN("populate_table: no targets specified"); - return -EINVAL; - } - for (i = 0; i < param->target_count; i++) { r = next_target(spec, next, end, &spec, &target_params); @@ -1338,6 +1383,8 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si if (!md) return -ENXIO; + md->interpose = get_interpose_flag(param); + r = dm_table_create(&t, get_mode(param), param->target_count, md); if (r) goto err; @@ -2098,6 +2145,8 @@ int __init dm_early_create(struct dm_ioctl *dmi, if (r) goto err_hash_remove; + md->interpose = get_interpose_flag(dmi); + /* add targets */ for (i = 0; i < dmi->target_count; i++) { r = dm_table_add_target(t, spec_array[i]->target_type, diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index e5f0f1703c5d..23574c727f2b 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -327,14 +327,14 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev, * it is accessed concurrently. */ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode, - struct mapped_device *md) + bool interpose, struct mapped_device *md) { int r; struct dm_dev *old_dev, *new_dev; old_dev = dd->dm_dev; - r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, + r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, interpose, dd->dm_dev->mode | new_mode, &new_dev); if (r) return r; @@ -367,6 +367,8 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, { int r; dev_t dev; + size_t ofs = 0; + bool interpose = false; unsigned int major, minor; char dummy; struct dm_dev_internal *dd; @@ -374,13 +376,40 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, BUG_ON(!t); - if (sscanf(path, "%u:%u%c", &major, &minor, &dummy) == 2) { + /* + * Extract extended options for device + */ + if (path[0] == '[') { + const char *interpose_opt = "interpose"; + size_t opt_pos = 1; + size_t opt_len; + + /* + * Because only one option is supported yet, the parser + * can be simplest. + */ + opt_len = strlen(interpose_opt); + if ((opt_pos + opt_len) < strlen(path) && + memcmp(&path[opt_pos], interpose_opt, opt_len) == 0) { + interpose = true; + + if (!t->md->interpose) + t->md->interpose = true; + } else { + DMERR("Invalid devices extended options %s", path); + return -EINVAL; + } + + ofs = opt_pos + opt_len + 1; + } + + if (sscanf(&path[ofs], "%u:%u%c", &major, &minor, &dummy) == 2) { /* Extract the major/minor numbers */ dev = MKDEV(major, minor); if (MAJOR(dev) != major || MINOR(dev) != minor) return -EOVERFLOW; } else { - dev = dm_get_dev_t(path); + dev = dm_get_dev_t(&path[ofs]); if (!dev) return -ENODEV; } @@ -391,7 +420,8 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, if (!dd) return -ENOMEM; - if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) { + r = dm_get_table_device(t->md, dev, mode, interpose, &dd->dm_dev); + if (r) { kfree(dd); return r; } @@ -401,14 +431,40 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, goto out; } else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) { - r = upgrade_mode(dd, mode, t->md); + r = upgrade_mode(dd, mode, interpose, t->md); if (r) return r; } refcount_inc(&dd->count); out: + if (interpose) { + struct block_device *original = dd->dm_dev->bdev; + /* + * Interposer target should cover all underlying device + */ + if (ti->begin != 0) { + DMERR("%s: target offset should be zero for dm interposer", + dm_device_name(t->md)); + r = -EINVAL; + goto fail; + } + if (bdev_nr_sectors(original) != ti->len) { + DMERR("%s: interposer and interposed block device size should be equal", + dm_device_name(t->md)); + r = -EINVAL; + goto fail; + } + } + *result = dd->dm_dev; return 0; +fail: + if (refcount_dec_and_test(&dd->count)) { + dm_put_table_device(t->md, dd->dm_dev); + list_del(&dd->list); + kfree(dd); + } + return r; } EXPORT_SYMBOL(dm_get_device); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3f3be9408afa..04142454c4ee 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -149,6 +149,7 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr); #define DMF_DEFERRED_REMOVE 6 #define DMF_SUSPENDED_INTERNALLY 7 #define DMF_POST_SUSPENDING 8 +#define DMF_INTERPOSER_ATTACHED 9 #define DM_NUMA_NODE NUMA_NO_NODE static int dm_numa_node = DM_NUMA_NODE; @@ -757,18 +758,24 @@ static int open_table_device(struct table_device *td, dev_t dev, struct mapped_device *md) { struct block_device *bdev; - + fmode_t mode = td->dm_dev.mode; + void *holder = NULL; int r; BUG_ON(td->dm_dev.bdev); - bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr); + if (!td->dm_dev.interpose) { + mode |= FMODE_EXCL; + holder = _dm_claim_ptr; + } + + bdev = blkdev_get_by_dev(dev, mode, holder); if (IS_ERR(bdev)) return PTR_ERR(bdev); r = bd_link_disk_holder(bdev, dm_disk(md)); if (r) { - blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL); + blkdev_put(bdev, mode); return r; } @@ -782,11 +789,16 @@ static int open_table_device(struct table_device *td, dev_t dev, */ static void close_table_device(struct table_device *td, struct mapped_device *md) { + fmode_t mode = td->dm_dev.mode; + if (!td->dm_dev.bdev) return; bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md)); - blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL); + if (!td->dm_dev.interpose) + mode |= FMODE_EXCL; + blkdev_put(td->dm_dev.bdev, mode); + put_dax(td->dm_dev.dax_dev); td->dm_dev.bdev = NULL; td->dm_dev.dax_dev = NULL; @@ -805,7 +817,7 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev, } int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, - struct dm_dev **result) + bool interpose, struct dm_dev **result) { int r; struct table_device *td; @@ -821,6 +833,7 @@ int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, td->dm_dev.mode = mode; td->dm_dev.bdev = NULL; + td->dm_dev.interpose = interpose; if ((r = open_table_device(td, dev, md))) { mutex_unlock(&md->table_devices_lock); @@ -1496,13 +1509,12 @@ static int __send_empty_flush(struct clone_info *ci) static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, sector_t sector, unsigned *len) { - struct bio *bio = ci->bio; struct dm_target_io *tio; int r; tio = alloc_tio(ci, ti, 0, GFP_NOIO); tio->len_ptr = len; - r = clone_bio(tio, bio, sector, *len); + r = clone_bio(tio, ci->bio, sector, *len); if (r < 0) { free_tio(tio); return r; @@ -1696,6 +1708,13 @@ static blk_qc_t dm_submit_bio(struct bio *bio) goto out; } + /* + * If md is an interposer, then we must set the BIO_INTERPOSE flag + * so that the request is not re-interposed. + */ + if (md->interpose) + bio_set_flag(bio, BIO_INTERPOSED); + /* If suspended, queue this IO for later */ if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) { if (bio->bi_opf & REQ_NOWAIT) @@ -2410,7 +2429,8 @@ static void dm_queue_flush(struct mapped_device *md) */ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table) { - struct dm_table *live_map = NULL, *map = ERR_PTR(-EINVAL); + struct dm_table *live_map = NULL; + struct dm_table *map = ERR_PTR(-EINVAL); struct queue_limits limits; int r; @@ -2453,26 +2473,50 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table) * Functions to lock and unlock any filesystem running on the * device. */ -static int lock_fs(struct mapped_device *md) +static int lock_bdev_fs(struct mapped_device *md, struct block_device *bdev) { int r; WARN_ON(test_bit(DMF_FROZEN, &md->flags)); - r = freeze_bdev(md->disk->part0); + r = freeze_bdev(bdev); if (!r) set_bit(DMF_FROZEN, &md->flags); return r; } -static void unlock_fs(struct mapped_device *md) +static void unlock_bdev_fs(struct mapped_device *md, struct block_device *bdev) { if (!test_bit(DMF_FROZEN, &md->flags)) return; - thaw_bdev(md->disk->part0); + thaw_bdev(bdev); clear_bit(DMF_FROZEN, &md->flags); } +static inline int lock_fs(struct mapped_device *md) +{ + return lock_bdev_fs(md, md->disk->part0); +} + +static inline void unlock_fs(struct mapped_device *md) +{ + unlock_bdev_fs(md, md->disk->part0); +} + +static inline struct block_device *get_interposed_bdev(struct dm_table *t) +{ + struct dm_dev_internal *dd; + + /* + * For interposer should be only one device in dm table + */ + list_for_each_entry(dd, dm_table_get_devices(t), list) + if (dd->dm_dev->interpose) + return bdgrab(dd->dm_dev->bdev); + + return NULL; +} + /* * @suspend_flags: DM_SUSPEND_LOCKFS_FLAG and/or DM_SUSPEND_NOFLUSH_FLAG * @task_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE @@ -2488,7 +2532,10 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, { bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG; bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG; - int r; + bool detach_ip = suspend_flags & DM_SUSPEND_DETACH_IP_FLAG + && md->interpose; + struct block_device *original_bdev = NULL; + int r = 0; lockdep_assert_held(&md->suspend_lock); @@ -2507,18 +2554,50 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, */ dm_table_presuspend_targets(map); + if (!md->interpose) { + /* + * Flush I/O to the device. + * Any I/O submitted after lock_fs() may not be flushed. + * noflush takes precedence over do_lockfs. + * (lock_fs() flushes I/Os and waits for them to complete.) + */ + if (!noflush && do_lockfs) + r = lock_fs(md); + } else if (map) { + /* + * Interposer should not lock mapped device, but + * should freeze interposed device and lock it. + */ + original_bdev = get_interposed_bdev(map); + if (!original_bdev) { + r = -EINVAL; + DMERR("%s: interposer cannot get interposed device from table", + dm_device_name(md)); + goto presuspend_undo; + } + + if (!noflush && do_lockfs) { + r = lock_bdev_fs(md, original_bdev); + if (r) { + DMERR("%s: interposer cannot freeze interposed device", + dm_device_name(md)); + goto presuspend_undo; + } + } + + bdev_interposer_lock(original_bdev); + } /* - * Flush I/O to the device. - * Any I/O submitted after lock_fs() may not be flushed. - * noflush takes precedence over do_lockfs. - * (lock_fs() flushes I/Os and waits for them to complete.) + * If map is not initialized, then we cannot suspend + * interposed device */ - if (!noflush && do_lockfs) { - r = lock_fs(md); - if (r) { - dm_table_presuspend_undo_targets(map); - return r; - } + +presuspend_undo: + if (r) { + if (original_bdev) + bdput(original_bdev); + dm_table_presuspend_undo_targets(map); + return r; } /* @@ -2559,14 +2638,40 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, if (map) synchronize_srcu(&md->io_barrier); - /* were we interrupted ? */ - if (r < 0) { + if (r == 0) { /* the wait ended successfully */ + if (md->interpose && original_bdev) { + if (detach_ip) { + bdev_interposer_detach(original_bdev); + clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + } + + bdev_interposer_unlock(original_bdev); + + if (detach_ip) { + /* + * If th interposer is detached, then there is + * no reason in keeping the queue of the + * interposed device stopped. + */ + unlock_bdev_fs(md, original_bdev); + } + + bdput(original_bdev); + } + } else { /* were we interrupted ? */ dm_queue_flush(md); if (dm_request_based(md)) dm_start_queue(md->queue); - unlock_fs(md); + if (md->interpose && original_bdev) { + bdev_interposer_unlock(original_bdev); + unlock_bdev_fs(md, original_bdev); + + bdput(original_bdev); + } else + unlock_fs(md); + dm_table_presuspend_undo_targets(map); /* pushback list is already flushed, so skip flush */ } @@ -2574,6 +2679,88 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, return r; } +int __dm_attach_interposer(struct mapped_device *md) +{ + int r; + struct dm_table *map; + struct block_device *original_bdev = NULL; + + if (dm_interposer_attached(md)) + return 0; + + map = rcu_dereference_protected(md->map, + lockdep_is_held(&md->suspend_lock)); + if (!map) { + DMERR("%s: interposers table is not initialized", + dm_device_name(md)); + return -EINVAL; + } + + original_bdev = get_interposed_bdev(map); + if (!original_bdev) { + DMERR("%s: interposer cannot get interposed device from table", + dm_device_name(md)); + return -EINVAL; + } + + bdev_interposer_lock(original_bdev); + + r = bdev_interposer_attach(original_bdev, dm_disk(md)->part0); + if (r) + DMERR("%s: failed to attach interposer", + dm_device_name(md)); + else + set_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + + bdev_interposer_unlock(original_bdev); + + unlock_bdev_fs(md, original_bdev); + + bdput(original_bdev); + + return r; +} + +int __dm_detach_interposer(struct mapped_device *md) +{ + struct dm_table *map = NULL; + struct block_device *original_bdev; + + if (!dm_interposer_attached(md)) + return 0; + /* + * If mapped device is suspended, but should be detached + * we just detach without freeze fs on interposed device. + */ + map = rcu_dereference_protected(md->map, + lockdep_is_held(&md->suspend_lock)); + if (!map) { + /* + * If table is not initialized then interposed device + * cannot be attached + */ + DMERR("%s: table is not initialized for device", + dm_device_name(md)); + return -EINVAL; + } + + original_bdev = get_interposed_bdev(map); + if (!original_bdev) { + DMERR("%s: interposer cannot get interposed device from table", + dm_device_name(md)); + return -EINVAL; + } + + bdev_interposer_lock(original_bdev); + + bdev_interposer_detach(original_bdev); + clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + + bdev_interposer_unlock(original_bdev); + + bdput(original_bdev); + return 0; +} /* * We need to be able to change a mapping table under a mounted * filesystem. For example we might want to move some data in @@ -2599,7 +2786,11 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING); if (dm_suspended_md(md)) { - r = -EINVAL; + if (suspend_flags & DM_SUSPEND_DETACH_IP_FLAG) + r = __dm_detach_interposer(md); + else + r = -EINVAL; + goto out_unlock; } @@ -2645,8 +2836,10 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map) if (dm_request_based(md)) dm_start_queue(md->queue); - unlock_fs(md); + if (md->interpose) + return __dm_attach_interposer(md); + unlock_fs(md); return 0; } @@ -2880,6 +3073,11 @@ int dm_suspended_md(struct mapped_device *md) return test_bit(DMF_SUSPENDED, &md->flags); } +int dm_interposer_attached(struct mapped_device *md) +{ + return test_bit(DMF_INTERPOSER_ATTACHED, &md->flags); +} + static int dm_post_suspending_md(struct mapped_device *md) { return test_bit(DMF_POST_SUSPENDING, &md->flags); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index b441ad772c18..35f71e48abd1 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -28,6 +28,7 @@ */ #define DM_SUSPEND_LOCKFS_FLAG (1 << 0) #define DM_SUSPEND_NOFLUSH_FLAG (1 << 1) +#define DM_SUSPEND_DETACH_IP_FLAG (1 << 2) /* * Status feature flags @@ -122,6 +123,11 @@ int dm_deleting_md(struct mapped_device *md); */ int dm_suspended_md(struct mapped_device *md); +/* + * Is the interposer of this mapped_device is attached? + */ +int dm_interposer_attached(struct mapped_device *md); + /* * Internal suspend and resume methods. */ @@ -180,7 +186,7 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only int dm_cancel_deferred_remove(struct mapped_device *md); int dm_request_based(struct mapped_device *md); int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, - struct dm_dev **result); + bool interpose, struct dm_dev **result); void dm_put_table_device(struct mapped_device *md, struct dm_dev *d); int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action, diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 5c641f930caf..3a7abb347702 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -159,6 +159,7 @@ struct dm_dev { struct block_device *bdev; struct dax_device *dax_dev; fmode_t mode; + bool interpose; char name[16]; }; diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h index fcff6669137b..73a5b712cd0d 100644 --- a/include/uapi/linux/dm-ioctl.h +++ b/include/uapi/linux/dm-ioctl.h @@ -362,4 +362,10 @@ enum { */ #define DM_INTERNAL_SUSPEND_FLAG (1 << 18) /* Out */ +/* + * If set, the underlying device should open without FMODE_EXCL + * and attach mapped device via bdev_interposer. + */ +#define DM_INTERPOSE_FLAG (1 << 19) /* In/Out */ + #endif /* _LINUX_DM_IOCTL_H */ From patchwork Fri Apr 9 11:48:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12193947 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 370B8C433B4 for ; Fri, 9 Apr 2021 11:48:53 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB30E610CC for ; Fri, 9 Apr 2021 11:48:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB30E610CC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-165-44IF6TsZNhGLzJ9h86ssZw-1; Fri, 09 Apr 2021 07:48:49 -0400 X-MC-Unique: 44IF6TsZNhGLzJ9h86ssZw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DB7D287A82A; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B590F100164A; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 8258F5533E; Fri, 9 Apr 2021 11:48:43 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 139BmXac000972 for ; Fri, 9 Apr 2021 07:48:34 -0400 Received: by smtp.corp.redhat.com (Postfix) id BD310210FE21; Fri, 9 Apr 2021 11:48:33 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B0D9821493DF for ; Fri, 9 Apr 2021 11:48:33 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6987385A5BD for ; Fri, 9 Apr 2021 11:48:33 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-89-Q3ccpOqyMf-tj_GDlbANVA-1; Fri, 09 Apr 2021 07:48:31 -0400 X-MC-Unique: Q3ccpOqyMf-tj_GDlbANVA-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 461554146C; Fri, 9 Apr 2021 07:48:28 -0400 (EDT) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Fri, 9 Apr 2021 13:48:21 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , Date: Fri, 9 Apr 2021 14:48:04 +0300 Message-ID: <1617968884-15149-5-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> References: <1617968884-15149-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59657262 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v8 4/4] fix origin_map - don't split a bio for the origin device if it does not have registered snapshots. X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Signed-off-by: Sergei Shtepa --- drivers/md/dm-snap.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 11890db71f3f..81e8e3bb6d25 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -2685,11 +2685,18 @@ static int origin_map(struct dm_target *ti, struct bio *bio) if (bio_data_dir(bio) != WRITE) return DM_MAPIO_REMAPPED; - available_sectors = o->split_boundary - - ((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1)); + /* + * If no snapshot is connected to origin, then split_boundary + * will be set to zero. In this case, we don't need to split a bio. + */ + if (o->split_boundary) { + available_sectors = o->split_boundary - + ((unsigned int)bio->bi_iter.bi_sector & + (o->split_boundary - 1)); - if (bio_sectors(bio) > available_sectors) - dm_accept_partial_bio(bio, available_sectors); + if (bio_sectors(bio) > available_sectors) + dm_accept_partial_bio(bio, available_sectors); + } /* Only tell snapshots if this is a write */ return do_origin(o->dev, bio, true);