From patchwork Wed Mar 3 12:30:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12113359 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DDFAC433DB for ; Wed, 3 Mar 2021 12:31:20 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D7E2164ECF for ; Wed, 3 Mar 2021 12:31:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7E2164ECF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-74-ebUeCRSuOQSspRvU8kLzCw-1; Wed, 03 Mar 2021 07:31:16 -0500 X-MC-Unique: ebUeCRSuOQSspRvU8kLzCw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D9E84800D55; Wed, 3 Mar 2021 12:31:11 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B466E5C261; Wed, 3 Mar 2021 12:31:11 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 7C81A1809C8F; Wed, 3 Mar 2021 12:31:11 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 123CUp9k011543 for ; Wed, 3 Mar 2021 07:30:51 -0500 Received: by smtp.corp.redhat.com (Postfix) id 360A32B5300; Wed, 3 Mar 2021 12:30:51 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BCB1F2B2B10 for ; Wed, 3 Mar 2021 12:30:48 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2547E862FFE for ; Wed, 3 Mar 2021 12:30:44 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-370-RWdslpKcOG25uGHvuTP-3g-1; Wed, 03 Mar 2021 07:30:42 -0500 X-MC-Unique: RWdslpKcOG25uGHvuTP-3g-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 42C0E41501; Wed, 3 Mar 2021 07:30:38 -0500 (EST) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Wed, 3 Mar 2021 13:30:36 +0100 From: Sergei Shtepa To: , , , , , , , , , Date: Wed, 3 Mar 2021 15:30:15 +0300 Message-ID: <1614774618-22410-2-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> References: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29C604D265637363 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v6 1/4] block: add blk_mq_is_queue_frozen() X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com blk_mq_is_queue_frozen() allow to assert that the queue is frozen. Signed-off-by: Sergei Shtepa --- block/blk-mq.c | 12 ++++++++++++ include/linux/blk-mq.h | 1 + 2 files changed, 13 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index d4d7c1caa439..d5e7122789fc 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -161,6 +161,18 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout); +bool blk_mq_is_queue_frozen(struct request_queue *q) +{ + bool ret; + + mutex_lock(&q->mq_freeze_lock); + ret = percpu_ref_is_dying(&q->q_usage_counter) && percpu_ref_is_zero(&q->q_usage_counter); + mutex_unlock(&q->mq_freeze_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(blk_mq_is_queue_frozen); + /* * Guarantee no request is in use, so we can change any data structure of * the queue afterward. diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2c473c9b8990..6f01971abf7b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -533,6 +533,7 @@ void blk_freeze_queue_start(struct request_queue *q); void blk_mq_freeze_queue_wait(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout); +bool blk_mq_is_queue_frozen(struct request_queue *q); int blk_mq_map_queues(struct blk_mq_queue_map *qmap); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); From patchwork Wed Mar 3 12:30:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12113357 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51066C433E6 for ; Wed, 3 Mar 2021 12:31:05 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BED3B64ECF for ; Wed, 3 Mar 2021 12:31:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BED3B64ECF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-336-p6Y9RPFWMB2MqDtT-5Uknw-1; Wed, 03 Mar 2021 07:31:00 -0500 X-MC-Unique: p6Y9RPFWMB2MqDtT-5Uknw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B799251F7; Wed, 3 Mar 2021 12:30:55 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 24C2010016FA; Wed, 3 Mar 2021 12:30:55 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id C01EA18089CC; Wed, 3 Mar 2021 12:30:52 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 123CUp0A011533 for ; Wed, 3 Mar 2021 07:30:51 -0500 Received: by smtp.corp.redhat.com (Postfix) id EA6991302A9A; Wed, 3 Mar 2021 12:30:50 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast03.extmail.prod.ext.rdu2.redhat.com [10.11.55.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A861B1302A71 for ; Wed, 3 Mar 2021 12:30:50 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 857DE805F27 for ; Wed, 3 Mar 2021 12:30:50 +0000 (UTC) Received: from mx4.veeam.com (mx4.veeam.com [104.41.138.86]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-383-f2V8cH79N_qld5rp58SQpA-1; Wed, 03 Mar 2021 07:30:45 -0500 X-MC-Unique: f2V8cH79N_qld5rp58SQpA-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx4.veeam.com (Postfix) with ESMTPS id 488D8114A96; Wed, 3 Mar 2021 15:30:43 +0300 (MSK) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Wed, 3 Mar 2021 13:30:41 +0100 From: Sergei Shtepa To: , , , , , , , , , Date: Wed, 3 Mar 2021 15:30:16 +0300 Message-ID: <1614774618-22410-3-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> References: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29C604D265637363 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v6 2/4] block: add blk_interposer X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com blk_interposer allows to intercept bio requests, remap bio to another devices or add new bios. Signed-off-by: Sergei Shtepa --- block/bio.c | 2 + block/blk-core.c | 36 +++++++++++++++ block/genhd.c | 93 +++++++++++++++++++++++++++++++++++++++ include/linux/blk_types.h | 4 ++ include/linux/blkdev.h | 17 +++++++ 5 files changed, 152 insertions(+) diff --git a/block/bio.c b/block/bio.c index a1c4d2900c7a..0bfbf06475ee 100644 --- a/block/bio.c +++ b/block/bio.c @@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio_set_flag(bio, BIO_THROTTLED); if (bio_flagged(bio_src, BIO_REMAPPED)) bio_set_flag(bio, BIO_REMAPPED); + if (bio_flagged(bio_src, BIO_INTERPOSED)) + bio_set_flag(bio, BIO_INTERPOSED); bio->bi_opf = bio_src->bi_opf; bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_write_hint = bio_src->bi_write_hint; diff --git a/block/blk-core.c b/block/blk-core.c index fc60ff208497..e749507cadd3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1018,6 +1018,34 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) return ret; } +static blk_qc_t __submit_bio_interposed(struct bio *bio) +{ + struct bio_list bio_list[2] = { }; + blk_qc_t ret = BLK_QC_T_NONE; + + current->bio_list = bio_list; + if (likely(bio_queue_enter(bio) == 0)) { + struct block_device *bdev = bio->bi_bdev; + + if (likely(bdev_has_interposer(bdev))) { + bio_set_flag(bio, BIO_INTERPOSED); + bdev->bd_interposer->ip_submit_bio(bio); + } else { + /* interposer was removed */ + bio_list_add(¤t->bio_list[0], bio); + } + + blk_queue_exit(bdev->bd_disk->queue); + } + current->bio_list = NULL; + + /* Resubmit remaining bios */ + while ((bio = bio_list_pop(&bio_list[0]))) + ret = submit_bio_noacct(bio); + + return ret; +} + /** * submit_bio_noacct - re-submit a bio to the block device layer for I/O * @bio: The bio describing the location in memory and on the device. @@ -1043,6 +1071,14 @@ blk_qc_t submit_bio_noacct(struct bio *bio) return BLK_QC_T_NONE; } + /* + * Checking the BIO_INTERPOSED flag is necessary so that the bio + * created by the bdev_interposer do not get to it for processing. + */ + if (bdev_has_interposer(bio->bi_bdev) && + !bio_flagged(bio, BIO_INTERPOSED)) + return __submit_bio_interposed(bio); + if (!bio->bi_bdev->bd_disk->fops->submit_bio) return __submit_bio_noacct_mq(bio); return __submit_bio_noacct(bio); diff --git a/block/genhd.c b/block/genhd.c index fcc530164b5a..1ae8516643c8 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -30,6 +30,11 @@ static struct kobject *block_depr; DECLARE_RWSEM(bdev_lookup_sem); +/* + * Prevents different block-layer interposers from attaching or detaching + * to the block device at the same time. + */ +DEFINE_MUTEX(bdev_interposer_attach_lock); /* for extended dynamic devt allocation, currently only one major is used */ #define NR_EXT_DEVT (1 << MINORBITS) @@ -1941,3 +1946,91 @@ static void disk_release_events(struct gendisk *disk) WARN_ON_ONCE(disk->ev && disk->ev->block != 1); kfree(disk->ev); } + +/** + * bdev_interposer_attach - Attach interposer to block device + * @bdev: target block device + * @interposer: block device interposer + * @ip_submit_bio: hook for submit_bio() + * + * Returns: + * -EINVAL if @interposer is NULL. + * -EPERM if queue is not frozen. + * -EBUSY if the block device already has @interposer. + * -EALREADY if the block device already has @interposer with same callback. + * -ENODEV if the block device cannot be referenced. + * + * Disk must be frozen by blk_mq_freeze_queue(). + */ +int bdev_interposer_attach(struct block_device *bdev, struct bdev_interposer *interposer, + const ip_submit_bio_t ip_submit_bio) +{ + int ret = 0; + + if (WARN_ON(!interposer)) + return -EINVAL; + + if (!blk_mq_is_queue_frozen(bdev->bd_disk->queue)) + return -EPERM; + + mutex_lock(&bdev_interposer_attach_lock); + if (bdev_has_interposer(bdev)) { + if (bdev->bd_interposer->ip_submit_bio == ip_submit_bio) + ret = -EALREADY; + else + ret = -EBUSY; + goto out; + } + + interposer->ip_submit_bio = ip_submit_bio; + + interposer->bdev = bdgrab(bdev); + if (!interposer->bdev) { + ret = -ENODEV; + goto out; + } + + bdev->bd_interposer = interposer; +out: + mutex_unlock(&bdev_interposer_attach_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(bdev_interposer_attach); + +/** + * bdev_interposer_detach - Detach interposer from block device + * @interposer: block device interposer + * @ip_submit_bio: hook for submit_bio() + * + * Disk must be frozen by blk_mq_freeze_queue(). + */ +void bdev_interposer_detach(struct bdev_interposer *interposer, + const ip_submit_bio_t ip_submit_bio) +{ + struct block_device *bdev; + + if (WARN_ON(!interposer)) + return; + + mutex_lock(&bdev_interposer_attach_lock); + + /* Check if the interposer is still active. */ + bdev = interposer->bdev; + if (WARN_ON(!bdev)) + goto out; + + if (WARN_ON(!blk_mq_is_queue_frozen(bdev->bd_disk->queue))) + goto out; + + /* Check if it is really our interposer. */ + if (WARN_ON(bdev->bd_interposer->ip_submit_bio != ip_submit_bio)) + goto out; + + bdev->bd_interposer = NULL; + interposer->bdev = NULL; + bdput(bdev); +out: + mutex_unlock(&bdev_interposer_attach_lock); +} +EXPORT_SYMBOL_GPL(bdev_interposer_detach); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index db026b6ec15a..2b43f65bb356 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -19,6 +19,7 @@ struct io_context; struct cgroup_subsys_state; typedef void (bio_end_io_t) (struct bio *); struct bio_crypt_ctx; +struct bdev_interposer; struct block_device { sector_t bd_start_sect; @@ -46,6 +47,7 @@ struct block_device { spinlock_t bd_size_lock; /* for bd_inode->i_size updates */ struct gendisk * bd_disk; struct backing_dev_info *bd_bdi; + struct bdev_interposer * bd_interposer; /* The counter of freeze processes */ int bd_fsfreeze_count; @@ -304,6 +306,8 @@ enum { BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ BIO_TRACKED, /* set if bio goes through the rq_qos path */ BIO_REMAPPED, + BIO_INTERPOSED, /* bio has been interposed and can be moved to + * a different block device */ BIO_FLAG_LAST }; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c032cfe133c7..82f8515fa3c8 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -2033,4 +2033,21 @@ int fsync_bdev(struct block_device *bdev); int freeze_bdev(struct block_device *bdev); int thaw_bdev(struct block_device *bdev); +/* + * block layer interposers structure and functions + */ +typedef void (*ip_submit_bio_t) (struct bio *bio); + +struct bdev_interposer { + ip_submit_bio_t ip_submit_bio; + struct block_device *bdev; +}; + +#define bdev_has_interposer(bd) ((bd)->bd_interposer != NULL) + +int bdev_interposer_attach(struct block_device *bdev, struct bdev_interposer *interposer, + const ip_submit_bio_t ip_submit_bio); +void bdev_interposer_detach(struct bdev_interposer *interposer, + const ip_submit_bio_t ip_submit_bio); + #endif /* _LINUX_BLKDEV_H */ From patchwork Wed Mar 3 12:30:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12113361 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2512C433E6 for ; Wed, 3 Mar 2021 12:31:21 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 137D264EBD for ; Wed, 3 Mar 2021 12:31:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 137D264EBD Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-436-hUTW6MglMteGLbI9pVDM5A-1; Wed, 03 Mar 2021 07:31:17 -0500 X-MC-Unique: hUTW6MglMteGLbI9pVDM5A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 028D6100CCC4; Wed, 3 Mar 2021 12:31:12 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D889D5D9C2; Wed, 3 Mar 2021 12:31:11 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 9FBB95002F; Wed, 3 Mar 2021 12:31:11 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 123CUvNO011586 for ; Wed, 3 Mar 2021 07:30:57 -0500 Received: by smtp.corp.redhat.com (Postfix) id 3873921121A3; Wed, 3 Mar 2021 12:30:57 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 32BE221121A1 for ; Wed, 3 Mar 2021 12:30:57 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14ABE858F13 for ; Wed, 3 Mar 2021 12:30:57 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-37-2s328oH9NSiu22V-EpPNDw-1; Wed, 03 Mar 2021 07:30:52 -0500 X-MC-Unique: 2s328oH9NSiu22V-EpPNDw-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 0622741352; Wed, 3 Mar 2021 07:30:49 -0500 (EST) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Wed, 3 Mar 2021 13:30:46 +0100 From: Sergei Shtepa To: , , , , , , , , , Date: Wed, 3 Mar 2021 15:30:17 +0300 Message-ID: <1614774618-22410-4-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> References: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29C604D265637363 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v6 3/4] dm: introduce dm-interposer X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com dm-interposer.c/. h contains code for working with blk_interposer and provides an API for interposer in device-mapper. Signed-off-by: Sergei Shtepa --- drivers/md/Makefile | 2 +- drivers/md/dm-interposer.c | 258 +++++++++++++++++++++++++++++++++++++ drivers/md/dm-interposer.h | 40 ++++++ 3 files changed, 299 insertions(+), 1 deletion(-) create mode 100644 drivers/md/dm-interposer.c create mode 100644 drivers/md/dm-interposer.h diff --git a/drivers/md/Makefile b/drivers/md/Makefile index ef7ddc27685c..bd5b38bee82e 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -5,7 +5,7 @@ dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \ dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o dm-stats.o \ - dm-rq.o + dm-rq.o dm-interposer.o dm-multipath-y += dm-path-selector.o dm-mpath.o dm-historical-service-time-y += dm-ps-historical-service-time.o dm-io-affinity-y += dm-ps-io-affinity.o diff --git a/drivers/md/dm-interposer.c b/drivers/md/dm-interposer.c new file mode 100644 index 000000000000..e5346db81def --- /dev/null +++ b/drivers/md/dm-interposer.c @@ -0,0 +1,258 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include + +#include "dm-core.h" +#include "dm-interposer.h" + +#define DM_MSG_PREFIX "interposer" + +struct dm_interposer { + struct bdev_interposer blk_ip; + + struct kref kref; + struct rw_semaphore ip_devs_lock; + struct rb_root_cached ip_devs_root; /* dm_interposed_dev tree, since there can be multiple + * interceptors for different ranges for a single + * block device. */ +}; + +/* + * Interval tree for device mapper + */ +#define START(node) ((node)->start) +#define LAST(node) ((node)->last) +INTERVAL_TREE_DEFINE(struct dm_rb_range, node, sector_t, _subtree_last, + START, LAST,, dm_rb); + +static DEFINE_MUTEX(dm_interposer_attach_lock); + +static void dm_submit_bio_interposer_fn(struct bio *bio) +{ + struct dm_interposer *ip; + unsigned int noio_flag = 0; + sector_t start; + sector_t last; + struct dm_rb_range *node; + + ip = container_of(bio->bi_bdev->bd_interposer, struct dm_interposer, blk_ip); + + start = bio->bi_iter.bi_sector; + if (bio_flagged(bio, BIO_REMAPPED)) + start -= get_start_sect(bio->bi_bdev); + last = start + dm_sector_div_up(bio->bi_iter.bi_size, SECTOR_SIZE); + + noio_flag = memalloc_noio_save(); + down_read(&ip->ip_devs_lock); + node = dm_rb_iter_first(&ip->ip_devs_root, start, last); + while (node) { + struct dm_interposed_dev *ip_dev = + container_of(node, struct dm_interposed_dev, node); + + atomic64_inc(&ip_dev->ip_cnt); + ip_dev->dm_interpose_bio(ip_dev, bio); + + node = dm_rb_iter_next(node, start, last); + } + up_read(&ip->ip_devs_lock); + memalloc_noio_restore(noio_flag); +} + +void dm_interposer_free(struct kref *kref) +{ + struct dm_interposer *ip = container_of(kref, struct dm_interposer, kref); + + bdev_interposer_detach(&ip->blk_ip, dm_submit_bio_interposer_fn); + + kfree(ip); +} + +struct dm_interposer *dm_interposer_new(struct block_device *bdev) +{ + int ret = 0; + struct dm_interposer *ip; + + ip = kzalloc(sizeof(struct dm_interposer), GFP_NOIO); + if (!ip) + return ERR_PTR(-ENOMEM); + + kref_init(&ip->kref); + init_rwsem(&ip->ip_devs_lock); + ip->ip_devs_root = RB_ROOT_CACHED; + + ret = bdev_interposer_attach(bdev, &ip->blk_ip, dm_submit_bio_interposer_fn); + if (ret) { + DMERR("Failed to attach bdev_interposer."); + kref_put(&ip->kref, dm_interposer_free); + return ERR_PTR(ret); + } + + return ip; +} + +static struct dm_interposer *dm_interposer_get(struct block_device *bdev) +{ + struct dm_interposer *ip; + + if (!bdev_has_interposer(bdev)) + return NULL; + + if (bdev->bd_interposer->ip_submit_bio != dm_submit_bio_interposer_fn) { + DMERR("Block devices interposer slot already occupied."); + return ERR_PTR(-EBUSY); + } + + ip = container_of(bdev->bd_interposer, struct dm_interposer, blk_ip); + + kref_get(&ip->kref); + return ip; +} + +static inline void dm_disk_freeze(struct gendisk *disk) +{ + blk_mq_freeze_queue(disk->queue); + blk_mq_quiesce_queue(disk->queue); +} + +static inline void dm_disk_unfreeze(struct gendisk *disk) +{ + blk_mq_unquiesce_queue(disk->queue); + blk_mq_unfreeze_queue(disk->queue); +} + +/** + * dm_interposer_dev_init - initialize interposed device + * @ip_dev: interposed device + * @ofs: offset from the beginning of the block device + * @len: the length of the part of the block device to which requests will be interposed + * @private: user purpose parameter + * @interpose_fn: interposing callback + * + * Initialize structure dm_interposed_dev. + * For interposing part of block device set ofs and len. + * For interposing whole device set ofs=0 and len=0. + */ +void dm_interposer_dev_init(struct dm_interposed_dev *ip_dev, + sector_t ofs, sector_t len, + void *private, dm_interpose_bio_t interpose_fn) +{ + ip_dev->node.start = ofs; + ip_dev->node.last = ofs + len - 1; + ip_dev->dm_interpose_bio = interpose_fn; + ip_dev->private = private; + + atomic64_set(&ip_dev->ip_cnt, 0); +} + +/** + * dm_interposer_dev_attach - attach interposed device to his block device + * @bdev: block device + * @ip_dev: interposed device + * + * Return error code. + */ +int dm_interposer_dev_attach(struct block_device *bdev, struct dm_interposed_dev *ip_dev) +{ + int ret = 0; + struct dm_interposer *ip = NULL; + unsigned int noio_flag = 0; + + if (!ip_dev) + return -EINVAL; + + dm_disk_freeze(bdev->bd_disk); + mutex_lock(&dm_interposer_attach_lock); + noio_flag = memalloc_noio_save(); + + ip = dm_interposer_get(bdev); + if (ip == NULL) + ip = dm_interposer_new(bdev); + if (IS_ERR(ip)) { + ret = PTR_ERR(ip); + goto out; + } + + /* Attach dm_interposed_dev to dm_interposer */ + down_write(&ip->ip_devs_lock); + do { + struct dm_rb_range *node; + + /* checking that ip_dev already exists for this region */ + node = dm_rb_iter_first(&ip->ip_devs_root, ip_dev->node.start, ip_dev->node.last); + if (node) { + DMERR("Block device in region [%llu,%llu] already have interposer.", + node->start, node->last); + + ret = -EBUSY; + break; + } + + /* insert ip_dev to ip tree */ + dm_rb_insert(&ip_dev->node, &ip->ip_devs_root); + /* increment ip reference counter */ + kref_get(&ip->kref); + } while (false); + up_write(&ip->ip_devs_lock); + + kref_put(&ip->kref, dm_interposer_free); + +out: + memalloc_noio_restore(noio_flag); + mutex_unlock(&dm_interposer_attach_lock); + dm_disk_unfreeze(bdev->bd_disk); + + return ret; +} + +/** + * dm_interposer_detach_dev - detach interposed device from his block device + * @bdev: block device + * @ip_dev: interposed device + * + * Return error code. + */ +int dm_interposer_detach_dev(struct block_device *bdev, struct dm_interposed_dev *ip_dev) +{ + int ret = 0; + struct dm_interposer *ip = NULL; + unsigned int noio_flag = 0; + + if (!ip_dev) + return -EINVAL; + + dm_disk_freeze(bdev->bd_disk); + mutex_lock(&dm_interposer_attach_lock); + noio_flag = memalloc_noio_save(); + + ip = dm_interposer_get(bdev); + if (IS_ERR(ip)) { + ret = PTR_ERR(ip); + DMERR("Interposer not found."); + goto out; + } + if (unlikely(ip == NULL)) { + ret = -ENXIO; + DMERR("Interposer not found."); + goto out; + } + + down_write(&ip->ip_devs_lock); + { + dm_rb_remove(&ip_dev->node, &ip->ip_devs_root); + /* the reference counter here cannot be zero */ + kref_put(&ip->kref, dm_interposer_free); + } + up_write(&ip->ip_devs_lock); + + /* detach and free interposer if it's not needed */ + kref_put(&ip->kref, dm_interposer_free); +out: + memalloc_noio_restore(noio_flag); + mutex_unlock(&dm_interposer_attach_lock); + dm_disk_unfreeze(bdev->bd_disk); + + return ret; +} diff --git a/drivers/md/dm-interposer.h b/drivers/md/dm-interposer.h new file mode 100644 index 000000000000..17a5411f6f00 --- /dev/null +++ b/drivers/md/dm-interposer.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Device mapper's interposer. + */ + +#include + +struct dm_rb_range { + struct rb_node node; + sector_t start; /* start sector of rb node */ + sector_t last; /* end sector of rb node */ + sector_t _subtree_last; /* highest sector in subtree of rb node */ +}; + +typedef void (*dm_interpose_bio_t) (struct dm_interposed_dev *ip_dev, struct bio *bio); + +struct dm_interposed_dev { + struct dm_rb_range node; + void *private; + dm_interpose_bio_t dm_interpose_bio; + + atomic64_t ip_cnt; /* for debug purpose only */ +}; + +/* + * Initialize structure dm_interposed_dev. + * For interposing part of block device set ofs and len. + * For interposing whole device set ofs=0 and len=0. + */ +void dm_interposer_dev_init(struct dm_interposed_dev *ip_dev, + sector_t ofs, sector_t len, + void *private, dm_interpose_bio_t interpose_fn); +/* + * Attach interposer to his block device. + */ +int dm_interposer_dev_attach(struct block_device *bdev, struct dm_interposed_dev *ip_dev); +/* + * Detach interposer from his block device. + */ +int dm_interposer_detach_dev(struct block_device *bdev, struct dm_interposed_dev *ip_dev); From patchwork Wed Mar 3 12:30:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12113363 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466CEC433E0 for ; Wed, 3 Mar 2021 12:31:23 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD82164EBD for ; Wed, 3 Mar 2021 12:31:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD82164EBD Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=veeam.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-523-y3kUsxXzM1mUMgqhTUYvXw-1; Wed, 03 Mar 2021 07:31:19 -0500 X-MC-Unique: y3kUsxXzM1mUMgqhTUYvXw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 66411100CCC1; Wed, 3 Mar 2021 12:31:14 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4730560C15; Wed, 3 Mar 2021 12:31:14 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 176351809C91; Wed, 3 Mar 2021 12:31:14 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 123CV1hQ011600 for ; Wed, 3 Mar 2021 07:31:01 -0500 Received: by smtp.corp.redhat.com (Postfix) id 8036C114A1D7; Wed, 3 Mar 2021 12:31:01 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B35F10500D8 for ; Wed, 3 Mar 2021 12:31:01 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5F87198D3B7 for ; Wed, 3 Mar 2021 12:31:01 +0000 (UTC) Received: from mx2.veeam.com (mx2.veeam.com [64.129.123.6]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-170-K9ZXCrHZNda3UbOKY-Wq9Q-1; Wed, 03 Mar 2021 07:30:59 -0500 X-MC-Unique: K9ZXCrHZNda3UbOKY-Wq9Q-1 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 5920442316; Wed, 3 Mar 2021 07:30:55 -0500 (EST) Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.721.2; Wed, 3 Mar 2021 13:30:51 +0100 From: Sergei Shtepa To: , , , , , , , , , Date: Wed, 3 Mar 2021 15:30:18 +0300 Message-ID: <1614774618-22410-5-git-send-email-sergei.shtepa@veeam.com> In-Reply-To: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> References: <1614774618-22410-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx01.amust.local (172.24.0.171) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29C604D265637363 X-Veeam-MMEX: True X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-loop: dm-devel@redhat.com Cc: pavel.tide@veeam.com, sergei.shtepa@veeam.com Subject: [dm-devel] [PATCH v6 4/4] dm: add DM_INTERPOSED_FLAG X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com DM_INTERPOSED_FLAG allow to create dm targets on "the fly". Underlying block device opens without a flag FMODE_EXCL. Dm target receives bio from the original device via blk_interposer. Signed-off-by: Sergei Shtepa --- drivers/md/dm-core.h | 6 ++ drivers/md/dm-ioctl.c | 9 +++ drivers/md/dm-table.c | 115 +++++++++++++++++++++++++++++++--- drivers/md/dm.c | 38 +++++++---- include/linux/device-mapper.h | 1 + include/uapi/linux/dm-ioctl.h | 6 ++ 6 files changed, 154 insertions(+), 21 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 5953ff2bd260..e5c845f9b1df 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -21,6 +21,8 @@ #define DM_RESERVED_MAX_IOS 1024 +struct dm_interposed_dev; + struct dm_kobject_holder { struct kobject kobj; struct completion completion; @@ -114,6 +116,10 @@ struct mapped_device { bool init_tio_pdu:1; struct srcu_struct io_barrier; + + /* for interposers logic */ + bool is_interposed; + struct dm_interposed_dev *ip_dev; }; void disable_discard(struct mapped_device *md); diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 5e306bba4375..2bcb316144a1 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1267,6 +1267,11 @@ static inline fmode_t get_mode(struct dm_ioctl *param) return mode; } +static inline bool get_interposer_flag(struct dm_ioctl *param) +{ + return (param->flags & DM_INTERPOSED_FLAG); +} + static int next_target(struct dm_target_spec *last, uint32_t next, void *end, struct dm_target_spec **spec, char **target_params) { @@ -1338,6 +1343,8 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si if (!md) return -ENXIO; + md->is_interposed = get_interposer_flag(param); + r = dm_table_create(&t, get_mode(param), param->target_count, md); if (r) goto err; @@ -2098,6 +2105,8 @@ int __init dm_early_create(struct dm_ioctl *dmi, if (r) goto err_hash_remove; + md->is_interposed = get_interposer_flag(dmi); + /* add targets */ for (i = 0; i < dmi->target_count; i++) { r = dm_table_add_target(t, spec_array[i]->target_type, diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 95391f78b8d5..0b2f9b66ade5 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -6,6 +6,7 @@ */ #include "dm-core.h" +#include "dm-interposer.h" #include #include @@ -225,12 +226,13 @@ void dm_table_destroy(struct dm_table *t) /* * See if we've already got a device in the list. */ -static struct dm_dev_internal *find_device(struct list_head *l, dev_t dev) +static struct dm_dev_internal *find_device(struct list_head *l, dev_t dev, bool is_interposed) { struct dm_dev_internal *dd; list_for_each_entry (dd, l, list) - if (dd->dm_dev->bdev->bd_dev == dev) + if ((dd->dm_dev->bdev->bd_dev == dev) && + (dd->dm_dev->is_interposed == is_interposed)) return dd; return NULL; @@ -358,6 +360,90 @@ dev_t dm_get_dev_t(const char *path) } EXPORT_SYMBOL_GPL(dm_get_dev_t); +/* + * Redirect bio from interposed device to dm device + */ +static void dm_interpose_fn(struct dm_interposed_dev *ip_dev, struct bio *bio) +{ + struct mapped_device *md = ip_dev->private; + + if (bio_flagged(bio, BIO_REMAPPED)) { + /* + * Since bio has already been remapped, we need to subtract + * the block device offset from the beginning of the disk. + */ + bio->bi_iter.bi_sector -= get_start_sect(bio->bi_bdev); + + bio_clear_flag(bio, BIO_REMAPPED); + } + + /* + * Set acceptor device. + * It is quite convenient that device mapper creates + * one disk for one block device. + */ + bio->bi_bdev = md->disk->part0; + + /* + * Bio should be resubmitted. + * The bio will be checked again and placed in current->bio_list. + */ + submit_bio_noacct(bio); +} + +static int _interposer_dev_create(struct block_device *bdev, sector_t ofs, sector_t len, + struct mapped_device *md) +{ + int ret; + + DMDEBUG("Create dm interposer."); + + if (md->ip_dev) { + DMERR("The dm interposer device already in use."); + return -EALREADY; + } + + if ((ofs + len) > bdev_nr_sectors(bdev)) { + DMERR("The specified range of sectors exceeds of the size of the block device."); + return -ERANGE; + } + + md->ip_dev = kzalloc(sizeof(struct dm_interposed_dev), GFP_KERNEL); + if (!md->ip_dev) + return -ENOMEM; + + if ((ofs == 0) && (len == 0)) + DMDEBUG("Whole block device should be interposed."); + + dm_interposer_dev_init(md->ip_dev, + ofs, len, + md, dm_interpose_fn); + + ret = dm_interposer_dev_attach(bdev, md->ip_dev); + if (ret) { + DMERR("Cannot attach dm interposer device."); + kfree(md->ip_dev); + md->ip_dev = NULL; + } + + return ret; +} + +static void _interposer_dev_remove(struct block_device *bdev, struct mapped_device *md) +{ + if (!md->ip_dev) + return; + + DMDEBUG("Remove dm interposer. %llu bios was interposed.", + atomic64_read(&md->ip_dev->ip_cnt)); + + if (dm_interposer_detach_dev(bdev, md->ip_dev)) + DMERR("Failed to detach dm interposer."); + + kfree(md->ip_dev); + md->ip_dev = NULL; +} + /* * Add a device to the list, or just increment the usage count if * it's already present. @@ -385,7 +471,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, return -ENODEV; } - dd = find_device(&t->devices, dev); + dd = find_device(&t->devices, dev, t->md->is_interposed); if (!dd) { dd = kmalloc(sizeof(*dd), GFP_KERNEL); if (!dd) @@ -398,15 +484,22 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, refcount_set(&dd->count, 1); list_add(&dd->list, &t->devices); - goto out; - } else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) { r = upgrade_mode(dd, mode, t->md); if (r) return r; + refcount_inc(&dd->count); } - refcount_inc(&dd->count); -out: + + if (t->md->is_interposed) { + r = _interposer_dev_create(dd->dm_dev->bdev, ti->begin, ti->len, t->md); + if (r) { + dm_put_device(ti, dd->dm_dev); + DMERR("Failed to attach dm interposer."); + return r; + } + } + *result = dd->dm_dev; return 0; } @@ -446,6 +539,7 @@ void dm_put_device(struct dm_target *ti, struct dm_dev *d) { int found = 0; struct list_head *devices = &ti->table->devices; + struct mapped_device *md = ti->table->md; struct dm_dev_internal *dd; list_for_each_entry(dd, devices, list) { @@ -456,11 +550,14 @@ void dm_put_device(struct dm_target *ti, struct dm_dev *d) } if (!found) { DMWARN("%s: device %s not in table devices list", - dm_device_name(ti->table->md), d->name); + dm_device_name(md), d->name); return; } + if (md->is_interposed) + _interposer_dev_remove(d->bdev, md); + if (refcount_dec_and_test(&dd->count)) { - dm_put_table_device(ti->table->md, d); + dm_put_table_device(md, d); list_del(&dd->list); kfree(dd); } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 50b693d776d6..466bf70a66b0 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -762,16 +762,24 @@ static int open_table_device(struct table_device *td, dev_t dev, BUG_ON(td->dm_dev.bdev); - bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr); - if (IS_ERR(bdev)) - return PTR_ERR(bdev); + if (md->is_interposed) { - r = bd_link_disk_holder(bdev, dm_disk(md)); - if (r) { - blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL); - return r; + bdev = blkdev_get_by_dev(dev, td->dm_dev.mode, NULL); + if (IS_ERR(bdev)) + return PTR_ERR(bdev); + } else { + bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr); + if (IS_ERR(bdev)) + return PTR_ERR(bdev); + + r = bd_link_disk_holder(bdev, dm_disk(md)); + if (r) { + blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL); + return r; + } } + td->dm_dev.is_interposed = md->is_interposed; td->dm_dev.bdev = bdev; td->dm_dev.dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); return 0; @@ -785,20 +793,26 @@ static void close_table_device(struct table_device *td, struct mapped_device *md if (!td->dm_dev.bdev) return; - bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md)); - blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL); + if (td->dm_dev.is_interposed) + blkdev_put(td->dm_dev.bdev, td->dm_dev.mode); + else { + bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md)); + blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL); + } put_dax(td->dm_dev.dax_dev); td->dm_dev.bdev = NULL; td->dm_dev.dax_dev = NULL; } static struct table_device *find_table_device(struct list_head *l, dev_t dev, - fmode_t mode) + fmode_t mode, bool is_interposed) { struct table_device *td; list_for_each_entry(td, l, list) - if (td->dm_dev.bdev->bd_dev == dev && td->dm_dev.mode == mode) + if (td->dm_dev.bdev->bd_dev == dev && + td->dm_dev.mode == mode && + td->dm_dev.is_interposed == is_interposed) return td; return NULL; @@ -811,7 +825,7 @@ int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, struct table_device *td; mutex_lock(&md->table_devices_lock); - td = find_table_device(&md->table_devices, dev, mode); + td = find_table_device(&md->table_devices, dev, mode, md->is_interposed); if (!td) { td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id); if (!td) { diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 7f4ac87c0b32..76a6dfb1cb29 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -159,6 +159,7 @@ struct dm_dev { struct block_device *bdev; struct dax_device *dax_dev; fmode_t mode; + bool is_interposed; char name[16]; }; diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h index fcff6669137b..fc4d06bb3dbb 100644 --- a/include/uapi/linux/dm-ioctl.h +++ b/include/uapi/linux/dm-ioctl.h @@ -362,4 +362,10 @@ enum { */ #define DM_INTERNAL_SUSPEND_FLAG (1 << 18) /* Out */ +/* + * If set, the underlying device should open without FMODE_EXCL + * and attach mapped device via bdev_interposer. + */ +#define DM_INTERPOSED_FLAG (1 << 19) /* In */ + #endif /* _LINUX_DM_IOCTL_H */