From patchwork Wed Jun 14 17:22:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 9787033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AEA48602D9 for ; Wed, 14 Jun 2017 17:22:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 939B027853 for ; Wed, 14 Jun 2017 17:22:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88352283A6; Wed, 14 Jun 2017 17:22:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18FC527853 for ; Wed, 14 Jun 2017 17:22:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752668AbdFNRW1 (ORCPT ); Wed, 14 Jun 2017 13:22:27 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:33977 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752664AbdFNRWX (ORCPT ); Wed, 14 Jun 2017 13:22:23 -0400 Received: by mail-io0-f173.google.com with SMTP id i7so4663918ioe.1 for ; Wed, 14 Jun 2017 10:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=27Fm3viMveY3GCPX4yFLb8Zwx/Hq44M2R2KBRFjGZBE=; b=to0C4gG8S9urFBMpjgZZCch3zZc7rbIB9aNcZgiMogCFQ/ziqLkC/SQwy6LSORcr1+ 4tYolZfSbXhHGWj5U+PROE5tsM0U/zfaqzgQM2Jl8rdcmEnpKYman1tq45hRFmW2gndA ENsHTNwsicv4zJPQhiVE+ldFi4va4Tu3guAqG3gWCblircjJDAwnxsX7zTwD83dWa4m2 HLg9HywSJfk/rpOo/vbdZ34/7H8T0xA2plVBM8H+ftlEFbsSIbz7oEhMjO4gzjO/ZEqC +H1aSl+hezfUgZaMyBxQhTWpNP8601p//4uvx8XmTzM8IuyKmRoogWt3d7vImG6tg/6x uulA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=27Fm3viMveY3GCPX4yFLb8Zwx/Hq44M2R2KBRFjGZBE=; b=B4nPm5lIMDQhfcwnSrlajYQba1m2NQr/eAwETNS8xnSVhAVnmjtMGFb6uvfh/epaar W63LTGs8ZcCZmDyKLWneEe8pbvYF1CeJCw2it8AmMptgDrG40S5/ytNbpkcc8OLcOrwE m1y/ZVRaEapkP0h6iQOpvByxIgkjV0F/5BD50jYmaOSBcx8sggB9mxSxJ7CsvywLr06i dCErZwePXl1lXyHGs8Xr1d9+ccQG0gtoDyeJMo4/cCkEGwlAS3zKiD5w/IdUTDG/uRT4 mYPW3XyCZA34wYLF8DC3Lb2tN/oMteXxyQcbGxEvll570YAkzwsGJ3M42UwDZXoJ70rt UGPQ== X-Gm-Message-State: AKS2vOwuRTj69pahE64svOyJkS1sUtna7Ir1vehuDVNswR3A/cXUWiI7 OCxkYnFvgTtL/dXw X-Received: by 10.107.183.133 with SMTP id h127mr1211149iof.29.1497460942862; Wed, 14 Jun 2017 10:22:22 -0700 (PDT) Received: from [192.168.1.154] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id m133sm271792ioa.19.2017.06.14.10.22.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Jun 2017 10:22:22 -0700 (PDT) Subject: Re: [PATCHSET v2] Add support for write life time hints To: "Martin K. Petersen" , Christoph Hellwig Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, adilger@dilger.ca References: <1497412919-19400-1-git-send-email-axboe@kernel.dk> <20170614160127.GA30644@infradead.org> From: Jens Axboe Message-ID: <7ba27f20-ef1c-b5a5-4fd2-fc756590e6f7@kernel.dk> Date: Wed, 14 Jun 2017 11:22:20 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 06/14/2017 10:04 AM, Martin K. Petersen wrote: > > Christoph, > >> I think what Martin wants (or at least what I'd want him to want) is >> to define a few REQ_* bits that mirror the RWF bits, use that to >> transfer the information down the stack, and then only translate it >> to stream ids in the driver. > > Yup. If we have enough space in the existing flags that's perfect (I > lost count after your op/flag shuffle). OK, diff on top of the current stuff, so you can see how that changes things. If this looks good to folks, I'll update the series to achieve the same final result. diff --git a/block/bio.c b/block/bio.c index 77f4be1f..25ea7c3 100644 --- a/block/bio.c +++ b/block/bio.c @@ -595,7 +595,6 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio->bi_opf = bio_src->bi_opf; bio->bi_iter = bio_src->bi_iter; bio->bi_io_vec = bio_src->bi_io_vec; - bio->bi_stream = bio_src->bi_stream; bio_clone_blkcg_association(bio, bio_src); } @@ -679,7 +678,6 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask, bio->bi_opf = bio_src->bi_opf; bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector; bio->bi_iter.bi_size = bio_src->bi_iter.bi_size; - bio->bi_stream = bio_src->bi_stream; switch (bio_op(bio)) { case REQ_OP_DISCARD: @@ -2084,6 +2082,22 @@ void bio_clone_blkcg_association(struct bio *dst, struct bio *src) #endif /* CONFIG_BLK_CGROUP */ +static const unsigned int rwf_write_to_opf_flag[] = { + 0, REQ_WRITE_SHORT, REQ_WRITE_MEDIUM, REQ_WRITE_LONG, REQ_WRITE_EXTREME +}; + +/* + * 'stream_flags' is one of RWF_WRITE_LIFE_* values + */ +void bio_set_streamid(struct bio *bio, unsigned int rwf_flags) +{ + if (WARN_ON_ONCE(rwf_flags >= ARRAY_SIZE(rwf_write_to_opf_flag))) + return; + + bio->bi_opf |= rwf_write_to_opf_flag[rwf_flags]; +} +EXPORT_SYMBOL_GPL(bio_set_streamid); + static void __init biovec_init_slabs(void) { int i; diff --git a/block/blk-core.c b/block/blk-core.c index 3f4a206..a7421b7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2057,12 +2057,6 @@ blk_qc_t generic_make_request(struct bio *bio) do { struct request_queue *q = bdev_get_queue(bio->bi_bdev); - if (bio_op(bio) == REQ_OP_WRITE && - bio_stream(bio) < BLK_MAX_STREAM) { - q->stream_writes[bio_stream(bio)] += - bio->bi_iter.bi_size >> 9; - } - if (likely(blk_queue_enter(q, false) == 0)) { struct bio_list lower, same; diff --git a/block/blk-merge.c b/block/blk-merge.c index 28998ac..7d299df 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -696,7 +696,8 @@ static struct request *attempt_merge(struct request_queue *q, * Don't allow merge of different streams, or for a stream with * non-stream IO. */ - if (req->bio->bi_stream != next->bio->bi_stream) + if ((req->cmd_flags & REQ_WRITE_LIFE_MASK) != + (next->cmd_flags & REQ_WRITE_LIFE_MASK)) return NULL; /* @@ -822,7 +823,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) * Don't allow merge of different streams, or for a stream with * non-stream IO. */ - if (rq->bio->bi_stream != bio->bi_stream) + if ((rq->cmd_flags & REQ_WRITE_LIFE_MASK) != + (bio->bi_opf & REQ_WRITE_LIFE_MASK)) return false; return true; diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d7cbd05..8988133 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -335,6 +335,20 @@ static inline int nvme_setup_discard(struct nvme_ns *ns, struct request *req, return BLK_MQ_RQ_QUEUE_OK; } +static inline unsigned int req_to_streamid(struct request *req) +{ + if (req->cmd_flags & REQ_WRITE_SHORT) + return 1; + else if (req->cmd_flags & REQ_WRITE_MEDIUM) + return 2; + else if (req->cmd_flags & REQ_WRITE_LONG) + return 3; + else if (req->cmd_flags & REQ_WRITE_EXTREME) + return 4; + + return 0; +} + static inline void nvme_setup_rw(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { @@ -355,13 +369,15 @@ static inline void nvme_setup_rw(struct nvme_ns *ns, struct request *req, cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req))); cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1); - if (req_op(req) == REQ_OP_WRITE) { - if (bio_stream_valid(req->bio) && ns->nr_streams) { - unsigned stream = bio_stream(req->bio) & 0xffff; + if (req_op(req) == REQ_OP_WRITE && blk_stream_valid(req->cmd_flags) && + ns->nr_streams) { + unsigned stream = req_to_streamid(req); - control |= NVME_RW_DTYPE_STREAMS; - dsmgmt |= ((stream % (ns->nr_streams + 1)) << 16); - } + control |= NVME_RW_DTYPE_STREAMS; + dsmgmt |= ((stream % (ns->nr_streams + 1)) << 16); + + if (stream < BLK_MAX_STREAM) + req->q->stream_writes[stream] += blk_rq_bytes(req) >> 9; } if (ns->ms) { diff --git a/fs/block_dev.c b/fs/block_dev.c index 284b8a7..31ba4a8 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -227,7 +227,6 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_iter.bi_sector = pos >> 9; bio.bi_private = current; bio.bi_end_io = blkdev_bio_end_io_simple; - bio.bi_stream = iocb_streamid(iocb); ret = bio_iov_iter_get_pages(&bio, iter); if (unlikely(ret)) @@ -240,6 +239,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, should_dirty = true; } else { bio.bi_opf = dio_bio_write_op(iocb); + bio_set_streamid(&bio, iocb_streamid(iocb)); task_io_account_write(ret); } @@ -361,7 +361,6 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_iter.bi_sector = pos >> 9; bio->bi_private = dio; bio->bi_end_io = blkdev_bio_end_io; - bio->bi_stream = iocb_streamid(iocb); ret = bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) { @@ -376,6 +375,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio_set_pages_dirty(bio); } else { bio->bi_opf = dio_bio_write_op(iocb); + bio_set_streamid(bio, iocb_streamid(iocb)); task_io_account_write(bio->bi_iter.bi_size); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db0558a..ef3c98c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8608,7 +8608,6 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, atomic_set(&dip->pending_bios, 0); btrfs_bio = btrfs_io_bio(io_bio); btrfs_bio->logical = file_offset; - bio_set_streamid(io_bio, bio_stream(dio_bio)); if (write) { io_bio->bi_end_io = btrfs_endio_direct_write; diff --git a/fs/direct-io.c b/fs/direct-io.c index c9c8b9f..a770e82 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -386,7 +386,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, else bio->bi_end_io = dio_bio_end_io; - bio->bi_stream = iocb_streamid(dio->iocb); + bio_set_streamid(bio, iocb_streamid(dio->iocb)); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; diff --git a/include/linux/bio.h b/include/linux/bio.h index d1b04b0..a1b3145 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -443,6 +443,7 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int, gfp_t, int); extern void bio_set_pages_dirty(struct bio *bio); extern void bio_check_pages_dirty(struct bio *bio); +extern void bio_set_streamid(struct bio *bio, unsigned int rwf_flags); void generic_start_io_acct(int rw, unsigned long sectors, struct hd_struct *part); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 1940876..06c8c35 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -36,8 +36,6 @@ struct bio { unsigned short bi_flags; /* status, etc and bvec pool number */ unsigned short bi_ioprio; - unsigned int bi_stream; /* write life time hint */ - struct bvec_iter bi_iter; /* Number of segments in this BIO after @@ -203,6 +201,10 @@ enum req_flag_bits { __REQ_PREFLUSH, /* request for cache flush */ __REQ_RAHEAD, /* read ahead, can fail anytime */ __REQ_BACKGROUND, /* background IO */ + __REQ_WRITE_SHORT, /* short life time write */ + __REQ_WRITE_MEDIUM, /* medium life time write */ + __REQ_WRITE_LONG, /* long life time write */ + __REQ_WRITE_EXTREME, /* extremely long life time write */ /* command specific flags for REQ_OP_WRITE_ZEROES: */ __REQ_NOUNMAP, /* do not free blocks when zeroing */ @@ -223,6 +225,13 @@ enum req_flag_bits { #define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH) #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) +#define REQ_WRITE_SHORT (1ULL << __REQ_WRITE_SHORT) +#define REQ_WRITE_MEDIUM (1ULL << __REQ_WRITE_MEDIUM) +#define REQ_WRITE_LONG (1ULL << __REQ_WRITE_LONG) +#define REQ_WRITE_EXTREME (1ULL << __REQ_WRITE_EXTREME) + +#define REQ_WRITE_LIFE_MASK (REQ_WRITE_SHORT | REQ_WRITE_MEDIUM | \ + REQ_WRITE_LONG | REQ_WRITE_EXTREME) #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) @@ -314,19 +323,9 @@ struct blk_rq_stat { u64 batch; }; -static inline void bio_set_streamid(struct bio *bio, unsigned int stream) -{ - bio->bi_stream = stream; -} - -static inline bool bio_stream_valid(struct bio *bio) -{ - return bio->bi_stream != 0; -} - -static inline unsigned int bio_stream(struct bio *bio) +static inline bool blk_stream_valid(unsigned int opf) { - return bio->bi_stream; + return (opf & REQ_WRITE_LIFE_MASK) != 0; } #endif /* __LINUX_BLK_TYPES_H */