Message ID | 20240124085334.3060748-1-zhaoyang.huang@unisoc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCHv2,1/1] block: introduce content activity based ioprio | expand |
On Wed, Jan 24, 2024 at 04:53:34PM +0800, zhaoyang.huang wrote: > void __bio_add_page(struct bio *bio, struct page *page, > unsigned int len, unsigned int off) > { > + int class, level, hint, activity; > + > + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); > + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio); > + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > + > WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > WARN_ON_ONCE(bio_full(bio, len)); > > bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, off); > bio->bi_iter.bi_size += len; > bio->bi_vcnt++; > + activity += bio_page_if_active(bio, page, IOPRIO_NR_ACTIVITY); > + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); If you need to touch anything in the block layer I/O path you're doign this wrong. The file system that is submitting the I/O needs to be in control of the priorities. must not hack the assignment behind the
On Wed, Jan 24, 2024 at 5:20 PM Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Jan 24, 2024 at 04:53:34PM +0800, zhaoyang.huang wrote: > > void __bio_add_page(struct bio *bio, struct page *page, > > unsigned int len, unsigned int off) > > { > > + int class, level, hint, activity; > > + > > + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); > > + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > > + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio); > > + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > > + > > WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > > WARN_ON_ONCE(bio_full(bio, len)); > > > > bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, off); > > bio->bi_iter.bi_size += len; > > bio->bi_vcnt++; > > + activity += bio_page_if_active(bio, page, IOPRIO_NR_ACTIVITY); > > + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); > > If you need to touch anything in the block layer I/O path > you're doign this wrong. The file system that is submitting the > I/O needs to be in control of the priorities. > must not hack the assignment behind the Please correct me if I am wrong. According to my understanding, bio(request)'s ioprio is set via either task's scheduler priority or blkcg's priority during submit_bio, that is, there is no explicit operation over ioprio from the file system so far. Furthermore, this commit is actually an add-on feature which is not against the current way of setting ioprio.
The I/O priority can be explicitly set by the submitter, task and blkcg arre jut fallbacks. And as said multiple times now bio_add_page must just treat the page as a physical address container. It must never look at MM-internal flags.
On Wed, Jan 24, 2024 at 5:38 PM Christoph Hellwig <hch@infradead.org> wrote: > > The I/O priority can be explicitly set by the submitter, task and > blkcg arre jut fallbacks. Yes. I would like to suggest if it is possible to have this commit work as a hint for promoting the priority since it has been proved in the verification? > > And as said multiple times now bio_add_page must just treat the page > as a physical address container. It must never look at MM-internal > flags. The alternative way is to iterate the request;s pages in the scheduler which has been refused by Jens in the previous version. Anyway, we can find a solution on this. >
On 1/24/24 4:58 AM, Zhaoyang Huang wrote: > On Wed, Jan 24, 2024 at 5:38?PM Christoph Hellwig <hch@infradead.org> wrote: >> >> The I/O priority can be explicitly set by the submitter, task and >> blkcg arre jut fallbacks. > Yes. I would like to suggest if it is possible to have this commit > work as a hint for promoting the priority since it has been proved in > the verification? We don't add patches that are wrong just because they provide a performance benefit for some cases. Down that path lies tech debt to be cleaned up later. Rather, the feature should be done right from the start. >> And as said multiple times now bio_add_page must just treat the page >> as a physical address container. It must never look at MM-internal >> flags. > The alternative way is to iterate the request;s pages in the scheduler > which has been refused by Jens in the previous version. Anyway, we can > find a solution on this. That approach, or the current one, both have the same layering violation that Christoph keeps telling you is wrong - you are looking at the page itself in the IO path. What has been suggested is that the _issuer_ of the IO, the one that actually deals with pages, is the one that should be submitting IO at the right priority to begin with. Your approach tries to hack around the fact that this isn't done, and hence is introducing a layering violation where the block layer now needs to look at the page and adjust the priority. If the IO was submitted with the right priority to begin with, you would not have this issue at all.
On Wed, Jan 24, 2024 at 11:38 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 1/24/24 4:58 AM, Zhaoyang Huang wrote: > > On Wed, Jan 24, 2024 at 5:38?PM Christoph Hellwig <hch@infradead.org> wrote: > >> > >> The I/O priority can be explicitly set by the submitter, task and > >> blkcg arre jut fallbacks. > > Yes. I would like to suggest if it is possible to have this commit > > work as a hint for promoting the priority since it has been proved in > > the verification? > > We don't add patches that are wrong just because they provide a > performance benefit for some cases. Down that path lies tech debt to be > cleaned up later. Rather, the feature should be done right from the > start. > > >> And as said multiple times now bio_add_page must just treat the page > >> as a physical address container. It must never look at MM-internal > >> flags. > > The alternative way is to iterate the request;s pages in the scheduler > > which has been refused by Jens in the previous version. Anyway, we can > > find a solution on this. > > That approach, or the current one, both have the same layering violation > that Christoph keeps telling you is wrong - you are looking at the page > itself in the IO path. What has been suggested is that the _issuer_ of > the IO, the one that actually deals with pages, is the one that should > be submitting IO at the right priority to begin with. > > Your approach tries to hack around the fact that this isn't done, and > hence is introducing a layering violation where the block layer now > needs to look at the page and adjust the priority. If the IO was > submitted with the right priority to begin with, you would not have this > issue at all. I have issued out v3 which provide new APIs to have submitter set bio's ioprio out of bio_add_page > > -- > Jens Axboe >
diff --git a/block/Kconfig b/block/Kconfig index f1364d1c0d93..8d6075575eae 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -228,6 +228,14 @@ config BLOCK_HOLDER_DEPRECATED config BLK_MQ_STACKING bool +config CONTENT_ACT_BASED_IOPRIO + bool "Enable content activity based ioprio" + depends on LRU_GEN + default y + help + This item enable the feature of adjust bio's priority by + calculating its content's activity. + source "block/Kconfig.iosched" endif # BLOCK diff --git a/block/bio.c b/block/bio.c index 816d412c06e9..1228e2a4940f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -24,6 +24,7 @@ #include "blk.h" #include "blk-rq-qos.h" #include "blk-cgroup.h" +#include "blk-ioprio.h" #define ALLOC_CACHE_THRESHOLD 16 #define ALLOC_CACHE_MAX 256 @@ -1069,12 +1070,21 @@ EXPORT_SYMBOL_GPL(bio_add_zone_append_page); void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off) { + int class, level, hint, activity; + + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio); + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio); + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); WARN_ON_ONCE(bio_full(bio, len)); bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, off); bio->bi_iter.bi_size += len; bio->bi_vcnt++; + activity += bio_page_if_active(bio, page, IOPRIO_NR_ACTIVITY); + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); } EXPORT_SYMBOL_GPL(__bio_add_page); diff --git a/block/blk-mq.c b/block/blk-mq.c index 1fafd54dce3c..05cdd3adde94 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2939,6 +2939,26 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, return rq; } +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO +static void bio_set_ioprio(struct bio *bio) +{ + int class, level, hint, activity; + + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio); + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio); + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); + + if (activity >= bio->bi_vcnt / 2) + class = IOPRIO_CLASS_RT; + else if (activity >= bio->bi_vcnt / 4) + class = max(IOPRIO_PRIO_CLASS(get_current_ioprio()), IOPRIO_CLASS_BE); + + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); + + blkcg_set_ioprio(bio); +} +#else static void bio_set_ioprio(struct bio *bio) { /* Nobody set ioprio so far? Initialize it based on task's nice value */ @@ -2946,6 +2966,7 @@ static void bio_set_ioprio(struct bio *bio) bio->bi_ioprio = get_current_ioprio(); blkcg_set_ioprio(bio); } +#endif /** * blk_mq_submit_bio - Create and send a request to block device. diff --git a/fs/buffer.c b/fs/buffer.c index 12e9a71c693d..b15bff481706 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2832,6 +2832,12 @@ void submit_bh(blk_opf_t opf, struct buffer_head *bh) } EXPORT_SYMBOL(submit_bh); +int bio_page_if_active(struct bio *bio, struct page *page, unsigned short limit) +{ + return (bio->bi_vcnt <= limit && PageWorkingset(page)) ? 1 : 0; +} +EXPORT_SYMBOL(bio_page_if_active); + void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags) { lock_buffer(bh); diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 44e9de51eedf..9a374f5965ec 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -248,6 +248,7 @@ int bh_uptodate_or_lock(struct buffer_head *bh); int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait); void __bh_read_batch(int nr, struct buffer_head *bhs[], blk_opf_t op_flags, bool force_lock); +int bio_page_if_active(struct bio *bio, struct page *page, unsigned short limit); /* * Generic address_space_operations implementations for buffer_head-backed diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h index bee2bdb0eedb..d1c6081e796b 100644 --- a/include/uapi/linux/ioprio.h +++ b/include/uapi/linux/ioprio.h @@ -71,12 +71,18 @@ enum { * class and level. */ #define IOPRIO_HINT_SHIFT IOPRIO_LEVEL_NR_BITS -#define IOPRIO_HINT_NR_BITS 10 +#define IOPRIO_HINT_NR_BITS 3 #define IOPRIO_NR_HINTS (1 << IOPRIO_HINT_NR_BITS) #define IOPRIO_HINT_MASK (IOPRIO_NR_HINTS - 1) #define IOPRIO_PRIO_HINT(ioprio) \ (((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_HINT_MASK) +#define IOPRIO_ACTIVITY_SHIFT (IOPRIO_HINT_NR_BITS + IOPRIO_LEVEL_NR_BITS) +#define IOPRIO_ACTIVITY_NR_BITS 7 +#define IOPRIO_NR_ACTIVITY (1 << IOPRIO_ACTIVITY_NR_BITS) +#define IOPRIO_ACTIVITY_MASK (IOPRIO_NR_ACTIVITY - 1) +#define IOPRIO_PRIO_ACTIVITY(ioprio) \ + (((ioprio) >> IOPRIO_ACTIVITY_SHIFT) & IOPRIO_ACTIVITY_MASK) /* * I/O hints. */ @@ -108,20 +114,24 @@ enum { * Return an I/O priority value based on a class, a level and a hint. */ static __always_inline __u16 ioprio_value(int prioclass, int priolevel, - int priohint) + int priohint, int activity) { if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) || IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) || - IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) + IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS) || + IOPRIO_BAD_VALUE(activity, IOPRIO_NR_ACTIVITY)) return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT; return (prioclass << IOPRIO_CLASS_SHIFT) | + (activity << IOPRIO_ACTIVITY_SHIFT) | (priohint << IOPRIO_HINT_SHIFT) | priolevel; } #define IOPRIO_PRIO_VALUE(prioclass, priolevel) \ - ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE) + ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE, 0) #define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \ - ioprio_value(prioclass, priolevel, priohint) + ioprio_value(prioclass, priolevel, priohint, 0) +#define IOPRIO_PRIO_VALUE_ACTIVITY(prioclass, priolevel, priohint, activity) \ + ioprio_value(prioclass, priolevel, priohint, activity) #endif /* _UAPI_LINUX_IOPRIO_H */