Message ID | 20220615161616.5055-8-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Fix IO priority mess | expand |
On 6/16/22 01:16, Jan Kara wrote: > Currently, IO priority set in task's IO context is not reflected in the > bio->bi_ioprio for most IO (only io_uring and direct IO set it). This > results in odd results where process is submitting some bios with one > priority and other bios with a different (unset) priority and due to > differing priorities bios cannot be merged. Make sure bio->bi_ioprio is > always set on bio submission. > > Signed-off-by: Jan Kara <jack@suse.cz> > --- > block/blk-mq.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index e17d822e6051..e976f696babc 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, > > static void bio_set_ioprio(struct bio *bio) > { > + unsigned short ioprio_class; > + > blkcg_set_ioprio(bio); Shouldn't this function set the default using the below "if" code ? > + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); > + /* Nobody set ioprio so far? Initialize it based on task's nice value */ I do not think that the ioprio_class variable is useful. This can be: if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) bio->bi_ioprio = get_current_ioprio(); > + if (ioprio_class == IOPRIO_CLASS_NONE) > + bio->bi_ioprio = get_current_ioprio(); > } > > /** Beside this comment, I am still scratching my head regarding what the user gets with ioprio_get(). If I understood your patches correctly, the user may still see IOPRIO_CLASS_NONE ? For that case, to be in sync with the man page, I thought the returned ioprio should be the effective one based on the task io nice value, that is, the value returned by get_current_ioprio(). Am I missing something... ?
On Thu 16-06-22 12:15:25, Damien Le Moal wrote: > On 6/16/22 01:16, Jan Kara wrote: > > Currently, IO priority set in task's IO context is not reflected in the > > bio->bi_ioprio for most IO (only io_uring and direct IO set it). This > > results in odd results where process is submitting some bios with one > > priority and other bios with a different (unset) priority and due to > > differing priorities bios cannot be merged. Make sure bio->bi_ioprio is > > always set on bio submission. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > block/blk-mq.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index e17d822e6051..e976f696babc 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, > > static void bio_set_ioprio(struct bio *bio) > > { > > + unsigned short ioprio_class; > > + > > blkcg_set_ioprio(bio); > > Shouldn't this function set the default using the below "if" code ? > > > + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); > > + /* Nobody set ioprio so far? Initialize it based on task's nice value */ > > I do not think that the ioprio_class variable is useful. > This can be: > > if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) > bio->bi_ioprio = get_current_ioprio(); You are right. Fixed. > > + if (ioprio_class == IOPRIO_CLASS_NONE) > > + bio->bi_ioprio = get_current_ioprio(); > > } > > /** > > Beside this comment, I am still scratching my head regarding what the user > gets with ioprio_get(). If I understood your patches correctly, the user may > still see IOPRIO_CLASS_NONE ? > For that case, to be in sync with the man page, I thought the returned > ioprio should be the effective one based on the task io nice value, that is, > the value returned by get_current_ioprio(). Am I missing something... ? The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP or IOPRIO_WHO_USER the effective IO priority may be different for different processes considered and it can be also further influenced by blk-ioprio settings. But thinking about it now after things have settled I agree that what you suggests makes more sense. I'll fix that. Thanks for suggestion! Honza
On Thu 16-06-22 13:23:03, Jan Kara wrote: > On Thu 16-06-22 12:15:25, Damien Le Moal wrote: > > On 6/16/22 01:16, Jan Kara wrote: > > > + if (ioprio_class == IOPRIO_CLASS_NONE) > > > + bio->bi_ioprio = get_current_ioprio(); > > > } > > > /** > > > > Beside this comment, I am still scratching my head regarding what the user > > gets with ioprio_get(). If I understood your patches correctly, the user may > > still see IOPRIO_CLASS_NONE ? > > For that case, to be in sync with the man page, I thought the returned > > ioprio should be the effective one based on the task io nice value, that is, > > the value returned by get_current_ioprio(). Am I missing something... ? > > The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP > or IOPRIO_WHO_USER the effective IO priority may be different for different > processes considered and it can be also further influenced by blk-ioprio > settings. But thinking about it now after things have settled I agree that > what you suggests makes more sense. I'll fix that. Thanks for suggestion! Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS (which is probably the most used and the best defined variant), we were returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until commit e70344c05995 ("block: fix default IO priority handling"). So my patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to change the behavior of IOPRIO_WHO_PROCESS because that has the biggest chances for userspace regressions. But perhaps it makes sense to keep IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and just use effective IO priority in those two variants. That looks like the smallest API change to make things at least somewhat sensible... Honza > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On 6/16/22 21:24, Jan Kara wrote: > On Thu 16-06-22 13:23:03, Jan Kara wrote: >> On Thu 16-06-22 12:15:25, Damien Le Moal wrote: >>> On 6/16/22 01:16, Jan Kara wrote: >>>> + if (ioprio_class == IOPRIO_CLASS_NONE) >>>> + bio->bi_ioprio = get_current_ioprio(); >>>> } >>>> /** >>> >>> Beside this comment, I am still scratching my head regarding what the user >>> gets with ioprio_get(). If I understood your patches correctly, the user may >>> still see IOPRIO_CLASS_NONE ? >>> For that case, to be in sync with the man page, I thought the returned >>> ioprio should be the effective one based on the task io nice value, that is, >>> the value returned by get_current_ioprio(). Am I missing something... ? >> >> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP >> or IOPRIO_WHO_USER the effective IO priority may be different for different >> processes considered and it can be also further influenced by blk-ioprio >> settings. But thinking about it now after things have settled I agree that >> what you suggests makes more sense. I'll fix that. Thanks for suggestion! > > Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS > (which is probably the most used and the best defined variant), we were > returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until > commit e70344c05995 ("block: fix default IO priority handling"). So my > patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER > consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to > change the behavior of IOPRIO_WHO_PROCESS because that has the biggest > chances for userspace regressions. But perhaps it makes sense to keep > IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and > just use effective IO priority in those two variants. That looks like the > smallest API change to make things at least somewhat sensible... Still bit lost. Let me try to summarize your goal: 1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return the effective priority 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also return the effective priority 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE) ? Or switch to using the effective IO priority ? Not that the last 2 choices are actually equivalent if the user did not IO nice the process (the default for the effective IO prio is class BE) For (1) and (2), I am not sure. Given that my last changes to the ioprio default did not seem to have bothered anyone (nobody screamed at me :)) I am tempted to say: any choice is OK. So we should try to get as close as the man page defined behavior as possible.
On 6/16/22 20:23, Jan Kara wrote: > On Thu 16-06-22 12:15:25, Damien Le Moal wrote: >> On 6/16/22 01:16, Jan Kara wrote: >>> Currently, IO priority set in task's IO context is not reflected in the >>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This >>> results in odd results where process is submitting some bios with one >>> priority and other bios with a different (unset) priority and due to >>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is >>> always set on bio submission. >>> >>> Signed-off-by: Jan Kara <jack@suse.cz> >>> --- >>> block/blk-mq.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index e17d822e6051..e976f696babc 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, >>> static void bio_set_ioprio(struct bio *bio) >>> { >>> + unsigned short ioprio_class; >>> + >>> blkcg_set_ioprio(bio); >> >> Shouldn't this function set the default using the below "if" code ? >> >>> + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); >>> + /* Nobody set ioprio so far? Initialize it based on task's nice value */ >> >> I do not think that the ioprio_class variable is useful. >> This can be: >> >> if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) >> bio->bi_ioprio = get_current_ioprio(); > > You are right. Fixed. What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would then not be needed at all and blkcg_set_ioprio() called directly ? > >>> + if (ioprio_class == IOPRIO_CLASS_NONE) >>> + bio->bi_ioprio = get_current_ioprio(); >>> } >>> /** >> >> Beside this comment, I am still scratching my head regarding what the user >> gets with ioprio_get(). If I understood your patches correctly, the user may >> still see IOPRIO_CLASS_NONE ? >> For that case, to be in sync with the man page, I thought the returned >> ioprio should be the effective one based on the task io nice value, that is, >> the value returned by get_current_ioprio(). Am I missing something... ? > > The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP > or IOPRIO_WHO_USER the effective IO priority may be different for different > processes considered and it can be also further influenced by blk-ioprio > settings. But thinking about it now after things have settled I agree that > what you suggests makes more sense. I'll fix that. Thanks for suggestion! > > Honza
On Fri 17-06-22 09:04:34, Damien Le Moal wrote: > On 6/16/22 21:24, Jan Kara wrote: > > On Thu 16-06-22 13:23:03, Jan Kara wrote: > >> On Thu 16-06-22 12:15:25, Damien Le Moal wrote: > >>> On 6/16/22 01:16, Jan Kara wrote: > >>>> + if (ioprio_class == IOPRIO_CLASS_NONE) > >>>> + bio->bi_ioprio = get_current_ioprio(); > >>>> } > >>>> /** > >>> > >>> Beside this comment, I am still scratching my head regarding what the user > >>> gets with ioprio_get(). If I understood your patches correctly, the user may > >>> still see IOPRIO_CLASS_NONE ? > >>> For that case, to be in sync with the man page, I thought the returned > >>> ioprio should be the effective one based on the task io nice value, that is, > >>> the value returned by get_current_ioprio(). Am I missing something... ? > >> > >> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP > >> or IOPRIO_WHO_USER the effective IO priority may be different for different > >> processes considered and it can be also further influenced by blk-ioprio > >> settings. But thinking about it now after things have settled I agree that > >> what you suggests makes more sense. I'll fix that. Thanks for suggestion! > > > > Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS > > (which is probably the most used and the best defined variant), we were > > returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until > > commit e70344c05995 ("block: fix default IO priority handling"). So my > > patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER > > consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to > > change the behavior of IOPRIO_WHO_PROCESS because that has the biggest > > chances for userspace regressions. But perhaps it makes sense to keep > > IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and > > just use effective IO priority in those two variants. That looks like the > > smallest API change to make things at least somewhat sensible... > > Still bit lost. Let me try to summarize your goal: > > 1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return > the effective priority You make it sound here like IOPRIO_WHO_PGRP would be some different type of IO priority. For record it is not, there's just one IO priority per task, if you set ioprio with IOPRIO_WHO_PGRP, it will just iterate all the tasks in PGRP and set IO priority for each task. After my patches, ioprio_get(IOPRIO_WHO_PGRPIO) will return the best of the effective IO priorities of tasks within PGRP. Before my patch it was doing the same but if IO priority was unset for some task it considered it to be CLASS_BE,4. > 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also > return the effective priority. This is the same as above. Just the calls iterate over all tasks of the given user... > 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do > you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE) > ? Or switch to using the effective IO priority ? Not that the last 2 > choices are actually equivalent if the user did not IO nice the process > (the default for the effective IO prio is class BE) I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO priority. > For (1) and (2), I am not sure. Given that my last changes to the ioprio > default did not seem to have bothered anyone (nobody screamed at me :)) I > am tempted to say: any choice is OK. So we should try to get as close as > the man page defined behavior as possible. I also don't find (1) and (2) too important. (3) is IMHO somewhat important and I think that the reason why nobody complained about the change there is because your change is relatively new so it didn't propagate yet to any widely used distro kernel... Honza
On Fri 17-06-22 09:05:12, Damien Le Moal wrote: > On 6/16/22 20:23, Jan Kara wrote: > > On Thu 16-06-22 12:15:25, Damien Le Moal wrote: > >> On 6/16/22 01:16, Jan Kara wrote: > >>> Currently, IO priority set in task's IO context is not reflected in the > >>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This > >>> results in odd results where process is submitting some bios with one > >>> priority and other bios with a different (unset) priority and due to > >>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is > >>> always set on bio submission. > >>> > >>> Signed-off-by: Jan Kara <jack@suse.cz> > >>> --- > >>> block/blk-mq.c | 6 ++++++ > >>> 1 file changed, 6 insertions(+) > >>> > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>> index e17d822e6051..e976f696babc 100644 > >>> --- a/block/blk-mq.c > >>> +++ b/block/blk-mq.c > >>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, > >>> static void bio_set_ioprio(struct bio *bio) > >>> { > >>> + unsigned short ioprio_class; > >>> + > >>> blkcg_set_ioprio(bio); > >> > >> Shouldn't this function set the default using the below "if" code ? > >> > >>> + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); > >>> + /* Nobody set ioprio so far? Initialize it based on task's nice value */ > >> > >> I do not think that the ioprio_class variable is useful. > >> This can be: > >> > >> if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) > >> bio->bi_ioprio = get_current_ioprio(); > > > > You are right. Fixed. > > What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would > then not be needed at all and blkcg_set_ioprio() called directly ? What I dislike about this is that it is counterintuitive that blkcg_set_prio() does anything when blkcgs are disabled in kernel config (and it would have to to keep things working). So if you dislike bio_set_ioprio(), I can just opencode this function in its single call site. Would that be better? Honza
On 6/17/22 20:52, Jan Kara wrote: > On Fri 17-06-22 09:05:12, Damien Le Moal wrote: >> On 6/16/22 20:23, Jan Kara wrote: >>> On Thu 16-06-22 12:15:25, Damien Le Moal wrote: >>>> On 6/16/22 01:16, Jan Kara wrote: >>>>> Currently, IO priority set in task's IO context is not reflected in the >>>>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This >>>>> results in odd results where process is submitting some bios with one >>>>> priority and other bios with a different (unset) priority and due to >>>>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is >>>>> always set on bio submission. >>>>> >>>>> Signed-off-by: Jan Kara <jack@suse.cz> >>>>> --- >>>>> block/blk-mq.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index e17d822e6051..e976f696babc 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, >>>>> static void bio_set_ioprio(struct bio *bio) >>>>> { >>>>> + unsigned short ioprio_class; >>>>> + >>>>> blkcg_set_ioprio(bio); >>>> >>>> Shouldn't this function set the default using the below "if" code ? >>>> >>>>> + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); >>>>> + /* Nobody set ioprio so far? Initialize it based on task's nice value */ >>>> >>>> I do not think that the ioprio_class variable is useful. >>>> This can be: >>>> >>>> if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) >>>> bio->bi_ioprio = get_current_ioprio(); >>> >>> You are right. Fixed. >> >> What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would >> then not be needed at all and blkcg_set_ioprio() called directly ? > > What I dislike about this is that it is counterintuitive that > blkcg_set_prio() does anything when blkcgs are disabled in kernel config > (and it would have to to keep things working). So if you dislike > bio_set_ioprio(), I can just opencode this function in its single call > site. Would that be better? Ah, yes, blkcg may be disabled. Got it. So sure, bio_set_ioprio() is fine as is. > > Honza
On 6/17/22 20:49, Jan Kara wrote: > On Fri 17-06-22 09:04:34, Damien Le Moal wrote: >> On 6/16/22 21:24, Jan Kara wrote: >>> On Thu 16-06-22 13:23:03, Jan Kara wrote: >>>> On Thu 16-06-22 12:15:25, Damien Le Moal wrote: >>>>> On 6/16/22 01:16, Jan Kara wrote: >>>>>> + if (ioprio_class == IOPRIO_CLASS_NONE) >>>>>> + bio->bi_ioprio = get_current_ioprio(); >>>>>> } >>>>>> /** >>>>> >>>>> Beside this comment, I am still scratching my head regarding what the user >>>>> gets with ioprio_get(). If I understood your patches correctly, the user may >>>>> still see IOPRIO_CLASS_NONE ? >>>>> For that case, to be in sync with the man page, I thought the returned >>>>> ioprio should be the effective one based on the task io nice value, that is, >>>>> the value returned by get_current_ioprio(). Am I missing something... ? >>>> >>>> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP >>>> or IOPRIO_WHO_USER the effective IO priority may be different for different >>>> processes considered and it can be also further influenced by blk-ioprio >>>> settings. But thinking about it now after things have settled I agree that >>>> what you suggests makes more sense. I'll fix that. Thanks for suggestion! >>> >>> Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS >>> (which is probably the most used and the best defined variant), we were >>> returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until >>> commit e70344c05995 ("block: fix default IO priority handling"). So my >>> patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER >>> consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to >>> change the behavior of IOPRIO_WHO_PROCESS because that has the biggest >>> chances for userspace regressions. But perhaps it makes sense to keep >>> IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and >>> just use effective IO priority in those two variants. That looks like the >>> smallest API change to make things at least somewhat sensible... >> >> Still bit lost. Let me try to summarize your goal: >> >> 1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return >> the effective priority > > You make it sound here like IOPRIO_WHO_PGRP would be some different type of > IO priority. For record it is not, there's just one IO priority per task, > if you set ioprio with IOPRIO_WHO_PGRP, it will just iterate all the tasks > in PGRP and set IO priority for each task. After my patches, > ioprio_get(IOPRIO_WHO_PGRPIO) will return the best of the effective IO > priorities of tasks within PGRP. Before my patch it was doing the same but > if IO priority was unset for some task it considered it to be CLASS_BE,4. OK. Got it. Thanks for clarifying. > >> 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also >> return the effective priority. > > This is the same as above. Just the calls iterate over all tasks of the > given user... > >> 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do >> you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE) >> ? Or switch to using the effective IO priority ? Not that the last 2 >> choices are actually equivalent if the user did not IO nice the process >> (the default for the effective IO prio is class BE) > > I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO > priority. And that would be to retain the older (broken) behavior. Because if we consider the man page, tasks with an unset IO prio should be reported as having the effective IO nice based priority, which is class BE if IO nice is not set. Right ? I am OK with that, but I think we should add this explanation as a comment somewhere in the prio code. No ? > >> For (1) and (2), I am not sure. Given that my last changes to the ioprio >> default did not seem to have bothered anyone (nobody screamed at me :)) I >> am tempted to say: any choice is OK. So we should try to get as close as >> the man page defined behavior as possible. > > I also don't find (1) and (2) too important. (3) is IMHO somewhat important > and I think that the reason why nobody complained about the change there is > because your change is relatively new so it didn't propagate yet to any > widely used distro kernel... Indeed. > > Honza
On Fri 17-06-22 21:03:45, Damien Le Moal wrote: > On 6/17/22 20:49, Jan Kara wrote: > > On Fri 17-06-22 09:04:34, Damien Le Moal wrote: > >> 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also > >> return the effective priority. > > > > This is the same as above. Just the calls iterate over all tasks of the > > given user... > > > >> 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do > >> you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE) > >> ? Or switch to using the effective IO priority ? Not that the last 2 > >> choices are actually equivalent if the user did not IO nice the process > >> (the default for the effective IO prio is class BE) > > > > I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO > > priority. > > And that would be to retain the older (broken) behavior. Because if we > consider the man page, tasks with an unset IO prio should be reported as > having the effective IO nice based priority, which is class BE if IO nice > is not set. Right ? I am OK with that, but I think we should add this > explanation as a comment somewhere in the prio code. No ? Adding a comment regarding this is certainly a good idea, I'll do that. WRT whether the old behavior is broken or not - I actually think the old behavior is more useful because it allows userspace to distinguish a situation when IO priority is set based on nice value from a situation when IO priority is set to a fixed value. Also the old behavior makes ioprio_set(pid, IOPRIO_WHO_PROCESS, ioprio_get(pid, IOPRIO_WHO_PROCESS)) a noop which is IMO a good property to have for a get/set APIs. Honza
diff --git a/block/blk-mq.c b/block/blk-mq.c index e17d822e6051..e976f696babc 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, static void bio_set_ioprio(struct bio *bio) { + unsigned short ioprio_class; + blkcg_set_ioprio(bio); + ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio); + /* Nobody set ioprio so far? Initialize it based on task's nice value */ + if (ioprio_class == IOPRIO_CLASS_NONE) + bio->bi_ioprio = get_current_ioprio(); } /**
Currently, IO priority set in task's IO context is not reflected in the bio->bi_ioprio for most IO (only io_uring and direct IO set it). This results in odd results where process is submitting some bios with one priority and other bios with a different (unset) priority and due to differing priorities bios cannot be merged. Make sure bio->bi_ioprio is always set on bio submission. Signed-off-by: Jan Kara <jack@suse.cz> --- block/blk-mq.c | 6 ++++++ 1 file changed, 6 insertions(+)