Message ID | 20230629110359.1111832-2-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: optimize the size of struct request | expand |
On Thu, Jun 29, 2023 at 07:03:56PM +0800, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > If request need to be completed remotely, we insert it into percpu llist, > and smp_call_function_single_async() if llist is empty previously. > > We don't need to use per-rq csd, percpu csd is enough. And the size of > struct request is decreased by 24 bytes. > > This way is cleaner, and looks correct, given block softirq is guaranteed to be > scheduled to consume the list if one new request is added to this percpu list, > either smp_call_function_single_async() returns -EBUSY or 0. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > v2: > - Change to use call_single_data_t, which avoid to use 2 cache lines > for 1 csd, as suggested by Ming Lei. > - Improve the commit log, the explanation is copied from Ming Lei. Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
On Thu, Jun 29, 2023 at 07:03:56PM +0800, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > If request need to be completed remotely, we insert it into percpu llist, > and smp_call_function_single_async() if llist is empty previously. > > We don't need to use per-rq csd, percpu csd is enough. And the size of > struct request is decreased by 24 bytes. > > This way is cleaner, and looks correct, given block softirq is guaranteed to be > scheduled to consume the list if one new request is added to this percpu list, > either smp_call_function_single_async() returns -EBUSY or 0. Please trim your commit logs to 73 characters per line so that they are readable in git log output. > static void blk_mq_request_bypass_insert(struct request *rq, > @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) > { > struct llist_head *list; > unsigned int cpu; > + call_single_data_t *csd; > > cpu = rq->mq_ctx->cpu; > list = &per_cpu(blk_cpu_done, cpu); > - if (llist_add(&rq->ipi_list, list)) { > - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); > - smp_call_function_single_async(cpu, &rq->csd); > - } > + csd = &per_cpu(blk_cpu_csd, cpu); > + if (llist_add(&rq->ipi_list, list)) > + smp_call_function_single_async(cpu, csd); > } No need for the list and csd variables here as they are only used once. But I think this code has a rpboem when it is preemptd between the llist_add and smp_call_function_single_async. We either need a get_cpu/put_cpu around them, or instroduce a structure with the list and csd, and then you can use one pointer from per_cpu and still ensure the list and csd are for the same CPU.
On 2023/7/6 21:07, Christoph Hellwig wrote: > On Thu, Jun 29, 2023 at 07:03:56PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> If request need to be completed remotely, we insert it into percpu llist, >> and smp_call_function_single_async() if llist is empty previously. >> >> We don't need to use per-rq csd, percpu csd is enough. And the size of >> struct request is decreased by 24 bytes. >> >> This way is cleaner, and looks correct, given block softirq is guaranteed to be >> scheduled to consume the list if one new request is added to this percpu list, >> either smp_call_function_single_async() returns -EBUSY or 0. > > Please trim your commit logs to 73 characters per line so that they > are readable in git log output. Ok, will fix in the next version. > >> static void blk_mq_request_bypass_insert(struct request *rq, >> @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) >> { >> struct llist_head *list; >> unsigned int cpu; >> + call_single_data_t *csd; >> >> cpu = rq->mq_ctx->cpu; >> list = &per_cpu(blk_cpu_done, cpu); >> - if (llist_add(&rq->ipi_list, list)) { >> - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); >> - smp_call_function_single_async(cpu, &rq->csd); >> - } >> + csd = &per_cpu(blk_cpu_csd, cpu); >> + if (llist_add(&rq->ipi_list, list)) >> + smp_call_function_single_async(cpu, csd); >> } > > No need for the list and csd variables here as they are only used > once. Yes, should I change like below? Looks like much long code. :-) if (llist_add(&rq->ipi_list, &per_cpu(blk_cpu_done, cpu))) smp_call_function_single_async(cpu, &per_cpu(blk_cpu_csd, cpu)); > > But I think this code has a rpboem when it is preemptd between > the llist_add and smp_call_function_single_async. We either need a > get_cpu/put_cpu around them, or instroduce a structure with the list > and csd, and then you can use one pointer from per_cpu and still ensure > the list and csd are for the same CPU. > cpu = rq->mq_ctx->cpu; So it's certainly the same CPU, right? Thanks!
On Thu, Jul 06, 2023 at 10:23:49PM +0800, Chengming Zhou wrote: > Yes, should I change like below? Looks like much long code. :-) > > if (llist_add(&rq->ipi_list, &per_cpu(blk_cpu_done, cpu))) > smp_call_function_single_async(cpu, &per_cpu(blk_cpu_csd, cpu)); Doesn't look bad too me. > > > > > > But I think this code has a rpboem when it is preemptd between > > the llist_add and smp_call_function_single_async. We either need a > > get_cpu/put_cpu around them, or instroduce a structure with the list > > and csd, and then you can use one pointer from per_cpu and still ensure > > the list and csd are for the same CPU. > > > > cpu = rq->mq_ctx->cpu; So it's certainly the same CPU, right? You're right of couse - cpu is the submitting cpu and not the current one and thus not affected by preemption. Sorry for the noise.
diff --git a/block/blk-mq.c b/block/blk-mq.c index decb6ab2d508..e52200edd2b1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,7 @@ #include "blk-ioprio.h" static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); +static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd); static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); static void blk_mq_request_bypass_insert(struct request *rq, @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) { struct llist_head *list; unsigned int cpu; + call_single_data_t *csd; cpu = rq->mq_ctx->cpu; list = &per_cpu(blk_cpu_done, cpu); - if (llist_add(&rq->ipi_list, list)) { - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); - smp_call_function_single_async(cpu, &rq->csd); - } + csd = &per_cpu(blk_cpu_csd, cpu); + if (llist_add(&rq->ipi_list, list)) + smp_call_function_single_async(cpu, csd); } static void blk_mq_raise_softirq(struct request *rq) @@ -4796,6 +4797,9 @@ static int __init blk_mq_init(void) for_each_possible_cpu(i) init_llist_head(&per_cpu(blk_cpu_done, i)); + for_each_possible_cpu(i) + INIT_CSD(&per_cpu(blk_cpu_csd, i), + __blk_mq_complete_request_remote, NULL); open_softirq(BLOCK_SOFTIRQ, blk_done_softirq); cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD, diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f401067ac03a..070551197c0e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -182,10 +182,7 @@ struct request { rq_end_io_fn *saved_end_io; } flush; - union { - struct __call_single_data csd; - u64 fifo_time; - }; + u64 fifo_time; /* * completion callback.