Message ID | 20191008001416.12656-1-andrealmeid@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/1] blk-mq: fill header with kernel-doc | expand |
André Almeida <andrealmeid@collabora.com> writes: > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index e0fce93ac127..8b745f229789 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -10,74 +10,153 @@ struct blk_mq_tags; > struct blk_flush_queue; > > /** > - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device > + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware > + * block device > */ > struct blk_mq_hw_ctx { > struct { > + /** @lock: Lock for accessing dispatch queue */ > spinlock_t lock; > + /** > + * @dispatch: Queue of dispatched requests, waiting for > + * workers to send them to the hardware. > + */ It's been a few years since I looked at the block layer, but isn't this used to hold requests that were taken from the blk_mq_ctx, but couldn't be dispatched because the queue was full? > struct list_head dispatch; > - unsigned long state; /* BLK_MQ_S_* flags */ > + /** > + * @state: BLK_MQ_S_* flags. Define the state of the hw > + * queue (active, scheduled to restart, stopped). > + */ > + unsigned long state; > } ____cacheline_aligned_in_smp; > > + /** > + * @run_work: Worker for dispatch scheduled requests to hardware. > + * BLK_MQ_CPU_WORK_BATCH workers will be assigned to a CPU, then the > + * following ones will be assigned to the next_cpu. > + */ > struct delayed_work run_work; > + /** @cpumask: Map of available CPUs. */ Available cpus where this hctx can run. > cpumask_var_t cpumask; > + /** > + * @next_cpu: Index of CPU/workqueue to be used in the next batch > + * of workers. > + */ > int next_cpu; > + /** > + * @next_cpu_batch: Counter of how many works letf in the batch before > + * changing to the next CPU. A batch has the size > + * of BLK_MQ_CPU_WORK_BATCH. > + */ > int next_cpu_batch; > > - unsigned long flags; /* BLK_MQ_F_* flags */ > + /** @flags: BLK_MQ_F_* flags. Define the behaviour of the queue. */ > + unsigned long flags; > > + /** @sched_data: Data to support schedulers. */ > void *sched_data; > + /** @queue: Queue of request to be dispatched. */ > struct request_queue *queue; Maybe "the request queue this hctx belongs to" > + /** @fq: Queue of requests to be flushed from memory to storage. */ > struct blk_flush_queue *fq; Hmm, i think this needs clarification. > > + /** > + * @driver_data: Pointer to data owned by the block driver that created > + * this hctx > + */ > void *driver_data; > > + /** > + * @ctx_map: Bitmap for each software queue. If bit is on, there is a > + * pending request. > + */ > struct sbitmap ctx_map; > > + /** > + * @dispatch_from: Queue to be used when there is no scheduler > + * was selected. > + */ "when there is no scheduler selected" or "when no scheduler was selected" > struct blk_mq_ctx *dispatch_from; > + /** > + * @dispatch_busy: Number used by blk_mq_update_dispatch_busy() to > + * decide if the hw_queue is busy using Exponential Weighted Moving > + * Average algorithm. > + */ > unsigned int dispatch_busy; > > + /** @type: HCTX_TYPE_* flags. Type of hardware queue. */ > unsigned short type; > + /** @nr_ctx: Number of software queues. */ > unsigned short nr_ctx; > + /** @ctxs: Array of software queues. */ > struct blk_mq_ctx **ctxs; > > + /** @dispatch_wait_lock: Lock for dispatch_wait queue. */ > spinlock_t dispatch_wait_lock; > + /** > + * @dispatch_wait: Waitqueue for dispatched requests. Request here will > + * be processed when percpu_ref_is_zero(&q->q_usage_counter) evaluates > + * true for q as a request_queue. > + */ That's not really useful, since it is stating what the code does. It would be interesting to say why this is needed, i.e. to re-run the queue and dispatch requests if the queue was full in the previous attempt. > wait_queue_entry_t dispatch_wait; > + /** @wait_index: Index of next wait queue to be used. */ > atomic_t wait_index; > > + /** @tags: Request tags. */ > struct blk_mq_tags *tags; > + /** @sched_tags: Request tags for schedulers. */ > struct blk_mq_tags *sched_tags; > > + /** @queued: Number of queued requests. */ > unsigned long queued; > + /** @run: Number of dispatched requests. */ > unsigned long run; > #define BLK_MQ_MAX_DISPATCH_ORDER 7 > + /** @dispatched: Number of dispatch requests by queue. */ > unsigned long dispatched[BLK_MQ_MAX_DISPATCH_ORDER]; > > + /** @numa_node: NUMA node the storage adapter has been connected to. */ > unsigned int numa_node; > + /** @queue_num: Index of this hardware queue. */ > unsigned int queue_num; > > + /** @nr_active: Number of active tags. */ > atomic_t nr_active; > > + /** @cpuhp_dead: List to store request if some CPU die. */ > struct hlist_node cpuhp_dead; > + /** @kobj: Kernel object for sysfs. */ > struct kobject kobj; > > + /** @poll_considered: Count times blk_poll() was called. */ > unsigned long poll_considered; > + /** @poll_invoked: Count how many requests blk_poll() polled. */ > unsigned long poll_invoked; > + /** @poll_success: Count how many polled requests were completed. */ > unsigned long poll_success; > > #ifdef CONFIG_BLK_DEBUG_FS > + /** > + * @debugfs_dir: debugfs directory for this hardware queue. Named > + * as cpu<cpu_number>. > + */ > struct dentry *debugfs_dir; > + /** @sched_debugfs_dir: debugfs directory for the scheduler. */ > struct dentry *sched_debugfs_dir; > #endif > > + /** @hctx_list: List of all hardware queues. */ > struct list_head hctx_list; > > - /* Must be the last member - see also blk_mq_hw_ctx_size(). */ > + /** > + * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is > + * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also > + * blk_mq_hw_ctx_size(). > + */ > struct srcu_struct srcu[0]; > }; > > /** > - * struct blk_mq_queue_map - ctx -> hctx mapping > + * struct blk_mq_queue_map - Map software queues to hardware queues > * @mq_map: CPU ID to hardware queue index map. This is an array > * with nr_cpu_ids elements. Each element has a value in the range > * [@queue_offset, @queue_offset + @nr_queues). > @@ -92,10 +171,17 @@ struct blk_mq_queue_map { > unsigned int queue_offset; > }; > > +/** > + * enum hctx_type - Type of hardware queue > + * @HCTX_TYPE_DEFAULT: All I/O not otherwise accounted for. > + * @HCTX_TYPE_READ: Just for READ I/O. > + * @HCTX_TYPE_POLL: Polled I/O of any kind. > + * @HCTX_MAX_TYPES: Number of types of hctx. > + */ > enum hctx_type { > - HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ > - HCTX_TYPE_READ, /* just for READ I/O */ > - HCTX_TYPE_POLL, /* polled I/O of any kind */ > + HCTX_TYPE_DEFAULT, > + HCTX_TYPE_READ, > + HCTX_TYPE_POLL, > > HCTX_MAX_TYPES, > }; > @@ -147,6 +233,12 @@ struct blk_mq_tag_set { > struct list_head tag_list; > }; > > +/** > + * struct blk_mq_queue_data - Data about a request inserted in a queue > + * > + * @rq: Data about the block IO request. > + * @last: If it is the last request in the queue. > + */ > struct blk_mq_queue_data { > struct request *rq; > bool last; > @@ -174,81 +266,101 @@ typedef bool (busy_fn)(struct request_queue *); > typedef void (complete_fn)(struct request *); > typedef void (cleanup_rq_fn)(struct request *); > > - > +/** > + * struct blk_mq_ops - list of callback functions that implements block driver > + * behaviour. > + */ > struct blk_mq_ops { > - /* > - * Queue request > + /** > + * @queue_rq: Queue a new request from block IO. > */ > queue_rq_fn *queue_rq; > > - /* > - * If a driver uses bd->last to judge when to submit requests to > - * hardware, it must define this function. In case of errors that > - * make us stop issuing further requests, this hook serves the > + /** > + * @commit_rqs: If a driver uses bd->last to judge when to submit > + * requests to hardware, it must define this function. In case of errors > + * that make us stop issuing further requests, this hook serves the > * purpose of kicking the hardware (which the last request otherwise > * would have done). > */ > commit_rqs_fn *commit_rqs; > > - /* > - * Reserve budget before queue request, once .queue_rq is > + /** > + * @get_budget: Reserve budget before queue request, once .queue_rq is > * run, it is driver's responsibility to release the > * reserved budget. Also we have to handle failure case > * of .get_budget for avoiding I/O deadlock. > */ > get_budget_fn *get_budget; > + /** > + * @put_budget: Release the reserved budget. > + */ > put_budget_fn *put_budget; > > - /* > - * Called on request timeout > + /** > + * @timeout: Called on request timeout. > */ > timeout_fn *timeout; > > - /* > - * Called to poll for completion of a specific tag. > + /** > + * @poll: Called to poll for completion of a specific tag. > */ > poll_fn *poll; > > + /** > + * @complete: Mark the request as complete. > + */ > complete_fn *complete; > > - /* > - * Called when the block layer side of a hardware queue has been > - * set up, allowing the driver to allocate/init matching structures. > - * Ditto for exit/teardown. > + /** > + * @init_hctx: Called when the block layer side of a hardware queue has > + * been set up, allowing the driver to allocate/init matching > + * structures. > */ > init_hctx_fn *init_hctx; > + /** > + * @exit_hctx: Ditto for exit/teardown. > + */ > exit_hctx_fn *exit_hctx; > > - /* > - * Called for every command allocated by the block layer to allow > - * the driver to set up driver specific data. > + /** > + * @init_request: Called for every command allocated by the block layer > + * to allow the driver to set up driver specific data. > * > * Tag greater than or equal to queue_depth is for setting up > * flush request. > - * > - * Ditto for exit/teardown. > */ > init_request_fn *init_request; > + /** > + * @exit_request: Ditto for exit/teardown. > + */ > exit_request_fn *exit_request; > - /* Called from inside blk_get_request() */ > + > + /** > + * @initialize_rq_fn: Called from inside blk_get_request(). > + */ > void (*initialize_rq_fn)(struct request *rq); > > - /* > - * Called before freeing one request which isn't completed yet, > - * and usually for freeing the driver private data > + /** > + * @cleanup_rq: Called before freeing one request which isn't completed > + * yet, and usually for freeing the driver private data. > */ > cleanup_rq_fn *cleanup_rq; > > - /* > - * If set, returns whether or not this queue currently is busy > + /** > + * @busy: If set, returns whether or not this queue currently is busy. > */ > busy_fn *busy; > > + /** > + * @map_queues: This allows drivers specify their own queue mapping by > + * overriding the setup-time function that builds the mq_map. > + */ > map_queues_fn *map_queues; > > #ifdef CONFIG_BLK_DEBUG_FS > - /* > - * Used by the debugfs implementation to show driver-specific > + /** > + * @show_rq: Used by the debugfs implementation to show driver-specific > * information about a request. > */ > void (*show_rq)(struct seq_file *m, struct request *rq); > @@ -391,14 +503,29 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q); > > unsigned int blk_mq_rq_cpu(struct request *rq); > > -/* > +/** > + * blk_mq_rq_from_pdu - cast a PDU to a request > + * @pdu: the PDU (Protocol Data Unit) to be casted > + * > + * Return: request > + * > * Driver command data is immediately after the request. So subtract request > - * size to get back to the original request, add request size to get the PDU. > + * size to get back to the original request. > */ > static inline struct request *blk_mq_rq_from_pdu(void *pdu) > { > return pdu - sizeof(struct request); > } > + > +/** > + * blk_mq_rq_to_pdu - cast a request to a PDU > + * @rq: the request to be casted > + * > + * Return: pointer to the PDU > + * > + * Driver command data is immediately after the request. So add request to get > + * the PDU. > + */ > static inline void *blk_mq_rq_to_pdu(struct request *rq) > { > return rq + 1;
On 10/8/19 9:35 AM, Gabriel Krisman Bertazi wrote: > André Almeida <andrealmeid@collabora.com> writes: > >> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h >> index e0fce93ac127..8b745f229789 100644 >> --- a/include/linux/blk-mq.h >> +++ b/include/linux/blk-mq.h >> @@ -10,74 +10,153 @@ struct blk_mq_tags; >> struct blk_flush_queue; >> >> /** >> - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device >> + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware >> + * block device >> */ >> struct blk_mq_hw_ctx { >> struct { >> + /** @lock: Lock for accessing dispatch queue */ >> spinlock_t lock; >> + /** >> + * @dispatch: Queue of dispatched requests, waiting for >> + * workers to send them to the hardware. >> + */ > > It's been a few years since I looked at the block layer, but isn't > this used to hold requests that were taken from the blk_mq_ctx, but > couldn't be dispatched because the queue was full? I don't think so. I think that you are looking for the requeue_list member of struct request_queue. Bart.
On 10/7/19 5:14 PM, André Almeida wrote: > struct blk_mq_hw_ctx { > struct { > + /** @lock: Lock for accessing dispatch queue */ > spinlock_t lock; This spinlock not only protects dispatch list accesses but also modifications of the dispatch list. How about changing that comment into "protects the dispatch list"? > + /** > + * @dispatch: Queue of dispatched requests, waiting for > + * workers to send them to the hardware. > + */ > struct list_head dispatch; What is a worker? That word is not used anywhere in the block layer. How about changing that comment into "requests owned by this hardware queue"? > - unsigned long state; /* BLK_MQ_S_* flags */ > + /** > + * @state: BLK_MQ_S_* flags. Define the state of the hw ^^^^^^ Defines? > > + /** > + * @run_work: Worker for dispatch scheduled requests to hardware. > + * BLK_MQ_CPU_WORK_BATCH workers will be assigned to a CPU, then the > + * following ones will be assigned to the next_cpu. > + */ > struct delayed_work run_work; I'd prefer if algorithm details would be left out from structure documentation since such documentation becomes easily outdated. How about using something like the following description: "used for scheduling a hardware queue run at a later time"? > + /** > + * @next_cpu: Index of CPU/workqueue to be used in the next batch > + * of workers. > + */ The word "workers" here is confusing. How about the following description: "used by blk_mq_hctx_next_cpu() for round-robin CPU selection from @cpumask"? > + /** > + * @next_cpu_batch: Counter of how many works letf in the batch before ^^^^ left? > + * changing to the next CPU. A batch has the size > + * of BLK_MQ_CPU_WORK_BATCH. > + */ > int next_cpu_batch; Again, I think this is too much detail about the actual algorithm. > > - unsigned long flags; /* BLK_MQ_F_* flags */ > + /** @flags: BLK_MQ_F_* flags. Define the behaviour of the queue. */ ^^^^^^ Defines? > + unsigned long flags; > > + /** @sched_data: Data to support schedulers. */ > void *sched_data; That's a rather vague description. How about mentioning that this pointer is owned by the I/O scheduler that has been attached to a request queue and that the I/O scheduler decides how to use this pointer? > + /** @queue: Queue of request to be dispatched. */ > struct request_queue *queue; How about "pointer to the request queue that owns this hardware context"? > + /** > + * @ctx_map: Bitmap for each software queue. If bit is on, there is a > + * pending request. > + */ > struct sbitmap ctx_map; Please add " in that software queue" at the end of the description. > > + /** > + * @dispatch_from: Queue to be used when there is no scheduler > + * was selected. > + */ > struct blk_mq_ctx *dispatch_from; Does the word "queue" refer to a request queue, software queue or hardware queue? Please make that clear. > + /** > + * @dispatch_wait: Waitqueue for dispatched requests. Request here will > + * be processed when percpu_ref_is_zero(&q->q_usage_counter) evaluates > + * true for q as a request_queue. > + */ > wait_queue_entry_t dispatch_wait; That description doesn't look correct to me. I think that @dispatch_wait is used to wait if no tags are available. The comment above blk_mq_mark_tag_wait() is as follows: /* * Mark us waiting for a tag. For shared tags, this involves hooking us * into the tag wakeups. For non-shared tags, we can simply mark us * needing a restart. For both cases, take care to check the condition * again after marking us as waiting. */ > + /** @wait_index: Index of next wait queue to be used. */ > atomic_t wait_index; To be used by what? > + /** @tags: Request tags. */ > struct blk_mq_tags *tags; > + /** @sched_tags: Request tags for schedulers. */ > struct blk_mq_tags *sched_tags; I think @tags represents tags owned by the block driver and @sched_tags represents tags owned by the I/O scheduler. If no I/O scheduler is associated with a request queue, only a driver tag is allocated. If an I/O scheduler has been associated with a request queue, a request is assigned a tag from @sched_tags when that request is allocated. A tag from @tags is only assigned when a request is dispatched to a hardware queue. See also blk_mq_get_driver_tag(). > + /** @nr_active: Number of active tags. */ > atomic_t nr_active; Isn't this the number of active requests instead of the number of active tags? Please mention that this member is only used when a tag set is shared across request queues. > +/** > + * struct blk_mq_queue_data - Data about a request inserted in a queue > + * > + * @rq: Data about the block IO request. How about changing this into "Request pointer"? > +/** > + * struct blk_mq_ops - list of callback functions that implements block driver > + * behaviour. > + */ Is this really a list? > * Driver command data is immediately after the request. So subtract request > - * size to get back to the original request, add request size to get the PDU. > + * size to get back to the original request. > */ > static inline struct request *blk_mq_rq_from_pdu(void *pdu) > { > return pdu - sizeof(struct request); > } I'm not sure this change is an improvement. Bart.
Bart Van Assche <bvanassche@acm.org> writes: > On 10/8/19 9:35 AM, Gabriel Krisman Bertazi wrote: >> André Almeida <andrealmeid@collabora.com> writes: >> >>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h >>> index e0fce93ac127..8b745f229789 100644 >>> --- a/include/linux/blk-mq.h >>> +++ b/include/linux/blk-mq.h >>> @@ -10,74 +10,153 @@ struct blk_mq_tags; >>> struct blk_flush_queue; >>> /** >>> - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device >>> + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware >>> + * block device >>> */ >>> struct blk_mq_hw_ctx { >>> struct { >>> + /** @lock: Lock for accessing dispatch queue */ >>> spinlock_t lock; >>> + /** >>> + * @dispatch: Queue of dispatched requests, waiting for >>> + * workers to send them to the hardware. >>> + */ >> >> It's been a few years since I looked at the block layer, but isn't >> this used to hold requests that were taken from the blk_mq_ctx, but >> couldn't be dispatched because the queue was full? > > I don't think so. I think that you are looking for the requeue_list > member of struct request_queue. > Hmm, sorry, but I'm confused. I'm sure I'm missing something simple, since I haven't touched this in a while, so maybe you can quickly point me in the right direction? I see blk_mq_requeue_request() being used by device drivers to retry requests that failed, but if I read the code correctly, the flushed queue seems to be moved to hctx->dispatch when the device driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by the driver on .queue_rq() to signal there was no more resources on the hardware to service further requests.
On 10/8/19 11:46 AM, Gabriel Krisman Bertazi wrote: > Hmm, sorry, but I'm confused. I'm sure I'm missing something simple, > since I haven't touched this in a while, so maybe you can quickly point > me in the right direction? > > I see blk_mq_requeue_request() being used by device drivers to retry > requests that failed, but if I read the code correctly, the flushed > queue seems to be moved to hctx->dispatch when the device > driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to > blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by > the driver on .queue_rq() to signal there was no more resources on the > hardware to service further requests. Hi Gabriel, The simplified version of how requests are requeued as follows: * A block driver calls blk_mq_requeue_request(). * blk_mq_requeue_request() calls blk_mq_add_to_requeue_list() * blk_mq_add_to_requeue_list() executes the following code: list_add_tail(&rq->queuelist, &q->requeue_list) * A block driver or the block layer core calls blk_mq_kick_requeue_list() or blk_mq_delay_kick_requeue_list(). Both functions trigger a call of blk_mq_requeue_work(). * blk_mq_requeue_work() processes q->requeue_list. Bart.
On 10/8/19 5:01 PM, Bart Van Assche wrote: > On 10/8/19 11:46 AM, Gabriel Krisman Bertazi wrote: >> Hmm, sorry, but I'm confused. I'm sure I'm missing something simple, >> since I haven't touched this in a while, so maybe you can quickly point >> me in the right direction? >> >> I see blk_mq_requeue_request() being used by device drivers to retry >> requests that failed, but if I read the code correctly, the flushed >> queue seems to be moved to hctx->dispatch when the device >> driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to >> blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by >> the driver on .queue_rq() to signal there was no more resources on the >> hardware to service further requests. > > Hi Gabriel, > > The simplified version of how requests are requeued as follows: > * A block driver calls blk_mq_requeue_request(). > * blk_mq_requeue_request() calls blk_mq_add_to_requeue_list() > * blk_mq_add_to_requeue_list() executes the following code: > list_add_tail(&rq->queuelist, &q->requeue_list) > * A block driver or the block layer core calls > blk_mq_kick_requeue_list() or blk_mq_delay_kick_requeue_list(). Both > functions trigger a call of blk_mq_requeue_work(). > * blk_mq_requeue_work() processes q->requeue_list. > > Bart. > > Hello Bart, Seems that it's not clear for me the role of these members. Could you please check if those definitions make sense for you? - hctx->dispatch: This queue is used for requests that are ready to be dispatched to the hardware but for some reason (e.g. lack of resources, the hardware is to busy and can't get more requests) could not be sent to the hardware. As soon as the driver can send new requests, those queued at this list will be sent first for a more fair dispatch. Since those requests are at the hctx, they can't be requeued or rescheduled anymore. - request_queue->requeue_list: This list is used when it's not possible to send the request to the associated hctx. This can happen if the associated CPU or hctx is not available anymore. When requeueing those requests, it will be possible to send them to new and function queues. Thanks, André
Hello Bart, On 10/8/19 3:12 PM, Bart Van Assche wrote: > On 10/7/19 5:14 PM, André Almeida wrote: ... > > Bart. Thanks you for your extensive and detailed feedback. I'm applying those changes for v3. André
On 10/10/19 1:38 PM, André Almeida wrote: > Seems that it's not clear for me the role of these members. Could you > please check if those definitions make sense for you? > > - hctx->dispatch: This queue is used for requests that are ready to be > dispatched to the hardware but for some reason (e.g. lack of resources, > the hardware is to busy and can't get more requests) could not be sent > to the hardware. As soon as the driver can send new requests, those > queued at this list will be sent first for a more fair dispatch. Since > those requests are at the hctx, they can't be requeued or rescheduled > anymore. > > - request_queue->requeue_list: This list is used when it's not possible > to send the request to the associated hctx. This can happen if the > associated CPU or hctx is not available anymore. When requeueing those > requests, it will be possible to send them to new and function queues. Hi André, The hctx->dispatch description looks mostly fine. Can the following part be left out since it looks confusing to me: "Since those requests are at the hctx, they can't be requeued or rescheduled anymore." How about changing the requeue_list description into the following: "requests requeued by a call to blk_mq_requeue_request()". Thanks, Bart.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e0fce93ac127..8b745f229789 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -10,74 +10,153 @@ struct blk_mq_tags; struct blk_flush_queue; /** - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware + * block device */ struct blk_mq_hw_ctx { struct { + /** @lock: Lock for accessing dispatch queue */ spinlock_t lock; + /** + * @dispatch: Queue of dispatched requests, waiting for + * workers to send them to the hardware. + */ struct list_head dispatch; - unsigned long state; /* BLK_MQ_S_* flags */ + /** + * @state: BLK_MQ_S_* flags. Define the state of the hw + * queue (active, scheduled to restart, stopped). + */ + unsigned long state; } ____cacheline_aligned_in_smp; + /** + * @run_work: Worker for dispatch scheduled requests to hardware. + * BLK_MQ_CPU_WORK_BATCH workers will be assigned to a CPU, then the + * following ones will be assigned to the next_cpu. + */ struct delayed_work run_work; + /** @cpumask: Map of available CPUs. */ cpumask_var_t cpumask; + /** + * @next_cpu: Index of CPU/workqueue to be used in the next batch + * of workers. + */ int next_cpu; + /** + * @next_cpu_batch: Counter of how many works letf in the batch before + * changing to the next CPU. A batch has the size + * of BLK_MQ_CPU_WORK_BATCH. + */ int next_cpu_batch; - unsigned long flags; /* BLK_MQ_F_* flags */ + /** @flags: BLK_MQ_F_* flags. Define the behaviour of the queue. */ + unsigned long flags; + /** @sched_data: Data to support schedulers. */ void *sched_data; + /** @queue: Queue of request to be dispatched. */ struct request_queue *queue; + /** @fq: Queue of requests to be flushed from memory to storage. */ struct blk_flush_queue *fq; + /** + * @driver_data: Pointer to data owned by the block driver that created + * this hctx + */ void *driver_data; + /** + * @ctx_map: Bitmap for each software queue. If bit is on, there is a + * pending request. + */ struct sbitmap ctx_map; + /** + * @dispatch_from: Queue to be used when there is no scheduler + * was selected. + */ struct blk_mq_ctx *dispatch_from; + /** + * @dispatch_busy: Number used by blk_mq_update_dispatch_busy() to + * decide if the hw_queue is busy using Exponential Weighted Moving + * Average algorithm. + */ unsigned int dispatch_busy; + /** @type: HCTX_TYPE_* flags. Type of hardware queue. */ unsigned short type; + /** @nr_ctx: Number of software queues. */ unsigned short nr_ctx; + /** @ctxs: Array of software queues. */ struct blk_mq_ctx **ctxs; + /** @dispatch_wait_lock: Lock for dispatch_wait queue. */ spinlock_t dispatch_wait_lock; + /** + * @dispatch_wait: Waitqueue for dispatched requests. Request here will + * be processed when percpu_ref_is_zero(&q->q_usage_counter) evaluates + * true for q as a request_queue. + */ wait_queue_entry_t dispatch_wait; + /** @wait_index: Index of next wait queue to be used. */ atomic_t wait_index; + /** @tags: Request tags. */ struct blk_mq_tags *tags; + /** @sched_tags: Request tags for schedulers. */ struct blk_mq_tags *sched_tags; + /** @queued: Number of queued requests. */ unsigned long queued; + /** @run: Number of dispatched requests. */ unsigned long run; #define BLK_MQ_MAX_DISPATCH_ORDER 7 + /** @dispatched: Number of dispatch requests by queue. */ unsigned long dispatched[BLK_MQ_MAX_DISPATCH_ORDER]; + /** @numa_node: NUMA node the storage adapter has been connected to. */ unsigned int numa_node; + /** @queue_num: Index of this hardware queue. */ unsigned int queue_num; + /** @nr_active: Number of active tags. */ atomic_t nr_active; + /** @cpuhp_dead: List to store request if some CPU die. */ struct hlist_node cpuhp_dead; + /** @kobj: Kernel object for sysfs. */ struct kobject kobj; + /** @poll_considered: Count times blk_poll() was called. */ unsigned long poll_considered; + /** @poll_invoked: Count how many requests blk_poll() polled. */ unsigned long poll_invoked; + /** @poll_success: Count how many polled requests were completed. */ unsigned long poll_success; #ifdef CONFIG_BLK_DEBUG_FS + /** + * @debugfs_dir: debugfs directory for this hardware queue. Named + * as cpu<cpu_number>. + */ struct dentry *debugfs_dir; + /** @sched_debugfs_dir: debugfs directory for the scheduler. */ struct dentry *sched_debugfs_dir; #endif + /** @hctx_list: List of all hardware queues. */ struct list_head hctx_list; - /* Must be the last member - see also blk_mq_hw_ctx_size(). */ + /** + * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is + * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also + * blk_mq_hw_ctx_size(). + */ struct srcu_struct srcu[0]; }; /** - * struct blk_mq_queue_map - ctx -> hctx mapping + * struct blk_mq_queue_map - Map software queues to hardware queues * @mq_map: CPU ID to hardware queue index map. This is an array * with nr_cpu_ids elements. Each element has a value in the range * [@queue_offset, @queue_offset + @nr_queues). @@ -92,10 +171,17 @@ struct blk_mq_queue_map { unsigned int queue_offset; }; +/** + * enum hctx_type - Type of hardware queue + * @HCTX_TYPE_DEFAULT: All I/O not otherwise accounted for. + * @HCTX_TYPE_READ: Just for READ I/O. + * @HCTX_TYPE_POLL: Polled I/O of any kind. + * @HCTX_MAX_TYPES: Number of types of hctx. + */ enum hctx_type { - HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ - HCTX_TYPE_READ, /* just for READ I/O */ - HCTX_TYPE_POLL, /* polled I/O of any kind */ + HCTX_TYPE_DEFAULT, + HCTX_TYPE_READ, + HCTX_TYPE_POLL, HCTX_MAX_TYPES, }; @@ -147,6 +233,12 @@ struct blk_mq_tag_set { struct list_head tag_list; }; +/** + * struct blk_mq_queue_data - Data about a request inserted in a queue + * + * @rq: Data about the block IO request. + * @last: If it is the last request in the queue. + */ struct blk_mq_queue_data { struct request *rq; bool last; @@ -174,81 +266,101 @@ typedef bool (busy_fn)(struct request_queue *); typedef void (complete_fn)(struct request *); typedef void (cleanup_rq_fn)(struct request *); - +/** + * struct blk_mq_ops - list of callback functions that implements block driver + * behaviour. + */ struct blk_mq_ops { - /* - * Queue request + /** + * @queue_rq: Queue a new request from block IO. */ queue_rq_fn *queue_rq; - /* - * If a driver uses bd->last to judge when to submit requests to - * hardware, it must define this function. In case of errors that - * make us stop issuing further requests, this hook serves the + /** + * @commit_rqs: If a driver uses bd->last to judge when to submit + * requests to hardware, it must define this function. In case of errors + * that make us stop issuing further requests, this hook serves the * purpose of kicking the hardware (which the last request otherwise * would have done). */ commit_rqs_fn *commit_rqs; - /* - * Reserve budget before queue request, once .queue_rq is + /** + * @get_budget: Reserve budget before queue request, once .queue_rq is * run, it is driver's responsibility to release the * reserved budget. Also we have to handle failure case * of .get_budget for avoiding I/O deadlock. */ get_budget_fn *get_budget; + /** + * @put_budget: Release the reserved budget. + */ put_budget_fn *put_budget; - /* - * Called on request timeout + /** + * @timeout: Called on request timeout. */ timeout_fn *timeout; - /* - * Called to poll for completion of a specific tag. + /** + * @poll: Called to poll for completion of a specific tag. */ poll_fn *poll; + /** + * @complete: Mark the request as complete. + */ complete_fn *complete; - /* - * Called when the block layer side of a hardware queue has been - * set up, allowing the driver to allocate/init matching structures. - * Ditto for exit/teardown. + /** + * @init_hctx: Called when the block layer side of a hardware queue has + * been set up, allowing the driver to allocate/init matching + * structures. */ init_hctx_fn *init_hctx; + /** + * @exit_hctx: Ditto for exit/teardown. + */ exit_hctx_fn *exit_hctx; - /* - * Called for every command allocated by the block layer to allow - * the driver to set up driver specific data. + /** + * @init_request: Called for every command allocated by the block layer + * to allow the driver to set up driver specific data. * * Tag greater than or equal to queue_depth is for setting up * flush request. - * - * Ditto for exit/teardown. */ init_request_fn *init_request; + /** + * @exit_request: Ditto for exit/teardown. + */ exit_request_fn *exit_request; - /* Called from inside blk_get_request() */ + + /** + * @initialize_rq_fn: Called from inside blk_get_request(). + */ void (*initialize_rq_fn)(struct request *rq); - /* - * Called before freeing one request which isn't completed yet, - * and usually for freeing the driver private data + /** + * @cleanup_rq: Called before freeing one request which isn't completed + * yet, and usually for freeing the driver private data. */ cleanup_rq_fn *cleanup_rq; - /* - * If set, returns whether or not this queue currently is busy + /** + * @busy: If set, returns whether or not this queue currently is busy. */ busy_fn *busy; + /** + * @map_queues: This allows drivers specify their own queue mapping by + * overriding the setup-time function that builds the mq_map. + */ map_queues_fn *map_queues; #ifdef CONFIG_BLK_DEBUG_FS - /* - * Used by the debugfs implementation to show driver-specific + /** + * @show_rq: Used by the debugfs implementation to show driver-specific * information about a request. */ void (*show_rq)(struct seq_file *m, struct request *rq); @@ -391,14 +503,29 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q); unsigned int blk_mq_rq_cpu(struct request *rq); -/* +/** + * blk_mq_rq_from_pdu - cast a PDU to a request + * @pdu: the PDU (Protocol Data Unit) to be casted + * + * Return: request + * * Driver command data is immediately after the request. So subtract request - * size to get back to the original request, add request size to get the PDU. + * size to get back to the original request. */ static inline struct request *blk_mq_rq_from_pdu(void *pdu) { return pdu - sizeof(struct request); } + +/** + * blk_mq_rq_to_pdu - cast a request to a PDU + * @rq: the request to be casted + * + * Return: pointer to the PDU + * + * Driver command data is immediately after the request. So add request to get + * the PDU. + */ static inline void *blk_mq_rq_to_pdu(struct request *rq) { return rq + 1;
Insert documentation for structs, enums and functions at header file. Format existing and new comments at struct blk_mq_ops as kernel-doc comments. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- Hello, This patch is an effort to enhance the documentation of the multiqueue API. To check if the comments are confirming to kernel-doc format, you first need to apply a patch[1] solving a bug around the `__cacheline...` directive. Then, just run ./scripts/kernel-doc -none include/linux/blk-mq.h to check that there is no warning at this documentation. I've done my best at the source to check the purpose of each struct and its members, but please let me know if I get something wrong. Changes since v2: - Fix the meaning of PDU - Uses nested comments for long structs - Slight reword member definition to look closer to work made at commit "block: Document all members of blk_mq_tag_set and bkl_mq_queue_map" [1] https://lkml.org/lkml/2019/9/17/820 --- include/linux/blk-mq.h | 207 +++++++++++++++++++++++++++++++++-------- 1 file changed, 167 insertions(+), 40 deletions(-)