[v2,1/1] blk-mq: fill header with kernel-doc

Message ID	20191008001416.12656-1-andrealmeid@collabora.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ti8e=YB=vger.kernel.org=linux-block-owner@kernel.org> sender: tonyk) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id C60AE28E972; Tue, 8 Oct 2019 01:15:35 +0100 (BST) From: =?utf-8?q?Andr=C3=A9_Almeida?= <andrealmeid@collabora.com> To: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: axboe@kernel.dk, kernel@collabora.com, krisman@collabora.com, =?utf-8?q?Andr=C3=A9_Almeida?= <andrealmeid@collabora.com> Subject: [PATCH v2 1/1] blk-mq: fill header with kernel-doc Date: Mon, 7 Oct 2019 21:14:16 -0300 Message-Id: <20191008001416.12656-1-andrealmeid@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	[v2,1/1] blk-mq: fill header with kernel-doc \| expand [v2,1/1] blk-mq: fill header with kernel-doc

André Almeida Oct. 8, 2019, 12:14 a.m. UTC

Insert documentation for structs, enums and functions at header file.
Format existing and new comments at struct blk_mq_ops as
kernel-doc comments.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
Hello,

This patch is an effort to enhance the documentation of the multiqueue
API. To check if the comments are confirming to kernel-doc format, you
first need to apply a patch[1] solving a bug around the `__cacheline...`
directive. Then, just run

./scripts/kernel-doc -none include/linux/blk-mq.h

to check that there is no warning at this documentation. I've done my
best at the source to check the purpose of each struct and its members,
but please let me know if I get something wrong.

Changes since v2:
- Fix the meaning of PDU
- Uses nested comments for long structs
- Slight reword member definition to look closer to work made at commit
"block: Document all members of blk_mq_tag_set and bkl_mq_queue_map"

[1] https://lkml.org/lkml/2019/9/17/820
---
 include/linux/blk-mq.h | 207 +++++++++++++++++++++++++++++++++--------
 1 file changed, 167 insertions(+), 40 deletions(-)

Gabriel Krisman Bertazi Oct. 8, 2019, 4:35 p.m. UTC | #1

André Almeida <andrealmeid@collabora.com> writes:

> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index e0fce93ac127..8b745f229789 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -10,74 +10,153 @@ struct blk_mq_tags;
>  struct blk_flush_queue;
>  
>  /**
> - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device
> + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware
> + * block device
>   */
>  struct blk_mq_hw_ctx {
>  	struct {
> +		/** @lock: Lock for accessing dispatch queue */
>  		spinlock_t		lock;
> +		/**
> +		 * @dispatch: Queue of dispatched requests, waiting for
> +		 * workers to send them to the hardware.
> +		 */

It's been a few years since I looked at the block layer, but isn't
this used to hold requests that were taken from the blk_mq_ctx,  but
couldn't be dispatched because the queue was full?

>  		struct list_head	dispatch;
> -		unsigned long		state;		/* BLK_MQ_S_* flags */
> +		 /**
> +		  * @state: BLK_MQ_S_* flags. Define the state of the hw
> +		  * queue (active, scheduled to restart, stopped).
> +		  */
> +		unsigned long		state;
>  	} ____cacheline_aligned_in_smp;
>  
> +	/**
> +	 * @run_work: Worker for dispatch scheduled requests to hardware.
> +	 * BLK_MQ_CPU_WORK_BATCH workers will be assigned to a CPU, then the
> +	 * following ones will be assigned to the next_cpu.
> +	 */
>  	struct delayed_work	run_work;
> +	/** @cpumask: Map of available CPUs. */

Available cpus where this hctx can run.

>  	cpumask_var_t		cpumask;
> +	/**
> +	 * @next_cpu: Index of CPU/workqueue to be used in the next batch
> +	 * of workers.
> +	 */
>  	int			next_cpu;
> +	/**
> +	 * @next_cpu_batch: Counter of how many works letf in the batch before
> +	 * changing to the next CPU. A batch has the size
> +	 * of BLK_MQ_CPU_WORK_BATCH.
> +	 */
>  	int			next_cpu_batch;
>  
> -	unsigned long		flags;		/* BLK_MQ_F_* flags */
> +	/** @flags: BLK_MQ_F_* flags. Define the behaviour of the queue. */
> +	unsigned long		flags;
>  
> +	/** @sched_data: Data to support schedulers. */
>  	void			*sched_data;
> +	/** @queue: Queue of request to be dispatched. */
>  	struct request_queue	*queue;

Maybe "the request queue this hctx belongs to"


> +	/** @fq: Queue of requests to be flushed from memory to storage. */
>  	struct blk_flush_queue	*fq;

Hmm, i think this needs clarification.
>  
> +	/**
> +	 * @driver_data: Pointer to data owned by the block driver that created
> +	 * this hctx
> +	 */
>  	void			*driver_data;
>  
> +	/**
> +	 * @ctx_map: Bitmap for each software queue. If bit is on, there is a
> +	 * pending request.
> +	 */
>  	struct sbitmap		ctx_map;
>  
> +	/**
> +	 * @dispatch_from: Queue to be used when there is no scheduler
> +	 * was selected.
> +	 */

"when there is no scheduler selected" or "when no scheduler was selected"

>  	struct blk_mq_ctx	*dispatch_from;
> +	/**
> +	 * @dispatch_busy: Number used by blk_mq_update_dispatch_busy() to
> +	 * decide if the hw_queue is busy using Exponential Weighted Moving
> +	 * Average algorithm.
> +	 */
>  	unsigned int		dispatch_busy;
>  
> +	/** @type: HCTX_TYPE_* flags. Type of hardware queue. */
>  	unsigned short		type;
> +	/** @nr_ctx: Number of software queues. */
>  	unsigned short		nr_ctx;
> +	/** @ctxs:	Array of software queues. */
>  	struct blk_mq_ctx	**ctxs;
>  
> +	/** @dispatch_wait_lock: Lock for dispatch_wait queue. */
>  	spinlock_t		dispatch_wait_lock;
> +	/**
> +	 * @dispatch_wait: Waitqueue for dispatched requests. Request here will
> +	 * be processed when percpu_ref_is_zero(&q->q_usage_counter) evaluates
> +	 * true for q as a request_queue.
> +	 */

That's not really useful, since it is stating what the code does.  It
would be interesting to say why this is needed, i.e. to re-run the queue
and dispatch requests if the queue was full in the previous attempt.

>  	wait_queue_entry_t	dispatch_wait;
> +	/** @wait_index: Index of next wait queue to be used. */
>  	atomic_t		wait_index;
>  
> +	/** @tags: Request tags. */
>  	struct blk_mq_tags	*tags;
> +	/** @sched_tags: Request tags for schedulers. */
>  	struct blk_mq_tags	*sched_tags;
>  
> +	/** @queued: Number of queued requests. */
>  	unsigned long		queued;
> +	/** @run: Number of dispatched requests. */
>  	unsigned long		run;
>  #define BLK_MQ_MAX_DISPATCH_ORDER	7
> +	/** @dispatched: Number of dispatch requests by queue. */
>  	unsigned long		dispatched[BLK_MQ_MAX_DISPATCH_ORDER];
>  
> +	/** @numa_node: NUMA node the storage adapter has been connected to. */
>  	unsigned int		numa_node;
> +	/** @queue_num: Index of this hardware queue. */
>  	unsigned int		queue_num;
>  
> +	/** @nr_active:	Number of active tags. */
>  	atomic_t		nr_active;
>  
> +	/** @cpuhp_dead: List to store request if some CPU die. */
>  	struct hlist_node	cpuhp_dead;
> +	/** @kobj: Kernel object for sysfs. */
>  	struct kobject		kobj;
>  
> +	/** @poll_considered: Count times blk_poll() was called. */
>  	unsigned long		poll_considered;
> +	/** @poll_invoked: Count how many requests blk_poll() polled. */
>  	unsigned long		poll_invoked;
> +	/** @poll_success: Count how many polled requests were completed. */
>  	unsigned long		poll_success;
>  
>  #ifdef CONFIG_BLK_DEBUG_FS
> +	/**
> +	 * @debugfs_dir: debugfs directory for this hardware queue. Named
> +	 * as cpu<cpu_number>.
> +	 */
>  	struct dentry		*debugfs_dir;
> +	/** @sched_debugfs_dir:	debugfs directory for the scheduler. */
>  	struct dentry		*sched_debugfs_dir;
>  #endif
>  
> +	/** @hctx_list:	List of all hardware queues. */
>  	struct list_head	hctx_list;
>  
> -	/* Must be the last member - see also blk_mq_hw_ctx_size(). */
> +	/**
> +	 * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is
> +	 * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also
> +	 * blk_mq_hw_ctx_size().
> +	 */
>  	struct srcu_struct	srcu[0];
>  };
>  
>  /**
> - * struct blk_mq_queue_map - ctx -> hctx mapping
> + * struct blk_mq_queue_map - Map software queues to hardware queues
>   * @mq_map:       CPU ID to hardware queue index map. This is an array
>   *	with nr_cpu_ids elements. Each element has a value in the range
>   *	[@queue_offset, @queue_offset + @nr_queues).
> @@ -92,10 +171,17 @@ struct blk_mq_queue_map {
>  	unsigned int queue_offset;
>  };
>  
> +/**
> + * enum hctx_type - Type of hardware queue
> + * @HCTX_TYPE_DEFAULT:	All I/O not otherwise accounted for.
> + * @HCTX_TYPE_READ:	Just for READ I/O.
> + * @HCTX_TYPE_POLL:	Polled I/O of any kind.
> + * @HCTX_MAX_TYPES:	Number of types of hctx.
> + */
>  enum hctx_type {
> -	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> -	HCTX_TYPE_READ,		/* just for READ I/O */
> -	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> +	HCTX_TYPE_DEFAULT,
> +	HCTX_TYPE_READ,
> +	HCTX_TYPE_POLL,
>  
>  	HCTX_MAX_TYPES,
>  };
> @@ -147,6 +233,12 @@ struct blk_mq_tag_set {
>  	struct list_head	tag_list;
>  };
>  
> +/**
> + * struct blk_mq_queue_data - Data about a request inserted in a queue
> + *
> + * @rq:   Data about the block IO request.
> + * @last: If it is the last request in the queue.
> + */
>  struct blk_mq_queue_data {
>  	struct request *rq;
>  	bool last;
> @@ -174,81 +266,101 @@ typedef bool (busy_fn)(struct request_queue *);
>  typedef void (complete_fn)(struct request *);
>  typedef void (cleanup_rq_fn)(struct request *);
>  
> -
> +/**
> + * struct blk_mq_ops - list of callback functions that implements block driver
> + * behaviour.
> + */
>  struct blk_mq_ops {
> -	/*
> -	 * Queue request
> +	/**
> +	 * @queue_rq: Queue a new request from block IO.
>  	 */
>  	queue_rq_fn		*queue_rq;
>  
> -	/*
> -	 * If a driver uses bd->last to judge when to submit requests to
> -	 * hardware, it must define this function. In case of errors that
> -	 * make us stop issuing further requests, this hook serves the
> +	/**
> +	 * @commit_rqs: If a driver uses bd->last to judge when to submit
> +	 * requests to hardware, it must define this function. In case of errors
> +	 * that make us stop issuing further requests, this hook serves the
>  	 * purpose of kicking the hardware (which the last request otherwise
>  	 * would have done).
>  	 */
>  	commit_rqs_fn		*commit_rqs;
>  
> -	/*
> -	 * Reserve budget before queue request, once .queue_rq is
> +	/**
> +	 * @get_budget: Reserve budget before queue request, once .queue_rq is
>  	 * run, it is driver's responsibility to release the
>  	 * reserved budget. Also we have to handle failure case
>  	 * of .get_budget for avoiding I/O deadlock.
>  	 */
>  	get_budget_fn		*get_budget;
> +	/**
> +	 * @put_budget: Release the reserved budget.
> +	 */
>  	put_budget_fn		*put_budget;
>  
> -	/*
> -	 * Called on request timeout
> +	/**
> +	 * @timeout: Called on request timeout.
>  	 */
>  	timeout_fn		*timeout;
>  
> -	/*
> -	 * Called to poll for completion of a specific tag.
> +	/**
> +	 * @poll: Called to poll for completion of a specific tag.
>  	 */
>  	poll_fn			*poll;
>  
> +	/**
> +	 * @complete: Mark the request as complete.
> +	 */
>  	complete_fn		*complete;
>  
> -	/*
> -	 * Called when the block layer side of a hardware queue has been
> -	 * set up, allowing the driver to allocate/init matching structures.
> -	 * Ditto for exit/teardown.
> +	/**
> +	 * @init_hctx: Called when the block layer side of a hardware queue has
> +	 * been set up, allowing the driver to allocate/init matching
> +	 * structures.
>  	 */
>  	init_hctx_fn		*init_hctx;
> +	/**
> +	 * @exit_hctx: Ditto for exit/teardown.
> +	 */
>  	exit_hctx_fn		*exit_hctx;
>  
> -	/*
> -	 * Called for every command allocated by the block layer to allow
> -	 * the driver to set up driver specific data.
> +	/**
> +	 * @init_request: Called for every command allocated by the block layer
> +	 * to allow the driver to set up driver specific data.
>  	 *
>  	 * Tag greater than or equal to queue_depth is for setting up
>  	 * flush request.
> -	 *
> -	 * Ditto for exit/teardown.
>  	 */
>  	init_request_fn		*init_request;
> +	/**
> +	 * @exit_request: Ditto for exit/teardown.
> +	 */
>  	exit_request_fn		*exit_request;
> -	/* Called from inside blk_get_request() */
> +
> +	/**
> +	 * @initialize_rq_fn: Called from inside blk_get_request().
> +	 */
>  	void (*initialize_rq_fn)(struct request *rq);
>  
> -	/*
> -	 * Called before freeing one request which isn't completed yet,
> -	 * and usually for freeing the driver private data
> +	/**
> +	 * @cleanup_rq: Called before freeing one request which isn't completed
> +	 * yet, and usually for freeing the driver private data.
>  	 */
>  	cleanup_rq_fn		*cleanup_rq;
>  
> -	/*
> -	 * If set, returns whether or not this queue currently is busy
> +	/**
> +	 * @busy: If set, returns whether or not this queue currently is busy.
>  	 */
>  	busy_fn			*busy;
>  
> +	/**
> +	 * @map_queues: This allows drivers specify their own queue mapping by
> +	 * overriding the setup-time function that builds the mq_map.
> +	 */
>  	map_queues_fn		*map_queues;
>  
>  #ifdef CONFIG_BLK_DEBUG_FS
> -	/*
> -	 * Used by the debugfs implementation to show driver-specific
> +	/**
> +	 * @show_rq: Used by the debugfs implementation to show driver-specific
>  	 * information about a request.
>  	 */
>  	void (*show_rq)(struct seq_file *m, struct request *rq);
> @@ -391,14 +503,29 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q);
>  
>  unsigned int blk_mq_rq_cpu(struct request *rq);
>  
> -/*
> +/**
> + * blk_mq_rq_from_pdu - cast a PDU to a request
> + * @pdu: the PDU (Protocol Data Unit) to be casted
> + *
> + * Return: request
> + *
>   * Driver command data is immediately after the request. So subtract request
> - * size to get back to the original request, add request size to get the PDU.
> + * size to get back to the original request.
>   */
>  static inline struct request *blk_mq_rq_from_pdu(void *pdu)
>  {
>  	return pdu - sizeof(struct request);
>  }
> +
> +/**
> + * blk_mq_rq_to_pdu - cast a request to a PDU
> + * @rq: the request to be casted
> + *
> + * Return: pointer to the PDU
> + *
> + * Driver command data is immediately after the request. So add request to get
> + * the PDU.
> + */
>  static inline void *blk_mq_rq_to_pdu(struct request *rq)
>  {
>  	return rq + 1;

Bart Van Assche Oct. 8, 2019, 5:30 p.m. UTC | #2

On 10/8/19 9:35 AM, Gabriel Krisman Bertazi wrote:
> André Almeida <andrealmeid@collabora.com> writes:
> 
>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>> index e0fce93ac127..8b745f229789 100644
>> --- a/include/linux/blk-mq.h
>> +++ b/include/linux/blk-mq.h
>> @@ -10,74 +10,153 @@ struct blk_mq_tags;
>>   struct blk_flush_queue;
>>   
>>   /**
>> - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device
>> + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware
>> + * block device
>>    */
>>   struct blk_mq_hw_ctx {
>>   	struct {
>> +		/** @lock: Lock for accessing dispatch queue */
>>   		spinlock_t		lock;
>> +		/**
>> +		 * @dispatch: Queue of dispatched requests, waiting for
>> +		 * workers to send them to the hardware.
>> +		 */
> 
> It's been a few years since I looked at the block layer, but isn't
> this used to hold requests that were taken from the blk_mq_ctx,  but
> couldn't be dispatched because the queue was full?

I don't think so. I think that you are looking for the requeue_list 
member of struct request_queue.

Bart.

Bart Van Assche Oct. 8, 2019, 6:12 p.m. UTC | #3

On 10/7/19 5:14 PM, André Almeida wrote:
>   struct blk_mq_hw_ctx {
>   	struct {
> +		/** @lock: Lock for accessing dispatch queue */
>   		spinlock_t		lock;

This spinlock not only protects dispatch list accesses but also 
modifications of the dispatch list. How about changing that comment into 
"protects the dispatch list"?

> +		/**
> +		 * @dispatch: Queue of dispatched requests, waiting for
> +		 * workers to send them to the hardware.
> +		 */
>   		struct list_head	dispatch;

What is a worker? That word is not used anywhere in the block layer. How 
about changing that comment into "requests owned by this hardware queue"?

> -		unsigned long		state;		/* BLK_MQ_S_* flags */
> +		 /**
> +		  * @state: BLK_MQ_S_* flags. Define the state of the hw
                                               ^^^^^^
                                               Defines?
>   
> +	/**
> +	 * @run_work: Worker for dispatch scheduled requests to hardware.
> +	 * BLK_MQ_CPU_WORK_BATCH workers will be assigned to a CPU, then the
> +	 * following ones will be assigned to the next_cpu.
> +	 */
 >   	struct delayed_work	run_work;

I'd prefer if algorithm details would be left out from structure 
documentation since such documentation becomes easily outdated. How 
about using something like the following description: "used for 
scheduling a hardware queue run at a later time"?

> +	/**
> +	 * @next_cpu: Index of CPU/workqueue to be used in the next batch
> +	 * of workers.
> +	 */

The word "workers" here is confusing. How about the following 
description: "used by blk_mq_hctx_next_cpu() for round-robin CPU 
selection from @cpumask"?

> +	/**
> +	 * @next_cpu_batch: Counter of how many works letf in the batch before
                                                       ^^^^
                                                       left?
> +	 * changing to the next CPU. A batch has the size
> +	 * of BLK_MQ_CPU_WORK_BATCH.
> +	 */
>   	int			next_cpu_batch;

Again, I think this is too much detail about the actual algorithm.

>   
> -	unsigned long		flags;		/* BLK_MQ_F_* flags */
> +	/** @flags: BLK_MQ_F_* flags. Define the behaviour of the queue. */
                                       ^^^^^^
                                       Defines?
> +	unsigned long		flags;
>   
> +	/** @sched_data: Data to support schedulers. */
>   	void			*sched_data;

That's a rather vague description. How about mentioning that this 
pointer is owned by the I/O scheduler that has been attached to a 
request queue and that the I/O scheduler decides how to use this pointer?

> +	/** @queue: Queue of request to be dispatched. */
>   	struct request_queue	*queue;

How about "pointer to the request queue that owns this hardware context"?

> +	/**
> +	 * @ctx_map: Bitmap for each software queue. If bit is on, there is a
> +	 * pending request.
> +	 */
>   	struct sbitmap		ctx_map;

Please add " in that software queue" at the end of the description.

>   
> +	/**
> +	 * @dispatch_from: Queue to be used when there is no scheduler
> +	 * was selected.
> +	 */
>   	struct blk_mq_ctx	*dispatch_from;

Does the word "queue" refer to a request queue, software queue or 
hardware queue? Please make that clear.

> +	/**
> +	 * @dispatch_wait: Waitqueue for dispatched requests. Request here will
> +	 * be processed when percpu_ref_is_zero(&q->q_usage_counter) evaluates
> +	 * true for q as a request_queue.
> +	 */
>   	wait_queue_entry_t	dispatch_wait;

That description doesn't look correct to me. I think that @dispatch_wait 
is used to wait if no tags are available. The comment above 
blk_mq_mark_tag_wait() is as follows:

/*
  * Mark us waiting for a tag. For shared tags, this involves hooking us
  * into the tag wakeups. For non-shared tags, we can simply mark us
  * needing a restart. For both cases, take care to check the condition
  * again after marking us as waiting.
  */

> +	/** @wait_index: Index of next wait queue to be used. */
>   	atomic_t		wait_index;

To be used by what?

> +	/** @tags: Request tags. */
>   	struct blk_mq_tags	*tags;
> +	/** @sched_tags: Request tags for schedulers. */
>   	struct blk_mq_tags	*sched_tags;

I think @tags represents tags owned by the block driver and @sched_tags 
represents tags owned by the I/O scheduler. If no I/O scheduler is 
associated with a request queue, only a driver tag is allocated. If an 
I/O scheduler has been associated with a request queue, a request is 
assigned a tag from @sched_tags when that request is allocated. A tag 
from @tags is only assigned when a request is dispatched to a hardware 
queue. See also blk_mq_get_driver_tag().

> +	/** @nr_active:	Number of active tags. */
>   	atomic_t		nr_active;

Isn't this the number of active requests instead of the number of active 
tags? Please mention that this member is only used when a tag set is 
shared across request queues.

> +/**
> + * struct blk_mq_queue_data - Data about a request inserted in a queue
> + *
> + * @rq:   Data about the block IO request.

How about changing this into "Request pointer"?

> +/**
> + * struct blk_mq_ops - list of callback functions that implements block driver
> + * behaviour.
> + */

Is this really a list?

>    * Driver command data is immediately after the request. So subtract request
> - * size to get back to the original request, add request size to get the PDU.
> + * size to get back to the original request.
>    */
>   static inline struct request *blk_mq_rq_from_pdu(void *pdu)
>   {
>   	return pdu - sizeof(struct request);
>   }

I'm not sure this change is an improvement.

Bart.

Gabriel Krisman Bertazi Oct. 8, 2019, 6:46 p.m. UTC | #4

Bart Van Assche <bvanassche@acm.org> writes:

> On 10/8/19 9:35 AM, Gabriel Krisman Bertazi wrote:
>> André Almeida <andrealmeid@collabora.com> writes:
>>
>>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>>> index e0fce93ac127..8b745f229789 100644
>>> --- a/include/linux/blk-mq.h
>>> +++ b/include/linux/blk-mq.h
>>> @@ -10,74 +10,153 @@ struct blk_mq_tags;
>>>   struct blk_flush_queue;
>>>     /**
>>> - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device
>>> + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware
>>> + * block device
>>>    */
>>>   struct blk_mq_hw_ctx {
>>>   	struct {
>>> +		/** @lock: Lock for accessing dispatch queue */
>>>   		spinlock_t		lock;
>>> +		/**
>>> +		 * @dispatch: Queue of dispatched requests, waiting for
>>> +		 * workers to send them to the hardware.
>>> +		 */
>>
>> It's been a few years since I looked at the block layer, but isn't
>> this used to hold requests that were taken from the blk_mq_ctx,  but
>> couldn't be dispatched because the queue was full?
>
> I don't think so. I think that you are looking for the requeue_list
> member of struct request_queue.
>

Hmm, sorry, but I'm confused.  I'm sure I'm missing something simple,
since I haven't touched this in a while, so maybe you can quickly point
me in the right direction?

I see blk_mq_requeue_request() being used by device drivers to retry
requests that failed, but if I read the code correctly, the flushed
queue seems to be moved to hctx->dispatch when the device
driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to
blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by
the driver on .queue_rq() to signal there was no more resources on the
hardware to service further requests.

Bart Van Assche Oct. 8, 2019, 8:01 p.m. UTC | #5

On 10/8/19 11:46 AM, Gabriel Krisman Bertazi wrote:
> Hmm, sorry, but I'm confused.  I'm sure I'm missing something simple,
> since I haven't touched this in a while, so maybe you can quickly point
> me in the right direction?
> 
> I see blk_mq_requeue_request() being used by device drivers to retry
> requests that failed, but if I read the code correctly, the flushed
> queue seems to be moved to hctx->dispatch when the device
> driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to
> blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by
> the driver on .queue_rq() to signal there was no more resources on the
> hardware to service further requests.

Hi Gabriel,

The simplified version of how requests are requeued as follows:
* A block driver calls blk_mq_requeue_request().
* blk_mq_requeue_request() calls blk_mq_add_to_requeue_list()
* blk_mq_add_to_requeue_list() executes the following code:
     list_add_tail(&rq->queuelist, &q->requeue_list)
* A block driver or the block layer core calls
   blk_mq_kick_requeue_list() or blk_mq_delay_kick_requeue_list(). Both
   functions trigger a call of blk_mq_requeue_work().
* blk_mq_requeue_work() processes q->requeue_list.

Bart.

André Almeida Oct. 10, 2019, 8:38 p.m. UTC | #6

On 10/8/19 5:01 PM, Bart Van Assche wrote:
> On 10/8/19 11:46 AM, Gabriel Krisman Bertazi wrote:
>> Hmm, sorry, but I'm confused.  I'm sure I'm missing something simple,
>> since I haven't touched this in a while, so maybe you can quickly point
>> me in the right direction?
>>
>> I see blk_mq_requeue_request() being used by device drivers to retry
>> requests that failed, but if I read the code correctly, the flushed
>> queue seems to be moved to hctx->dispatch when the device
>> driver returned BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE to
>> blk_mq_dispatch_rq_list(). I thought BLK_STS_RESOURCE was returned by
>> the driver on .queue_rq() to signal there was no more resources on the
>> hardware to service further requests.
> 
> Hi Gabriel,
> 
> The simplified version of how requests are requeued as follows:
> * A block driver calls blk_mq_requeue_request().
> * blk_mq_requeue_request() calls blk_mq_add_to_requeue_list()
> * blk_mq_add_to_requeue_list() executes the following code:
>     list_add_tail(&rq->queuelist, &q->requeue_list)
> * A block driver or the block layer core calls
>   blk_mq_kick_requeue_list() or blk_mq_delay_kick_requeue_list(). Both
>   functions trigger a call of blk_mq_requeue_work().
> * blk_mq_requeue_work() processes q->requeue_list.
> 
> Bart.
> 
> 

Hello Bart,

Seems that it's not clear for me the role of these members. Could you
please check if those definitions make sense for you?

- hctx->dispatch: This queue is used for requests that are ready to be
dispatched to the hardware but for some reason (e.g. lack of resources,
the hardware is to busy and can't get more requests) could not be sent
to the hardware. As soon as the driver can send new requests, those
queued at this list will be sent first for a more fair dispatch. Since
those requests are at the hctx, they can't be requeued or rescheduled
anymore.

- request_queue->requeue_list: This list is used when it's not possible
to send the request to the associated hctx. This can happen if the
associated CPU or hctx is not available anymore. When requeueing those
requests, it will be possible to send them to new and function queues.

Thanks,
	André

André Almeida Oct. 10, 2019, 8:41 p.m. UTC | #7

Hello Bart,

On 10/8/19 3:12 PM, Bart Van Assche wrote:
> On 10/7/19 5:14 PM, André Almeida wrote:
...
> 
> Bart.

Thanks you for your extensive and detailed feedback. I'm applying those
changes for v3.

André

Bart Van Assche Oct. 11, 2019, 5 p.m. UTC | #8

On 10/10/19 1:38 PM, André Almeida wrote:
> Seems that it's not clear for me the role of these members. Could you
> please check if those definitions make sense for you?
> 
> - hctx->dispatch: This queue is used for requests that are ready to be
> dispatched to the hardware but for some reason (e.g. lack of resources,
> the hardware is to busy and can't get more requests) could not be sent
> to the hardware. As soon as the driver can send new requests, those
> queued at this list will be sent first for a more fair dispatch. Since
> those requests are at the hctx, they can't be requeued or rescheduled
> anymore.
> 
> - request_queue->requeue_list: This list is used when it's not possible
> to send the request to the associated hctx. This can happen if the
> associated CPU or hctx is not available anymore. When requeueing those
> requests, it will be possible to send them to new and function queues.

Hi André,

The hctx->dispatch description looks mostly fine. Can the following part 
be left out since it looks confusing to me: "Since those requests are at 
the hctx, they can't be requeued or rescheduled anymore."

How about changing the requeue_list description into the following: 
"requests requeued by a call to blk_mq_requeue_request()".

Thanks,

Bart.

[v2,1/1] blk-mq: fill header with kernel-doc

Commit Message

Comments

Patch