diff mbox series

[v2,2/6] drm/panfrost: Add fdinfo support GPU load metrics

Message ID 20230824013604.466224-3-adrian.larumbe@collabora.com (mailing list archive)
State Superseded
Headers show
Series Add fdinfo support to Panfrost | expand

Commit Message

Adrián Larumbe Aug. 24, 2023, 1:34 a.m. UTC
The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
 6 files changed, 102 insertions(+), 1 deletion(-)

Comments

kernel test robot Aug. 24, 2023, 4:12 a.m. UTC | #1
Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241240.ngAywBMr-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/panfrost/panfrost_drv.c: In function 'panfrost_gpu_show_fdinfo':
>> drivers/gpu/drm/panfrost/panfrost_drv.c:551:50: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' [-Wformat=]
     551 |                 drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
         |                                                 ~^
         |                                                  |
         |                                                  unsigned int
         |                                                 %lu
     552 |                            ei->name, pfdev->pfdevfreq.current_frequency);
         |                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                                      |
         |                                                      long unsigned int


vim +551 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
 > 551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
   552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555
Boris Brezillon Aug. 30, 2023, 10:17 a.m. UTC | #2
On Thu, 24 Aug 2023 02:34:45 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

Pretty sure this has already been discussed, but it's probably worth
mentioning that drm-cycles is not accurate, it just gives you a rough
idea of how much GPU cycles were dedicated to a context (just like
drm-engine elapsed-ns is giving you an approximation of the
GPU utilization). This comes from 2 factors:

1. We're dependent on the time the kernel/CPU takes to process the GPU
interrupt.
2. The pipelining done by the Job Manager (2 job slots per engine)
implies that you can't really know how much time each job spent on the
GPU. When these jobs are coming from the same context, that's not a
problem, but when they don't, it's impossible to have a clear split.

I'd really like to have that mentioned somewhere in the code+commit
message to lower users expectation.

> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> index 58dfb15a8757..28caffc689e2 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>  
>  	panfrost_devfreq_update_utilization(pfdevfreq);
> +	pfdevfreq->current_frequency = status->current_frequency;
>  
>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>  						   pfdevfreq->idle_time));
> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  	struct devfreq *devfreq;
>  	struct thermal_cooling_device *cooling;
>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
> +	unsigned long freq = ULONG_MAX;
>  
>  	if (pfdev->comp->num_supplies > 1) {
>  		/*
> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  		return ret;
>  	}
>  
> +	/* Find the fastest defined rate  */
> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
> +	if (IS_ERR(opp))
> +		return PTR_ERR(opp);
> +	pfdevfreq->fast_rate = freq;
> +
>  	dev_pm_opp_put(opp);
>  
>  	/*
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> index 1514c1f9d91c..48dbe185f206 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>  	struct devfreq_simple_ondemand_data gov_data;
>  	bool opp_of_table_added;
>  
> +	unsigned long current_frequency;
> +	unsigned long fast_rate;
> +
>  	ktime_t busy_time;
>  	ktime_t idle_time;
>  	ktime_t time_last_update;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index b0126b9fbadc..680f298fd1a9 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>  
>  #define NUM_JOB_SLOTS 3
>  #define MAX_PM_DOMAINS 5
> +#define MAX_SLOT_NAME_LEN 10
>  
>  struct panfrost_features {
>  	u16 id;
> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>  	struct list_head list;
>  };
>  
> +struct drm_info_gpu {
> +	unsigned int maxfreq;
> +
> +	struct engine_info {
> +		unsigned long long elapsed_ns;
> +		unsigned long long cycles;
> +		char name[MAX_SLOT_NAME_LEN];
> +	} engines[NUM_JOB_SLOTS];
> +};
> +
>  struct panfrost_file_priv {
>  	struct panfrost_device *pfdev;
>  
>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>  
>  	struct panfrost_mmu *mmu;
> +
> +	struct drm_info_gpu fdinfo;
>  };
>  
>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;

Uh, I'm not comfortable passing the file context here, unless you reset
it to NULL in panfrost_job_close() and have code that's robust to
job->priv being NULL. We've had cases in the past where jobs outlived
the file context itself.

>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif
> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
> +
>  	ret = panfrost_job_open(panfrost_priv);
>  	if (ret)
>  		goto err_job;
> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>  };
>  
> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
> +
> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
> +				     struct panfrost_file_priv *panfrost_priv,
> +				     struct drm_printer *p)
> +{
> +	int i;
> +
> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
> +
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> +			   ei->name, ei->elapsed_ns);
> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
> +			   ei->name, ei->cycles);
> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
> +			   ei->name, pfdev->pfdevfreq.current_frequency);
> +	}
> +}
> +
> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_device *dev = file->minor->dev;
> +	struct panfrost_device *pfdev = dev->dev_private;
> +
> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> +}
> +
> +static const struct file_operations panfrost_drm_driver_fops = {
> +	.owner = THIS_MODULE,
> +	DRM_GEM_FOPS,
> +	.show_fdinfo = drm_show_fdinfo,
> +};
>  
>  /*
>   * Panfrost driver version:
> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>  	.open			= panfrost_open,
>  	.postclose		= panfrost_postclose,
> +	.show_fdinfo		= panfrost_show_fdinfo,
>  	.ioctls			= panfrost_drm_driver_ioctls,
>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>  	.fops			= &panfrost_drm_driver_fops,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index dbc597ab46fb..a847e183b5d0 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>  }
>  
> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
> +{
> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
> +
> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
> +

We probably want to handle the 32-bit overflow case with something like:

	u32 hi, lo;

	do {
		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));

	return ((u64)hi << 32) | lo;

> +	return address;
> +}
> +
>  static struct panfrost_job *
>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>  {
>  	struct panfrost_job *job = pfdev->jobs[slot][0];
> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
> +
> +	engine_info->elapsed_ns +=
> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
> +	engine_info->cycles +=
> +		read_cycles(pfdev) - job->start_cycles;
> +
> +	/* Reset in case the job has to be requeued */
> +	job->start_time = 0;
> +	/* A GPU reset puts the Cycle Counter register back to 0 */
> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
> +		0 : read_cycles(pfdev);

Do we need to reset these values? If the jobs are re-submitted, those
fields will be re-assigned, and if the job is done, I don't see where
we're using it after that point (might have missed something).

>  
>  	WARN_ON(!job);
>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>  	/* Don't queue the job if a reset is in progress */
>  	if (!atomic_read(&pfdev->reset.pending)) {
> +		job->start_time = ktime_get();
> +		job->start_cycles = read_cycles(pfdev);
> +
>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>  		dev_dbg(pfdev->dev,
>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>  
>  	kref_get(&job->refcount); /* put by scheduler job completion */
>  
> +	if (panfrost_job_is_idle(pfdev))
> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
> +
>  	drm_sched_entity_push_job(&job->base);
>  
>  	mutex_unlock(&pfdev->sched_lock);
> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>  
>  	drm_sched_job_cleanup(sched_job);
>  
> +	if (panfrost_job_is_idle(job->pfdev))
> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
> +
>  	panfrost_job_put(job);
>  }
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
> index 8becc1ba0eb9..038171c39dd8 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
> @@ -32,6 +32,10 @@ struct panfrost_job {
>  
>  	/* Fence to be signaled by drm-sched once its done with the job */
>  	struct dma_fence *render_done_fence;
> +
> +	struct panfrost_file_priv *priv;
> +	ktime_t start_time;
> +	u64 start_cycles;
>  };
>  
>  int panfrost_job_init(struct panfrost_device *pfdev);
Steven Price Aug. 31, 2023, 3:54 p.m. UTC | #3
On 24/08/2023 02:34, Adrián Larumbe wrote:
> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 

[...]

> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;
>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif

I'm not sure what names are best, but slot 2 isn't actually a compute slot.

Slot 0 is fragment, that name is fine.

Slot 1 and 2 are actually the same (from a hardware perspective) but the
core affinity of the two slots cannot overlap which means you need to
divide the GPU in two to usefully use both slots. The only GPU that this
actually makes sense for is the T628[1] as it has two (non-coherent)
core groups.

The upshot is that slot 1 is used for all of vertex, tiling and compute.
Slot 2 is currently never used, but kbase will use it only for compute
(and only on the two core group GPUs).

Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
but I appreciate that's not very helpful to people who aren't intimately
familiar with the hardware ;)

Steve

[1] And technically the T608 but that's even rarer and the T60x isn't
(yet) supported by Panfrost.
Adrián Larumbe Aug. 31, 2023, 9:34 p.m. UTC | #4
On 31.08.2023 16:54, Steven Price wrote:
>On 24/08/2023 02:34, Adrián Larumbe wrote:
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>
>[...]
>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>
>I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>
>Slot 0 is fragment, that name is fine.
>
>Slot 1 and 2 are actually the same (from a hardware perspective) but the
>core affinity of the two slots cannot overlap which means you need to
>divide the GPU in two to usefully use both slots. The only GPU that this
>actually makes sense for is the T628[1] as it has two (non-coherent)
>core groups.
>
>The upshot is that slot 1 is used for all of vertex, tiling and compute.
>Slot 2 is currently never used, but kbase will use it only for compute
>(and only on the two core group GPUs).

I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:

int panfrost_job_get_slot(struct panfrost_job *job)
{
	/* JS0: fragment jobs.
	 * JS1: vertex/tiler jobs
	 * JS2: compute jobs
	 */
         [...]
}

Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
There's no reason why I would skimp on engine name length, and anything more
descriptive would be just as good.

>Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>but I appreciate that's not very helpful to people who aren't intimately
>familiar with the hardware ;)

The downside of this is that both IGT's fdinfo library and nvtop will use the
engime name for display, and like you said these numbers might mean nothing to
someone who isn't acquainted with the hardware.

>Steve
>
>[1] And technically the T608 but that's even rarer and the T60x isn't
>(yet) supported by Panfrost.
Adrián Larumbe Aug. 31, 2023, 11:23 p.m. UTC | #5
On 30.08.2023 12:17, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:45 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>
>Pretty sure this has already been discussed, but it's probably worth
>mentioning that drm-cycles is not accurate, it just gives you a rough
>idea of how much GPU cycles were dedicated to a context (just like
>drm-engine elapsed-ns is giving you an approximation of the
>GPU utilization). This comes from 2 factors:
>
>1. We're dependent on the time the kernel/CPU takes to process the GPU
>interrupt.
>2. The pipelining done by the Job Manager (2 job slots per engine)
>implies that you can't really know how much time each job spent on the
>GPU. When these jobs are coming from the same context, that's not a
>problem, but when they don't, it's impossible to have a clear split.
>
>I'd really like to have that mentioned somewhere in the code+commit
>message to lower users expectation.
>
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> index 58dfb15a8757..28caffc689e2 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>>  
>>  	panfrost_devfreq_update_utilization(pfdevfreq);
>> +	pfdevfreq->current_frequency = status->current_frequency;
>>  
>>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>>  						   pfdevfreq->idle_time));
>> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  	struct devfreq *devfreq;
>>  	struct thermal_cooling_device *cooling;
>>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
>> +	unsigned long freq = ULONG_MAX;
>>  
>>  	if (pfdev->comp->num_supplies > 1) {
>>  		/*
>> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  		return ret;
>>  	}
>>  
>> +	/* Find the fastest defined rate  */
>> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
>> +	if (IS_ERR(opp))
>> +		return PTR_ERR(opp);
>> +	pfdevfreq->fast_rate = freq;
>> +
>>  	dev_pm_opp_put(opp);
>>  
>>  	/*
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> index 1514c1f9d91c..48dbe185f206 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>>  	struct devfreq_simple_ondemand_data gov_data;
>>  	bool opp_of_table_added;
>>  
>> +	unsigned long current_frequency;
>> +	unsigned long fast_rate;
>> +
>>  	ktime_t busy_time;
>>  	ktime_t idle_time;
>>  	ktime_t time_last_update;
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
>> index b0126b9fbadc..680f298fd1a9 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
>> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>>  
>>  #define NUM_JOB_SLOTS 3
>>  #define MAX_PM_DOMAINS 5
>> +#define MAX_SLOT_NAME_LEN 10
>>  
>>  struct panfrost_features {
>>  	u16 id;
>> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>>  	struct list_head list;
>>  };
>>  
>> +struct drm_info_gpu {
>> +	unsigned int maxfreq;
>> +
>> +	struct engine_info {
>> +		unsigned long long elapsed_ns;
>> +		unsigned long long cycles;
>> +		char name[MAX_SLOT_NAME_LEN];
>> +	} engines[NUM_JOB_SLOTS];
>> +};
>> +
>>  struct panfrost_file_priv {
>>  	struct panfrost_device *pfdev;
>>  
>>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>>  
>>  	struct panfrost_mmu *mmu;
>> +
>> +	struct drm_info_gpu fdinfo;
>>  };
>>  
>>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>
>Uh, I'm not comfortable passing the file context here, unless you reset
>it to NULL in panfrost_job_close() and have code that's robust to
>job->priv being NULL. We've had cases in the past where jobs outlived
>the file context itself.
>
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
>> +
>>  	ret = panfrost_job_open(panfrost_priv);
>>  	if (ret)
>>  		goto err_job;
>> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>>  };
>>  
>> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
>> +
>> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
>> +				     struct panfrost_file_priv *panfrost_priv,
>> +				     struct drm_printer *p)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
>> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
>> +
>> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>> +			   ei->name, ei->elapsed_ns);
>> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
>> +			   ei->name, ei->cycles);
>> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
>> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
>> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
>> +			   ei->name, pfdev->pfdevfreq.current_frequency);
>> +	}
>> +}
>> +
>> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>> +{
>> +	struct drm_device *dev = file->minor->dev;
>> +	struct panfrost_device *pfdev = dev->dev_private;
>> +
>> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>> +}
>> +
>> +static const struct file_operations panfrost_drm_driver_fops = {
>> +	.owner = THIS_MODULE,
>> +	DRM_GEM_FOPS,
>> +	.show_fdinfo = drm_show_fdinfo,
>> +};
>>  
>>  /*
>>   * Panfrost driver version:
>> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>>  	.open			= panfrost_open,
>>  	.postclose		= panfrost_postclose,
>> +	.show_fdinfo		= panfrost_show_fdinfo,
>>  	.ioctls			= panfrost_drm_driver_ioctls,
>>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>>  	.fops			= &panfrost_drm_driver_fops,
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index dbc597ab46fb..a847e183b5d0 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>>  }
>>  
>> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
>> +{
>> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
>> +
>> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>> +
>
>We probably want to handle the 32-bit overflow case with something like:
>
>	u32 hi, lo;
>
>	do {
>		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
>		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));
>
>	return ((u64)hi << 32) | lo;
>
>> +	return address;
>> +}
>> +
>>  static struct panfrost_job *
>>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>>  {
>>  	struct panfrost_job *job = pfdev->jobs[slot][0];
>> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
>> +
>> +	engine_info->elapsed_ns +=
>> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
>> +	engine_info->cycles +=
>> +		read_cycles(pfdev) - job->start_cycles;
>> +
>> +	/* Reset in case the job has to be requeued */
>> +	job->start_time = 0;
>> +	/* A GPU reset puts the Cycle Counter register back to 0 */
>> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
>> +		0 : read_cycles(pfdev);
>
>Do we need to reset these values? If the jobs are re-submitted, those
>fields will be re-assigned, and if the job is done, I don't see where
>we're using it after that point (might have missed something).

I did this because from the third loop in panfrost_job_handle_irq, I got the
impression that when a job in the second slot is stopped after the one in the
first one fails, then it's requeued and started immediately without involvement
from the drm scheduler, so in this case panfrost_job_hw_submit wouldn't be
called. At the moment the initial sample of cycles and time is done in that function.

>>  
>>  	WARN_ON(!job);
>>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
>> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>>  	/* Don't queue the job if a reset is in progress */
>>  	if (!atomic_read(&pfdev->reset.pending)) {
>> +		job->start_time = ktime_get();
>> +		job->start_cycles = read_cycles(pfdev);
>> +
>>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>>  		dev_dbg(pfdev->dev,
>>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
>> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>>  
>>  	kref_get(&job->refcount); /* put by scheduler job completion */
>>  
>> +	if (panfrost_job_is_idle(pfdev))
>> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
>> +
>>  	drm_sched_entity_push_job(&job->base);
>>  
>>  	mutex_unlock(&pfdev->sched_lock);
>> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>>  
>>  	drm_sched_job_cleanup(sched_job);
>>  
>> +	if (panfrost_job_is_idle(job->pfdev))
>> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
>> +
>>  	panfrost_job_put(job);
>>  }
>>  
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
>> index 8becc1ba0eb9..038171c39dd8 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
>> @@ -32,6 +32,10 @@ struct panfrost_job {
>>  
>>  	/* Fence to be signaled by drm-sched once its done with the job */
>>  	struct dma_fence *render_done_fence;
>> +
>> +	struct panfrost_file_priv *priv;
>> +	ktime_t start_time;
>> +	u64 start_cycles;
>>  };
>>  
>>  int panfrost_job_init(struct panfrost_device *pfdev);
kernel test robot Sept. 2, 2023, 3:20 a.m. UTC | #6
Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5 next-20230831]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: s390-randconfig-001-20230902 (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309021155.i3NPUDJi-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> drivers/gpu/drm/panfrost/panfrost_drv.c:552:17: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
                              ei->name, pfdev->pfdevfreq.current_frequency);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   13 warnings generated.


vim +552 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
   551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
 > 552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555
Steven Price Sept. 4, 2023, 8:22 a.m. UTC | #7
On 31/08/2023 22:34, Adrián Larumbe wrote:
> On 31.08.2023 16:54, Steven Price wrote:
>> On 24/08/2023 02:34, Adrián Larumbe wrote:
>>> The drm-stats fdinfo tags made available to user space are drm-engine,
>>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>>>
>>> This deviates from standard practice in other DRM drivers, where a single
>>> set of key:value pairs is provided for the whole render engine. However,
>>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>>> decision was made to calculate bus cycles and workload times separately.
>>>
>>> Maximum operating frequency is calculated at devfreq initialisation time.
>>> Current frequency is made available to user space because nvtop uses it
>>> when performing engine usage calculations.
>>>
>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>>  6 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> index a2ab99698ca8..3fd372301019 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>>  	job->requirements = args->requirements;
>>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>>  	job->mmu = file_priv->mmu;
>>> +	job->priv = file_priv;
>>>  
>>>  	slot = panfrost_job_get_slot(job);
>>>  
>>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>>  		goto err_free;
>>>  	}
>>>  
>>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>>> +#if 0
>>> +	/* Add compute engine in the future */
>>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>>> +#endif
>>
>> I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>>
>> Slot 0 is fragment, that name is fine.
>>
>> Slot 1 and 2 are actually the same (from a hardware perspective) but the
>> core affinity of the two slots cannot overlap which means you need to
>> divide the GPU in two to usefully use both slots. The only GPU that this
>> actually makes sense for is the T628[1] as it has two (non-coherent)
>> core groups.
>>
>> The upshot is that slot 1 is used for all of vertex, tiling and compute.
>> Slot 2 is currently never used, but kbase will use it only for compute
>> (and only on the two core group GPUs).
> 
> I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:
> 
> int panfrost_job_get_slot(struct panfrost_job *job)
> {
> 	/* JS0: fragment jobs.
> 	 * JS1: vertex/tiler jobs
> 	 * JS2: compute jobs
> 	 */
>          [...]
> }
> 
> Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
> There's no reason why I would skimp on engine name length, and anything more
> descriptive would be just as good.

Yeah, those names are probably the best we're going to get. And I
certainly prefer the longer names.

>> Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>> but I appreciate that's not very helpful to people who aren't intimately
>> familiar with the hardware ;)
> 
> The downside of this is that both IGT's fdinfo library and nvtop will use the
> engime name for display, and like you said these numbers might mean nothing to
> someone who isn't acquainted with the hardware.

Indeed - I've spent way too much time with the hardware and there are
many subtleties so I tent to try to avoid calling them anything other
than "slot x" (especially when talking to hardware engineers). For
example a test that submits NULL jobs can submit them to any slot.
However, when you get beyond artificial tests then it is quite
consistent that slot 0=fragment, slot 1=vertex-tiler (and compute), slot
2=never used (except for compute on dual core groups).

Steve

>> Steve
>>
>> [1] And technically the T608 but that's even rarer and the T60x isn't
>> (yet) supported by Panfrost.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@  static int panfrost_devfreq_get_dev_status(struct device *dev,
 	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
 
 	panfrost_devfreq_update_utilization(pfdevfreq);
+	pfdevfreq->current_frequency = status->current_frequency;
 
 	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
 						   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@  int panfrost_devfreq_init(struct panfrost_device *pfdev)
 	struct devfreq *devfreq;
 	struct thermal_cooling_device *cooling;
 	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
+	unsigned long freq = ULONG_MAX;
 
 	if (pfdev->comp->num_supplies > 1) {
 		/*
@@ -172,6 +174,12 @@  int panfrost_devfreq_init(struct panfrost_device *pfdev)
 		return ret;
 	}
 
+	/* Find the fastest defined rate  */
+	opp = dev_pm_opp_find_freq_floor(dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+	pfdevfreq->fast_rate = freq;
+
 	dev_pm_opp_put(opp);
 
 	/*
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
index 1514c1f9d91c..48dbe185f206 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
@@ -19,6 +19,9 @@  struct panfrost_devfreq {
 	struct devfreq_simple_ondemand_data gov_data;
 	bool opp_of_table_added;
 
+	unsigned long current_frequency;
+	unsigned long fast_rate;
+
 	ktime_t busy_time;
 	ktime_t idle_time;
 	ktime_t time_last_update;
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..680f298fd1a9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -24,6 +24,7 @@  struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 5
+#define MAX_SLOT_NAME_LEN 10
 
 struct panfrost_features {
 	u16 id;
@@ -135,12 +136,24 @@  struct panfrost_mmu {
 	struct list_head list;
 };
 
+struct drm_info_gpu {
+	unsigned int maxfreq;
+
+	struct engine_info {
+		unsigned long long elapsed_ns;
+		unsigned long long cycles;
+		char name[MAX_SLOT_NAME_LEN];
+	} engines[NUM_JOB_SLOTS];
+};
+
 struct panfrost_file_priv {
 	struct panfrost_device *pfdev;
 
 	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
 
 	struct panfrost_mmu *mmu;
+
+	struct drm_info_gpu fdinfo;
 };
 
 static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..3fd372301019 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -267,6 +267,7 @@  static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 	job->requirements = args->requirements;
 	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
 	job->mmu = file_priv->mmu;
+	job->priv = file_priv;
 
 	slot = panfrost_job_get_slot(job);
 
@@ -483,6 +484,14 @@  panfrost_open(struct drm_device *dev, struct drm_file *file)
 		goto err_free;
 	}
 
+	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
+	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
+#if 0
+	/* Add compute engine in the future */
+	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
+#endif
+	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
+
 	ret = panfrost_job_open(panfrost_priv);
 	if (ret)
 		goto err_job;
@@ -523,7 +532,40 @@  static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
 	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
 };
 
-DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
+
+static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
+				     struct panfrost_file_priv *panfrost_priv,
+				     struct drm_printer *p)
+{
+	int i;
+
+	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
+
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
+			   ei->name, ei->elapsed_ns);
+		drm_printf(p, "drm-cycles-%s:\t%llu\n",
+			   ei->name, ei->cycles);
+		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
+			   ei->name, panfrost_priv->fdinfo.maxfreq);
+		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
+			   ei->name, pfdev->pfdevfreq.current_frequency);
+	}
+}
+
+static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_device *dev = file->minor->dev;
+	struct panfrost_device *pfdev = dev->dev_private;
+
+	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+}
+
+static const struct file_operations panfrost_drm_driver_fops = {
+	.owner = THIS_MODULE,
+	DRM_GEM_FOPS,
+	.show_fdinfo = drm_show_fdinfo,
+};
 
 /*
  * Panfrost driver version:
@@ -535,6 +577,7 @@  static const struct drm_driver panfrost_drm_driver = {
 	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
 	.open			= panfrost_open,
 	.postclose		= panfrost_postclose,
+	.show_fdinfo		= panfrost_show_fdinfo,
 	.ioctls			= panfrost_drm_driver_ioctls,
 	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
 	.fops			= &panfrost_drm_driver_fops,
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index dbc597ab46fb..a847e183b5d0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -153,10 +153,31 @@  panfrost_get_job_chain_flag(const struct panfrost_job *job)
 	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
 }
 
+static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
+{
+	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
+
+	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
+
+	return address;
+}
+
 static struct panfrost_job *
 panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
 {
 	struct panfrost_job *job = pfdev->jobs[slot][0];
+	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
+
+	engine_info->elapsed_ns +=
+		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
+	engine_info->cycles +=
+		read_cycles(pfdev) - job->start_cycles;
+
+	/* Reset in case the job has to be requeued */
+	job->start_time = 0;
+	/* A GPU reset puts the Cycle Counter register back to 0 */
+	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
+		0 : read_cycles(pfdev);
 
 	WARN_ON(!job);
 	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
@@ -233,6 +254,9 @@  static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 	subslot = panfrost_enqueue_job(pfdev, js, job);
 	/* Don't queue the job if a reset is in progress */
 	if (!atomic_read(&pfdev->reset.pending)) {
+		job->start_time = ktime_get();
+		job->start_cycles = read_cycles(pfdev);
+
 		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
 		dev_dbg(pfdev->dev,
 			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
@@ -297,6 +321,9 @@  int panfrost_job_push(struct panfrost_job *job)
 
 	kref_get(&job->refcount); /* put by scheduler job completion */
 
+	if (panfrost_job_is_idle(pfdev))
+		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
+
 	drm_sched_entity_push_job(&job->base);
 
 	mutex_unlock(&pfdev->sched_lock);
@@ -351,6 +378,9 @@  static void panfrost_job_free(struct drm_sched_job *sched_job)
 
 	drm_sched_job_cleanup(sched_job);
 
+	if (panfrost_job_is_idle(job->pfdev))
+		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
+
 	panfrost_job_put(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index 8becc1ba0eb9..038171c39dd8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -32,6 +32,10 @@  struct panfrost_job {
 
 	/* Fence to be signaled by drm-sched once its done with the job */
 	struct dma_fence *render_done_fence;
+
+	struct panfrost_file_priv *priv;
+	ktime_t start_time;
+	u64 start_cycles;
 };
 
 int panfrost_job_init(struct panfrost_device *pfdev);