From patchwork Tue Jun 19 12:28:46 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Lucas Stach <l.stach@pengutronix.de>
X-Patchwork-Id: 10474205
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	2AE7A6029B for <patchwork-dri-devel@patchwork.kernel.org>;
	Tue, 19 Jun 2018 12:28:55 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1508E28B7E
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Tue, 19 Jun 2018 12:28:55 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 097F628B80; Tue, 19 Jun 2018 12:28:55 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6BD3D28B79
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Tue, 19 Jun 2018 12:28:54 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id DF8A889939;
	Tue, 19 Jun 2018 12:28:51 +0000 (UTC)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de
	[IPv6:2001:67c:670:201:290:27ff:fe1d:cc33])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 99AF589939
	for <dri-devel@lists.freedesktop.org>;
	Tue, 19 Jun 2018 12:28:50 +0000 (UTC)
Received: from kresse.hi.pengutronix.de ([2001:67c:670:100:1d::2a])
	by metis.ext.pengutronix.de with esmtp (Exim 4.89)
	(envelope-from <l.stach@pengutronix.de>)
	id 1fVFkd-00058y-CN; Tue, 19 Jun 2018 14:28:47 +0200
Message-ID: <1529411326.7211.19.camel@pengutronix.de>
Subject: Re: [BUG 4.17] etnaviv-gpu f1840000.gpu: recover hung GPU!
From: Lucas Stach <l.stach@pengutronix.de>
To: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Tue, 19 Jun 2018 14:28:46 +0200
In-Reply-To: <20180619114200.GG17671@n2100.armlinux.org.uk>
References: <20180619094303.GE17671@n2100.armlinux.org.uk>
	<1529402956.7211.14.camel@pengutronix.de>
	<20180619110021.GF17671@n2100.armlinux.org.uk>
	<1529406689.7211.16.camel@pengutronix.de>
	<20180619114200.GG17671@n2100.armlinux.org.uk>
X-Mailer: Evolution 3.22.6-1+deb9u1 
Mime-Version: 1.0
X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::2a
X-SA-Exim-Mail-From: l.stach@pengutronix.de
X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de);
	SAEximRunCond expanded to false
X-PTX-Original-Recipient: dri-devel@lists.freedesktop.org
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
	<dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: kernel@pengutronix.de, etnaviv@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, patchwork-lst@pengutronix.de
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Am Dienstag, den 19.06.2018, 12:42 +0100 schrieb Russell King - ARM Linux:
> On Tue, Jun 19, 2018 at 01:11:29PM +0200, Lucas Stach wrote:
> > Am Dienstag, den 19.06.2018, 12:00 +0100 schrieb Russell King - ARM Linux:
> > > No, it's not "a really big job" - it's just that the Dove GC600 is not
> > > fast enough to complete _two_ 1080p sized GPU operations within 500ms.
> > > The preceeding job contained two blits - one of them a non-alphablend
> > > copy of:
> > > 
> > >                 00180000 04200780  0,24,1920,1056 -> 0,24,1920,1056
> > > 
> > > and one an alpha blended copy of:
> > > 
> > >                 00000000 04380780  0,0,1920,1080 -> 0,0,1920,1080
> > > 
> > > This is (iirc) something I already fixed with the addition of the
> > > progress detection back before etnaviv was merged into the mainline
> > > kernel.
> > 
> > I hadn't expected it to be this slow. I see that we might need to bring
> > back the progress detection to fix the userspace regression, but I'm
> > not fond of this, as it might lead to really bad QoS.
> 
> Well, the choices are that or worse overall performance through having
> to ignore the GPU entirely.
> 
> > I would prefer userspace tracking the size of the blits and flushing
> > the cmdstream at an appropriate time, so we don't end up with really
> > long running jobs, but I'm not sure if this would be acceptable to
> > you...
> 
> The question becomes how to split up two operations.  Yes, we could
> submit them individually, but if they're together taking in excess of
> 500ms, then it's likely that individually, each operation will take in
> excess of 250ms which is still a long time.
> 
> In any case, I think we need to fix this for 4.17-stable and then try
> to work (a) which operations are taking a long time, and (b) how to
> solve this issue.

Agreed. I'll look into bringing back the process detection for 4.17
stable.

I'm still curious why the GC600 on the Dove is that slow. With
performance like this moving a big(ish) window on the screen must be a
horrible user experience.

> Do we have any way to track how long each submitted job has actually
> taken on the GPU?  (Eg, by recording the times that we receive the
> events?)  It wouldn't be very accurate for small jobs, but given this
> operation is taking so long, it would give an indication of how long
> this operation is actually taking.  etnaviv doesn't appear to have
> any tracepoints, which would've been ideal for that.  Maybe this is
> a reason to add some? ;)

See attached patch (which I apparently forgot to send out). The DRM GPU
scheduler has some tracepoints, which might be helpful. The attached
patch adds a drm_sched_job_run tracepoint when a job is queued in the
hardware ring. Together with the existing drm_sched_process_job, this
should get you an idea how long a job takes to process. Note that at
any time up to 4 jobs are allowed in the hardware queue, so you need to
match up the end times.

Regards,
Lucas

From a9ec48d1eecddcc95018ad37ebdf154ffa7ce9a4 Mon Sep 17 00:00:00 2001
From: Lucas Stach <l.stach@pengutronix.de>
Date: Fri, 8 Dec 2017 18:35:43 +0100
Subject: [PATCH] drm/sched: add tracepoint for job run

When tracing GPU execution it is very interesting to know when the job gets
dequeued from the software queue and added to the hardware ring. Add a
tracepoint to allow easy access to this information.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c |  1 +
 include/drm/gpu_scheduler_trace.h         | 27 +++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 0d95888ccc3e..ceecaef67801 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -666,6 +666,7 @@ static int drm_sched_main(void *param)
 		drm_sched_job_begin(sched_job);
 
 		fence = sched->ops->run_job(sched_job);
+		trace_drm_sched_job_run(sched_job, entity);
 		drm_sched_fence_scheduled(s_fence);
 
 		if (fence) {
diff --git a/include/drm/gpu_scheduler_trace.h b/include/drm/gpu_scheduler_trace.h
index 0789e8d0a0e1..c4d83857ae00 100644
--- a/include/drm/gpu_scheduler_trace.h
+++ b/include/drm/gpu_scheduler_trace.h
@@ -61,6 +61,33 @@ TRACE_EVENT(drm_sched_job,
 		      __entry->job_count, __entry->hw_job_count)
 );
 
+TRACE_EVENT(drm_sched_job_run,
+	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
+	    TP_ARGS(sched_job, entity),
+	    TP_STRUCT__entry(
+			     __field(struct drm_sched_entity *, entity)
+			     __field(struct dma_fence *, fence)
+			     __field(const char *, name)
+			     __field(uint64_t, id)
+			     __field(u32, job_count)
+			     __field(int, hw_job_count)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->entity = entity;
+			   __entry->id = sched_job->id;
+			   __entry->fence = &sched_job->s_fence->finished;
+			   __entry->name = sched_job->sched->name;
+			   __entry->job_count = spsc_queue_count(&entity->job_queue);
+			   __entry->hw_job_count = atomic_read(
+				   &sched_job->sched->hw_rq_count);
+			   ),
+	    TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
+		      __entry->entity, __entry->id,
+		      __entry->fence, __entry->name,
+		      __entry->job_count, __entry->hw_job_count)
+);
+
 TRACE_EVENT(drm_sched_process_job,
 	    TP_PROTO(struct drm_sched_fence *fence),
 	    TP_ARGS(fence),
-- 
2.17.1