From patchwork Sat Aug 24 19:06:10 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rob Clark X-Patchwork-Id: 2849270 Return-Path: X-Original-To: patchwork-linux-arm-msm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 834CC9F2F6 for ; Sat, 24 Aug 2013 19:09:32 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 69D3920258 for ; Sat, 24 Aug 2013 19:09:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5457D20255 for ; Sat, 24 Aug 2013 19:09:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755376Ab3HXTJ3 (ORCPT ); Sat, 24 Aug 2013 15:09:29 -0400 Received: from mail-qe0-f43.google.com ([209.85.128.43]:56025 "EHLO mail-qe0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755360Ab3HXTJ3 (ORCPT ); Sat, 24 Aug 2013 15:09:29 -0400 Received: by mail-qe0-f43.google.com with SMTP id t7so1008995qeb.16 for ; Sat, 24 Aug 2013 12:09:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=WmFaRNZsqdjG5jxYSCnLzyRd0rJ+KCgDvtukCiiwSkk=; b=XoQ5ecwPOaBbef3S6Zg7pOEZqmfg/M+ecAy+O5HSfxJL6GCFXJqhnJz7yJseFv8dYH 4X2YjUBTdK1ugL8gLbIydW0vvDOENYTSqE7jGs89x11J0FaNQM86OsGDVfWysQDIus1J IhtoaTQNRHD+dWhMlBxarzsjtCFeaNsCNb4CLMqIHnD1dsaaLTHrnW9SwGaL0cQwxWoG Cm1NNKpyruddpc5CiL7q7yQWcddefCVb58njSeI3C80icXj8SXLePbYExE+Mt+3bplbF 9ESlo79RYAsZMgHDhK4lQp9Gx1QwPxRwVVHGVY7JwgjuqLQ4k7Ljqjae/G7v1qPNSR0U d7Pw== X-Received: by 10.49.59.44 with SMTP id w12mr7074873qeq.57.1377371368811; Sat, 24 Aug 2013 12:09:28 -0700 (PDT) Received: from localhost (pool-108-20-246-35.bstnma.east.verizon.net. [108.20.246.35]) by mx.google.com with ESMTPSA id w8sm8642567qej.3.1969.12.31.16.00.00 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 24 Aug 2013 12:09:28 -0700 (PDT) From: Rob Clark To: dri-devel@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org, Rob Clark Subject: [PATCHv4 5/5] drm/msm: add basic hangcheck/recovery mechanism Date: Sat, 24 Aug 2013 15:06:10 -0400 Message-Id: <1377371170-20135-6-git-send-email-robdclark@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1377371170-20135-1-git-send-email-robdclark@gmail.com> References: <1377371170-20135-1-git-send-email-robdclark@gmail.com> Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A basic, no-frills recovery mechanism in case the gpu gets wedged. We could try to be a bit more fancy and restart the next submit after the one that got wedged, but for now keep it simple. This is enough to recover things if, for example, the gpu hangs mid way through a piglit run. Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 26 +++++++++++++++-- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 3 +- drivers/gpu/drm/msm/msm_gpu.c | 52 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_gpu.h | 10 +++++++ 5 files changed, 87 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 13d61bb..035bd13 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -371,6 +371,7 @@ static const struct adreno_gpu_funcs funcs = { .hw_init = a3xx_hw_init, .pm_suspend = msm_gpu_pm_suspend, .pm_resume = msm_gpu_pm_resume, + .recover = adreno_recover, .last_fence = adreno_last_fence, .submit = adreno_submit, .flush = adreno_flush, diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 282163e..a605847 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -111,6 +111,28 @@ uint32_t adreno_last_fence(struct msm_gpu *gpu) return adreno_gpu->memptrs->fence; } +void adreno_recover(struct msm_gpu *gpu) +{ + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); + struct drm_device *dev = gpu->dev; + int ret; + + gpu->funcs->pm_suspend(gpu); + + /* reset ringbuffer: */ + gpu->rb->cur = gpu->rb->start; + + /* reset completed fence seqno, just discard anything pending: */ + adreno_gpu->memptrs->fence = gpu->submitted_fence; + + gpu->funcs->pm_resume(gpu); + ret = gpu->funcs->hw_init(gpu); + if (ret) { + dev_err(dev->dev, "gpu hw init failed: %d\n", ret); + /* hmm, oh well? */ + } +} + int adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, struct msm_file_private *ctx) { @@ -119,8 +141,6 @@ int adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, struct msm_ringbuffer *ring = gpu->rb; unsigned i, ibs = 0; - adreno_gpu->last_fence = submit->fence; - for (i = 0; i < submit->nr_cmds; i++) { switch (submit->cmd[i].type) { case MSM_SUBMIT_CMD_IB_TARGET_BUF: @@ -225,7 +245,7 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m) adreno_gpu->rev.patchid); seq_printf(m, "fence: %d/%d\n", adreno_gpu->memptrs->fence, - adreno_gpu->last_fence); + gpu->submitted_fence); seq_printf(m, "rptr: %d\n", adreno_gpu->memptrs->rptr); seq_printf(m, "wptr: %d\n", adreno_gpu->memptrs->wptr); seq_printf(m, "rb wptr: %d\n", get_wptr(gpu->rb)); diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index 6b49c4f..f73abfb 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -54,8 +54,6 @@ struct adreno_gpu { uint32_t revn; /* numeric revision name */ const struct adreno_gpu_funcs *funcs; - uint32_t last_fence; - /* firmware: */ const struct firmware *pm4, *pfp; @@ -99,6 +97,7 @@ static inline bool adreno_is_a330(struct adreno_gpu *gpu) int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value); int adreno_hw_init(struct msm_gpu *gpu); uint32_t adreno_last_fence(struct msm_gpu *gpu); +void adreno_recover(struct msm_gpu *gpu); int adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, struct msm_file_private *ctx); void adreno_flush(struct msm_gpu *gpu); diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 7c6541e..e1e1ec9 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -203,6 +203,51 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu) } /* + * Hangcheck detection for locked gpu: + */ + +static void recover_worker(struct work_struct *work) +{ + struct msm_gpu *gpu = container_of(work, struct msm_gpu, recover_work); + struct drm_device *dev = gpu->dev; + + dev_err(dev->dev, "%s: hangcheck recover!\n", gpu->name); + + mutex_lock(&dev->struct_mutex); + gpu->funcs->recover(gpu); + mutex_unlock(&dev->struct_mutex); + + msm_gpu_retire(gpu); +} + +static void hangcheck_timer_reset(struct msm_gpu *gpu) +{ + DBG("%s", gpu->name); + mod_timer(&gpu->hangcheck_timer, + round_jiffies_up(jiffies + DRM_MSM_HANGCHECK_JIFFIES)); +} + +static void hangcheck_handler(unsigned long data) +{ + struct msm_gpu *gpu = (struct msm_gpu *)data; + uint32_t fence = gpu->funcs->last_fence(gpu); + + if (fence != gpu->hangcheck_fence) { + /* some progress has been made.. ya! */ + gpu->hangcheck_fence = fence; + } else if (fence < gpu->submitted_fence) { + /* no progress and not done.. hung! */ + struct msm_drm_private *priv = gpu->dev->dev_private; + gpu->hangcheck_fence = fence; + queue_work(priv->wq, &gpu->recover_work); + } + + /* if still more pending work, reset the hangcheck timer: */ + if (gpu->submitted_fence > gpu->hangcheck_fence) + hangcheck_timer_reset(gpu); +} + +/* * Cmdstream submission/retirement: */ @@ -254,6 +299,8 @@ int msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, submit->fence = ++priv->next_fence; + gpu->submitted_fence = submit->fence; + ret = gpu->funcs->submit(gpu, submit, ctx); priv->lastctx = ctx; @@ -276,6 +323,7 @@ int msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, msm_gem_move_to_active(&msm_obj->base, gpu, submit->fence); } + hangcheck_timer_reset(gpu); mutex_unlock(&dev->struct_mutex); return ret; @@ -307,6 +355,10 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, INIT_LIST_HEAD(&gpu->active_list); INIT_WORK(&gpu->retire_work, retire_worker); + INIT_WORK(&gpu->recover_work, recover_worker); + + setup_timer(&gpu->hangcheck_timer, hangcheck_handler, + (unsigned long)gpu); BUG_ON(ARRAY_SIZE(clk_names) != ARRAY_SIZE(gpu->grp_clks)); diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 8d2cd6c..8cd829e 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -51,6 +51,7 @@ struct msm_gpu_funcs { void (*idle)(struct msm_gpu *gpu); irqreturn_t (*irq)(struct msm_gpu *irq); uint32_t (*last_fence)(struct msm_gpu *gpu); + void (*recover)(struct msm_gpu *gpu); void (*destroy)(struct msm_gpu *gpu); #ifdef CONFIG_DEBUG_FS /* show GPU status in debugfs: */ @@ -69,6 +70,8 @@ struct msm_gpu { /* list of GEM active objects: */ struct list_head active_list; + uint32_t submitted_fence; + /* worker for handling active-list retiring: */ struct work_struct retire_work; @@ -83,6 +86,13 @@ struct msm_gpu { struct clk *ebi1_clk, *grp_clks[5]; uint32_t fast_rate, slow_rate, bus_freq; uint32_t bsc; + + /* Hang Detction: */ +#define DRM_MSM_HANGCHECK_PERIOD 500 /* in ms */ +#define DRM_MSM_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_MSM_HANGCHECK_PERIOD) + struct timer_list hangcheck_timer; + uint32_t hangcheck_fence; + struct work_struct recover_work; }; static inline void gpu_write(struct msm_gpu *gpu, u32 reg, u32 data)