From patchwork Fri Aug 3 13:18:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Stach X-Patchwork-Id: 10555105 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB645139A for ; Fri, 3 Aug 2018 13:18:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C70D72C12C for ; Fri, 3 Aug 2018 13:18:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB5482C1B5; Fri, 3 Aug 2018 13:18:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 537292C12C for ; Fri, 3 Aug 2018 13:18:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C26646E10E; Fri, 3 Aug 2018 13:18:32 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9AFC86E016 for ; Fri, 3 Aug 2018 13:18:31 +0000 (UTC) Received: from dude.hi.pengutronix.de ([2001:67c:670:100:1d::7] helo=dude.pengutronix.de.) by metis.ext.pengutronix.de with esmtp (Exim 4.89) (envelope-from ) id 1flZyM-0006a6-Or; Fri, 03 Aug 2018 15:18:26 +0200 From: Lucas Stach To: =?utf-8?q?Christian_K=C3=B6nig?= Subject: [PATCH] drm/scheduler: fix timeout worker setup for out of order job completions Date: Fri, 3 Aug 2018 15:18:25 +0200 Message-Id: <20180803131825.21945-1-l.stach@pengutronix.de> X-Mailer: git-send-email 2.18.0 X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::7 X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: dri-devel@lists.freedesktop.org X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: amd-gfx@lists.freedesktop.org, patchwork-lst@pengutronix.de, dri-devel@lists.freedesktop.org, kernel@pengutronix.de, Nayan Deshmukh MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP drm_sched_job_finish() is a work item scheduled for each finished job on a unbound system workqueue. This means the workers can execute out of order with regard to the real hardware job completions. If this happens queueing a timeout worker for the first job on the ring mirror list is wrong, as this may be a job which has already finished executing. Fix this by reorganizing the code to always queue the worker for the next job on the list, if this job hasn't finished yet. This is robust against a potential reordering of the finish workers. Also move out the timeout worker cancelling, so that we don't need to take the job list lock twice. As a small optimization list_del is used to remove the job from the ring mirror list, as there is no need to reinit the list head in the job we are about to free. Signed-off-by: Lucas Stach --- drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 44d480768dfe..0be2859d7b80 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work) finish_work); struct drm_gpu_scheduler *sched = s_job->sched; - /* remove job from ring_mirror_list */ + if (sched->timeout != MAX_SCHEDULE_TIMEOUT) + cancel_delayed_work_sync(&s_job->work_tdr); + spin_lock(&sched->job_list_lock); - list_del_init(&s_job->node); if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { - struct drm_sched_job *next; - - spin_unlock(&sched->job_list_lock); - cancel_delayed_work_sync(&s_job->work_tdr); - spin_lock(&sched->job_list_lock); + struct drm_sched_job *next = list_next_entry(s_job, node); /* queue TDR for next job */ - next = list_first_entry_or_null(&sched->ring_mirror_list, - struct drm_sched_job, node); - - if (next) + if (next != s_job && + !dma_fence_is_signaled(&s_job->s_fence->finished)) schedule_delayed_work(&next->work_tdr, sched->timeout); } + /* remove job from ring_mirror_list */ + list_del(&s_job->node); spin_unlock(&sched->job_list_lock); + dma_fence_put(&s_job->s_fence->finished); sched->ops->free_job(s_job); }