From patchwork Mon Apr 15 15:11:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 10901015 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 008901390 for ; Mon, 15 Apr 2019 15:11:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5B5E2899C for ; Mon, 15 Apr 2019 15:11:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CA04C289A7; Mon, 15 Apr 2019 15:11:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AE196289A5 for ; Mon, 15 Apr 2019 15:11:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A9CB0897AC; Mon, 15 Apr 2019 15:11:47 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-eopbgr750072.outbound.protection.outlook.com [40.107.75.72]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6844489755; Mon, 15 Apr 2019 15:11:46 +0000 (UTC) Received: from SN1PR12CA0056.namprd12.prod.outlook.com (2603:10b6:802:20::27) by DM6PR12MB2970.namprd12.prod.outlook.com (2603:10b6:5:115::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1792.15; Mon, 15 Apr 2019 15:11:44 +0000 Received: from CO1NAM03FT046.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e48::205) by SN1PR12CA0056.outlook.office365.com (2603:10b6:802:20::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1792.15 via Frontend Transport; Mon, 15 Apr 2019 15:11:44 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV01.amd.com (165.204.84.17) by CO1NAM03FT046.mail.protection.outlook.com (10.152.81.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1771.16 via Frontend Transport; Mon, 15 Apr 2019 15:11:43 +0000 Received: from agrodzovsky-All-Series.amd.com (10.34.1.3) by SATLEXCHOV01.amd.com (10.181.40.71) with Microsoft SMTP Server id 14.3.389.1; Mon, 15 Apr 2019 10:11:41 -0500 From: Andrey Grodzovsky To: , , , , Subject: [PATCH v2 1/5] drm/scheduler: rework job destruction Date: Mon, 15 Apr 2019 11:11:23 -0400 Message-ID: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(376002)(346002)(136003)(39860400002)(396003)(2980300002)(428003)(189003)(199004)(316002)(53936002)(81156014)(8676002)(72206003)(356004)(6666004)(336012)(5660300002)(5820100001)(6306002)(106466001)(54906003)(966005)(14444005)(105586002)(53416004)(478600001)(4326008)(110136005)(97736004)(68736007)(47776003)(2201001)(305945005)(50226002)(81166006)(8936002)(426003)(104016004)(2616005)(476003)(126002)(26005)(486006)(2870700001)(77096007)(23676004)(2906002)(36756003)(66574012)(50466002)(30864003)(7696005)(86362001)(186003)(44832011)(2101003); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR12MB2970; H:SATLEXCHOV01.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: de9a9a46-791e-44bc-590e-08d6c1b4abdc X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600140)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328); SRVR:DM6PR12MB2970; X-MS-TrafficTypeDiagnostic: DM6PR12MB2970: X-MS-Exchange-PUrlCount: 1 X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 000800954F X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: SzXNpjSZ0nxCsQGyPcqgVWdUznv/9jYPfa6xHLoc/7jrwDlmlEZ2D5RDStavyeQXua0pPiOILpo0816yNo1UXcEYcL+Q1un4kzrc6C07aJCOzb10DtUqmeQV9wncYjPO5UtafnK7afDGXLyqI1IPgA1zEz5DEIq8HaCoTt6oRPlu0Vgoa3sd5ZPzMMz5JtF1grQ5U2ACKyWOXhVaBbsyg3LXxQ84pdaIX3996okwoNbbEVWmOZ+R+eLpq3swioTKoCuAW7yeiozdN/5R+vig+UOdFE59lPZXMZEtmMqb4gE3sp7oLcDC/Vn3cnY+GIDuFr6KYQY61hIK4whxre/8EZ9atqOs8hTFA+te1en2Cn221ksbYTOE8C+NnKlo4OqHBgsC+Lcd8mMhMJSNFESXryhHJ3FbehkLq7TuYv6IZuM= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2019 15:11:43.8140 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: de9a9a46-791e-44bc-590e-08d6c1b4abdc X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB2970 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TRQU5aDsSkyEhT/Gr7rG/t7QshCmdI1BgYSs2lRwuLA=; b=qM29xWxHkYNcypndkbx4dbe1/7Sk6Ioxb1tRCDaLl1O+DkKfynh2EjZprBadO5wa4pW1G/L3VL34kX57CPwexJqRK3XHI6Up5UsjAx5x1WKQjdiT2fQ4ZzJT5MYZif4qkS58mQ5apIogvmDfqrEjgHcqn8CB7iq4rMkYa3vf5C0= X-Mailman-Original-Authentication-Results: spf=none (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=permerror action=none header.from=amd.com; X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas.Kazlauskas@amd.com, =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Christian König We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Signed-off-by: Christian König Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 138 +++++++++++++++++------------ drivers/gpu/drm/v3d/v3d_sched.c | 9 +- include/drm/gpu_scheduler.h | 6 +- 6 files changed, 108 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d46675b..8c41a9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3341,15 +3341,23 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; - drm_sched_stop(&ring->sched); + drm_sched_stop(&ring->sched, &job->base); /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } - if(job) + if(job) { drm_sched_increase_karma(&job->base); + /* + * Guilty job did complete and hence needs to be manually removed + * See drm_sched_stop doc. + */ + if (list_empty(&job->base.node)) + job->base.sched->ops->free_job(&job->base); + } + if (!amdgpu_sriov_vf(adev)) { @@ -3489,8 +3497,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, return r; } -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, - struct amdgpu_job *job) +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) { int i; @@ -3630,7 +3637,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, /* Post ASIC reset for all devs .*/ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); + amdgpu_device_post_asic_reset(tmp_adev); if (r) { /* bad news, how to tell it to userspace ? */ diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c index 3fbb485..8434715 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) mmu_size + gpu->buffer.size; /* Add in the active command buffers */ - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { submit = to_etnaviv_submit(s_job); file_size += submit->cmdbuf.size; n_obj++; } - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); /* Add in the active buffer objects */ list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) gpu->buffer.size, etnaviv_cmdbuf_get_va(&gpu->buffer)); - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { submit = to_etnaviv_submit(s_job); etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, submit->cmdbuf.vaddr, submit->cmdbuf.size, etnaviv_cmdbuf_get_va(&submit->cmdbuf)); } - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); /* Reserve space for the bomap */ if (n_bomap_pages) { diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 67ae266..8795c19 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -109,11 +109,18 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) } /* block scheduler */ - drm_sched_stop(&gpu->sched); + drm_sched_stop(&gpu->sched, sched_job); if(sched_job) drm_sched_increase_karma(sched_job); + /* + * Guilty job did complete and hence needs to be manually removed + * See drm_sched_stop doc. + */ + if (list_empty(&sched_job->node)) + sched_job->sched->ops->free_job(sched_job); + /* get the GPU back into the init state */ etnaviv_core_dump(gpu); etnaviv_gpu_recover_hang(gpu); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 19fc601..05a4540 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, } EXPORT_SYMBOL(drm_sched_resume_timeout); -/* job_finish is called after hw fence signaled - */ -static void drm_sched_job_finish(struct work_struct *work) -{ - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, - finish_work); - struct drm_gpu_scheduler *sched = s_job->sched; - unsigned long flags; - - /* - * Canceling the timeout without removing our job from the ring mirror - * list is safe, as we will only end up in this worker if our jobs - * finished fence has been signaled. So even if some another worker - * manages to find this job as the next job in the list, the fence - * signaled check below will prevent the timeout to be restarted. - */ - cancel_delayed_work_sync(&sched->work_tdr); - - spin_lock_irqsave(&sched->job_list_lock, flags); - /* queue TDR for next job */ - drm_sched_start_timeout(sched); - spin_unlock_irqrestore(&sched->job_list_lock, flags); - - sched->ops->free_job(s_job); -} - static void drm_sched_job_begin(struct drm_sched_job *s_job) { struct drm_gpu_scheduler *sched = s_job->sched; @@ -371,23 +345,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); * @sched: scheduler instance * @bad: bad scheduler job * + * Stop the scheduler and also removes and frees all completed jobs. + * Note: bad job will not be freed as it might be used later and so it's + * callers responsibility to release it manually if it's not part of the + * mirror list any more. + * */ -void drm_sched_stop(struct drm_gpu_scheduler *sched) +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) { - struct drm_sched_job *s_job; + struct drm_sched_job *s_job, *tmp; unsigned long flags; - struct dma_fence *last_fence = NULL; kthread_park(sched->thread); /* - * Verify all the signaled jobs in mirror list are removed from the ring - * by waiting for the latest job to enter the list. This should insure that - * also all the previous jobs that were in flight also already singaled - * and removed from the list. + * Iterate the job list from later to earlier one and either deactive + * their HW callbacks or remove them from mirror list if they already + * signaled. + * This iteration is thread safe as sched thread is stopped. */ - spin_lock_irqsave(&sched->job_list_lock, flags); - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, &s_job->cb)) { @@ -395,16 +372,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) s_job->s_fence->parent = NULL; atomic_dec(&sched->hw_rq_count); } else { - last_fence = dma_fence_get(&s_job->s_fence->finished); - break; + /* + * remove job from ring_mirror_list. + * Locking here is for concurrent resume timeout + */ + spin_lock_irqsave(&sched->job_list_lock, flags); + list_del_init(&s_job->node); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + + /* + * Wait for job's HW fence callback to finish using s_job + * before releasing it. + * + * Job is still alive so fence refcount at least 1 + */ + dma_fence_wait(&s_job->s_fence->finished, false); + + /* + * We must keep bad job alive for later use during + * recovery by some of the drivers + */ + if (bad != s_job) + sched->ops->free_job(s_job); } } - spin_unlock_irqrestore(&sched->job_list_lock, flags); - - if (last_fence) { - dma_fence_wait(last_fence, false); - dma_fence_put(last_fence); - } } EXPORT_SYMBOL(drm_sched_stop); @@ -418,6 +409,7 @@ EXPORT_SYMBOL(drm_sched_stop); void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) { struct drm_sched_job *s_job, *tmp; + unsigned long flags; int r; if (!full_recovery) @@ -425,9 +417,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) /* * Locking the list is not required here as the sched thread is parked - * so no new jobs are being pushed in to HW and in drm_sched_stop we - * flushed all the jobs who were still in mirror list but who already - * signaled and removed them self from the list. Also concurrent + * so no new jobs are being inserted or removed. Also concurrent * GPU recovers can't run in parallel. */ list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { @@ -445,7 +435,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) drm_sched_process_job(NULL, &s_job->cb); } + spin_lock_irqsave(&sched->job_list_lock, flags); drm_sched_start_timeout(sched); + spin_unlock_irqrestore(&sched->job_list_lock, flags); unpark: kthread_unpark(sched->thread); @@ -464,7 +456,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) uint64_t guilty_context; bool found_guilty = false; - /*TODO DO we need spinlock here ? */ list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { struct drm_sched_fence *s_fence = s_job->s_fence; @@ -514,7 +505,6 @@ int drm_sched_job_init(struct drm_sched_job *job, return -ENOMEM; job->id = atomic64_inc_return(&sched->job_id_count); - INIT_WORK(&job->finish_work, drm_sched_job_finish); INIT_LIST_HEAD(&job->node); return 0; @@ -597,24 +587,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); struct drm_sched_fence *s_fence = s_job->s_fence; struct drm_gpu_scheduler *sched = s_fence->sched; - unsigned long flags; - - cancel_delayed_work(&sched->work_tdr); atomic_dec(&sched->hw_rq_count); atomic_dec(&sched->num_jobs); - spin_lock_irqsave(&sched->job_list_lock, flags); - /* remove job from ring_mirror_list */ - list_del_init(&s_job->node); - spin_unlock_irqrestore(&sched->job_list_lock, flags); + trace_drm_sched_process_job(s_fence); drm_sched_fence_finished(s_fence); - - trace_drm_sched_process_job(s_fence); wake_up_interruptible(&sched->wake_up_worker); +} + +/** + * drm_sched_cleanup_jobs - destroy finished jobs + * + * @sched: scheduler instance + * + * Remove all finished jobs from the mirror list and destroy them. + */ +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) +{ + unsigned long flags; + + /* Don't destroy jobs while the timeout worker is running */ + if (!cancel_delayed_work(&sched->work_tdr)) + return; + + + while (!list_empty(&sched->ring_mirror_list)) { + struct drm_sched_job *job; + + job = list_first_entry(&sched->ring_mirror_list, + struct drm_sched_job, node); + if (!dma_fence_is_signaled(&job->s_fence->finished)) + break; + + spin_lock_irqsave(&sched->job_list_lock, flags); + /* remove job from ring_mirror_list */ + list_del_init(&job->node); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + + sched->ops->free_job(job); + } + + /* queue timeout for next job */ + spin_lock_irqsave(&sched->job_list_lock, flags); + drm_sched_start_timeout(sched); + spin_unlock_irqrestore(&sched->job_list_lock, flags); - schedule_work(&s_job->finish_work); } /** @@ -656,9 +675,10 @@ static int drm_sched_main(void *param) struct dma_fence *fence; wait_event_interruptible(sched->wake_up_worker, + (drm_sched_cleanup_jobs(sched), (!drm_sched_blocked(sched) && (entity = drm_sched_select_entity(sched))) || - kthread_should_stop()); + kthread_should_stop())); if (!entity) continue; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index ce7c737b..8efb091 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) /* block scheduler */ for (q = 0; q < V3D_MAX_QUEUES; q++) - drm_sched_stop(&v3d->queue[q].sched); + drm_sched_stop(&v3d->queue[q].sched, sched_job); if(sched_job) drm_sched_increase_karma(sched_job); + /* + * Guilty job did complete and hence needs to be manually removed + * See drm_sched_stop doc. + */ + if (list_empty(&sched_job->node)) + sched_job->sched->ops->free_job(sched_job); + /* get the GPU back into the init state */ v3d_reset(v3d); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0daca4d..9ee0f27 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); * @sched: the scheduler instance on which this job is scheduled. * @s_fence: contains the fences for the scheduling of job. * @finish_cb: the callback for the finished fence. - * @finish_work: schedules the function @drm_sched_job_finish once the job has - * finished to remove the job from the - * @drm_gpu_scheduler.ring_mirror_list. * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. * @id: a unique id assigned to each job scheduled on the scheduler. * @karma: increment on every hang caused by this job. If this exceeds the hang @@ -188,7 +185,6 @@ struct drm_sched_job { struct drm_gpu_scheduler *sched; struct drm_sched_fence *s_fence; struct dma_fence_cb finish_cb; - struct work_struct finish_work; struct list_head node; uint64_t id; atomic_t karma; @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, void *owner); void drm_sched_job_cleanup(struct drm_sched_job *job); void drm_sched_wakeup(struct drm_gpu_scheduler *sched); -void drm_sched_stop(struct drm_gpu_scheduler *sched); +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); void drm_sched_increase_karma(struct drm_sched_job *bad); From patchwork Mon Apr 15 15:11:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 10901017 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C23B51390 for ; Mon, 15 Apr 2019 15:11:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA760289A5 for ; Mon, 15 Apr 2019 15:11:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BC1D289A0; Mon, 15 Apr 2019 15:11:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5033C28999 for ; Mon, 15 Apr 2019 15:11:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 99A22897D0; Mon, 15 Apr 2019 15:11:53 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM04-SN1-obe.outbound.protection.outlook.com (mail-eopbgr700040.outbound.protection.outlook.com [40.107.70.40]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2411B8979D; Mon, 15 Apr 2019 15:11:52 +0000 (UTC) Received: from DM3PR12CA0137.namprd12.prod.outlook.com (10.164.142.161) by DM6PR12MB2969.namprd12.prod.outlook.com (20.178.29.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1792.14; Mon, 15 Apr 2019 15:11:50 +0000 Received: from CO1NAM03FT018.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e48::200) by DM3PR12CA0137.outlook.office365.com (2603:10b6:0:51::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1792.14 via Frontend Transport; Mon, 15 Apr 2019 15:11:50 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV01.amd.com (165.204.84.17) by CO1NAM03FT018.mail.protection.outlook.com (10.152.80.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1771.16 via Frontend Transport; Mon, 15 Apr 2019 15:11:49 +0000 Received: from agrodzovsky-All-Series.amd.com (10.34.1.3) by SATLEXCHOV01.amd.com (10.181.40.71) with Microsoft SMTP Server id 14.3.389.1; Mon, 15 Apr 2019 10:11:48 -0500 From: Andrey Grodzovsky To: , , , , Subject: [PATCH v2 2/5] drm/sched: Keep s_fence->parent pointer Date: Mon, 15 Apr 2019 11:11:24 -0400 Message-ID: <1555341087-23339-2-git-send-email-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> References: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(396003)(346002)(39860400002)(376002)(136003)(2980300002)(428003)(199004)(189003)(486006)(44832011)(446003)(106466001)(126002)(476003)(11346002)(105586002)(53416004)(2616005)(104016004)(97736004)(50226002)(50466002)(4326008)(47776003)(8936002)(5660300002)(2906002)(426003)(48376002)(16586007)(54906003)(110136005)(478600001)(316002)(6666004)(356004)(36756003)(8676002)(76176011)(81166006)(51416003)(81156014)(7696005)(26005)(86362001)(186003)(336012)(77096007)(14444005)(305945005)(2201001)(53936002)(72206003)(68736007)(2101003); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR12MB2969; H:SATLEXCHOV01.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; A:1; MX:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9f5d4556-d5c4-4112-b81f-08d6c1b4af7a X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600140)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328); SRVR:DM6PR12MB2969; X-MS-TrafficTypeDiagnostic: DM6PR12MB2969: X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 000800954F X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: Dj7VAmH9e9H3lPu/sa5g5jTB3u1lSE/isziY0wA5hjH9yIDzYjf4ZZdGJqXuW8rpN3gMt8bi+rmDlGkqqN0etQoBU0Vbdy32MQUXln/1E3oaO9oz2NXMck0WMAlmB9fXt5YVdYzDzcuIKyqWZWOABymGZ77ArNSPA3zUVSLkaVCUWBHIE6fDI79yfhfgpA+Zr5+Nfg+wxkhVn+alzb2TS3Bf0iB9PiJHjXPZcCjtgYeNiEjz9S/jvIvs6P9i4ekKJju9K4iM6D8l0J+wfiWIGoAkMd3TWLZWqTB4pXqPjRKCz+vACvgRon7+yONeuVfjZ+MupBNZ5SOa96VnOFA0DCOgAdDrtMYx++wf4I0O1w+oevsKMdWTQv2IBfC7i+I9Q/wt2z7qCFj++iL+2KP/xCdShpUb0hCdnVrP99ISckU= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2019 15:11:49.9661 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9f5d4556-d5c4-4112-b81f-08d6c1b4af7a X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB2969 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2WzipMdxl76BSrvQ1+h6v2X3RxYJm/Rq/8YhlkEfktU=; b=CclWd07uD2lK6wICGImWv1VfM5GO18qDb/ZM2uvsbn9EboCw/p/Ty4VI6WbN4WxHWuLBT/zbWs5yJXkE8RuslF6pZlTbLl54UZc09uqKDwHofiQZ1joACPTThLPAoFVJPRBRVcdTf77Eg0hdqzoddbF5l+LRwc7tbZ29d/9ceyY= X-Mailman-Original-Authentication-Results: spf=none (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=permerror action=none header.from=amd.com; X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas.Kazlauskas@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP For later driver's reference to see if the fence is signaled. v2: Move parent fence put to resubmit jobs. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 05a4540..89d1f1b 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -368,8 +368,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, &s_job->cb)) { - dma_fence_put(s_job->s_fence->parent); - s_job->s_fence->parent = NULL; atomic_dec(&sched->hw_rq_count); } else { /* @@ -396,6 +394,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) sched->ops->free_job(s_job); } } + + /* + * Stop pending timer in flight as we rearm it in drm_sched_start. This + * avoids the pending timeout work in progress to fire right away after + * this TDR finished and before the newly restarted jobs had a + * chance to complete. + */ + cancel_delayed_work(&sched->work_tdr); } EXPORT_SYMBOL(drm_sched_stop); @@ -467,6 +473,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) dma_fence_set_error(&s_fence->finished, -ECANCELED); + dma_fence_put(s_job->s_fence->parent); s_job->s_fence->parent = sched->ops->run_job(s_job); atomic_inc(&sched->hw_rq_count); } From patchwork Mon Apr 15 15:11:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 10901019 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD94414DB for ; Mon, 15 Apr 2019 15:12:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B541528989 for ; Mon, 15 Apr 2019 15:12:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A9B7028990; Mon, 15 Apr 2019 15:12:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D905D2899D for ; Mon, 15 Apr 2019 15:12:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A835A897FB; Mon, 15 Apr 2019 15:11:59 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM05-BY2-obe.outbound.protection.outlook.com (mail-eopbgr710065.outbound.protection.outlook.com [40.107.71.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id A014A897F9; Mon, 15 Apr 2019 15:11:58 +0000 (UTC) Received: from SN1PR12CA0111.namprd12.prod.outlook.com (2603:10b6:802:21::46) by MN2PR12MB3470.namprd12.prod.outlook.com (2603:10b6:208:d0::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1792.17; Mon, 15 Apr 2019 15:11:56 +0000 Received: from CO1NAM03FT060.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e48::205) by SN1PR12CA0111.outlook.office365.com (2603:10b6:802:21::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1792.17 via Frontend Transport; Mon, 15 Apr 2019 15:11:56 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV01.amd.com (165.204.84.17) by CO1NAM03FT060.mail.protection.outlook.com (10.152.81.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1771.16 via Frontend Transport; Mon, 15 Apr 2019 15:11:56 +0000 Received: from agrodzovsky-All-Series.amd.com (10.34.1.3) by SATLEXCHOV01.amd.com (10.181.40.71) with Microsoft SMTP Server id 14.3.389.1; Mon, 15 Apr 2019 10:11:54 -0500 From: Andrey Grodzovsky To: , , , , Subject: [PATCH v2 3/5] drm/amdgpu: Avoid HW reset if guilty job already signaled. Date: Mon, 15 Apr 2019 11:11:25 -0400 Message-ID: <1555341087-23339-3-git-send-email-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> References: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(346002)(376002)(136003)(396003)(39860400002)(2980300002)(428003)(43544003)(189003)(199004)(476003)(126002)(486006)(97736004)(86362001)(2616005)(2201001)(426003)(47776003)(11346002)(105586002)(44832011)(336012)(356004)(446003)(6666004)(186003)(53416004)(478600001)(77096007)(4326008)(68736007)(36756003)(26005)(72206003)(7696005)(76176011)(16586007)(316002)(8676002)(305945005)(5660300002)(53936002)(48376002)(51416003)(50466002)(104016004)(50226002)(81166006)(106466001)(2906002)(8936002)(110136005)(81156014)(54906003)(14444005)(2101003); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR12MB3470; H:SATLEXCHOV01.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 362801f0-c287-4186-3e46-08d6c1b4b31e X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600140)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328); SRVR:MN2PR12MB3470; X-MS-TrafficTypeDiagnostic: MN2PR12MB3470: X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 000800954F X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: bhGckaKgiAvwy6RGgirTdC1jul6Y42YJ74t0iWNjbbPABv0fpbRnf0ClMue2sQxOu9ziqUdah1bLdTbIrL6UYb5+ZrVqK21WML96vPP8nRDENnXs9FwKtWJzZt2/sWY7qkq1AwK+1xKHOFT7bYMwy/+ABtOLcA+FCxhD3uAcIWlX+4mWpQVJVNrx5MD9Q7ZAXtFNxxEGkj19l1+OFTJYB/N9yfzlPz8xM1gOXyEqMCaC9IMC1sNFwS09ub3UroZPIeJh0blczeEdNxUhoBWE3OtdeDylOtBZqZE1F9pbouJ7mLGZ6kS6O/gfq2L37s3FYTWZZ2V+eaw7sQ/6wJBFhXl7W518uupNMCR1wLYAGcAH7dnHpZJz1mHOOq8fvT8xn2pzWGfprrkr8R1xrdQI4Ab8yNQtCLtKIYIBPz19ic0= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2019 15:11:56.0699 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 362801f0-c287-4186-3e46-08d6c1b4b31e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB3470 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YcKOR63mMaNrkpke/O5JDzkDPgxEGO3W8Ci3ViS/gqo=; b=JSt86euqa9vTJQAcl2MM3fjZrx3z6fZPilj/M6D0ODuUeDkbfU7/2iqJ+DFdDjqhnnY3s2NJEGAhReLoec9vl/Cnq3SauxVyjkFUuceGBI5DFDrxZnzNtdOQWCx9a7EcTjPo4jSkeff7gqCmVupibVGfiy41YuaINotjmjmi6q0= X-Mailman-Original-Authentication-Results: spf=none (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=permerror action=none header.from=amd.com; X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas.Kazlauskas@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also reject TDRs if another one already running. v2: Stop all schedulers across device and entire XGMI hive before force signaling HW fences. Avoid passing job_signaled to helper fnctions to keep all the decision making about skipping HW reset in one place. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 159 ++++++++++++++++++----------- 1 file changed, 102 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8c41a9f..565862d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3341,25 +3341,16 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; - drm_sched_stop(&ring->sched, &job->base); - /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } - if(job) { + if(job) drm_sched_increase_karma(&job->base); - /* - * Guilty job did complete and hence needs to be manually removed - * See drm_sched_stop doc. - */ - if (list_empty(&job->base.node)) - job->base.sched->ops->free_job(&job->base); - } - + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ if (!amdgpu_sriov_vf(adev)) { if (!need_full_reset) @@ -3497,37 +3488,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, return r; } -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) { - int i; - - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { - struct amdgpu_ring *ring = adev->rings[i]; - - if (!ring || !ring->sched.thread) - continue; - - if (!adev->asic_reset_res) - drm_sched_resubmit_jobs(&ring->sched); - - drm_sched_start(&ring->sched, !adev->asic_reset_res); - } - - if (!amdgpu_device_has_dc_support(adev)) { - drm_helper_resume_force_mode(adev->ddev); - } - - adev->asic_reset_res = 0; -} + if (trylock) { + if (!mutex_trylock(&adev->lock_reset)) + return false; + } else + mutex_lock(&adev->lock_reset); -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) -{ - mutex_lock(&adev->lock_reset); atomic_inc(&adev->gpu_reset_counter); adev->in_gpu_reset = 1; /* Block kfd: SRIOV would do it separately */ if (!amdgpu_sriov_vf(adev)) amdgpu_amdkfd_pre_reset(adev); + + return true; } static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) @@ -3555,40 +3530,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) { - int r; + struct list_head device_list, *device_list_handle = NULL; + bool need_full_reset, job_signaled; struct amdgpu_hive_info *hive = NULL; - bool need_full_reset = false; struct amdgpu_device *tmp_adev = NULL; - struct list_head device_list, *device_list_handle = NULL; + int i, r = 0; + need_full_reset = job_signaled = false; INIT_LIST_HEAD(&device_list); dev_info(adev->dev, "GPU reset begin!\n"); + hive = amdgpu_get_xgmi_hive(adev, false); + /* - * In case of XGMI hive disallow concurrent resets to be triggered - * by different nodes. No point also since the one node already executing - * reset will also reset all the other nodes in the hive. + * Here we trylock to avoid chain of resets executing from + * either trigger by jobs on different adevs in XGMI hive or jobs on + * different schedulers for same device while this TO handler is running. + * We always reset all schedulers for device and all devices for XGMI + * hive so that should take care of them too. */ - hive = amdgpu_get_xgmi_hive(adev, 0); - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && - !mutex_trylock(&hive->reset_lock)) + + if (hive && !mutex_trylock(&hive->reset_lock)) { + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", + job->base.id, hive->hive_id); return 0; + } /* Start with adev pre asic reset first for soft reset check.*/ - amdgpu_device_lock_adev(adev); - r = amdgpu_device_pre_asic_reset(adev, - job, - &need_full_reset); - if (r) { - /*TODO Should we stop ?*/ - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", - r, adev->ddev->unique); - adev->asic_reset_res = r; + if (!amdgpu_device_lock_adev(adev, !hive)) { + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", + job->base.id); + return 0; } /* Build list of devices to reset */ - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { + if (adev->gmc.xgmi.num_physical_nodes > 1) { if (!hive) { amdgpu_device_unlock_adev(adev); return -ENODEV; @@ -3605,13 +3582,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, device_list_handle = &device_list; } + /* block all schedulers and reset given job's ring */ + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = tmp_adev->rings[i]; + + if (!ring || !ring->sched.thread) + continue; + + drm_sched_stop(&ring->sched, &job->base); + } + } + + + /* + * Must check guilty signal here since after this point all old + * HW fences are force signaled. + * + * job->base holds a reference to parent fence + */ + if (job && job->base.s_fence->parent && + dma_fence_is_signaled(job->base.s_fence->parent)) + job_signaled = true; + + if (!amdgpu_device_ip_need_full_reset(adev)) + device_list_handle = &device_list; + + if (job_signaled) { + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); + goto skip_hw_reset; + } + + + /* Guilty job will be freed after this*/ + r = amdgpu_device_pre_asic_reset(adev, + job, + &need_full_reset); + if (r) { + /*TODO Should we stop ?*/ + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", + r, adev->ddev->unique); + adev->asic_reset_res = r; + } + retry: /* Rest of adevs pre asic reset from XGMI hive. */ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { if (tmp_adev == adev) continue; - amdgpu_device_lock_adev(tmp_adev); + amdgpu_device_lock_adev(tmp_adev, false); r = amdgpu_device_pre_asic_reset(tmp_adev, NULL, &need_full_reset); @@ -3635,9 +3655,34 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, goto retry; } +skip_hw_reset: + /* + * Guilty job did complete and hence needs to be manually removed + * See drm_sched_stop doc. + */ + if (job && list_empty(&job->base.node)) + job->base.sched->ops->free_job(&job->base); + /* Post ASIC reset for all devs .*/ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - amdgpu_device_post_asic_reset(tmp_adev); + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = tmp_adev->rings[i]; + + if (!ring || !ring->sched.thread) + continue; + + /* No point to resubmit jobs if we didn't HW reset*/ + if (!tmp_adev->asic_reset_res && !job_signaled) + drm_sched_resubmit_jobs(&ring->sched); + + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); + } + + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { + drm_helper_resume_force_mode(tmp_adev->ddev); + } + + tmp_adev->asic_reset_res = 0; if (r) { /* bad news, how to tell it to userspace ? */ @@ -3650,7 +3695,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, amdgpu_device_unlock_adev(tmp_adev); } - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) + if (hive) mutex_unlock(&hive->reset_lock); if (r) From patchwork Mon Apr 15 15:11:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 10901021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4EB11669 for ; Mon, 15 Apr 2019 15:12:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9BA6128925 for ; Mon, 15 Apr 2019 15:12:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F1DC2899D; Mon, 15 Apr 2019 15:12:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 911B228989 for ; Mon, 15 Apr 2019 15:12:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 67891897C3; Mon, 15 Apr 2019 15:12:08 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-eopbgr770088.outbound.protection.outlook.com [40.107.77.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id E30CC897C3; Mon, 15 Apr 2019 15:12:06 +0000 (UTC) Received: from MWHPR12CA0054.namprd12.prod.outlook.com (2603:10b6:300:103::16) by DM6PR12MB2969.namprd12.prod.outlook.com (2603:10b6:5:115::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1792.14; Mon, 15 Apr 2019 15:12:02 +0000 Received: from CO1NAM03FT014.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e48::208) by MWHPR12CA0054.outlook.office365.com (2603:10b6:300:103::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1792.14 via Frontend Transport; Mon, 15 Apr 2019 15:12:02 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV01.amd.com (165.204.84.17) by CO1NAM03FT014.mail.protection.outlook.com (10.152.80.101) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1771.16 via Frontend Transport; Mon, 15 Apr 2019 15:12:02 +0000 Received: from agrodzovsky-All-Series.amd.com (10.34.1.3) by SATLEXCHOV01.amd.com (10.181.40.71) with Microsoft SMTP Server id 14.3.389.1; Mon, 15 Apr 2019 10:12:00 -0500 From: Andrey Grodzovsky To: , , , , Subject: [PATCH v2 4/5] drm/amd/display: wait for fence without holding reservation lock Date: Mon, 15 Apr 2019 11:11:26 -0400 Message-ID: <1555341087-23339-4-git-send-email-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> References: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(396003)(346002)(39860400002)(376002)(136003)(2980300002)(428003)(199004)(189003)(486006)(44832011)(446003)(106466001)(126002)(476003)(11346002)(105586002)(53416004)(2616005)(104016004)(97736004)(50226002)(2870700001)(50466002)(4326008)(47776003)(8936002)(5660300002)(2906002)(426003)(66574012)(5820100001)(54906003)(110136005)(478600001)(316002)(6666004)(356004)(36756003)(8676002)(76176011)(81166006)(81156014)(7696005)(26005)(86362001)(186003)(336012)(77096007)(14444005)(305945005)(2201001)(53936002)(72206003)(68736007)(23676004)(2101003); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR12MB2969; H:SATLEXCHOV01.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; A:1; MX:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3769f1d7-db05-4c9b-0bf1-08d6c1b4b6ac X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600140)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328); SRVR:DM6PR12MB2969; X-MS-TrafficTypeDiagnostic: DM6PR12MB2969: X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 000800954F X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: udzdhja0lJRI6nDn0H0HGqk9pVBGah6iR12p2kH6HjAVIO0TlTwzJ08+MGdn1gsq/oXiquZrSwHgBG9N7k2AVhp/Lif7l7hPPwarY1DY6XKEWNOUq9nYE4E9O3I9UThKMqZL/WoyTCQZ5N5Z0dNIQr+Kn2M0GkAIwlK0Uz0d0Ko0QNO2B/k0wTwS/r2i0NJ04W52JpI0NR9+606epZ7MGYz3IcF3jvkpeE9IdJuH9FVCfG67l8N+1q97LnE9T7xqhHdQCtnsuB0Iul5OYO09o6118vi9EpZ4zWh6cKHXIlVMvP5aH3yX4L9PxHDPyLSJKgsXmbkWGbMr/4Sywkia1kzqCosN23bvuAr6nB5oYCrBXQOJot9NmGLEcF5TeLtgWcIeLiQ/Hw3pumkZQPeELvv340vKeiNZBQKLGMaOdKc= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2019 15:12:02.0384 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3769f1d7-db05-4c9b-0bf1-08d6c1b4b6ac X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB2969 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YzLnC9YryBGPnK2gy54en2ETPC5NOxoHyA8uPA/xkwg=; b=JjAXZCc5VSjZUJWEPzznbBRW0iAOHUEeK8S9nXY2a9cNXkyiU008OmrYXLdjeTe8rlEo8FTXIu3gdbrv2iupjtUhM5lGEj+ZG7R2/S5DOr144dfIeB1LImBK8+ADfaODIc+OrgynVfgKlZcKwLfk1yDoS6KX9jQH1Lq7WtNJ+Xg= X-Mailman-Original-Authentication-Results: spf=none (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=permerror action=none header.from=amd.com; X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas.Kazlauskas@amd.com, =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Christian König Don't block others while waiting for the fences to finish, concurrent submission is perfectly valid in this case and holding the lock can prevent killed applications from terminating. Signed-off-by: Christian König Reviewed-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 44a8bf8..5aeac2c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5295,23 +5295,26 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, continue; } + abo = gem_to_amdgpu_bo(fb->obj[0]); + + /* Wait for all fences on this FB */ + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, + false, + MAX_SCHEDULE_TIMEOUT); + WARN_ON(r < 0); + /* * TODO This might fail and hence better not used, wait * explicitly on fences instead * and in general should be called for * blocking commit to as per framework helpers */ - abo = gem_to_amdgpu_bo(fb->obj[0]); r = amdgpu_bo_reserve(abo, true); if (unlikely(r != 0)) { DRM_ERROR("failed to reserve buffer before flip\n"); WARN_ON(1); } - /* Wait for all fences on this FB */ - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false, - MAX_SCHEDULE_TIMEOUT) < 0); - amdgpu_bo_get_tiling_flags(abo, &tiling_flags); amdgpu_bo_unreserve(abo); From patchwork Mon Apr 15 15:11:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 10901023 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DD8A14DB for ; Mon, 15 Apr 2019 15:12:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 667A528992 for ; Mon, 15 Apr 2019 15:12:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5A9D3289A2; Mon, 15 Apr 2019 15:12:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B9020289A9 for ; Mon, 15 Apr 2019 15:12:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AA31289817; Mon, 15 Apr 2019 15:12:12 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM04-SN1-obe.outbound.protection.outlook.com (mail-eopbgr700062.outbound.protection.outlook.com [40.107.70.62]) by gabe.freedesktop.org (Postfix) with ESMTPS id 204EE89801; Mon, 15 Apr 2019 15:12:11 +0000 (UTC) Received: from CY4PR1201CA0006.namprd12.prod.outlook.com (10.172.75.16) by MN2PR12MB2975.namprd12.prod.outlook.com (20.178.243.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1792.17; Mon, 15 Apr 2019 15:12:08 +0000 Received: from CO1NAM03FT024.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e48::202) by CY4PR1201CA0006.outlook.office365.com (2603:10b6:910:16::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1792.14 via Frontend Transport; Mon, 15 Apr 2019 15:12:08 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV01.amd.com (165.204.84.17) by CO1NAM03FT024.mail.protection.outlook.com (10.152.80.160) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1771.16 via Frontend Transport; Mon, 15 Apr 2019 15:12:07 +0000 Received: from agrodzovsky-All-Series.amd.com (10.34.1.3) by SATLEXCHOV01.amd.com (10.181.40.71) with Microsoft SMTP Server id 14.3.389.1; Mon, 15 Apr 2019 10:12:06 -0500 From: Andrey Grodzovsky To: , , , , Subject: [PATCH v2 5/5] Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' Date: Mon, 15 Apr 2019 11:11:27 -0400 Message-ID: <1555341087-23339-5-git-send-email-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> References: <1555341087-23339-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39860400002)(396003)(346002)(376002)(136003)(2980300002)(428003)(189003)(199004)(68736007)(105586002)(36756003)(81156014)(97736004)(81166006)(8676002)(305945005)(106466001)(53416004)(50226002)(47776003)(53936002)(478600001)(72206003)(4326008)(356004)(6666004)(2201001)(11346002)(476003)(426003)(48376002)(446003)(2616005)(7696005)(50466002)(110136005)(16586007)(486006)(8936002)(126002)(44832011)(316002)(14444005)(4743002)(76176011)(104016004)(26005)(186003)(77096007)(86362001)(2906002)(51416003)(5660300002)(336012)(54906003)(2101003); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR12MB2975; H:SATLEXCHOV01.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: e4b2b880-2545-4242-f50f-08d6c1b4ba2c X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600140)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328); SRVR:MN2PR12MB2975; X-MS-TrafficTypeDiagnostic: MN2PR12MB2975: X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 000800954F X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: TgWkbvEczbI2sljsmlRjkHwIWLQtRHSLqX7tsRV3X7Hu3QKq4UQmnYGj/Koz2tTY1A/yah2rbXa4Zrtd2gG9CyLc8sBKDdCmZqs68B4qloDdLpNFHVZA4s+ptXZ7ZjGfETmFbXN5evEjD7B76Jg8TqRvP3vrz10X0wABRKWC8dVTFrJGLcp61QRPVBr2sxWigfaV5QaFeJJNNFjxgM08nqW8KovdZWhX2d/Cx4GqWYJF87iqe+lkaq7GLEmg0nq5ZepRwIKAZrOtVszCCDLTmuirmGcb18twn/fMH2bf9p+Vi1DAl/FMWHPEb+ts4P5at02qdKkpDdifz+tmEhKhm1OAzh5O6QVlRRNNGcfFFe3Xta+Gc+QzXI+SSZ2Iq9n61rZ0naB0NrmGkeWSxCbKMF70ABwNb5if2g+T2WrAy5I= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2019 15:12:07.6472 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e4b2b880-2545-4242-f50f-08d6c1b4ba2c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB2975 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Cth0gc7YvGce1VozkXxp+YLi6UzvC1wa52Z7dTxhXDk=; b=TXyyo8XCPj4pO0TNcU0ffExHLQD3BVtV1Ig6uZsuSJw+aeKF/2yCodOiUBKlIhxF53alMqg2N1WS+9y7qo8+51gmYQheJtd1T8ZhJdvC3YCRnJpnFECCOs+eVmE131Cc5PQ+bnD1d/7gSvePwuWcXnSMnnkaxYcsth5nikoDK2k= X-Mailman-Original-Authentication-Results: spf=none (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=permerror action=none header.from=amd.com; X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas.Kazlauskas@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP was accidentaly removed during one of DALs code merges. Signed-off-by: Andrey Grodzovsky Reviewed-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 5aeac2c..2bae2bf 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5297,11 +5297,16 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, abo = gem_to_amdgpu_bo(fb->obj[0]); - /* Wait for all fences on this FB */ + /* + * Wait for all fences on this FB. Do limited wait to avoid + * deadlock during GPU reset when this fence will not signal + * but we hold reservation lock for the BO. + */ r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false, - MAX_SCHEDULE_TIMEOUT); - WARN_ON(r < 0); + msecs_to_jiffies(5000)); + if (unlikely(r <= 0)) + DRM_ERROR("Waiting for fences timed out or interrupted!"); /* * TODO This might fail and hence better not used, wait @@ -5310,10 +5315,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, * blocking commit to as per framework helpers */ r = amdgpu_bo_reserve(abo, true); - if (unlikely(r != 0)) { + if (unlikely(r != 0)) DRM_ERROR("failed to reserve buffer before flip\n"); - WARN_ON(1); - } amdgpu_bo_get_tiling_flags(abo, &tiling_flags);