[RFC,v4,05/11] drm/amdgpu: Drop hive->in_reset

Message ID	20220209002320.6077-6-andrey.grodzovsky@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org> Subject: [RFC v4 05/11] drm/amdgpu: Drop hive->in_reset Date: Tue, 8 Feb 2022 19:23:14 -0500 Message-ID: <20220209002320.6077-6-andrey.grodzovsky@amd.com> In-Reply-To: <20220209002320.6077-1-andrey.grodzovsky@amd.com> References: <20220209002320.6077-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Precedence: list Cc: horace.chen@amd.com, lijo.lazar@amd.com, jingwech@amd.com, christian.koenig@amd.com, Monk.Liu@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	Define and use reset domain for GPU recovery in amdgpu \| expand [RFC,v4,00/11] Define and use reset domain for GPU recovery in amdgpu [RFC,v4,01/11] drm/amdgpu: Introduce reset domain [RFC,v4,02/11] drm/amdgpu: Move scheduler init to after XGMI is ready [RFC,v4,03/11] drm/amdgpu: Serialize non TDR gpu recovery with TDRs [RFC,v4,04/11] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue. [RFC,v4,05/11] drm/amdgpu: Drop hive->in_reset [RFC,v4,06/11] drm/amdgpu: Drop concurrent GPU reset protection for device [RFC,v4,07/11] drm/amdgpu: Rework reset domain to be refcounted. [RFC,v4,08/11] drm/amdgpu: Move reset sem into reset_domain [RFC,v4,09/11] drm/amdgpu: Move in_gpu_reset into reset_domain [RFC,v4,10/11] drm/amdgpu: Rework amdgpu_device_lock_adev [RFC,v4,11/11] Revert 'drm/amdgpu: annotate a false positive recursive locking'

Message ID

20220209002320.6077-6-andrey.grodzovsky@amd.com (mailing list archive)

State

New, archived

Headers

Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com;
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org>
Subject: [RFC v4 05/11] drm/amdgpu: Drop hive->in_reset
Date: Tue, 8 Feb 2022 19:23:14 -0500
Message-ID: <20220209002320.6077-6-andrey.grodzovsky@amd.com>
In-Reply-To: <20220209002320.6077-1-andrey.grodzovsky@amd.com>
References: <20220209002320.6077-1-andrey.grodzovsky@amd.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Feb 2022 00:23:50.9766 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 19a35e09-e753-497c-d137-08d9eb627299
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17];
 Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 CO1NAM11FT058.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2743
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: horace.chen@amd.com, lijo.lazar@amd.com, jingwech@amd.com,
 christian.koenig@amd.com, Monk.Liu@amd.com
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

Define and use reset domain for GPU recovery in amdgpu | expand

Commit Message

Andrey Grodzovsky Feb. 9, 2022, 12:23 a.m. UTC

Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 15e8fde3ac2d..7e92f2432087 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5067,26 +5067,10 @@  int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	dev_info(adev->dev, "GPU %s begin!\n",
 		need_emergency_restart ? "jobs stop":"reset");
 
-	/*
-	 * Here we trylock to avoid chain of resets executing from
-	 * either trigger by jobs on different adevs in XGMI hive or jobs on
-	 * different schedulers for same device while this TO handler is running.
-	 * We always reset all schedulers for device and all devices for XGMI
-	 * hive so that should take care of them too.
-	 */
 	if (!amdgpu_sriov_vf(adev))
 		hive = amdgpu_get_xgmi_hive(adev);
-	if (hive) {
-		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
-			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
-				job ? job->base.id : -1, hive->hive_id);
-			amdgpu_put_xgmi_hive(hive);
-			if (job && job->vm)
-				drm_sched_increase_karma(&job->base);
-			return 0;
-		}
+	if (hive)
 		mutex_lock(&hive->hive_lock);
-	}
 
 	reset_context.method = AMD_RESET_METHOD_NONE;
 	reset_context.reset_req_dev = adev;
@@ -5282,7 +5266,6 @@  int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 skip_recovery:
 	if (hive) {
-		atomic_set(&hive->in_reset, 0);
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index d406897346d6..89b682afe821 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -410,7 +410,6 @@  struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
 	mutex_init(&hive->hive_lock);
-	atomic_set(&hive->in_reset, 0);
 	atomic_set(&hive->number_devices, 0);
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@  struct amdgpu_hive_info {
 	struct list_head node;
 	atomic_t number_devices;
 	struct mutex hive_lock;
-	atomic_t in_reset;
 	int hi_req_count;
 	struct amdgpu_device *hi_req_gpu;
 	struct task_barrier tb;

[RFC,v4,05/11] drm/amdgpu: Drop hive->in_reset

Commit Message

Patch