From patchwork Thu Feb 3 09:09:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rajneesh Bhardwaj X-Patchwork-Id: 12733970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0F49FC433F5 for ; Thu, 3 Feb 2022 09:10:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7434710EA8A; Thu, 3 Feb 2022 09:09:50 +0000 (UTC) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2078.outbound.protection.outlook.com [40.107.92.78]) by gabe.freedesktop.org (Postfix) with ESMTPS id DF3A110EA36; Thu, 3 Feb 2022 09:09:43 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y7AXd16BIPrY2AAD0OTEHOVTOsAeUqRsVHgppajjNNBqNIVJzZ7oR6FymS7jhqj8jQzJUxV/b4Pn3EeVYh8NDOgidMNwJfrrZkDUcslr5/iiUzNkDz5AJtvyxtvCbul1cUfG6PNM1Q9Ajc94btS1Fgj2UbmovwYbQKjDVzbgxavSmcldQmz88xZyiEyDyj8+ZUTZspkAjLhXM5nazhtz4bS8LLTRDerjxa4GsPM7FzT+sMZdmac+Ccs3pE4oXNWdCW5Ekn9rsofluhtlrp6IKB9l63MwoPOKhMsxm/dSFes5JaWDzWztxw+ss57po1L8dnzRHbYNNdIf3I8Ic6urjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tiw/66qSjsyWOPEI9EkGOJJ1MPf3N+ILbl8sOy4MzHg=; b=Qibp55XRMb69R8MuL6JQmRAWNBW3tJx1SxvsnE/YUEwJYfQQebMpkCildQa6XNbyK2DpT//E0OwocqlJBcbQWHOpWLd+DdO4dZoVkFzfA7vkmPwGVvPP7YA6wf6v4bAy4ARRWnTBVhK7603NdTgoYxTss6Mn1YQCgcVuxl7O3uy8u27rOHl/J1iNLjUsru7yTvI3suwKCC3wqRBWzDDnRHIt9pztTl111WFcMN2G7iah0jotG7yejB3s4qDEsYVkTm3eAhxmzlYQjfNWUAm2Q8CwPBrqGKA3bBfahf/XZMI8DM/LmOW6i8qm1+ZbqY400ZfbJh/aCZrA+8/4x97rWA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tiw/66qSjsyWOPEI9EkGOJJ1MPf3N+ILbl8sOy4MzHg=; b=iJFX3YCO+szZNM9Ds9WOxVBYgZo/4uU2W7AZlRRqPCsLgynxmSYm0iVZndxNbthvh6KJBpw4gR8mQQsOetcL1+Df9TR/KJAcDW2sfnsT46FGQbz9IAXkGc1izqGM0L4Y7jd6mM8DbeyqFGUc0laExZGl6nAS7A4Y3ZtEIutb7bw= Received: from BN6PR11CA0068.namprd11.prod.outlook.com (2603:10b6:404:f7::30) by DM6PR12MB4075.namprd12.prod.outlook.com (2603:10b6:5:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 09:09:41 +0000 Received: from BN8NAM11FT012.eop-nam11.prod.protection.outlook.com (2603:10b6:404:f7:cafe::c8) by BN6PR11CA0068.outlook.office365.com (2603:10b6:404:f7::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 09:09:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT012.mail.protection.outlook.com (10.13.177.55) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 09:09:41 +0000 Received: from rajneesh-desk.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 03:09:37 -0600 From: Rajneesh Bhardwaj To: Subject: [Patch v5 10/24] drm/amdkfd: CRIU restore sdma id for queues Date: Thu, 3 Feb 2022 04:09:04 -0500 Message-ID: <20220203090918.11520-11-rajneesh.bhardwaj@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220203090918.11520-1-rajneesh.bhardwaj@amd.com> References: <20220203090918.11520-1-rajneesh.bhardwaj@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 70e3ffa4-0a03-42ed-9b31-08d9e6f4e9c0 X-MS-TrafficTypeDiagnostic: DM6PR12MB4075:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2582; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Ii3a8U0uHIC2iPhTvHao2SsYpyeufvDjeKt3XLbg/ewkYAm74oXYw5afo9sCq6m5a0lMFCONQASUAT1O+ALsX2WsDlOlt1AieKPx+Txkb0j0UTeBbIEp26gttk/FertpulqW9YOxgrJqdRpzpNPUjsZB/LxYpcFO40LoFcoUnliC5T3E5DtGld9/H2y1LRdrUGCg7+Eu/0t+YBljFnw9gsgxa0a83mFQ0Hf93xlTkYk48KRt6Qddd8RrTJYg2EsdSTj0f0TiRMSdAomVwC+AFbj/Y9QjZ0iaYKGWQFiVUC7cyZtIZOoJJ6jqkjLMrxmfOlr0EgdSxycr6F1890qtnp/L9+UJLaQTcTIfyN5m3BImFXubidA12jNfEL9BCv8pi1EGEO+/FsReCnKTrpJz928SNzygH3HseLXBhMDTTb/oAZeX0fHAtM22f3+vR7Sla3fLgN5vYvdCEjPKIsxqsA1ZRW5+jAD/jUbPw3clFzp0MMg04mcOSek3rUgzl7j2jrU3fDrjJWR3cDXLnP90evuMieGStL8zo9b+pwb/sOapx6rKQCIhHlsUHHl2PzrHyfbGEXqBYTNjkEOS9PsbKyt3/QJqPF5WWzlwRJGQEnUsRKXG+1KAkkcRh5uC7ND/IU9C3nFvNdOU2MQknp9aMkHsNNSiunQD9vc2VFRYBQmlHigfbbLH96glhQs8YNIl6GcMmKXnMsrxRC6idMXdDA== X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB04.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(81166007)(16526019)(5660300002)(186003)(1076003)(26005)(336012)(7696005)(356005)(47076005)(426003)(6666004)(83380400001)(36756003)(44832011)(2906002)(70586007)(8676002)(8936002)(82310400004)(2616005)(6916009)(54906003)(40460700003)(70206006)(508600001)(450100002)(4326008)(316002)(86362001)(36860700001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 09:09:41.7133 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 70e3ffa4-0a03-42ed-9b31-08d9e6f4e9c0 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT012.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4075 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alexander.deucher@amd.com, felix.kuehling@amd.com, david.yatsin@amd.com, Rajneesh Bhardwaj , dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++++++++++++++----- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +- .../amd/amdkfd/kfd_process_queue_manager.c | 4 +- 3 files changed, 40 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 4b6814949aad..15fa2dc6dcba 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager *dqm, struct queue *q); static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q); static int allocate_sdma_queue(struct device_queue_manager *dqm, - struct queue *q); + struct queue *q, const uint32_t *restore_sdma_id); static void kfd_process_hw_exception(struct work_struct *work); static inline @@ -299,7 +299,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm, static int create_queue_nocpsch(struct device_queue_manager *dqm, struct queue *q, - struct qcm_process_device *qpd) + struct qcm_process_device *qpd, + const struct kfd_criu_queue_priv_data *qd) { struct mqd_manager *mqd_mgr; int retval; @@ -339,7 +340,7 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm, q->pipe, q->queue); } else if (q->properties.type == KFD_QUEUE_TYPE_SDMA || q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) { - retval = allocate_sdma_queue(dqm, q); + retval = allocate_sdma_queue(dqm, q, qd ? &qd->sdma_id : NULL); if (retval) goto deallocate_vmid; dqm->asic_ops.init_sdma_vm(dqm, q, qpd); @@ -1034,7 +1035,7 @@ static void pre_reset(struct device_queue_manager *dqm) } static int allocate_sdma_queue(struct device_queue_manager *dqm, - struct queue *q) + struct queue *q, const uint32_t *restore_sdma_id) { int bit; @@ -1044,9 +1045,21 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm, return -ENOMEM; } - bit = __ffs64(dqm->sdma_bitmap); - dqm->sdma_bitmap &= ~(1ULL << bit); - q->sdma_id = bit; + if (restore_sdma_id) { + /* Re-use existing sdma_id */ + if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) { + pr_err("SDMA queue already in use\n"); + return -EBUSY; + } + dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id); + q->sdma_id = *restore_sdma_id; + } else { + /* Find first available sdma_id */ + bit = __ffs64(dqm->sdma_bitmap); + dqm->sdma_bitmap &= ~(1ULL << bit); + q->sdma_id = bit; + } + q->properties.sdma_engine_id = q->sdma_id % kfd_get_num_sdma_engines(dqm->dev); q->properties.sdma_queue_id = q->sdma_id / @@ -1056,9 +1069,19 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm, pr_err("No more XGMI SDMA queue to allocate\n"); return -ENOMEM; } - bit = __ffs64(dqm->xgmi_sdma_bitmap); - dqm->xgmi_sdma_bitmap &= ~(1ULL << bit); - q->sdma_id = bit; + if (restore_sdma_id) { + /* Re-use existing sdma_id */ + if (!(dqm->xgmi_sdma_bitmap & (1ULL << *restore_sdma_id))) { + pr_err("SDMA queue already in use\n"); + return -EBUSY; + } + dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id); + q->sdma_id = *restore_sdma_id; + } else { + bit = __ffs64(dqm->xgmi_sdma_bitmap); + dqm->xgmi_sdma_bitmap &= ~(1ULL << bit); + q->sdma_id = bit; + } /* sdma_engine_id is sdma id including * both PCIe-optimized SDMAs and XGMI- * optimized SDMAs. The calculation below @@ -1288,7 +1311,8 @@ static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm, } static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, - struct qcm_process_device *qpd) + struct qcm_process_device *qpd, + const struct kfd_criu_queue_priv_data *qd) { int retval; struct mqd_manager *mqd_mgr; @@ -1303,7 +1327,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, if (q->properties.type == KFD_QUEUE_TYPE_SDMA || q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) { dqm_lock(dqm); - retval = allocate_sdma_queue(dqm, q); + retval = allocate_sdma_queue(dqm, q, qd ? &qd->sdma_id : NULL); dqm_unlock(dqm); if (retval) goto out; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index e145e4deb53a..76cc5e2a9733 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -88,7 +88,8 @@ struct device_process_node { struct device_queue_manager_ops { int (*create_queue)(struct device_queue_manager *dqm, struct queue *q, - struct qcm_process_device *qpd); + struct qcm_process_device *qpd, + const struct kfd_criu_queue_priv_data *qd); int (*destroy_queue)(struct device_queue_manager *dqm, struct qcm_process_device *qpd, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index 75bad4381421..8afe6879b861 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -272,7 +272,7 @@ int pqm_create_queue(struct process_queue_manager *pqm, goto err_create_queue; pqn->q = q; pqn->kq = NULL; - retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd); + retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data); print_queue(q); break; @@ -292,7 +292,7 @@ int pqm_create_queue(struct process_queue_manager *pqm, goto err_create_queue; pqn->q = q; pqn->kq = NULL; - retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd); + retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data); print_queue(q); break; case KFD_QUEUE_TYPE_DIQ: