From patchwork Tue Jan 7 05:58:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Airlie X-Patchwork-Id: 13928263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A3C28E77197 for ; Tue, 7 Jan 2025 06:05:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2A6FA10E225; Tue, 7 Jan 2025 06:05:07 +0000 (UTC) X-Greylist: delayed 369 seconds by postgrey-1.36 at gabe; Tue, 07 Jan 2025 06:05:05 UTC Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [205.139.111.44]) by gabe.freedesktop.org (Postfix) with ESMTPS id 697C910E225 for ; Tue, 7 Jan 2025 06:05:05 +0000 (UTC) Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-80-VDkKrUhtPpyaxdSYzwfX6Q-1; Tue, 07 Jan 2025 00:58:51 -0500 X-MC-Unique: VDkKrUhtPpyaxdSYzwfX6Q-1 X-Mimecast-MFC-AGG-ID: VDkKrUhtPpyaxdSYzwfX6Q Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8DAE11956048; Tue, 7 Jan 2025 05:58:50 +0000 (UTC) Received: from dreadlord.redhat.com (unknown [10.64.136.7]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B6ED41955F4A; Tue, 7 Jan 2025 05:58:48 +0000 (UTC) From: Dave Airlie To: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org, dakr@kernel.org Subject: [PATCH] nouveau/fence: handle cross device fences properly. Date: Tue, 7 Jan 2025 15:58:46 +1000 Message-ID: <20250107055846.536589-1-airlied@gmail.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: U4XN6h9hudLKfHlU1PfFMFVlWONiH0R96s-SQHRwKIo_1736229530 X-Mimecast-Originator: gmail.com content-type: text/plain; charset=WINDOWS-1252; x-default=true X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dave Airlie If we have two nouveau controlled devices and one passes a dma-fence to the other, when we hit the sync path it can cause the second device to try and put a sync wait in it's pushbuf for the seqno of the context on the first device. Since fence contexts are vmm bound, check the if vmm's match between both users, this should ensure that fence seqnos don't get used wrongly on incorrect channels. This seems to happen fairly spuriously and I found it tracking down a multi-card regression report, that seems to work by luck before this. Signed-off-by: Dave Airlie Cc: stable@vger.kernel.org Reviewed-by: Ben Skeggs --- drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index ee5e9d40c166f..5743c82f4094b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, rcu_read_lock(); prev = rcu_dereference(f->channel); - if (prev && (prev == chan || + if (prev && (prev->vmm == chan->vmm) && + (prev == chan || fctx->sync(f, prev, chan) == 0)) must_wait = false; rcu_read_unlock();