From patchwork Mon Sep 26 06:03:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alistair Popple X-Patchwork-Id: 12988236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3671AC32771 for ; Mon, 26 Sep 2022 06:04:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B79FF90000D; Mon, 26 Sep 2022 02:04:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B298E900002; Mon, 26 Sep 2022 02:04:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97C2590000D; Mon, 26 Sep 2022 02:04:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 871B0900002 for ; Mon, 26 Sep 2022 02:04:47 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 64B8B40680 for ; Mon, 26 Sep 2022 06:04:47 +0000 (UTC) X-FDA: 79953197814.20.6CBAEF1 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2053.outbound.protection.outlook.com [40.107.92.53]) by imf19.hostedemail.com (Postfix) with ESMTP id 059C61A0007 for ; Mon, 26 Sep 2022 06:04:46 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jw17skYtiFDCLZycDqF+cbJuvTdLoE63zex6fCsqzX3OMOT2q7qR1Qh5sj+p2CLvpA1FQkjhZVl3/EPJza5t4B/Dy+O4HAXzm5U6wYAFZ1UQ5lNoMH23LxouvAs+H6X8Kj8AUL87L5s8ObPWyk5GUchFUa0qZnc7FMLEoy3GojfYNIWONBFDuXG5TWaMfgZI5wfDAiyl28xERmUO5ZQrqzRTOW649auAuqCBle22YAZccgLHwxQPjtGXGmwZ2TfS1Ot2WwQhahvOMtdkAD6Z7yJRh26qm4vGNrrebKGXcFa63XFy/LisK4gMZVSA2jJQFP3PWSIrrWwr0fdf3SECrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wJXLJAmQHeMb/7dTH0Y7fbicwcRe1WcmHt6SvAWONwI=; b=iJMVPUkxXKFAXU2+xzMgIEn30GpvQSaWyHLT1mHOnsay9ziCfyXs+uxzt5T/H7BkryToP9lURLlFKPdvVGQ0dDB8Us3+ymY1V/ZdSWPoGhuiQZ6fjSFFuyh46sS0zXO0UBAgIrzbFFuWWbQ/+E1bN4VfgdN/di00UNIMWCC5tWxgy6izVCRdGo4UhQ1Oe/rS2wA75e4AK6pyw8QHuGnshYyPwhB47dSHN+KBJqZyx0Jw+zOBaCGxZwUnI0tR0CWnIMrp5xZq1L7mvRsV71aTHt8G+3qnRrwpCCmxdy/b+UaVT1uEWZfy3AuBrz0/pWnm+AJ0m41+6wjX7wyYRVYaAA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wJXLJAmQHeMb/7dTH0Y7fbicwcRe1WcmHt6SvAWONwI=; b=VYdjxlsjtYnFjUzyigbFfeOTTf4eseosfC9xni/fMwTCWHFcS+1Y0JrX8K5zOkpEkLXiGigdfJyHMWxKFLJT8Gq2uHfNNkHYE+ckNx3UxE4bMgZkYxnGdVuYhtMIsgj8u8ISSttuDlOIIGq50RifBRYh703VomAdro7pFf9VeM9tc1l9vg7dTJbUgTiHRnvzW/DVchz+cTNx/Qfm9jKJptYUQf5J4nJy+fMi8M3iOAWiE/ZXMcGa3lS4IEgvHWUcMCKmqqtdckGBXYms1CNdxgHMocenxubnvcee1+herp+pgU1Mjg+SQVloABHWbYYXAaZUVU6WrHi/k5Gmh65Qxw== Received: from BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) by MN0PR12MB6030.namprd12.prod.outlook.com (2603:10b6:208:3ce::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.24; Mon, 26 Sep 2022 06:04:44 +0000 Received: from BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::ed0:a520:ac8e:9966]) by BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::ed0:a520:ac8e:9966%6]) with mapi id 15.20.5654.025; Mon, 26 Sep 2022 06:04:44 +0000 From: Alistair Popple To: linux-mm@kvack.org, Andrew Morton Cc: Michael Ellerman , Nicholas Piggin , Felix Kuehling , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" , David Airlie , Daniel Vetter , Ben Skeggs , Karol Herbst , Lyude Paul , Ralph Campbell , "Matthew Wilcox (Oracle)" , Alex Sierra , John Hubbard , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Jason Gunthorpe , Dan Williams , Alistair Popple Subject: [PATCH 6/7] nouveau/dmem: Evict device private memory during release Date: Mon, 26 Sep 2022 16:03:10 +1000 Message-Id: <072e1ce590fe101a4cdbd5e91b1702efebb6d0fd.1664171943.git-series.apopple@nvidia.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: X-ClientProxiedBy: SY6PR01CA0127.ausprd01.prod.outlook.com (2603:10c6:10:1b9::16) To BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR12MB3176:EE_|MN0PR12MB6030:EE_ X-MS-Office365-Filtering-Correlation-Id: 549a2cae-c846-4f52-9793-08da9f8501f1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3OnXLR+0P7sMoIM3V+E+8u5wcWkai+R+ZZBCmG2h8RUQt9F5pKny9/rEuFrc2ikYEMZ3tDfZY/LLpq+WrqBY9JPzO+oiTmyomexCi12u5XBBsCr8L57QMdXe7VLP4kOpAEoq64EpTzH3opgQv534XEEzR7bcqq+muKY1wWUfdN0bFzcb1eBerSgcgePokb6g3OfN7bDrdttkasNFQ5mkYVkWXqGsVDKJD9YVyScV5Uuxk2L4vyfwVQAfXg03/X+Ejo6YAglHOzVsWEc6W2ug0XO2dzEEwJEeNnqIAHKxp5U23ewIglWklccMnDKPvrXaFAPYPa93tAI1ew/vwAimhotcQzVxfhEcsoEKvvh+Gk1NLncg01tg9fmwtC1mCRFm54B9rjCH6GjXHFqm3DEYNZ7vZHutjZm6peXYz6hFGZR92OoNSqH4iS/wIFUB1FMvs6UILBcyAmFhmoealWT/BmtyptHU0bjA1SLVzWICBXnBm/Y9ZpsRI/1tNmwFKk9SDS4ycD8q/mypbEVFLXMXdNp9eXR+FsF6C6HeiknWbvVtebcz6dTfIcZtrzB4NTMtgvcN76NiS8KSu0N62vsjef/cOgXard/42HBxkwPNrZSr5veQ/+T0ZsmUeo4i4zTqiDS444oXwfBJTN8lwHGhTLUtgrejW3UYHe5QEzH+4yPygYW/9997nd8eR5Wx3XSZjOwYNVuR4+j/i4jv/jf3pw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR12MB3176.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(346002)(366004)(396003)(376002)(136003)(39860400002)(451199015)(316002)(54906003)(6916009)(6486002)(66946007)(66556008)(36756003)(8676002)(66476007)(5660300002)(4326008)(38100700002)(6666004)(8936002)(26005)(6506007)(7416002)(478600001)(107886003)(6512007)(2906002)(41300700001)(86362001)(2616005)(186003)(83380400001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: bzHQYHLx9JQ8xfi4ziYBBPbyK96nBccNfZ7OL9aqnECylcXChC7GLd5N4lYbUB3DeNpBeuken+pjaNVys7yqflBJkhe4ahlcWBNedXg100LZhvnGHuCWtPDRfRx0sNSYXS9qEJk3X/kZ0EE0M0Se/sTBOLMeXjN+xMSNNTZR7x9l7zdNjU0pZj6+Ofqsl+P0tpp53GHTP7Wl4MPmhjL+atUxRKOA5AMOZEklOdI48Ozn6lg/ykCXmHJvw1Hs0jEq2bHlXIFsw9UgF0q1ilz0JiiyTGyN8vAFRvJm1lmniUtMKpScblHJiqlkMcvfBfDpQoh7dZZD3SY9gNfa4TF29UaM93lj/hDUcMf0JzsbKt7XcbaQncFGgJT1u3QyIvJaoMevtLJzAfQI+/BgZcMKv8j1dxDJhLs5nxKyQkyuyln7Qby3C3FxDpbmRmbqznR4KmNIKi/nxtUUWC+kLxYjTPsMmiMD8q/BOgQtdA6VInknLMlJeG3PjeNWcMa8TKO2tvgmwny8Hnn28lupr+8542vEdZKzH/b0BDaDBkxvw1jxV2hdF+HFKzRumJ6FgY3Wl/5j/I/cD0u7Ovz2nAWFar2MsGzZNrw6rS+XogNRuyqebZPXBSZHJXKdLXxgN9nopzuo2l2QFmbIwI+jxM+TeHpyYFNlGC7jxxTcboUKfoRq7KlQITQlUonjabRpf1SW1t+hozzn7GSwh43w6R1dM72Xfc3K7l6bhFGNXU/JGplSLTf56AP/QhMkB/KFlMrVYjOAo6btmcVQtZqbZWbH69584cnGhCce0eUjXOpaH40OKlUY5WDqtlLB3t4iy7GvqlesX2aea0fHSRaKTWWUdv+2j3SughZMUNxOqBpw19zK2qbM3fdPh9GZkh9dnZQPmUXvKy/vTzCyBKFb0ffJkmC2NOJSbd9PGQhBIf/zT7sQIqRnBb5jSof7YBgnywr4EmTAQn6io6dvJWYgq1yMAg4GIadn15dLorYbFcVcDs252hcbbC6/u48lppm0mDeGDNYmlqnvw8k7RAiA55kZXkfH0CQyu1LqgVvH+bIzf7ErsLU8ZGc9VoyC/ZyOICZZ1NI3ISR0K5ZgwbbfXAx59jYbemoYlGvBXZ/zJy4pHH95yhQBgw7ytbrx+4q8jFYbjg330ennJzk6Qkpb7TyqdskJZXXFJboFQs5BxH7tWt7Q0zos0um9Eaa0JQGAzOoLeiebvez/hR+36/mkKG8gX/geKF2bnCsnKibeews95dQt1y6gdQhTzt+wETR2Qw3Ww9fJUhQbM1TRZYlw9WeRqWU00Mvx2rSj6T3SWj/fzarUgqRrNcXzc2Za1UD9WlZe3T56WGSJzCtfJTYXxuTGoqVdlweH2G+fqyHo4oQqNRm+JOHHHoKwZ3QEIlwYaZd+jqOJzkFibS4yFdEcS7R+yM7nEEUEuh29ztWx8/vkOYZcRMqVHe84hvxmyxkSoI/rQcCBBaIK2FxUABQ+EpxFkEkxqSEGenqadXUZkzH1sdrLKPS76f895y8qQAgVNIs6gq9DIJMx/LKMW2T4NVg+j/oHRMc+zZv1vzqAuZbCVPQykRqz3aAi13duTvGt+z7c X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 549a2cae-c846-4f52-9793-08da9f8501f1 X-MS-Exchange-CrossTenant-AuthSource: BYAPR12MB3176.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Sep 2022 06:04:43.9985 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: J6770XvgGRZ1/LoDm7lsaEhMkFUvkNv84XmDzKydNSGyiO87ewGnGSDJkkVsS3K7EpWoHWR/KDNMQICdyoPTkQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6030 ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1664172287; a=rsa-sha256; cv=pass; b=KzF28S0qjJKWeDbzlx6Tz+CjlQKztUhOTrnxpr6NScrZ/813SWJLf8kgCUG3RRpNiKOHQW PUV7zLppMZLq2ahcu8Jt0n8am8ag0QEK5rSWVhN677uBRiREKhhY5CtfmzuqvpF2YXXacN HrNWIIU0gRI0hj40cZ/BIDxEYREW6Eo= ARC-Authentication-Results: i=2; imf19.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=VYdjxlsj; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf19.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.53 as permitted sender) smtp.mailfrom=apopple@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664172287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wJXLJAmQHeMb/7dTH0Y7fbicwcRe1WcmHt6SvAWONwI=; b=6Ok6evOk7gEuQw6OUYmr2p/bRRbi0K4TYOFLmrpYiFgeI4jhQpvsPyS1tF58/fxk0a4dtV zXUt0Z6iHTzEGxcYnMG7GiPW5Ua+m+evGerVmagXp50q6eIiLoIM6KARXOatOw+XdpV6Lt kwGAlfwo4N+omtgpzBGUVuNpyhlDL00= Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=VYdjxlsj; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf19.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.53 as permitted sender) smtp.mailfrom=apopple@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspam-User: X-Stat-Signature: cn7hz8w968w6pj9qeu9i7jj3ydqdeu1c X-Rspamd-Queue-Id: 059C61A0007 X-Rspamd-Server: rspam09 X-HE-Tag: 1664172286-7011 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++++++++++++++++++++++++++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(&drm->dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(&chunk->pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* + * _GFP_NOFAIL because the GPU is going away and there + * is nothing sensible we can do if we can't copy the + * data back. + */ + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + nouveau_dmem_copy_one(chunk->drm, + migrate_pfn_to_page(src_pfns[i]), dpage, + &dma_addrs[i]); + } + } + + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence); + migrate_device_pages(src_pfns, dst_pfns, npages); + nouveau_dmem_fence_done(&fence); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); + for (i = 0; i < npages; i++) + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL); + kfree(dma_addrs); +} + void nouveau_dmem_fini(struct nouveau_drm *drm) { @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm) mutex_lock(&drm->dmem->mutex); list_for_each_entry_safe(chunk, tmp, &drm->dmem->chunks, list) { + nouveau_dmem_evict_chunk(chunk); nouveau_bo_unpin(chunk->bo); nouveau_bo_ref(NULL, &chunk->bo); + WARN_ON(chunk->callocated); list_del(&chunk->list); memunmap_pages(&chunk->pagemap); release_mem_region(chunk->pagemap.range.start,