From patchwork Wed Apr 5 18:01:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A3091C76188 for ; Wed, 5 Apr 2023 18:02:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WylzSoax9evz8cIVdIf6YMNyVKnfZKAkMboDgIjf4aY=; b=hT6oNQsSPzGn8y p81p+/TuPKGzPCUT8PQ0rYVtTjZOduReMGhWnKDpdPgwe5NU0reiy8nJIE8F+tzQxirQC9r2DhOHW 804tgzUOra07VMFr+alAWhhGlSxnn+vOfIo5hRpBHQXVjC/TCCKEheaQq8H9q7dr+dwFayuMgXJU/ jhAyyVuLjJTJhw0pbNYJytNFBoa1kne1Ac5r9Yeivngcoz9EBsYBRsHDnGqH7zo1GU0U9k4TfeapW cHFyg0A9NpgBSq2Vw37ByjgsBjj6lpiiC3O3cQmDT9exrpVLfizy4WZLrtIfCiGRjt3A1O8MDjShX 6dm5abOLMvFlof6Xh84A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SL-005K8k-2H; Wed, 05 Apr 2023 18:02:01 +0000 Received: from mail-dm6nam10on2062d.outbound.protection.outlook.com ([2a01:111:f400:7e88::62d] helo=NAM10-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SG-005K5X-0E for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:01:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EjNMwqRdGLxzODoF40zrVzQDI4XjC45frrIYPX89CqRHScyfeBhO37Aw4WIRXDJiMEbw9kAF0OzdzxtNupsh9NI+tEWXhcCwHjDXYr7q4b30T1H1FBKyO5lx6susZokR4ZGoAwTMLWmKZyMwSg6JC3PIGTt0xCzOTRtgbZDJz+0gWEW/R60SAj7OnA0XvVsYx2oE1sW0mAtDkza8a4Dng1oTfOcKNUTAEYzDQX3mGk6LrAzs/1vhBv8FdPfg5cItN89o0droKcVpBO7wb08rnrUv7DSu8fHF0Keh7+VWWxRMWo5mV+Tws2FZ9ShT8LdoQ6GBev6Hj6lGtu6ij64z/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ixyjebktJREJsrnkZvwgfBAu2SPMa3QRXMAWKUw88R8=; b=X2fVm5gI7a+F3xVsXtxPCpbTF7gig3/5sOn8sTBnnbmDucS8JKWJ+33cd8mMgGyop2xU1twW7Q1DbkPitW29xqnwl6iiR3I6W3NC10xH1AXCZynvUT+GM8QSycjseVhUlmOvlmrFL8pTOpC2J5QaN6CRuxtSWPJxPGJ9jm3vFlCU1xJfjgD3h2IHxRR1mPnx5/03dgZL2QUXpAaXBcDQjtZLj4F+cLuOEDvz3mg1NmVbZks+ojdE6c9cLgmNQscDl3yUET83rMhsme6XpJp4MK0dobjvIhBGsUz8s8SAD8hzRcp8B2W+Z63NPeVnn79lSqVSm5qeP73mUXULzf5l5Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ixyjebktJREJsrnkZvwgfBAu2SPMa3QRXMAWKUw88R8=; b=AUky1adre3hnadQ9CFkTpT1NqUR3h5h3O3DB0hxmlprqknWBhJTefWIHvml6SkkfLfbZEA0t2Sh5rjVK14e/GBmQa/1Syp+0/NZJGGIrXV17rMltuTTfgUl/HbH0Pmgse3oZ0NOXpqkgYrYzmyRP0Is7lnXmw1JJfWGT4CaFYndbJBnYGj5M8QGSwOkZoa0/190RFoQ9L1aZNYTs8JwNyTyRIv7t7P/J4N9JNOzBw8Bhi+yYuFlah+Izbwy1GIX7JLS+zdXgntylD6QPDbBxPm8y1CxXQPbtRqcZrQjjbWDpp7j5xxx30wEjfv3X8orbYXDxFs22wNbaiDIsc+pzZg== Received: from MW4PR04CA0252.namprd04.prod.outlook.com (2603:10b6:303:88::17) by DM6PR12MB4169.namprd12.prod.outlook.com (2603:10b6:5:215::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.35; Wed, 5 Apr 2023 18:01:46 +0000 Received: from CO1NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::40) by MW4PR04CA0252.outlook.office365.com (2603:10b6:303:88::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.29 via Frontend Transport; Wed, 5 Apr 2023 18:01:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT045.mail.protection.outlook.com (10.13.175.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.30 via Frontend Transport; Wed, 5 Apr 2023 18:01:46 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:35 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:35 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:35 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 1/6] kvm: determine memory type from VMA Date: Wed, 5 Apr 2023 11:01:29 -0700 Message-ID: <20230405180134.16932-2-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT045:EE_|DM6PR12MB4169:EE_ X-MS-Office365-Filtering-Correlation-Id: 5c10b3d6-9f95-464f-6103-08db35ffd250 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cD44gJ8CMfj/RP+gPq48x8AgtnGDcgbMvl/HPyIaRDasHrowVD2SMju+1vMCWU1iFpEH0LKsAM67a7/a3vbaP0N5actqFMXfAQ3EMBkp6DuOW0OYeYgdNSREiKtYT7Fgw2IZrhNPIkw2ZNKStVWSM1Cd45p5YHjSNkn3k2Ric701jN6OpmY4nUPJPX4AGOqTXiaQI6qOMi0i419J2B4a7qqCvEM3Ow2TKWHx//0man6rNe9ziwKwa+U+YjHMBStWfPQQgoicpvapStHr/BpwU1yRifkEmGABwc28TFJKc9xgZh8HU1EHTfosgLR/m486x9nhYujtdyrttu+Xz/YIm+weKCrexQ/VieMZy4npAoVj3pLfKvvTjBkIZF3v3A8nXtxlK8KrWUFpUj4gFLjKvj577ln4htyWqlfDTo27JqQfGZCkv63z/99BoipfVANZkgxmD/GLtO+TnyQSYtsz/ApMu/heYKpDA6dERuDNPLiyO67ADfv9BDIAzbnuXXzCgI/DHtPoUZVRtnRBFxQKLOZ6rs7yFRtzhk3TEbj1fLmxMrl3Y5CInutsX5bF5IDZk7E4HSPUFrgEmLaW5IQjR6QLNufm1a90U8c+UEtmMTrSly+gMhWj8slhTsID3Jae3rPpwlf8Wi5LJMkqX6gVPNGk7WF04FLlwsfRl+/0DfTJZvpG1UopDNki5F6HKN1FS53vVDSmq0sND1QBwoNbgVEDq1EEjg6BOUwtPP14TIYgJimOXFqSEBiM9luPj0j4eqiIh9J15NeyRdSyfoBNv27N6ZsYH8Yvgc/rqFXRLkItQ2xsGzcfiuI4gh3meKVBv0vQsIA2GrKd1fbsp9S7rw== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(346002)(39860400002)(376002)(136003)(396003)(451199021)(36840700001)(46966006)(40470700004)(54906003)(41300700001)(8936002)(2906002)(316002)(19627235002)(70206006)(5660300002)(4326008)(2876002)(186003)(8676002)(70586007)(478600001)(110136005)(966005)(336012)(6666004)(47076005)(7636003)(356005)(40460700003)(26005)(1076003)(426003)(40480700001)(2616005)(36756003)(83380400001)(82310400005)(82740400003)(86362001)(36860700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:01:46.3535 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5c10b3d6-9f95-464f-6103-08db35ffd250 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4169 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110156_365037_BD6F9997 X-CRM114-Status: GOOD ( 20.15 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal Each VM stores the requires pgprots for its mappings in the vma->pgprot. Based on this we can determine the desired MT_DEVICE_* for the VMA directly, and do not have to guess based on heuristics based on pfn_is_map_memory(). There are the following kinds of pgprot available to userspace and their corresponding type: pgprot_noncached -> MT_DEVICE_nGnRnE pgprot_writecombine -> MT_NORMAL_NC pgprot_device -> MT_DEVICE_nGnRE pgprot_tagged -> MT_NORMAL_TAGGED Decode the relevant MT_* types in use and translate them into the corresponding KVM_PGTABLEPROT_*: - MT_DEVICE_nGnRE -> KVM_PGTABLE_PROT_DEVICE_nGnRE (device) - MT_DEVICE_nGnRnE -> KVM_PGTABLE_PROT_DEVICE_nGnRnE (noncached) - MT_NORMAL/_TAGGED/_NC -> 0 The selection of 0 for the S2 KVM_PGTABLE_PROT_DEVICE_nGnRnE is based on [2]. Also worth noting is the result of the stage-1 and stage-2. Ref [3] If FWB not set, then the combination is the one that is more restrictive. The sequence from lowest restriction to the highest: DEVICE_nGnRnE -> DEVICE_nGnRE -> NORMAL/_TAGGED/_NC If FWB is set, then stage-2 mapping type overrides the stage-1 [1]. This solves a problem where KVM cannot preserve the MT_NORMAL memory type for non-struct page backed memory into the S2 mapping. Instead the VMA creator determines the MT type and the S2 will follow it. [1] https://developer.arm.com/documentation/102376/0100/Combining-Stage-1-and-Stage-2-attributes [2] ARMv8 reference manual: https://developer.arm.com/documentation/ddi0487/gb/ Section D5.5.3, Table D5-38 [3] ARMv8 reference manual: https://developer.arm.com/documentation/ddi0487/gb/ Table G5-20 on page G5-6330 Signed-off-by: Ankit Agrawal --- arch/arm64/include/asm/kvm_pgtable.h | 8 +++++--- arch/arm64/include/asm/memory.h | 6 ++++-- arch/arm64/kvm/hyp/pgtable.c | 16 +++++++++++----- arch/arm64/kvm/mmu.c | 27 ++++++++++++++++++++++----- 4 files changed, 42 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 4cd6762bda80..d3166b6e6329 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -150,7 +150,8 @@ enum kvm_pgtable_stage2_flags { * @KVM_PGTABLE_PROT_X: Execute permission. * @KVM_PGTABLE_PROT_W: Write permission. * @KVM_PGTABLE_PROT_R: Read permission. - * @KVM_PGTABLE_PROT_DEVICE: Device attributes. + * @KVM_PGTABLE_PROT_DEVICE_nGnRE: Device nGnRE attributes. + * @KVM_PGTABLE_PROT_DEVICE_nGnRnE: Device nGnRnE attributes. * @KVM_PGTABLE_PROT_SW0: Software bit 0. * @KVM_PGTABLE_PROT_SW1: Software bit 1. * @KVM_PGTABLE_PROT_SW2: Software bit 2. @@ -161,7 +162,8 @@ enum kvm_pgtable_prot { KVM_PGTABLE_PROT_W = BIT(1), KVM_PGTABLE_PROT_R = BIT(2), - KVM_PGTABLE_PROT_DEVICE = BIT(3), + KVM_PGTABLE_PROT_DEVICE_nGnRE = BIT(3), + KVM_PGTABLE_PROT_DEVICE_nGnRnE = BIT(4), KVM_PGTABLE_PROT_SW0 = BIT(55), KVM_PGTABLE_PROT_SW1 = BIT(56), @@ -178,7 +180,7 @@ enum kvm_pgtable_prot { #define PAGE_HYP KVM_PGTABLE_PROT_RW #define PAGE_HYP_EXEC (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X) #define PAGE_HYP_RO (KVM_PGTABLE_PROT_R) -#define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE) +#define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE_nGnRE) typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end, enum kvm_pgtable_prot prot); diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 78e5163836a0..4ebbc4b1ba4d 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -147,14 +147,16 @@ * Memory types for Stage-2 translation */ #define MT_S2_NORMAL 0xf +#define MT_S2_DEVICE_nGnRnE 0x0 #define MT_S2_DEVICE_nGnRE 0x1 /* * Memory types for Stage-2 translation when ID_AA64MMFR2_EL1.FWB is 0001 * Stage-2 enforces Normal-WB and Device-nGnRE */ -#define MT_S2_FWB_NORMAL 6 -#define MT_S2_FWB_DEVICE_nGnRE 1 +#define MT_S2_FWB_NORMAL 0x6 +#define MT_S2_FWB_DEVICE_nGnRnE 0x0 +#define MT_S2_FWB_DEVICE_nGnRE 0x1 #ifdef CONFIG_ARM64_4K_PAGES #define IOREMAP_MAX_ORDER (PUD_SHIFT) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 3d61bd3e591d..7a8238b41590 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -355,7 +355,7 @@ struct hyp_map_data { static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep) { - bool device = prot & KVM_PGTABLE_PROT_DEVICE; + bool device = prot & KVM_PGTABLE_PROT_DEVICE_nGnRE; u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL; kvm_pte_t attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype); u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS; @@ -636,14 +636,20 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt) static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot, kvm_pte_t *ptep) { - bool device = prot & KVM_PGTABLE_PROT_DEVICE; - kvm_pte_t attr = device ? KVM_S2_MEMATTR(pgt, DEVICE_nGnRE) : - KVM_S2_MEMATTR(pgt, NORMAL); u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS; + kvm_pte_t attr; + + if (prot & KVM_PGTABLE_PROT_DEVICE_nGnRE) + attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRE); + else if (prot & KVM_PGTABLE_PROT_DEVICE_nGnRnE) + attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRnE); + else + attr = KVM_S2_MEMATTR(pgt, NORMAL); if (!(prot & KVM_PGTABLE_PROT_X)) attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN; - else if (device) + else if (prot & KVM_PGTABLE_PROT_DEVICE_nGnRE || + prot & KVM_PGTABLE_PROT_DEVICE_nGnRnE) return -EINVAL; if (prot & KVM_PGTABLE_PROT_R) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7113587222ff..8d63aa951c33 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -897,7 +897,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, int ret = 0; struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO }; struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; - enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE | + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE_nGnRE | KVM_PGTABLE_PROT_R | (writable ? KVM_PGTABLE_PROT_W : 0); @@ -1186,6 +1186,15 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +/* + * Determine the memory region cacheability from VMA's pgprot. This + * is used to set the stage 2 PTEs. + */ +static unsigned long mapping_type(pgprot_t page_prot) +{ + return ((pgprot_val(page_prot) & PTE_ATTRINDX_MASK) >> 2); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -1368,10 +1377,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) prot |= KVM_PGTABLE_PROT_X; - if (device) - prot |= KVM_PGTABLE_PROT_DEVICE; - else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) - prot |= KVM_PGTABLE_PROT_X; + switch (mapping_type(vma->vm_page_prot)) { + case MT_DEVICE_nGnRE: + prot |= KVM_PGTABLE_PROT_DEVICE_nGnRE; + break; + case MT_DEVICE_nGnRnE: + prot |= KVM_PGTABLE_PROT_DEVICE_nGnRnE; + break; + /* MT_NORMAL/_TAGGED/_NC */ + default: + if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) + prot |= KVM_PGTABLE_PROT_X; + } /* * Under the premise of getting a FSC_PERM fault, we just need to relax From patchwork Wed Apr 5 18:01:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BB7EEC76188 for ; Wed, 5 Apr 2023 18:03:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=mv4/w9vy5mw6QGoYxRPI8XyvuodJHjvllacxZ9acUM4=; b=lpk7mCSedZRWge wNfZsEuoy5fpG6P/G4wOBWcIoQrZvJwWf2l9npVVM0vUu8ZHrbLanGMquyIZl/TD0weArfJsJZeQ6 gsxh37OHxXvbi1ry3OuROrU84zFFV6fGYoRoyYItSo/lVSLTgO1bIa7fG7M5NoTirNqL2oqTb9jgr 6dZPDvdjZEG0LQoJdzjtjUu7m0TOCH9Y+/IWf7HXNC+1SG3pSTlhifeTE0e1cwfLtopdYWqsoht2K cedvt8s1XZAgzqjOqdhkpU9qGH8Edln5NSm6sL9TTdOOCQlRUVI/4eo7LrfNlGCOkYoLO7wl0zzbN rUfFwlhT1S5lLFXR12Ow==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7Sc-005KFV-1p; Wed, 05 Apr 2023 18:02:18 +0000 Received: from mail-bn1nam02on20605.outbound.protection.outlook.com ([2a01:111:f400:7eb2::605] helo=NAM02-BN1-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SY-005KCA-1N for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:02:16 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VLTu+o0ehF/5rWCm6ZElBfQHG27Lbl3t2cb3Gwft/z3v7C8YLBWMwZkdPXZJRah3w1qVkk0F3gNErZW5QnPy7Ig/SMqqui/OCu7AdUUnLJivd6w6p9NKH/3ti53BEt4bUhFGlVcy+Rt0zwt5SdZWhrzRYZi0p4VXYwU4unHZ6gQ5JPtWcNQET3QAH+o5kHQP+Q5kStGFCPEPetTLCXVppPU4EPlHHLk6U9TcOA0G15OWPFgWK3iLlKwIdv8JcZQ89ELuHA4pGqg7BHigodQ78oTIbybeiyYZR5GPLyoLcbIc0JZ0mr0Yq5kU+LDt/F+haOlX3LUpY1GuK5Pta2cHCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1rpV84bedB/SLY9kiz3nuV5S5K+BBy2AxYHvceEx63c=; b=WkzRXFzuyW2JOCeD9lmFg228tlh3g0mJmz8FFAV3E4WtYpoPAoSBHLO2QW3b8djsHQ0By44hda33MNWVa4Qpx5tHfaNky3ecX4Xepq8XuZ8YqOjvcnmLFIRjRZeZ+JEd7J4lPOSHsAfZGyawePNhJhQ4vx86MdrOkrxxyM11StPMruzxBeocszp0LpA9uzfwCiHvVmncfDGKakCd6CkL0nkcCT93/P0qnxLE4yfxes18kdLlCkpr+owSXuSePgqfhIlT/yJwxFR1Tt/n3XzXJXWAI6FCOP1tHq4W7Cb5RhRcOJw7lkB7W1kqH5LzXqd65z0/WArrV4OwkTRIuCvM/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1rpV84bedB/SLY9kiz3nuV5S5K+BBy2AxYHvceEx63c=; b=WHK1HL9pCY9VSpgicLim/6WWnpBhG1Zyh6ZpOCD32ld8EXgqG51l0Xv5hwH9m5MGyfUqSGEC4a7BDrfEEF0dKXrQJeCz7uf24MhaLpJX8aZWdEOKaAn7Meg+ETURFL8YWK2z/yvNtQ+7EqbvrJilw33CPp28L0JHOKfk46WK++cAFnzVJ3yA8JDFkSi94zy+kAgo6eS6rASMbmcBHLWQX7NocWAJveH8RRDcPI26hJiJYNU1JzL+Q04bLhLIc5RtWNrmUAN8UVi8PSgMSgotAa51RIx6RTsmiPu6y3oqlCQSPpfKl604u59Bg8VK56VVlcEVREMyCSZ54keDZXyRXQ== Received: from DM6PR17CA0023.namprd17.prod.outlook.com (2603:10b6:5:1b3::36) by MN0PR12MB5929.namprd12.prod.outlook.com (2603:10b6:208:37c::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.35; Wed, 5 Apr 2023 18:02:10 +0000 Received: from DM6NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1b3:cafe::33) by DM6PR17CA0023.outlook.office365.com (2603:10b6:5:1b3::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.22 via Frontend Transport; Wed, 5 Apr 2023 18:02:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT049.mail.protection.outlook.com (10.13.172.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.28 via Frontend Transport; Wed, 5 Apr 2023 18:02:09 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:35 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:35 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:35 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 2/6] vfio/nvgpu: expose GPU device memory as BAR1 Date: Wed, 5 Apr 2023 11:01:30 -0700 Message-ID: <20230405180134.16932-3-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT049:EE_|MN0PR12MB5929:EE_ X-MS-Office365-Filtering-Correlation-Id: 68feaf94-b7bb-431a-26b2-08db35ffe060 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 31IPSctlLXlk6/oJbDUTpNFrgGkYnQxIic6BXdjpLqKLO9VAwFMlgIXY4rnzaaxRUdSLlMxVRph0SbLz2mXRKdH3GfcykEcR3tJB8X3NN4BJ0pialKYPyHqSoWdI5erIWM6wyWCEgQD1zpCc3Vz7HkmBb0/392brL/dGt7sKZty0N+9V9nZmYqDAQGL+touu8Gmiu1/etdwVM7RV/8nkyT+f7C3c6+tMyjrJBpXwSKVHi5gcLMFAzksjxiASg6drNndgXw1NAfLC81fbdkiovQkCMnNycCc8ohaxg69DVKZL2NkKFJFZavZJ/OnB0MY/1v8n6QUJCRySbeiCCu/uAhTAJ5DX27xiext5p8dfBU2u6jIfqVyQfa4u608LQJmpnrN2WwOzbnBCYzjJRS86o22RBSFBGr6xD8ZJHuNhiTn53sayJo3dUXyjU68XHzYXEyRC1AucKdG1YRoZaJcsw54ZfaNfa0/NrjLwSeTEGaxwckbUyNLo7QdtFB0wJ2e/ttCuCIW6XkzyafHQuh4U9i/lPAMBoWArWGRmFZ91rS1eyMtV2Ewejgsz/A360xr3bvUZwgEyq43GPKuIF7tSA0evecBsm9xraa6iow8nrguBGGQU0dK5Ih4LPlQu8rZGWozSjiVTnIKwRphtwpwYZxpVmgSZbfEcnyQqlAj9GvSTDAwnwWOsgnpeILX9KtVDM3ohyFlSYMZFA+/pQsNJOlMu+xBui7hruxQ2B9L8j0iTlVezL4xDoLl0iHqy8ttz X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(346002)(136003)(396003)(376002)(39860400002)(451199021)(36840700001)(46966006)(40470700004)(40480700001)(186003)(6666004)(82310400005)(26005)(8936002)(336012)(5660300002)(2906002)(86362001)(2876002)(30864003)(36860700001)(316002)(4326008)(70586007)(47076005)(1076003)(83380400001)(426003)(70206006)(41300700001)(356005)(82740400003)(110136005)(40460700003)(2616005)(8676002)(54906003)(36756003)(7636003)(478600001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:02:09.8981 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 68feaf94-b7bb-431a-26b2-08db35ffe060 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB5929 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110214_539787_D980DC91 X-CRM114-Status: GOOD ( 26.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal The NVIDIA Grace Hopper superchip does not model the coherent GPU memory aperture as a PCI config space BAR. Introduce an in-tree VFIO PCI variant module (nvgpu-vfio-pci) to expose the GPU memory as BAR1 to the userspace. The GPU memory size and physical address are obtained from ACPI using device_property_read_u64() and exported to userspace as the VFIO_REGION. QEMU will naturally generate a PCI device in the VM where the cachable aperture is reported in BAR1. QEMU can fetch the region information and perform mapping on it. The subsequent mmap call is handled by mmap() function pointer for the nvgpu-vfio-pci module and mapping to the GPU memory is established using the remap_pfn_range() API. Signed-off-by: Ankit Agrawal --- MAINTAINERS | 6 + drivers/vfio/pci/Kconfig | 2 + drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/nvgpu/Kconfig | 10 ++ drivers/vfio/pci/nvgpu/Makefile | 3 + drivers/vfio/pci/nvgpu/main.c | 255 ++++++++++++++++++++++++++++++++ 6 files changed, 278 insertions(+) create mode 100644 drivers/vfio/pci/nvgpu/Kconfig create mode 100644 drivers/vfio/pci/nvgpu/Makefile create mode 100644 drivers/vfio/pci/nvgpu/main.c diff --git a/MAINTAINERS b/MAINTAINERS index 1dc8bd26b6cf..6b48756c30d3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21954,6 +21954,12 @@ L: kvm@vger.kernel.org S: Maintained F: drivers/vfio/pci/mlx5/ +VFIO NVIDIA PCI DRIVER +M: Ankit Agrawal +L: kvm@vger.kernel.org +S: Maintained +F: drivers/vfio/pci/nvgpu/ + VGA_SWITCHEROO R: Lukas Wunner S: Maintained diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index f9d0c908e738..ade18b0ffb7b 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -59,4 +59,6 @@ source "drivers/vfio/pci/mlx5/Kconfig" source "drivers/vfio/pci/hisilicon/Kconfig" +source "drivers/vfio/pci/nvgpu/Kconfig" + endif diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 24c524224da5..0c93d452d0da 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -11,3 +11,5 @@ obj-$(CONFIG_VFIO_PCI) += vfio-pci.o obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ + +obj-$(CONFIG_NVGPU_VFIO_PCI) += nvgpu/ diff --git a/drivers/vfio/pci/nvgpu/Kconfig b/drivers/vfio/pci/nvgpu/Kconfig new file mode 100644 index 000000000000..066f764f7c5f --- /dev/null +++ b/drivers/vfio/pci/nvgpu/Kconfig @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0-only +config NVGPU_VFIO_PCI + tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip" + depends on ARM64 || (COMPILE_TEST && 64BIT) + select VFIO_PCI_CORE + help + VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is + required to assign the GPU device to a VM using KVM/qemu/etc. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/pci/nvgpu/Makefile b/drivers/vfio/pci/nvgpu/Makefile new file mode 100644 index 000000000000..00fd3a078218 --- /dev/null +++ b/drivers/vfio/pci/nvgpu/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_NVGPU_VFIO_PCI) += nvgpu-vfio-pci.o +nvgpu-vfio-pci-y := main.o diff --git a/drivers/vfio/pci/nvgpu/main.c b/drivers/vfio/pci/nvgpu/main.c new file mode 100644 index 000000000000..2dd8cc6e0145 --- /dev/null +++ b/drivers/vfio/pci/nvgpu/main.c @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include +#include + +#define DUMMY_PFN \ + (((nvdev->mem_prop.hpa + nvdev->mem_prop.mem_length) >> PAGE_SHIFT) - 1) + +struct dev_mem_properties { + uint64_t hpa; + uint64_t mem_length; + int bar1_start_offset; +}; + +struct nvgpu_vfio_pci_core_device { + struct vfio_pci_core_device core_device; + struct dev_mem_properties mem_prop; +}; + +static int vfio_get_bar1_start_offset(struct vfio_pci_core_device *vdev) +{ + u8 val = 0; + + pci_read_config_byte(vdev->pdev, 0x10, &val); + /* + * The BAR1 start offset in the PCI config space depends on the BAR0size. + * Check if the BAR0 is 64b and return the approproiate BAR1 offset. + */ + if (val & PCI_BASE_ADDRESS_MEM_TYPE_64) + return VFIO_PCI_BAR2_REGION_INDEX; + + return VFIO_PCI_BAR1_REGION_INDEX; +} + +static int nvgpu_vfio_pci_open_device(struct vfio_device *core_vdev) +{ + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + core_vdev, struct nvgpu_vfio_pci_core_device, core_device.vdev); + struct vfio_pci_core_device *vdev = + container_of(core_vdev, struct vfio_pci_core_device, vdev); + int ret; + + ret = vfio_pci_core_enable(vdev); + if (ret) + return ret; + + vfio_pci_core_finish_enable(vdev); + + nvdev->mem_prop.bar1_start_offset = vfio_get_bar1_start_offset(vdev); + + return ret; +} + +int nvgpu_vfio_pci_mmap(struct vfio_device *core_vdev, + struct vm_area_struct *vma) +{ + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + core_vdev, struct nvgpu_vfio_pci_core_device, core_device.vdev); + + unsigned long start_pfn; + unsigned int index; + u64 req_len, pgoff; + int ret = 0; + + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + if (index != nvdev->mem_prop.bar1_start_offset) + return vfio_pci_core_mmap(core_vdev, vma); + + /* + * Request to mmap the BAR1. Map to the CPU accessible memory on the + * GPU using the memory information gathered from the system ACPI + * tables. + */ + start_pfn = nvdev->mem_prop.hpa >> PAGE_SHIFT; + req_len = vma->vm_end - vma->vm_start; + pgoff = vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + if (pgoff >= (nvdev->mem_prop.mem_length >> PAGE_SHIFT)) + return -EINVAL; + + /* + * Perform a PFN map to the memory. The device BAR1 is backed by the + * GPU memory now. Check that the mapping does not overflow out of + * the GPU memory size. + */ + ret = remap_pfn_range(vma, vma->vm_start, start_pfn + pgoff, + min(req_len, nvdev->mem_prop.mem_length - pgoff), + vma->vm_page_prot); + if (ret) + return ret; + + vma->vm_pgoff = start_pfn + pgoff; + + return 0; +} + +long nvgpu_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd, + unsigned long arg) +{ + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + core_vdev, struct nvgpu_vfio_pci_core_device, core_device.vdev); + + unsigned long minsz = offsetofend(struct vfio_region_info, offset); + struct vfio_region_info info; + + switch (cmd) { + case VFIO_DEVICE_GET_REGION_INFO: + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + if (info.index == nvdev->mem_prop.bar1_start_offset) { + /* + * Request to determine the BAR1 region information. Send the + * GPU memory information. + */ + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = nvdev->mem_prop.mem_length; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE | + VFIO_REGION_INFO_FLAG_MMAP; + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + } + + if (info.index == nvdev->mem_prop.bar1_start_offset + 1) { + /* + * The BAR1 region is 64b. Ignore this access. + */ + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + } + + return vfio_pci_core_ioctl(core_vdev, cmd, arg); + + default: + return vfio_pci_core_ioctl(core_vdev, cmd, arg); + } +} + +static const struct vfio_device_ops nvgpu_vfio_pci_ops = { + .name = "nvgpu-vfio-pci", + .init = vfio_pci_core_init_dev, + .release = vfio_pci_core_release_dev, + .open_device = nvgpu_vfio_pci_open_device, + .close_device = vfio_pci_core_close_device, + .ioctl = nvgpu_vfio_pci_ioctl, + .read = vfio_pci_core_read, + .write = vfio_pci_core_write, + .mmap = nvgpu_vfio_pci_mmap, + .request = vfio_pci_core_request, + .match = vfio_pci_core_match, + .bind_iommufd = vfio_iommufd_physical_bind, + .unbind_iommufd = vfio_iommufd_physical_unbind, + .attach_ioas = vfio_iommufd_physical_attach_ioas, +}; + +static struct nvgpu_vfio_pci_core_device *nvgpu_drvdata(struct pci_dev *pdev) +{ + struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev); + + return container_of(core_device, struct nvgpu_vfio_pci_core_device, + core_device); +} + +static int +nvgpu_vfio_pci_fetch_memory_property(struct pci_dev *pdev, + struct nvgpu_vfio_pci_core_device *nvdev) +{ + int ret = 0; + + /* + * The memory information is present in the system ACPI tables as DSD + * properties nvidia,gpu-mem-base-pa and nvidia,gpu-mem-size. + */ + ret = device_property_read_u64(&(pdev->dev), "nvidia,gpu-mem-base-pa", + &(nvdev->mem_prop.hpa)); + if (ret) + return ret; + + ret = device_property_read_u64(&(pdev->dev), "nvidia,gpu-mem-size", + &(nvdev->mem_prop.mem_length)); + return ret; +} + +static int nvgpu_vfio_pci_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct nvgpu_vfio_pci_core_device *nvdev; + int ret; + + nvdev = vfio_alloc_device(nvgpu_vfio_pci_core_device, core_device.vdev, + &pdev->dev, &nvgpu_vfio_pci_ops); + if (IS_ERR(nvdev)) + return PTR_ERR(nvdev); + + dev_set_drvdata(&pdev->dev, nvdev); + + ret = nvgpu_vfio_pci_fetch_memory_property(pdev, nvdev); + if (ret) + goto out_put_vdev; + + ret = vfio_pci_core_register_device(&nvdev->core_device); + if (ret) + goto out_put_vdev; + + return ret; + +out_put_vdev: + vfio_put_device(&nvdev->core_device.vdev); + return ret; +} + +static void nvgpu_vfio_pci_remove(struct pci_dev *pdev) +{ + struct nvgpu_vfio_pci_core_device *nvdev = nvgpu_drvdata(pdev); + struct vfio_pci_core_device *vdev = &nvdev->core_device; + + vfio_pci_core_unregister_device(vdev); + vfio_put_device(&vdev->vdev); +} + +static const struct pci_device_id nvgpu_vfio_pci_table[] = { + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2342) }, + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2343) }, + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2345) }, + {} +}; + +MODULE_DEVICE_TABLE(pci, nvgpu_vfio_pci_table); + +static struct pci_driver nvgpu_vfio_pci_driver = { + .name = KBUILD_MODNAME, + .id_table = nvgpu_vfio_pci_table, + .probe = nvgpu_vfio_pci_probe, + .remove = nvgpu_vfio_pci_remove, + .err_handler = &vfio_pci_core_err_handlers, + .driver_managed_dma = true, +}; + +module_pci_driver(nvgpu_vfio_pci_driver); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Ankit Agrawal "); +MODULE_AUTHOR("Aniket Agashe "); +MODULE_DESCRIPTION( + "VFIO NVGPU PF - User Level driver for NVIDIA devices with CPU coherently accessible device memory"); From patchwork Wed Apr 5 18:01:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C53F2C77B6C for ; Wed, 5 Apr 2023 18:03:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=IOgHSV8TP7k4HzBD/5UvE4mvMcdn2teZEKq69w7zSCA=; b=e/BCb+5h/qNwpY R7qwLDO7HyFB/bFDNXrexNgKZLB+iJo1epTvYTKtePb9HFRJCflN2n2NQQAVpM/vaX0EiUj+1nT1K RnT9HLneZBt0vNRhihnZwX03aWyCghmHCj81+JLHi6pLzGERRaYYtWLRA9x5jsZ6/u/7BywubgA2O AeaLyQARMYk9IRON+o/xi2Hsn83ZBGIFlM7JvHTkCt1YM4r1T9WnvNraLknsFlCWFq8c+/vi18Co7 baaAudRusDGJRVSDuwJIvtkUfjfvDX+H3yX1SnjDpYMZIkKUBS23CtcFfZEAVUsSsjrffoGM2NJqK Z1GNeL8qGMUPPmi4q29A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7Sf-005KHR-30; Wed, 05 Apr 2023 18:02:21 +0000 Received: from mail-dm6nam04on20623.outbound.protection.outlook.com ([2a01:111:f400:7e8b::623] helo=NAM04-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7Sc-005KDh-08 for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:02:20 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WNJuBIWO5tL7OjllMJfC7zFWEySvvPRmrfCU5VRvA48/qNbpCYcUoiTkaCZH7IigA7wT2/HRfWhTvj28gUcx9Bly4vvvp3F08iAJlPwSI+KIbjz5uwgSlyhJdf30oTIQi6HT0GcABiZm7as9mPJ71KKJ2E23XcysYU9+qMM6OdH10O+Ix+r0WqNUd5e1f9o3V6EyfvUCIgV20S2qyT9QqiIpFV6lcDddSKl+eqdYD+DFfc682P8vByUq0kawPJKSpj4A9K+kvoOGuDz5MV3lZD01ZbYBxIFZygqJZKcIiGyCG5qL1tvAy3xG6RkZfpYBvH8JHslJODzY9Q41DeBqPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lN7JCeX+38WUzhVmzlDe4zwhECT4eimEo59GDuBNPwk=; b=iEgQjGnI0feL3TnVyS7MsvEM/nD656gCc4re2gVfL8tcRZNL5RE9qlP0VlfAF1m+hQPUrk5bR9TfieoVBozLLlkJcoIqC/KjhrpzPlXeArvD2sZ3Mw1JsF5UFLI0x5CiE0NV/kkcXfFBonurU3LIJsvRlS6JFcV2+ScSUQsQjWYga+jsdwf+JQBTreEY0Jb0UGoBhbNhG8D0RoZPNvXTUGDjeWUfTypCCJhxPbyDBN64elmnL19U33eyxLXj+Zz8dO94JQUYgw1mAvRWUWJhJ9ugBcPdWgsT2B2V7oyKMJ4RQk1uPnskR0yc+Ex8c83Y/J04JjAjRTEKEPVUNX2DhA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lN7JCeX+38WUzhVmzlDe4zwhECT4eimEo59GDuBNPwk=; b=knD+HkYCUJOzMuS9wVgQb50u3hipM/e5qHocdCCInJsDEfpzABiIuF03hDg2Jo2eRPqU9Dti+3oX853gRrVJ3V3kyCqJ6KgV5A6WgQTigDIm/55Gtu/NlFhOolsMbS4lzYiAskBRZ5auQxWrbfCnuR41dm4bZ2URIdrbCWZCrLpDgxaGmMxe+TD4tjO4B3dH0IZh8Oo9TuBAvUrJMWyvvuWmF0UavB7lIDLJTqD3BnYrvXmzHZGaOX/NYgqYUOGlMYpVBKfV/7ZQbeXq05ZyXVclqcJZhNwfoElZVGCHkYXBl+3W/SqH6m1HvGMqB/KJtthaAxQBwWr7AXctSozMew== Received: from DM6PR17CA0008.namprd17.prod.outlook.com (2603:10b6:5:1b3::21) by PH7PR12MB5688.namprd12.prod.outlook.com (2603:10b6:510:130::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.35; Wed, 5 Apr 2023 18:02:11 +0000 Received: from DM6NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1b3:cafe::51) by DM6PR17CA0008.outlook.office365.com (2603:10b6:5:1b3::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.22 via Frontend Transport; Wed, 5 Apr 2023 18:02:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT049.mail.protection.outlook.com (10.13.172.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.28 via Frontend Transport; Wed, 5 Apr 2023 18:02:10 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:35 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:35 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:35 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 3/6] mm: handle poisoning of pfn without struct pages Date: Wed, 5 Apr 2023 11:01:31 -0700 Message-ID: <20230405180134.16932-4-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT049:EE_|PH7PR12MB5688:EE_ X-MS-Office365-Filtering-Correlation-Id: 76f0e9a7-ff18-4cd7-a82a-08db35ffe0e5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2zM+XAVqxDJZqQdem4U+vrXQLcPxhbPFnGHor3CvunsvcXg9T8g7oM84thFiaxZN/8r85ENobh/hvuEpz07SVSmapIT+LNzIsmpGjbWaK7xbouJP0R778ciEGxJEnXdrCs+U8Nhi10PTH1f3TTnLr3B31eP3juvoo1JbZz4pieIZNi5bucYHzuEAi6MibHYYzDw4zXrn5UhkF7IAs2dptARo5ptWEMvQmcomWTK4K2OY5+s9EbZULehgMsae8RbeeaDp6HFZPflslb8HjPy8uAjFosZ5hBJ+yNmpEfR8gJ25zl7JAGIJF/PwJHirksIMFqsC6sIyyyBCqnTH8oLpUELlT1YElViqfxWuY3rY2MgNCZdD1OoIwg1krROVO/sJp46CbAwqjCGJHSjvBth7R1h/McHN8Rgfn51wVtI6rlDM6CP+CPYAGtcmvbvGKeR0m4/McW3rcUqMC0G4gyaWK9ZtG3wRCBBlORZSx6QPrkKcAHqeXoyNuttbtaQZPCW8X5Txfuh2jjiTOHhO7HiexbVKC2AZ37Oij/pH06YVrOBoJE9h5pM/JtKXW2ipW+ShvEb6Pa8c2m110j8keBofXWDxHQd81D3XbVRuHl76V1iefy4EitjiApl8UmXJR3JdNPQM99XMyZQ3hRa41E3k3VYLlZz5gBOjfOqHwDZoOEyK1h1pVe/aC6/a+/xL6uTpAjZx3UqjzpsHEsV7/E8Dhx/ZMAeGy1SGD04Crf8KogvCbh7ixEmuFV0EZeaZs1tr X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(136003)(39860400002)(396003)(346002)(376002)(451199021)(46966006)(40470700004)(36840700001)(6666004)(83380400001)(40480700001)(336012)(36860700001)(47076005)(2616005)(82310400005)(86362001)(82740400003)(426003)(356005)(36756003)(7636003)(40460700003)(1076003)(26005)(186003)(8936002)(70206006)(70586007)(54906003)(2906002)(4326008)(8676002)(41300700001)(316002)(110136005)(5660300002)(2876002)(30864003)(478600001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:02:10.7887 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 76f0e9a7-ff18-4cd7-a82a-08db35ffe0e5 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5688 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110218_133574_F8A412FD X-CRM114-Status: GOOD ( 31.49 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal The kernel MM does not currently handle ECC errors / poison on a memory region that is not backed by struct pages. In this series, mapping request from QEMU to the device memory is executed using remap_pfn_range(). Hence added a new mechanism to handle memory failure on such memory. Make kernel MM expose a function to allow modules managing the device memory to register a failure function and the address space that is associated with the device memory. MM maintains this information as interval tree. The registered memory failure function is used by MM to notify the module of the PFN, so that the module may take any required action. The module for example may use the information to track the poisoned pages. In this implementation, kernel MM follows the following sequence (mostly) similar to the memory_failure() handler for struct page backed memory: 1. memory_failure() is triggered on reception of a poison error. An absence of struct page is detected and consequently memory_failure_pfn is executed. 2. memory_failure_pfn() call the newly introduced failure handler exposed by the module managing the poisoned memory to notify it of the problematic PFN. 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN. 4. memory_failure_pfn() collects the processes mapped to the PFN. 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes mapping the faulty PFN using kill_procs(). 6. An access to the faulty PFN by an operation in VM at a later point of time is trapped and user_mem_abort() is called. 7. user_mem_abort() calls __gfn_to_pfn_memslot() on the PFN, and the following execution path is followed: __gfn_to_pfn_memslot() -> hva_to_pfn() -> hva_to_pfn_remapped() -> fixup_user_fault() -> handle_mm_fault() -> handle_pte_fault() -> do_fault(). do_fault() is expected to return VM_FAULT_HWPOISON on the PFN (it currently does not and is fixed as part of another patch in the series). 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process through kvm_send_hwpoison_signal(). Signed-off-by: Ankit Agrawal --- include/linux/memory-failure.h | 22 +++++ include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/memory-failure.c | 148 +++++++++++++++++++++++++++++---- 4 files changed, 154 insertions(+), 18 deletions(-) create mode 100644 include/linux/memory-failure.h diff --git a/include/linux/memory-failure.h b/include/linux/memory-failure.h new file mode 100644 index 000000000000..9a579960972a --- /dev/null +++ b/include/linux/memory-failure.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMORY_FAILURE_H +#define _LINUX_MEMORY_FAILURE_H + +#include + +struct pfn_address_space; + +struct pfn_address_space_ops { + void (*failure)(struct pfn_address_space *pfn_space, unsigned long pfn); +}; + +struct pfn_address_space { + struct interval_tree_node node; + const struct pfn_address_space_ops *ops; + struct address_space *mapping; +}; + +int register_pfn_address_space(struct pfn_address_space *pfn_space); +void unregister_pfn_address_space(struct pfn_address_space *pfn_space); + +#endif /* _LINUX_MEMORY_FAILURE_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..e3ef52d3d45a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3530,6 +3530,7 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_PFN, MF_MSG_UNKNOWN, }; diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index cbd3ddd7c33d..5c62a4d17172 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -373,6 +373,7 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_BUDDY, "free buddy page" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_PFN, "non struct page pfn" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fae9baf3be16..2c1a2ec42f7b 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -38,6 +38,7 @@ #include #include +#include #include #include #include @@ -62,6 +63,7 @@ #include #include #include +#include #include "swap.h" #include "internal.h" #include "ras/ras_event.h" @@ -122,6 +124,10 @@ const struct attribute_group memory_failure_attr_group = { .attrs = memory_failure_attr, }; +static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED; + +static DEFINE_MUTEX(pfn_space_lock); + /* * Return values: * 1: the page is dissolved (if needed) and taken off from buddy, @@ -399,15 +405,14 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, * Schedule a process for later kill. * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. * - * Note: @fsdax_pgoff is used only when @p is a fsdax page and a - * filesystem with a memory failure handler has claimed the - * memory_failure event. In all other cases, page->index and - * page->mapping are sufficient for mapping the page back to its + * Notice: @pgoff is used either when @p is a fsdax page or a PFN is not + * backed by struct page and a filesystem with a memory failure handler + * has claimed the memory_failure event. In all other cases, page->index + * and page->mapping are sufficient for mapping the page back to its * corresponding user virtual address. */ -static void add_to_kill(struct task_struct *tsk, struct page *p, - pgoff_t fsdax_pgoff, struct vm_area_struct *vma, - struct list_head *to_kill) +static void add_to_kill(struct task_struct *tsk, struct page *p, pgoff_t pgoff, + struct vm_area_struct *vma, struct list_head *to_kill) { struct to_kill *tk; @@ -417,13 +422,20 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, return; } - tk->addr = page_address_in_vma(p, vma); - if (is_zone_device_page(p)) { - if (fsdax_pgoff != FSDAX_INVALID_PGOFF) - tk->addr = vma_pgoff_address(fsdax_pgoff, 1, vma); + if (vma->vm_flags | PFN_MAP) { + tk->addr = + vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + tk->size_shift = PAGE_SHIFT; + } else if (is_zone_device_page(p)) { + if (pgoff != FSDAX_INVALID_PGOFF) + tk->addr = vma_pgoff_address(pgoff, 1, vma); + else + tk->addr = page_address_in_vma(p, vma); tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); - } else + } else { + tk->addr = page_address_in_vma(p, vma); tk->size_shift = page_shift(compound_head(p)); + } /* * Send SIGKILL if "tk->addr == -EFAULT". Also, as @@ -617,13 +629,12 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, i_mmap_unlock_read(mapping); } -#ifdef CONFIG_FS_DAX /* * Collect processes when the error hit a fsdax page. */ -static void collect_procs_fsdax(struct page *page, - struct address_space *mapping, pgoff_t pgoff, - struct list_head *to_kill) +static void collect_procs_pgoff(struct page *page, + struct address_space *mapping, pgoff_t pgoff, + struct list_head *to_kill) { struct vm_area_struct *vma; struct task_struct *tsk; @@ -643,7 +654,6 @@ static void collect_procs_fsdax(struct page *page, read_unlock(&tasklist_lock); i_mmap_unlock_read(mapping); } -#endif /* CONFIG_FS_DAX */ /* * Collect the processes who have the corrupted page mapped to kill. @@ -835,6 +845,7 @@ static const char * const action_page_types[] = { [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", + [MF_MSG_PFN] = "non struct page pfn", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1745,7 +1756,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, SetPageHWPoison(page); - collect_procs_fsdax(page, mapping, index, &to_kill); + collect_procs_pgoff(page, mapping, index, &to_kill); unmap_and_kill(&to_kill, page_to_pfn(page), mapping, index, mf_flags); unlock: @@ -2052,6 +2063,99 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +/** + * register_pfn_address_space - Register PA region for poison notification. + * @pfn_space: structure containing region range and callback function on + * poison detection. + * + * This function is called by a kernel module to register a PA region and + * a callback function with the kernel. On detection of poison, the + * kernel code will go through all registered regions and call the + * appropriate callback function associated with the range. The kernel + * module is responsible for tracking the poisoned pages. + * + * Return: 0 if successfully registered, + * -EBUSY if the region is already registered + */ +int register_pfn_address_space(struct pfn_address_space *pfn_space) +{ + if (!request_mem_region(pfn_space->node.start << PAGE_SHIFT, + (pfn_space->node.last - pfn_space->node.start + 1) << PAGE_SHIFT, "")) + return -EBUSY; + + mutex_lock(&pfn_space_lock); + interval_tree_insert(&pfn_space->node, &pfn_space_itree); + mutex_unlock(&pfn_space_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(register_pfn_address_space); + +/** + * unregister_pfn_address_space - Unregister a PA region from poison + * notification. + * @pfn_space: structure containing region range to be unregistered. + * + * This function is called by a kernel module to unregister the PA region + * from the kernel from poison tracking. + */ +void unregister_pfn_address_space(struct pfn_address_space *pfn_space) +{ + mutex_lock(&pfn_space_lock); + interval_tree_remove(&pfn_space->node, &pfn_space_itree); + mutex_unlock(&pfn_space_lock); + release_mem_region(pfn_space->node.start << PAGE_SHIFT, + (pfn_space->node.last - pfn_space->node.start + 1) << PAGE_SHIFT); +} +EXPORT_SYMBOL_GPL(unregister_pfn_address_space); + +static int memory_failure_pfn(unsigned long pfn, int flags) +{ + struct interval_tree_node *node; + int rc = -EBUSY; + LIST_HEAD(tokill); + + mutex_lock(&pfn_space_lock); + /* + * Modules registers with MM the address space mapping to the device memory they + * manage. Iterate to identify exactly which address space has mapped to this + * failing PFN. + */ + for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node; + node = interval_tree_iter_next(node, pfn, pfn)) { + struct pfn_address_space *pfn_space = + container_of(node, struct pfn_address_space, node); + rc = 0; + + /* + * Modules managing the device memory needs to be conveyed about the + * memory failure so that the poisoned PFN can be tracked. + */ + pfn_space->ops->failure(pfn_space, pfn); + + collect_procs_pgoff(NULL, pfn_space->mapping, pfn, &tokill); + + unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, + PAGE_SIZE, 0); + } + mutex_unlock(&pfn_space_lock); + + /* + * Unlike System-RAM there is no possibility to swap in a different + * physical page at a given virtual address, so all userspace + * consumption of direct PFN memory necessitates SIGBUS (i.e. + * MF_MUST_KILL) + */ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + kill_procs(&tokill, true, false, pfn, flags); + + pr_err("%#lx: recovery action for %s: %s\n", + pfn, action_page_types[MF_MSG_PFN], + action_name[rc ? MF_FAILED : MF_RECOVERED]); + + return rc; +} + static DEFINE_MUTEX(mf_mutex); /** @@ -2093,6 +2197,11 @@ int memory_failure(unsigned long pfn, int flags) if (!(flags & MF_SW_SIMULATED)) hw_memory_failure = true; + if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) { + res = memory_failure_pfn(pfn, flags); + goto unlock_mutex; + } + p = pfn_to_online_page(pfn); if (!p) { res = arch_memory_failure(pfn, flags); @@ -2106,6 +2215,9 @@ int memory_failure(unsigned long pfn, int flags) pgmap); goto unlock_mutex; } + + res = memory_failure_pfn(pfn, flags); + goto unlock_mutex; } pr_err("%#lx: memory outside kernel control\n", pfn); res = -ENXIO; From patchwork Wed Apr 5 18:01:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DCDBCC7619A for ; Wed, 5 Apr 2023 18:02:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sa4xDty3EQcKzfAsVacrKnoAQY6iytTyIGrjPJt2x1Q=; b=EESzj9P8CeBruC +Jfm+HlDYWkx6bdfat9dbhLkH52+LeJwxdIr7mrxdcR+aQcSyBF1VEedYaANoZWi4WGOEBL5ydL+6 BG3+Uzk5wh9zF34/THoblxRv+TO6kxUev/SS/ILaVctVXUgfRfywXOTFrgHWzBFWBhAahZY2aUBvz s/kzAUFrcdr2f7zNM9fxn7Cmu+vVQkan9Ivep1VyXyPBevb5b79ZruGi1RF5Sg6dZ6eGSvO6YlLSn XuPeVAhguSMtMeooYdUYBOTHnOJB2z9e8Lxas8/FtgOJeTFJK3n6B9032bjiYCdbKAdGlyMk7+RkT HH5bAeVhp0RQlI5Hf3Yg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SK-005K7o-14; Wed, 05 Apr 2023 18:02:00 +0000 Received: from mail-bn7nam10on20622.outbound.protection.outlook.com ([2a01:111:f400:7e8a::622] helo=NAM10-BN7-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SG-005K5u-2K for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:01:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Mk16Qisdbe0+HbJZYZO05tRGYuqSxSSQgAbCYG8g2V4dm1cQkMry7SJ18tDQOJaCImLF+pJv63LIaFDPLoNT0ownDmto3e/MKi737P2PrKEx40C+FDLf27VjqJj34aUvX2In01Qlg82jQ1rykt0vSu08s/B53lMu5kWFZUVi5MbuFae+4qdTICjIZGW7WFo3twhRXEKaF2D9Tl6SrJukvzkrhFOxtvJIn44K6j68AmFJIY7HrWkS1b3Y/qbSf5NlvWRj5N8PMnRRUk3Wtquf1I1NXDOIx9v2KdsBF6DXMLhLabT8FSqnvgfuXlLquGWkOv8+toSYjX+9q0ahJJYk+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Lbn51B6boribzXbBrYXY0Ya6Hm0KLcNEK2K3S+ou5C8=; b=MKlLjpD1L6tzVVgw49KWnibuzszKo6yzrPmPTuD+yI7JO/kusH141CPSsleVyjHNue5MDhbW/hhZJ6qRt5W/fmW45P4FuG0IuVM+apc2jWhWKNsgR6B0ET5zlRDRIwHxOq6N0umSZOCVrhtAkNvM28eGHyf1JtSMbQoWn1Ib4B8X432QdUj/zYlrLg7F+4VziDR3s7RdC3/XzTovyHuYDnRSetoMwcvl3kfq1Pn/74IXy40eg3coI4Wd5OkYGS/7EqcbgluybjBBTYrIjCbDcIwvEiU8/HiIBMk6LNGXdvvbqtjW3toP/pXw4+1h0+nrr9myIti7W7Zefvv5Ji4Org== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Lbn51B6boribzXbBrYXY0Ya6Hm0KLcNEK2K3S+ou5C8=; b=PZcCFz0qH8qNoVLQKhvqdFll5wU0Zc6tG/mKK6JvTISQIxEPlHJGF7RUsxyRSJ6KMxJvWZfASuArYA9Xf+Q1dgycobpZNV4H1LzkkkxQn6qE7K5qLxvBdRzHXUP/dVU3NccSeKhDQi/KPcCtrYgxIwp0cUERNzuQ75OmVXRhNROZ/djX2/3KWQTJT1vGD6cMYnpipuDz8D5DXA4zlQtpvli5n4e6vMVfsgR2QeXoC6M4po/3sG1vzxSbB0kvssNs1ziW0LZAKJlS59Jv/5d3VpvTmpTOdGcTVrMDI3cUnwrjsl9tTfPqfF6f1gGzewAgnyec0ButMXzqcuhQlH5oJQ== Received: from MW4PR04CA0250.namprd04.prod.outlook.com (2603:10b6:303:88::15) by MN0PR12MB6271.namprd12.prod.outlook.com (2603:10b6:208:3c1::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.33; Wed, 5 Apr 2023 18:01:49 +0000 Received: from CO1NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::88) by MW4PR04CA0250.outlook.office365.com (2603:10b6:303:88::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.30 via Frontend Transport; Wed, 5 Apr 2023 18:01:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT045.mail.protection.outlook.com (10.13.175.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.30 via Frontend Transport; Wed, 5 Apr 2023 18:01:48 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:36 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:35 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:35 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 4/6] mm: Add poison error check in fixup_user_fault() for mapped PFN Date: Wed, 5 Apr 2023 11:01:32 -0700 Message-ID: <20230405180134.16932-5-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT045:EE_|MN0PR12MB6271:EE_ X-MS-Office365-Filtering-Correlation-Id: d58cd556-1e24-452b-abca-08db35ffd3c6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TPrqo7iyZ1LzNwd2UFOiHjb3KLRpX/sJApbPGFn+XwlHp+Y9BEaCIuP9RUIldtiCozRqkdlmVSRETW/yA/eHUVbyu34ZvGx/9M9ynBGhLPUrKE8D1xhramLu3R14fw7CBWXzZYwKiwOD4fFGmVxnQrAtmvhBOZmAyiXqdRZZLlFilsSEwi5qztlj6z7kWCobJPVe4KGU7uXEwxOuXXhtZumpWzUo+U4pVV/D59VfdxChneSMlN/JRGOWXNd4rFrDohvPjEr1ujQgeenMln4S01W9CjZ79wCi2T6bO7Sf/8+MeEzQLr0yPNK8otUKAz8of0+QkeS+5Tlqn+aR4eX29QvM4o3qz3JsAIOvSS38Li/CQz0kI0OjhygiWIlBF9mqg9vYSALxFycMqvDdeIQTDOQk9f81DJ8zna4PcPbd4ZPxW6+3iCI2YkedWnBZpZOweU/EZQBZGiHYSVe1mV2gjns7e0L4pCImHw/A/WQy+eSy6OQgMXX4B6fDJjSI3RPL3qUR7irH4vVOhZkCafubX9DpHToB3w2l1rr+gwA+7r2zK5gcgiClbN4I4u9OxsAyMQBj+jeCMIMAkeMvAuqIDHLUceKp/5O+2UziNnzbv3njsWzQ7H4O+49S2UftdS+H50HMZG5PRDbHXIaRrLOh93AmY7lu0GFIoqMOJ1jYLDNFx+UP+VU8JoZSjbV7r32IC9Hiiz/RzhegDkAus1f9M4mRpnr1x6Cf+jG02sFcCmf72Kkw2ZQ8TXtaHNAlwJ4L X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(346002)(376002)(136003)(396003)(451199021)(46966006)(40470700004)(36840700001)(2876002)(2906002)(82310400005)(36756003)(86362001)(36860700001)(40480700001)(2616005)(336012)(47076005)(186003)(426003)(83380400001)(6666004)(1076003)(26005)(8676002)(70586007)(478600001)(70206006)(40460700003)(5660300002)(4326008)(82740400003)(356005)(41300700001)(110136005)(316002)(8936002)(7636003)(54906003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:01:48.8221 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d58cd556-1e24-452b-abca-08db35ffd3c6 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6271 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110156_785089_AB45F750 X-CRM114-Status: GOOD ( 13.71 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal The fixup_user_fault() currently does not expect a VM_FAULT_HWPOISON and hence does not check for it while calling vm_fault_to_errno(). Since we now have a new code path which can trigger such case, change fixup_user_fault to look for VM_FAULT_HWPOISON. Also make hva_to_pfn_remapped check for -EHWPOISON and communicate the poison fault up to the user_mem_abort(). Signed-off-by: Ankit Agrawal --- mm/gup.c | 2 +- virt/kvm/kvm_main.c | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index eab18ba045db..507a96e91bad 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1290,7 +1290,7 @@ int fixup_user_fault(struct mm_struct *mm, } if (ret & VM_FAULT_ERROR) { - int err = vm_fault_to_errno(ret, 0); + int err = vm_fault_to_errno(ret, FOLL_HWPOISON); if (err) return err; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d255964ec331..09b6973e679d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2688,6 +2688,12 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, r = hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn); if (r == -EAGAIN) goto retry; + + if (r == -EHWPOISON) { + pfn = KVM_PFN_ERR_HWPOISON; + goto exit; + } + if (r < 0) pfn = KVM_PFN_ERR_FAULT; } else { From patchwork Wed Apr 5 18:01:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F2B42C761AF for ; Wed, 5 Apr 2023 18:03:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=/R2AL1pp7+lWlEJDM2p7d3pbgS1kIhE0YUEiqfWoMOE=; b=GHX1m5vpPJq/uU h0pXVOGfA3LAYXbi/l1eoTDuaD9xsGExm/auBXcoiofAcLRdmvlkgasluYA8YzSdzXeGLrsP0mJt6 z+RhU7yHuRC7l85k2W9mMv9XZxJTXi+UOjO3LX+yUt9fQ+tN6KfD4D79doAE+8L8SyVmzOWLCUO2M 8BZN/CbnlRvgygVphdHow/Tv1bszInAl/bsUwTMcjqfJJ4vmQkse3RpUMdQzsrxlePKeBo0heuypc 1OdZDRI04uG6ChTQdau7RO+GblWfbBrQKZBUhr1INcT5mp0RteyLiYa7d9dA5gYdrdpjay1e0FMDW KTaX1FUzijo6OZGb8QDg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7ST-005KBM-1t; Wed, 05 Apr 2023 18:02:09 +0000 Received: from mail-dm6nam10on20601.outbound.protection.outlook.com ([2a01:111:f400:7e88::601] helo=NAM10-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7SR-005KAY-2I for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:02:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OZEZu3kRRvLKZAvppHl9rlc4mYRgdx1YoThcLRVLlztdB9H1HS6NoJol8igy16PYhBGeFGlCiPZHrPDtprxUoXuAjuPqaSOwutPNyq/s0SFxVXI0MZRf+VA5zNJ1ZYhoyishQ9XOkkViEE8pVR+iI1EOVeuQveMa18zzRgG1t2bKdd7LAfi7BmNk8YGYXbXG1DfaMO4iCzuEpr9G0by0wQZrrPkQkDN+Dzp9fF7L+Jlib3FGCelFOK4iPdKvzyVNIGyipmEoEzbjY2+K/8zHHhL2gTbqRV4zG6rCtgkYXB7rWN7UVfRjm07qGzWWTHRXBJJ4jr9Q1eaNF38gdO4tuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BtaL/n8C6+3A7ETFFxQQ906gOFVAx/zW99+G1mQbAcE=; b=bJtA0zIYz/GDw+dRvQs3YcqPlJwiaH6ZMjTyuPrEVWeqiDxkW3ymxI0cClv11lLzxg4LEGzxU3DSqFNGRA86oQ+4R6Lr32QIubKZF99VSY/vH5Fhl39jPT3Xkbbl/4IW8Z4Kgl2E2kMJQ/xQvtUbX3l0DWhnCAT6yP+aJ/SpD5P39v+hIWxd9s0o2HXxzHZXsRc32YQFJ3zuZsLNAviQC2c/XGK5NMcMMmH4C5tcGQp5SH9QzpsXTolUUeDavew4ZpJnRIiF+4G5eLZ3mra+UhfR+SD+Hhdl8jnY8Fhn6VWnCqGWDpYdJsbcBOOnd2ZntLHFYmPh1yAx+0X4jc7C8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BtaL/n8C6+3A7ETFFxQQ906gOFVAx/zW99+G1mQbAcE=; b=QyaJu+mRzARAiiVLlhivxtdm0wWLTASPlKuNoHtnbEobDBdfM2LtDfrBEyES/U+F0sPurXqtWcFlHyvO00xYsF0YjlYBJLxtMVxi3+2eKr8G9TcHjluJxbFybtFuT1P2G/D7GEHG+VsyGX3q3KphFQhSmp658gbKqpEg6uHnyMJp8sxQ6492+05cm4i/vp/Sj2uep05awNVTjX4XZ5kKK7QiBiDuP20Mg/AxDxGGqxjGESpbQ9oAnQLea+yTUXb3qf4Uq6v718oa/gRxP1Dz1qonnEcuv15ZpO98IzP39UqUmVMKiihMcgfPpv60/hixTetIJwlbKt3KzD1XX5/fJA== Received: from MW4PR04CA0265.namprd04.prod.outlook.com (2603:10b6:303:88::30) by MN0PR12MB6077.namprd12.prod.outlook.com (2603:10b6:208:3cb::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.33; Wed, 5 Apr 2023 18:01:51 +0000 Received: from CO1NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::bd) by MW4PR04CA0265.outlook.office365.com (2603:10b6:303:88::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.35 via Frontend Transport; Wed, 5 Apr 2023 18:01:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT045.mail.protection.outlook.com (10.13.175.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.30 via Frontend Transport; Wed, 5 Apr 2023 18:01:50 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:36 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:36 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:36 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 5/6] mm: Change ghes code to allow poison of non-struct PFN Date: Wed, 5 Apr 2023 11:01:33 -0700 Message-ID: <20230405180134.16932-6-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT045:EE_|MN0PR12MB6077:EE_ X-MS-Office365-Filtering-Correlation-Id: 4d9c77d8-476c-4dc5-216d-08db35ffd506 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aowpSczYvGkbf0GmtDx0TVteJl6Z3VkolupEIVVvbH9BsXlNO84HYLdRWY4TIUbPhoeFtu70v75Z44NH7DVzfD5K/lAoF4qBAKlioEM6+HTskN4pdYZv+lMBRiY3jEmTwV42OsFMIYD0zKMLpXJdLQs0S2u91rgbp0lvieCBxVMAOmK6A0gm/YYZ0Qqz9eDZQ2tASV+IqqH3CaGBYoRjn66tmI48cLBF5DluzKhTK7RhwRwxvVz3Yf9s0e656EYw9hy7AZLO40e1ibb90lyRvOsbiCAYXFJgv8Cxj2tsiGtmS4GJDo1oTXtXRZIIH+qpjtZV151mvnWNFDz4yZiv5JQXwDC0O6FVggfKGbhKzVUehb09XbOP7t1CueTKRPG7d0EsjZiBLzROaUk9JnJaIYWKgBOPSBqPSEM4cfVsXZ4ZJJLLHwaB0sjuGf6ri66JDwtUbSxs7vwgiJKRHdfFt5buG9pppE70cJunJ/qKKIPmopEMWRbuESi/Yswpivpl9Bx1NuCfZz1RHILnOSRQCppm3YHDJ0Ub3Gmff916Jz2QMph9ys76h+FvoFvWbnPnXctEucrIOfD75yBSG8mEZawLTbeh1SmgdcuI37ySYdLZVqztNpXqCAArWlTfmni3ZBFOiDgJuO4Vnujl1WaCuGDR9sQcjkmSUNnhlo5xE3yQGLSjNDoW5RLSuF6ao6AhkIIZ87auOtcTAqebEtmEaBIqmWK0USoJiSYWk+WeOpqOIryiPaa7r20BGriQGHLc X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(136003)(346002)(396003)(376002)(451199021)(46966006)(36840700001)(40470700004)(82740400003)(2876002)(2906002)(7636003)(356005)(5660300002)(36860700001)(70206006)(70586007)(40460700003)(8936002)(41300700001)(8676002)(1076003)(54906003)(316002)(83380400001)(4326008)(47076005)(2616005)(26005)(82310400005)(336012)(110136005)(426003)(186003)(478600001)(6666004)(40480700001)(36756003)(86362001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:01:50.9001 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4d9c77d8-476c-4dc5-216d-08db35ffd506 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6077 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110207_750915_9FAD1601 X-CRM114-Status: GOOD ( 10.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal The GHES code allows calling of memory_failure() on the PFNs that pass the pfn_valid() check. This contract is broken for the remapped PFNs which fails the check and ghes_do_memory_failure() returns without triggering memory_failure(). Update code to allow memory_failure() call on PFNs failing pfn_valid(). Signed-off-by: Ankit Agrawal --- drivers/acpi/apei/ghes.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 34ad071a64e9..2ab7fec8127d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -459,20 +459,10 @@ static void ghes_kick_task_work(struct callback_head *head) static bool ghes_do_memory_failure(u64 physical_addr, int flags) { - unsigned long pfn; - if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; - pfn = PHYS_PFN(physical_addr); - if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) { - pr_warn_ratelimited(FW_WARN GHES_PFX - "Invalid address in generic error data: %#llx\n", - physical_addr); - return false; - } - - memory_failure_queue(pfn, flags); + memory_failure_queue(PHYS_PFN(physical_addr), flags); return true; } From patchwork Wed Apr 5 18:01:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13202344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0952BC7619A for ; Wed, 5 Apr 2023 18:03:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=G4XnNZad1u+w42XPRr7y5dQJ8ZHY3zK8nhLkDcXrhwM=; b=v2I+JnbJIdUEOH HOEA3Biz/DRlKp3gDlr0BSuSWHb7wOUB/WqdIiF75s9sitEQ1hq1TP7iN3PtEPol5scomUEFk3/S5 SvgL7GRX0/mLI8a/OepBMW29DnAhaMIcHgxFMUkS8GMaRhkxTnUL7mIciUbgzh6zOpgNJ8MNq8Anl WuJHOG38YyNX7V5Hu4Sg2ntfk37BtEU+MtaFhBxQi3/Ukw8UBcT86RZEaNrimtY7k3Dqsr4nCfVgz mmOFrphBStq7bUbuPctcuLT0p0tHaHK10kFidDFdXIvy1AShWYa5nknK6HZYTTV6A7dMTlWsz1+sk y0633VizXoE3tcx418kw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pk7Sm-005KKd-1m; Wed, 05 Apr 2023 18:02:28 +0000 Received: from mail-dm6nam10on20613.outbound.protection.outlook.com ([2a01:111:f400:7e88::613] helo=NAM10-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pk7Si-005KH0-0E for linux-arm-kernel@lists.infradead.org; Wed, 05 Apr 2023 18:02:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k/LJWdS4RjzCTH/KoVH5nAkrW9pmx8WQDAWwRyOlx5jL5rLyFuklY07scJXyN/smnmXkDolzQYNQI7ZpDEmWMploHFUa1Iwcqd8FOD955pS1zhCMPwp2ugTps1qPCKtYeEXId3ObePRsA9Eu19adT7Ne8Ff5pDyPUEIK9PT613szNHM8bOY0IZWDcjbnxnA8OXlC3pCjcggMpy4hlKsxX/5PfdKj2u1BcWpxIXpENKuE9ct/WX7TP4qBr6uP9yheLT7euFQQZZHqtdFR8X4yJsiS9nL8yJaJKVzpPmNNIffaca8YOBmHU8LUtYv8XJYzA8BGxlgHpfy/zSs4MlXJeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vdzN4GTXlcXcoY0qTILPkK5gTwxyEEnt0xYFTZZrQps=; b=N7eZrdrYc3HXGPDuae21yBPy9z+1wKud2/2U0RYXzfP5ZC8Tjp/+Gfwi30Q4JhBPZqLiFtHnVdjAPmxDddSFxk2IKtyalHjyM7FQ5lcRYdKgYPKId8gQblBxF6EClhkRQhStR1ukWfDMlxAEyDIARgrIuI3OZ7y7L9ajd1p5vjBeQ2Dh6238hCvzGvPyi0Id8ZFoaMDrRI049mJLFDUoKfAtnMnm7V7EiMPxkoJdDyLp9cvAUJZnVlBpm4d40thGV9r0z//j8GLfcoXgQRLHDrbVSlWRqGuXnpeNdewBXVcuOVTZtwtws0aySw2LD/k5CZ+56dY0ACZ09AzEPZPyCg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vdzN4GTXlcXcoY0qTILPkK5gTwxyEEnt0xYFTZZrQps=; b=ubW5dCoKjzw5cTfA5hKqxz5SjfyJGVBAh2LK12BVTyNFAupHBGFm8n4F8WiEydoE20mjLwetNf/fJIfb5dHmBjZpe9HRp8fBFOERCK7d8RlSr5aX2HrFWu/Ql1ctzmBo/2ANJdRFEZYoBqk5yIp9y36+2TaKItA88pZ7zaqZCMn5cWv2rVSe5Afl1hyqk9MqRS21cf0rUhde/eroMoPvOND1OCQsMCfKShg1FSdR+bPD8OABfslRhtUyp13wDNJ10Z+bRAz610Ss9RcO/3QboXplbfXUWiBQbLDGxxAKW9Bi2l1CvemOjH69ivcvd7GBY5+DWILkOBEHDvkGlv+rBQ== Received: from DM6PR17CA0005.namprd17.prod.outlook.com (2603:10b6:5:1b3::18) by SJ0PR12MB6901.namprd12.prod.outlook.com (2603:10b6:a03:47e::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.33; Wed, 5 Apr 2023 18:02:14 +0000 Received: from DM6NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1b3:cafe::41) by DM6PR17CA0005.outlook.office365.com (2603:10b6:5:1b3::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.44 via Frontend Transport; Wed, 5 Apr 2023 18:02:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT049.mail.protection.outlook.com (10.13.172.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.28 via Frontend Transport; Wed, 5 Apr 2023 18:02:13 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 5 Apr 2023 11:01:36 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 5 Apr 2023 11:01:36 -0700 Received: from localhost.localdomain (10.127.8.14) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 5 Apr 2023 11:01:36 -0700 From: To: , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v3 6/6] vfio/nvgpu: register device memory for poison handling Date: Wed, 5 Apr 2023 11:01:34 -0700 Message-ID: <20230405180134.16932-7-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230405180134.16932-1-ankita@nvidia.com> References: <20230405180134.16932-1-ankita@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT049:EE_|SJ0PR12MB6901:EE_ X-MS-Office365-Filtering-Correlation-Id: 23ea251e-83ae-480e-4f9e-08db35ffe2bb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1ThF1QE+sshigyAroD389Se81EL2sgk1Xm6rlSoq83A0oALJZTJp74i4qj1tB+lvSecnvy3symPZUJPA683YkRpwfqEzE0iNv8Rhwa6roJbppfXN3Kpq5foiebR5Q5i+XMnxpGHCETBRYrLTivrseacxvifN3v1+IyKuFIYkyxGpOp6vbnbTK0rWgAGTXcJqiG8XKoTZX069dHRWQH9JRHv5baJL0+oFzLrvtx72a/nvtWqaO2cvIPe4smtQXhk77aKFnE0CHMbhaxXscipevEztXpyso9+rxnM4a+pIQthJiTXJ95eMVD5rmuzWKc3SsgKwfes90Wyl8rXIr533IfADKe4xiy30IJqmeHkEPObHsNQtWqzS4wAJc1DzX6CaUgNwJovEz9k3BF19ZVMjEDZrmUWkrv12rUb22aMF0d12+UhXUfOaKwauj3NYO6oQwxCLCPyJK9lpOpp8Uhm9bjhNAwTyZ/j7BZz5UUZhoa3oG6kFYd08QWCGp32rJ3vs82OgAOfDB3BBkE8WyOtsLpgAJAqheSg1ufP153djRht4poeiLtWBgook0s0X3nSMoqS7tLD5ZnNZhOSV2FUNvb/d1nIcuxC5JcEpHsZq5LiULW0Zk5T0K0EWAwHVNpPXb7YXP/WEe2Y0A7G16hVslGMY08U8pDw5tbqVu9TWJidtbJ1ZA1Zlb1g+tBCWQEcZfez/HmDzUCZ18FxqiHAzCuND9PDuRI+qXFCUyhw/5vJhu7v74HTUe1H8yy030JiC X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(396003)(376002)(346002)(136003)(39860400002)(451199021)(46966006)(40470700004)(36840700001)(186003)(336012)(426003)(40460700003)(41300700001)(40480700001)(2616005)(4326008)(70586007)(8676002)(36860700001)(70206006)(2876002)(36756003)(2906002)(356005)(86362001)(7636003)(82740400003)(82310400005)(83380400001)(5660300002)(110136005)(8936002)(316002)(54906003)(478600001)(47076005)(1076003)(26005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2023 18:02:13.8510 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 23ea251e-83ae-480e-4f9e-08db35ffe2bb X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB6901 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230405_110224_116654_4CECF656 X-CRM114-Status: GOOD ( 23.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ankit Agrawal The nvgpu-vfio-pci module maps QEMU VMA to device memory through remap_pfn_range(). The new mechanism to handle poison on memory not backed by struct page is leveraged here. nvgpu-vfio-pci defines a function pfn_memory_failure() to get the ECC PFN from the MM. The function is registered with kernel MM along with the address space and PFN range through register_pfn_address_space(). Track poisoned PFN in the nvgpu-vfio-pci module as bitmap with a bit per PFN. The PFN is communicated by the kernel MM to the module through the failure function, which sets the appropriate bit in the bitmap. Register a VMA fault ops for the module. It returns VM_FAULT_HWPOISON in case the bit for the PFN is set in the bitmap. Clear bitmap on reset to reflect the clean state of the device memory after reset. Signed-off-by: Ankit Agrawal --- drivers/vfio/pci/nvgpu/main.c | 116 ++++++++++++++++++++++++++++++++-- 1 file changed, 110 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/nvgpu/main.c b/drivers/vfio/pci/nvgpu/main.c index 2dd8cc6e0145..8ccd3fe33a0f 100644 --- a/drivers/vfio/pci/nvgpu/main.c +++ b/drivers/vfio/pci/nvgpu/main.c @@ -5,6 +5,8 @@ #include #include +#include +#include #define DUMMY_PFN \ (((nvdev->mem_prop.hpa + nvdev->mem_prop.mem_length) >> PAGE_SHIFT) - 1) @@ -12,12 +14,78 @@ struct dev_mem_properties { uint64_t hpa; uint64_t mem_length; + unsigned long *pfn_bitmap; int bar1_start_offset; }; struct nvgpu_vfio_pci_core_device { struct vfio_pci_core_device core_device; struct dev_mem_properties mem_prop; + struct pfn_address_space pfn_address_space; +}; + +void nvgpu_vfio_pci_pfn_memory_failure(struct pfn_address_space *pfn_space, + unsigned long pfn) +{ + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + pfn_space, struct nvgpu_vfio_pci_core_device, pfn_address_space); + + /* + * MM has called to notify a poisoned page. Track that in the bitmap. + */ + __set_bit(pfn - (pfn_space->node.start), nvdev->mem_prop.pfn_bitmap); +} + +struct pfn_address_space_ops nvgpu_vfio_pci_pas_ops = { + .failure = nvgpu_vfio_pci_pfn_memory_failure, +}; + +static int +nvgpu_vfio_pci_register_pfn_range(struct nvgpu_vfio_pci_core_device *nvdev, + struct vm_area_struct *vma) +{ + unsigned long nr_pages; + int ret = 0; + + nr_pages = nvdev->mem_prop.mem_length >> PAGE_SHIFT; + + nvdev->pfn_address_space.node.start = vma->vm_pgoff; + nvdev->pfn_address_space.node.last = vma->vm_pgoff + nr_pages - 1; + nvdev->pfn_address_space.ops = &nvgpu_vfio_pci_pas_ops; + nvdev->pfn_address_space.mapping = vma->vm_file->f_mapping; + + ret = register_pfn_address_space(&(nvdev->pfn_address_space)); + + return ret; +} + +static vm_fault_t nvgpu_vfio_pci_fault(struct vm_fault *vmf) +{ + unsigned long mem_offset = vmf->pgoff - vmf->vma->vm_pgoff; + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + vmf->vma->vm_file->private_data, + struct nvgpu_vfio_pci_core_device, core_device.vdev); + int ret; + + /* + * Check if the page is poisoned. + */ + if (mem_offset < (nvdev->mem_prop.mem_length >> PAGE_SHIFT) && + test_bit(mem_offset, nvdev->mem_prop.pfn_bitmap)) + return VM_FAULT_HWPOISON; + + ret = remap_pfn_range(vmf->vma, + vmf->vma->vm_start + (mem_offset << PAGE_SHIFT), + DUMMY_PFN, PAGE_SIZE, + vmf->vma->vm_page_prot); + if (ret) + return VM_FAULT_ERROR; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct nvgpu_vfio_pci_mmap_ops = { + .fault = nvgpu_vfio_pci_fault, }; static int vfio_get_bar1_start_offset(struct vfio_pci_core_device *vdev) @@ -26,8 +94,9 @@ static int vfio_get_bar1_start_offset(struct vfio_pci_core_device *vdev) pci_read_config_byte(vdev->pdev, 0x10, &val); /* - * The BAR1 start offset in the PCI config space depends on the BAR0size. - * Check if the BAR0 is 64b and return the approproiate BAR1 offset. + * The BAR1 start offset in the PCI config space depends on the BAR0 + * size. Check if the BAR0 is 64b and return the approproiate BAR1 + * offset. */ if (val & PCI_BASE_ADDRESS_MEM_TYPE_64) return VFIO_PCI_BAR2_REGION_INDEX; @@ -54,6 +123,16 @@ static int nvgpu_vfio_pci_open_device(struct vfio_device *core_vdev) return ret; } +void nvgpu_vfio_pci_close_device(struct vfio_device *core_vdev) +{ + struct nvgpu_vfio_pci_core_device *nvdev = container_of( + core_vdev, struct nvgpu_vfio_pci_core_device, core_device.vdev); + + unregister_pfn_address_space(&(nvdev->pfn_address_space)); + + vfio_pci_core_close_device(core_vdev); +} + int nvgpu_vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma) { @@ -93,8 +172,11 @@ int nvgpu_vfio_pci_mmap(struct vfio_device *core_vdev, return ret; vma->vm_pgoff = start_pfn + pgoff; + vma->vm_ops = &nvgpu_vfio_pci_mmap_ops; - return 0; + ret = nvgpu_vfio_pci_register_pfn_range(nvdev, vma); + + return ret; } long nvgpu_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd, @@ -140,7 +222,14 @@ long nvgpu_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd, } return vfio_pci_core_ioctl(core_vdev, cmd, arg); - + case VFIO_DEVICE_RESET: + /* + * Resetting the GPU clears up the poisoned page. Reset the + * poisoned page bitmap. + */ + memset(nvdev->mem_prop.pfn_bitmap, 0, + nvdev->mem_prop.mem_length >> (PAGE_SHIFT + 3)); + return vfio_pci_core_ioctl(core_vdev, cmd, arg); default: return vfio_pci_core_ioctl(core_vdev, cmd, arg); } @@ -151,7 +240,7 @@ static const struct vfio_device_ops nvgpu_vfio_pci_ops = { .init = vfio_pci_core_init_dev, .release = vfio_pci_core_release_dev, .open_device = nvgpu_vfio_pci_open_device, - .close_device = vfio_pci_core_close_device, + .close_device = nvgpu_vfio_pci_close_device, .ioctl = nvgpu_vfio_pci_ioctl, .read = vfio_pci_core_read, .write = vfio_pci_core_write, @@ -188,7 +277,20 @@ nvgpu_vfio_pci_fetch_memory_property(struct pci_dev *pdev, ret = device_property_read_u64(&(pdev->dev), "nvidia,gpu-mem-size", &(nvdev->mem_prop.mem_length)); - return ret; + if (ret) + return ret; + + /* + * A bitmap is maintained to teack the pages that are poisoned. Each + * page is represented by a bit. Allocation size in bytes is + * determined by shifting the device memory size by PAGE_SHIFT to + * determine the number of pages; and further shifted by 3 as each + * byte could track 8 pages. + */ + nvdev->mem_prop.pfn_bitmap + = vzalloc(nvdev->mem_prop.mem_length >> (PAGE_SHIFT + 3)); + + return 0; } static int nvgpu_vfio_pci_probe(struct pci_dev *pdev, @@ -224,6 +326,8 @@ static void nvgpu_vfio_pci_remove(struct pci_dev *pdev) struct nvgpu_vfio_pci_core_device *nvdev = nvgpu_drvdata(pdev); struct vfio_pci_core_device *vdev = &nvdev->core_device; + vfree(nvdev->mem_prop.pfn_bitmap); + vfio_pci_core_unregister_device(vdev); vfio_put_device(&vdev->vdev); }