Message ID | 20250224093938.3934386-1-aneesh.kumar@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: Drop mte_allowed check during memslot creation | expand |
On 24/02/2025 09:39, Aneesh Kumar K.V (Arm) wrote: > Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in > memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only > rejected a memory slot if VM_SHARED was set. This commit unified the > checking with user_mem_abort(), with slots being rejected if either > VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit > c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE > enabled") dropped the VM_SHARED check, so we ended up with memory slots > being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before > the commit d89585fbb308. The rejection of the memory slot with VM_SHARED > set was done to avoid a race condition with the test/set of the > PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page > for MTE tag initialization") the kernel avoided allowing MTE with shared > pages, thereby preventing two tasks sharing a page from setting up the > PG_mte_tagged flag racily. > > Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag > initialization") further updated the locking so that the kernel > allows VM_SHARED mapping with MTE. With this commit, we can enable > memslot creation with VM_SHARED VMA mapping. > > This patch results in a minor tweak to the ABI. We now allow creating > memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses > such a memslot with Allocation Tags, the kernel will generate -EFAULT. > ie, instead of failing early, we now fail later during KVM_RUN. > > This change is needed because, without it, users are not able to use MTE > with VFIO passthrough (currently the mapping is either Device or > NonCacheable for which tag access check is not applied.), as shown > below (kvmtool VMM). > > [ 617.921030] vfio-pci 0000:01:00.0: resetting > [ 618.024719] vfio-pci 0000:01:00.0: reset done > Error: 0000:01:00.0: failed to register region with KVM > Warning: [0abc:aced] Error activating emulation for BAR 0 > Error: 0000:01:00.0: failed to configure regions > Warning: Failed init: vfio__init > > Fatal: Initialisation failed > > Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote: > Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in > memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only > rejected a memory slot if VM_SHARED was set. This commit unified the > checking with user_mem_abort(), with slots being rejected if either > VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit > c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE > enabled") dropped the VM_SHARED check, so we ended up with memory slots > being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before > the commit d89585fbb308. The rejection of the memory slot with VM_SHARED > set was done to avoid a race condition with the test/set of the > PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page > for MTE tag initialization") the kernel avoided allowing MTE with shared > pages, thereby preventing two tasks sharing a page from setting up the > PG_mte_tagged flag racily. > > Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag > initialization") further updated the locking so that the kernel > allows VM_SHARED mapping with MTE. With this commit, we can enable > memslot creation with VM_SHARED VMA mapping. > > This patch results in a minor tweak to the ABI. We now allow creating > memslots that don't have the VM_MTE_ALLOWED flag set. As I commented here: https://lore.kernel.org/r/Z4e04P1bQlFBDHo7@arm.com I'm fine with the change, we basically go back to the original ABI prior to relaxing this for VM_SHARED. > If the guest uses > such a memslot with Allocation Tags, the kernel will generate -EFAULT. > ie, instead of failing early, we now fail later during KVM_RUN. Nit: more like the kernel "will return -EFAULT" to the VMM rather than "generate". > This change is needed because, without it, users are not able to use MTE > with VFIO passthrough (currently the mapping is either Device or > NonCacheable for which tag access check is not applied.), as shown > below (kvmtool VMM). Another nit: "users are not able to user VFIO passthrough when MTE is enabled". At a first read, the above sounded to me like one wants to enable MTE for VFIO passthrough mappings. > [ 617.921030] vfio-pci 0000:01:00.0: resetting > [ 618.024719] vfio-pci 0000:01:00.0: reset done > Error: 0000:01:00.0: failed to register region with KVM > Warning: [0abc:aced] Error activating emulation for BAR 0 > Error: 0000:01:00.0: failed to configure regions > Warning: Failed init: vfio__init > > Fatal: Initialisation failed > > Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> > --- > arch/arm64/kvm/mmu.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index ef53af6df6de..1f1b5aa43d2d 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -2178,11 +2178,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > if (!vma) > break; > > - if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) { > - ret = -EINVAL; > - break; > - } > - > if (vma->vm_flags & VM_PFNMAP) { > /* IO region dirty page logging not allowed */ > if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
On Mon, 24 Feb 2025 11:05:33 +0000, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote: > > Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in > > memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only > > rejected a memory slot if VM_SHARED was set. This commit unified the > > checking with user_mem_abort(), with slots being rejected if either > > VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit > > c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE > > enabled") dropped the VM_SHARED check, so we ended up with memory slots > > being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before > > the commit d89585fbb308. The rejection of the memory slot with VM_SHARED > > set was done to avoid a race condition with the test/set of the > > PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page > > for MTE tag initialization") the kernel avoided allowing MTE with shared > > pages, thereby preventing two tasks sharing a page from setting up the > > PG_mte_tagged flag racily. > > > > Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag > > initialization") further updated the locking so that the kernel > > allows VM_SHARED mapping with MTE. With this commit, we can enable > > memslot creation with VM_SHARED VMA mapping. > > > > This patch results in a minor tweak to the ABI. We now allow creating > > memslots that don't have the VM_MTE_ALLOWED flag set. > > As I commented here: > > https://lore.kernel.org/r/Z4e04P1bQlFBDHo7@arm.com > > I'm fine with the change, we basically go back to the original ABI prior > to relaxing this for VM_SHARED. > > > If the guest uses > > such a memslot with Allocation Tags, the kernel will generate -EFAULT. > > ie, instead of failing early, we now fail later during KVM_RUN. > > Nit: more like the kernel "will return -EFAULT" to the VMM rather than > "generate". > > > This change is needed because, without it, users are not able to use MTE > > with VFIO passthrough (currently the mapping is either Device or > > NonCacheable for which tag access check is not applied.), as shown > > below (kvmtool VMM). > > Another nit: "users are not able to user VFIO passthrough when MTE is > enabled". At a first read, the above sounded to me like one wants to > enable MTE for VFIO passthrough mappings. What the commit message doesn't spell out is how MTE and VFIO are interacting here. I also don't understand the reference to Device or NC memory here. Isn't the issue that DMA doesn't check/update tags, and therefore it makes little sense to prevent non-tagged memory being associated with a memslot? My other concern is that this gives pretty poor consistency to the guest, which cannot know what can be tagged and what cannot, and breaks a guarantee that the guest should be able to rely on. Thanks, M.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index ef53af6df6de..1f1b5aa43d2d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -2178,11 +2178,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, if (!vma) break; - if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) { - ret = -EINVAL; - break; - } - if (vma->vm_flags & VM_PFNMAP) { /* IO region dirty page logging not allowed */ if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only rejected a memory slot if VM_SHARED was set. This commit unified the checking with user_mem_abort(), with slots being rejected if either VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled") dropped the VM_SHARED check, so we ended up with memory slots being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before the commit d89585fbb308. The rejection of the memory slot with VM_SHARED set was done to avoid a race condition with the test/set of the PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag initialization") the kernel avoided allowing MTE with shared pages, thereby preventing two tasks sharing a page from setting up the PG_mte_tagged flag racily. Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag initialization") further updated the locking so that the kernel allows VM_SHARED mapping with MTE. With this commit, we can enable memslot creation with VM_SHARED VMA mapping. This patch results in a minor tweak to the ABI. We now allow creating memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses such a memslot with Allocation Tags, the kernel will generate -EFAULT. ie, instead of failing early, we now fail later during KVM_RUN. This change is needed because, without it, users are not able to use MTE with VFIO passthrough (currently the mapping is either Device or NonCacheable for which tag access check is not applied.), as shown below (kvmtool VMM). [ 617.921030] vfio-pci 0000:01:00.0: resetting [ 618.024719] vfio-pci 0000:01:00.0: reset done Error: 0000:01:00.0: failed to register region with KVM Warning: [0abc:aced] Error activating emulation for BAR 0 Error: 0000:01:00.0: failed to configure regions Warning: Failed init: vfio__init Fatal: Initialisation failed Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> --- arch/arm64/kvm/mmu.c | 5 ----- 1 file changed, 5 deletions(-)