diff mbox series

[1/1] KVM: MMU: Fix VM entry failure and OOPS for shdaow page table

Message ID 20220607074034.7109-1-yuan.yao@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/1] KVM: MMU: Fix VM entry failure and OOPS for shdaow page table | expand

Commit Message

Yao Yuan June 7, 2022, 7:40 a.m. UTC
commit e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and
repurpose shadow_me_mask") repurposed below varables:

shadow_me_value: the memory encryption bit(s) that will be
set to the SPTE (the original shadow_me_mask).
shadow_me_mask: all possible memory encryption bits (which
is a super set of shadow_me_value).

So assign shadow_me_mask to pae root page is wrong, instead
using shadow_me_value.

Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")

----------------------
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=8c 0a 14 28 3c 50 64 c8 66 90 66 90 66 90 66 90 66 90 66 90 <2e> 66 83 3e c8 61 00 0f 85 89 f0 31 d2 8e d2 66 bc 00 70 00 00 66 ba 63 fc 0e 00 e9 f3 ee

----------------------
[   80.806596] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
[  293.504118] BUG: unable to handle page fault for address: ffd43f00063049e8
[  293.515075] #PF: supervisor read access in kernel mode
[  293.524031] #PF: error_code(0x0000) - not-present page
[  293.532935] PGD 86dfd8067 P4D 0
[  293.539626] Oops: 0000 [#1] PREEMPT SMP
[  293.546958] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
[  293.565354] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
[  293.583639] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
[  293.592911] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
[  293.624056] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
[  293.633326] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
[  293.644758] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
[  293.656132] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
[  293.667480] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
[  293.678771] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
[  293.690069] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
[  293.702514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  293.712338] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
[  293.723802] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  293.735230] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  293.746597] PKRU: 55555554
[  293.752962] Call Trace:
[  293.758978]  <TASK>
[  293.764579]  kvm_mmu_free_roots+0xd1/0x200 [kvm]
[  293.773060]  __kvm_mmu_unload+0x29/0x70 [kvm]
[  293.781177]  kvm_mmu_unload+0x13/0x20 [kvm]
[  293.789012]  kvm_arch_destroy_vm+0x8a/0x190 [kvm]
[  293.797355]  kvm_put_kvm+0x197/0x2d0 [kvm]
[  293.804925]  kvm_vm_release+0x21/0x30 [kvm]
[  293.812499]  __fput+0x8e/0x260
[  293.818715]  ____fput+0xe/0x10
[  293.824822]  task_work_run+0x6f/0xb0
[  293.831433]  do_exit+0x327/0xa90
[  293.837586]  ? futex_unqueue+0x3f/0x70
[  293.844283]  do_group_exit+0x35/0xa0
[  293.850770]  get_signal+0x911/0x930
[  293.857137]  arch_do_signal_or_restart+0x37/0x720
[  293.864896]  ? do_futex+0xf9/0x1a0
[  293.871139]  ? __x64_sys_futex+0x66/0x160
[  293.878001]  exit_to_user_mode_prepare+0xb2/0x140
[  293.885576]  syscall_exit_to_user_mode+0x16/0x30
[  293.892973]  do_syscall_64+0x4e/0x90
[  293.899162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  293.906972] RIP: 0033:0x7f6c844f752d
[  293.913050] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
[  293.922442] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
[  293.933048] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
[  293.943161] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
[  293.953281] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
[  293.963401] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
[  293.973542] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
[  293.983683]  </TASK>
[  293.988216] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
[  294.022648] CR2: ffd43f00063049e8
[  294.028694] ---[ end trace 0000000000000000 ]---
[  294.042573] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
[  294.050908] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
[  294.079460] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
[  294.087908] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
[  294.098558] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
[  294.109193] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
[  294.119831] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
[  294.130449] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
[  294.141090] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
[  294.152867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  294.162035] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
[  294.172825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  294.183618] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  294.194390] PKRU: 55555554
[  294.200189] note: qemu-system-x86[4260] exited with preempt_count 1
[  294.210044] Fixing recursive fault but reboot is needed!
[  294.218854] BUG: scheduling while atomic: qemu-system-x86/4260/0x00000000
[  294.229357] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
[  294.266273] Preemption disabled at:
[  294.266273] [<ffffffff8109e404>] do_task_dead+0x24/0x50
[  294.282274] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G      D W         5.18.0-rc6-kvm-upstream-workaround+ #82
[  294.300693] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
[  294.319002] Call Trace:
[  294.325007]  <TASK>
[  294.330587]  dump_stack_lvl+0x38/0x49
[  294.337889]  ? do_task_dead+0x24/0x50
[  294.345102]  dump_stack+0x10/0x12
[  294.351836]  __schedule_bug.cold.156+0x7d/0x8e
[  294.359770]  __schedule+0x578/0x820
[  294.366552]  ? vprintk+0x52/0x80
[  294.373025]  ? _printk+0x58/0x6f
[  294.379449]  do_task_dead+0x44/0x50
[  294.386097]  make_task_dead.cold.48+0x50/0xaf
[  294.393650]  rewind_stack_and_make_dead+0x17/0x17
[  294.401549] RIP: 0033:0x7f6c844f752d
[  294.408147] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
[  294.418086] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
[  294.429266] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
[  294.439998] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
[  294.450748] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
[  294.461498] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
[  294.472199] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
[  294.482869]  </TASK>

Signed-off-by: Yuan Yao <yuan.yao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kai Huang June 7, 2022, 7:44 a.m. UTC | #1
On Tue, 2022-06-07 at 15:40 +0800, Yuan Yao wrote:
> commit e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and
> repurpose shadow_me_mask") repurposed below varables:
> 
> shadow_me_value: the memory encryption bit(s) that will be
> set to the SPTE (the original shadow_me_mask).
> shadow_me_mask: all possible memory encryption bits (which
> is a super set of shadow_me_value).
> 
> So assign shadow_me_mask to pae root page is wrong, instead
> using shadow_me_value.
> 
> Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")

Sorry I missed this.

Reviewed-by: Kai Huang <kai.huang@intel.com>

> 
> ----------------------
> KVM: entry failed, hardware error 0x80000021
> 
> If you're running a guest on an Intel machine without unrestricted mode
> support, the failure can be most likely due to the guest entering an invalid
> state for Intel VT. For example, the guest maybe running in big real mode
> which is not supported on less recent Intel processors.
> 
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f3
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=8c 0a 14 28 3c 50 64 c8 66 90 66 90 66 90 66 90 66 90 66 90 <2e> 66 83 3e c8 61 00 0f 85 89 f0 31 d2 8e d2 66 bc 00 70 00 00 66 ba 63 fc 0e 00 e9 f3 ee
> 
> ----------------------
> [   80.806596] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
> [  293.504118] BUG: unable to handle page fault for address: ffd43f00063049e8
> [  293.515075] #PF: supervisor read access in kernel mode
> [  293.524031] #PF: error_code(0x0000) - not-present page
> [  293.532935] PGD 86dfd8067 P4D 0
> [  293.539626] Oops: 0000 [#1] PREEMPT SMP
> [  293.546958] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
> [  293.565354] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
> [  293.583639] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> [  293.592911] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> [  293.624056] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> [  293.633326] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> [  293.644758] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> [  293.656132] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> [  293.667480] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> [  293.678771] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> [  293.690069] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> [  293.702514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  293.712338] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> [  293.723802] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  293.735230] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  293.746597] PKRU: 55555554
> [  293.752962] Call Trace:
> [  293.758978]  <TASK>
> [  293.764579]  kvm_mmu_free_roots+0xd1/0x200 [kvm]
> [  293.773060]  __kvm_mmu_unload+0x29/0x70 [kvm]
> [  293.781177]  kvm_mmu_unload+0x13/0x20 [kvm]
> [  293.789012]  kvm_arch_destroy_vm+0x8a/0x190 [kvm]
> [  293.797355]  kvm_put_kvm+0x197/0x2d0 [kvm]
> [  293.804925]  kvm_vm_release+0x21/0x30 [kvm]
> [  293.812499]  __fput+0x8e/0x260
> [  293.818715]  ____fput+0xe/0x10
> [  293.824822]  task_work_run+0x6f/0xb0
> [  293.831433]  do_exit+0x327/0xa90
> [  293.837586]  ? futex_unqueue+0x3f/0x70
> [  293.844283]  do_group_exit+0x35/0xa0
> [  293.850770]  get_signal+0x911/0x930
> [  293.857137]  arch_do_signal_or_restart+0x37/0x720
> [  293.864896]  ? do_futex+0xf9/0x1a0
> [  293.871139]  ? __x64_sys_futex+0x66/0x160
> [  293.878001]  exit_to_user_mode_prepare+0xb2/0x140
> [  293.885576]  syscall_exit_to_user_mode+0x16/0x30
> [  293.892973]  do_syscall_64+0x4e/0x90
> [  293.899162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  293.906972] RIP: 0033:0x7f6c844f752d
> [  293.913050] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> [  293.922442] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> [  293.933048] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> [  293.943161] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> [  293.953281] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> [  293.963401] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> [  293.973542] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> [  293.983683]  </TASK>
> [  293.988216] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> [  294.022648] CR2: ffd43f00063049e8
> [  294.028694] ---[ end trace 0000000000000000 ]---
> [  294.042573] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> [  294.050908] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> [  294.079460] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> [  294.087908] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> [  294.098558] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> [  294.109193] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> [  294.119831] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> [  294.130449] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> [  294.141090] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> [  294.152867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  294.162035] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> [  294.172825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  294.183618] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  294.194390] PKRU: 55555554
> [  294.200189] note: qemu-system-x86[4260] exited with preempt_count 1
> [  294.210044] Fixing recursive fault but reboot is needed!
> [  294.218854] BUG: scheduling while atomic: qemu-system-x86/4260/0x00000000
> [  294.229357] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> [  294.266273] Preemption disabled at:
> [  294.266273] [<ffffffff8109e404>] do_task_dead+0x24/0x50
> [  294.282274] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G      D W         5.18.0-rc6-kvm-upstream-workaround+ #82
> [  294.300693] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
> [  294.319002] Call Trace:
> [  294.325007]  <TASK>
> [  294.330587]  dump_stack_lvl+0x38/0x49
> [  294.337889]  ? do_task_dead+0x24/0x50
> [  294.345102]  dump_stack+0x10/0x12
> [  294.351836]  __schedule_bug.cold.156+0x7d/0x8e
> [  294.359770]  __schedule+0x578/0x820
> [  294.366552]  ? vprintk+0x52/0x80
> [  294.373025]  ? _printk+0x58/0x6f
> [  294.379449]  do_task_dead+0x44/0x50
> [  294.386097]  make_task_dead.cold.48+0x50/0xaf
> [  294.393650]  rewind_stack_and_make_dead+0x17/0x17
> [  294.401549] RIP: 0033:0x7f6c844f752d
> [  294.408147] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> [  294.418086] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> [  294.429266] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> [  294.439998] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> [  294.450748] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> [  294.461498] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> [  294.472199] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> [  294.482869]  </TASK>
> 
> Signed-off-by: Yuan Yao <yuan.yao@intel.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..6bd144f1e60c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3411,7 +3411,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
>  					      i << 30, PT32_ROOT_LEVEL, true);
>  			mmu->pae_root[i] = root | PT_PRESENT_MASK |
> -					   shadow_me_mask;
> +					   shadow_me_value;
>  		}
>  		mmu->root.hpa = __pa(mmu->pae_root);
>  	} else {
Sean Christopherson June 7, 2022, 2:58 p.m. UTC | #2
"KVM: x86/mmu:" for the scope please (because I'm holding out hope that someday
we'll have a common "KVM: MMU:" that's shared by multiple architectures.

And there's a s/shdaow/shadow typo.  That said, I'd prefer to use the shortlog to
give the reader a hint as to what the underlying bug is, e.g. it's not immediately
obvious that hitting this bug requires a platform with MKTME enabled.  Maybe:

  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs

On Tue, Jun 07, 2022, Yuan Yao wrote:
> commit e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and

Personal preference, I don't bother with a "commit ..." in the changelog if there's
a Fixes: that provides the same information.

> repurpose shadow_me_mask") repurposed below varables:

Please wrap closer to ~75 chars.

Please lead with the "what", then dive into the details, that way the reader has
an idea of what's changing.  And if there's a trace, finding the one sentence that
describes the actual change can be surprisingly difficult.  Somethine like:

  Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow
  PDPTRs, when host memory encryption is supported.  The "mask" is the set of all
  possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the
  actual value that needs to be stuffed into host page tables.

  Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA
  bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses
  with non-zero MKTME bits sending to_shadow_page() into the weeds.

> shadow_me_value: the memory encryption bit(s) that will be
> set to the SPTE (the original shadow_me_mask).
> shadow_me_mask: all possible memory encryption bits (which
> is a super set of shadow_me_value).
> 
> So assign shadow_me_mask to pae root page is wrong, instead
> using shadow_me_value.
> 
> Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")

Convention is to put Fixes: at the end of the changelog, i.e. after the trace and
next to the Cc list and SOB chain.

> ----------------------
> KVM: entry failed, hardware error 0x80000021

For this specific bug, I wouldn't bother with a dump of the failed VM-Entry,
nothing in the dump is relevant/interesting. 

> If you're running a guest on an Intel machine without unrestricted mode
> support, the failure can be most likely due to the guest entering an invalid
> state for Intel VT. For example, the guest maybe running in big real mode
> which is not supported on less recent Intel processors.
> 
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f3
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=8c 0a 14 28 3c 50 64 c8 66 90 66 90 66 90 66 90 66 90 66 90 <2e> 66 83 3e c8 61 00 0f 85 89 f0 31 d2 8e d2 66 bc 00 70 00 00 66 ba 63 fc 0e 00 e9 f3 ee
> 
> ----------------------
> [   80.806596] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
> [  293.504118] BUG: unable to handle page fault for address: ffd43f00063049e8
> [  293.515075] #PF: supervisor read access in kernel mode
> [  293.524031] #PF: error_code(0x0000) - not-present page
> [  293.532935] PGD 86dfd8067 P4D 0
> [  293.539626] Oops: 0000 [#1] PREEMPT SMP
> [  293.546958] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
> [  293.565354] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021

Thanks for the trace!  But please trim the superfluous information, e.g. the
timestamps, registers, code stream, etc... aren't necessary to understand why the
fault occured.

> [  293.583639] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> [  293.592911] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> [  293.624056] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> [  293.633326] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> [  293.644758] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> [  293.656132] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> [  293.667480] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> [  293.678771] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> [  293.690069] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> [  293.702514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  293.712338] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> [  293.723802] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  293.735230] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  293.746597] PKRU: 55555554
> [  293.752962] Call Trace:
> [  293.758978]  <TASK>
> [  293.764579]  kvm_mmu_free_roots+0xd1/0x200 [kvm]
> [  293.773060]  __kvm_mmu_unload+0x29/0x70 [kvm]
> [  293.781177]  kvm_mmu_unload+0x13/0x20 [kvm]
> [  293.789012]  kvm_arch_destroy_vm+0x8a/0x190 [kvm]
> [  293.797355]  kvm_put_kvm+0x197/0x2d0 [kvm]
> [  293.804925]  kvm_vm_release+0x21/0x30 [kvm]
> [  293.812499]  __fput+0x8e/0x260
> [  293.818715]  ____fput+0xe/0x10
> [  293.824822]  task_work_run+0x6f/0xb0
> [  293.831433]  do_exit+0x327/0xa90
> [  293.837586]  ? futex_unqueue+0x3f/0x70

Lines with leading '?' should be dropped, they're "guesses" from the unwinder.

> [  293.844283]  do_group_exit+0x35/0xa0
> [  293.850770]  get_signal+0x911/0x930
> [  293.857137]  arch_do_signal_or_restart+0x37/0x720
> [  293.864896]  ? do_futex+0xf9/0x1a0
> [  293.871139]  ? __x64_sys_futex+0x66/0x160
> [  293.878001]  exit_to_user_mode_prepare+0xb2/0x140
> [  293.885576]  syscall_exit_to_user_mode+0x16/0x30
> [  293.892973]  do_syscall_64+0x4e/0x90
> [  293.899162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  293.906972] RIP: 0033:0x7f6c844f752d

Everything below here can be dropped as it's not relevant to the original bug.

E.g. the entire trace can be trimmed to:

  set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
  BUG: unable to handle page fault for address: ffd43f00063049e8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 86dfd8067 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
  Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
  RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
  Call Trace:
   <TASK>
   kvm_mmu_free_roots+0xd1/0x200 [kvm]
   __kvm_mmu_unload+0x29/0x70 [kvm]
   kvm_mmu_unload+0x13/0x20 [kvm]
   kvm_arch_destroy_vm+0x8a/0x190 [kvm]
   kvm_put_kvm+0x197/0x2d0 [kvm]
   kvm_vm_release+0x21/0x30 [kvm]
   __fput+0x8e/0x260
   ____fput+0xe/0x10
   task_work_run+0x6f/0xb0
   do_exit+0x327/0xa90
   do_group_exit+0x35/0xa0
   get_signal+0x911/0x930
   arch_do_signal_or_restart+0x37/0x720
   exit_to_user_mode_prepare+0xb2/0x140
   syscall_exit_to_user_mode+0x16/0x30
   do_syscall_64+0x4e/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

> [  293.913050] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> [  293.922442] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> [  293.933048] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> [  293.943161] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> [  293.953281] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> [  293.963401] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> [  293.973542] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> [  293.983683]  </TASK>
> [  293.988216] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> [  294.022648] CR2: ffd43f00063049e8
> [  294.028694] ---[ end trace 0000000000000000 ]---
> [  294.042573] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> [  294.050908] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> [  294.079460] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> [  294.087908] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> [  294.098558] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> [  294.109193] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> [  294.119831] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> [  294.130449] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> [  294.141090] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> [  294.152867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  294.162035] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> [  294.172825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  294.183618] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  294.194390] PKRU: 55555554
> [  294.200189] note: qemu-system-x86[4260] exited with preempt_count 1
> [  294.210044] Fixing recursive fault but reboot is needed!
> [  294.218854] BUG: scheduling while atomic: qemu-system-x86/4260/0x00000000
> [  294.229357] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> [  294.266273] Preemption disabled at:
> [  294.266273] [<ffffffff8109e404>] do_task_dead+0x24/0x50
> [  294.282274] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G      D W         5.18.0-rc6-kvm-upstream-workaround+ #82
> [  294.300693] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
> [  294.319002] Call Trace:
> [  294.325007]  <TASK>
> [  294.330587]  dump_stack_lvl+0x38/0x49
> [  294.337889]  ? do_task_dead+0x24/0x50
> [  294.345102]  dump_stack+0x10/0x12
> [  294.351836]  __schedule_bug.cold.156+0x7d/0x8e
> [  294.359770]  __schedule+0x578/0x820
> [  294.366552]  ? vprintk+0x52/0x80
> [  294.373025]  ? _printk+0x58/0x6f
> [  294.379449]  do_task_dead+0x44/0x50
> [  294.386097]  make_task_dead.cold.48+0x50/0xaf
> [  294.393650]  rewind_stack_and_make_dead+0x17/0x17
> [  294.401549] RIP: 0033:0x7f6c844f752d
> [  294.408147] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> [  294.418086] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> [  294.429266] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> [  294.439998] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> [  294.450748] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> [  294.461498] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> [  294.472199] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> [  294.482869]  </TASK>
> 
> Signed-off-by: Yuan Yao <yuan.yao@intel.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..6bd144f1e60c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3411,7 +3411,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
>  					      i << 30, PT32_ROOT_LEVEL, true);
>  			mmu->pae_root[i] = root | PT_PRESENT_MASK |
> -					   shadow_me_mask;
> +					   shadow_me_value;
>  		}
>  		mmu->root.hpa = __pa(mmu->pae_root);
>  	} else {
> -- 
> 2.27.0
>
Yuan Yao June 7, 2022, 11:30 p.m. UTC | #3
On Tue, Jun 07, 2022 at 02:58:57PM +0000, Sean Christopherson wrote:
> "KVM: x86/mmu:" for the scope please (because I'm holding out hope that someday
> we'll have a common "KVM: MMU:" that's shared by multiple architectures.
>
> And there's a s/shdaow/shadow typo.  That said, I'd prefer to use the shortlog to
> give the reader a hint as to what the underlying bug is, e.g. it's not immediately
> obvious that hitting this bug requires a platform with MKTME enabled.  Maybe:
>
>   KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs

Fair enough, This includes more useful information, I will use this in V2.

>
> On Tue, Jun 07, 2022, Yuan Yao wrote:
> > commit e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and
>
> Personal preference, I don't bother with a "commit ..." in the changelog if there's
> a Fixes: that provides the same information.
>
> > repurpose shadow_me_mask") repurposed below varables:
>
> Please wrap closer to ~75 chars.
>
> Please lead with the "what", then dive into the details, that way the reader has
> an idea of what's changing.  And if there's a trace, finding the one sentence that
> describes the actual change can be surprisingly difficult.  Somethine like:
>
>   Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow
>   PDPTRs, when host memory encryption is supported.  The "mask" is the set of all
>   possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the
>   actual value that needs to be stuffed into host page tables.
>
>   Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA
>   bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses
>   with non-zero MKTME bits sending to_shadow_page() into the weeds.
>

I got it, the part you wrote is much better and describes it clearly, I will
use this in v2.

> > shadow_me_value: the memory encryption bit(s) that will be
> > set to the SPTE (the original shadow_me_mask).
> > shadow_me_mask: all possible memory encryption bits (which
> > is a super set of shadow_me_value).
> >
> > So assign shadow_me_mask to pae root page is wrong, instead
> > using shadow_me_value.
> >
> > Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")
>
> Convention is to put Fixes: at the end of the changelog, i.e. after the trace and
> next to the Cc list and SOB chain.

OK, got it.

>
> > ----------------------
> > KVM: entry failed, hardware error 0x80000021
>
> For this specific bug, I wouldn't bother with a dump of the failed VM-Entry,
> nothing in the dump is relevant/interesting.
>
> > If you're running a guest on an Intel machine without unrestricted mode
> > support, the failure can be most likely due to the guest entering an invalid
> > state for Intel VT. For example, the guest maybe running in big real mode
> > which is not supported on less recent Intel processors.
> >
> > EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f3
> > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> > EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 00000000 0000ffff 00009300
> > CS =f000 000f0000 0000ffff 00009b00
> > SS =0000 00000000 0000ffff 00009300
> > DS =0000 00000000 0000ffff 00009300
> > FS =0000 00000000 0000ffff 00009300
> > GS =0000 00000000 0000ffff 00009300
> > LDT=0000 00000000 0000ffff 00008200
> > TR =0000 00000000 0000ffff 00008b00
> > GDT=     00000000 0000ffff
> > IDT=     00000000 0000ffff
> > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000000
> > Code=8c 0a 14 28 3c 50 64 c8 66 90 66 90 66 90 66 90 66 90 66 90 <2e> 66 83 3e c8 61 00 0f 85 89 f0 31 d2 8e d2 66 bc 00 70 00 00 66 ba 63 fc 0e 00 e9 f3 ee
> >
> > ----------------------
> > [   80.806596] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
> > [  293.504118] BUG: unable to handle page fault for address: ffd43f00063049e8
> > [  293.515075] #PF: supervisor read access in kernel mode
> > [  293.524031] #PF: error_code(0x0000) - not-present page
> > [  293.532935] PGD 86dfd8067 P4D 0
> > [  293.539626] Oops: 0000 [#1] PREEMPT SMP
> > [  293.546958] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
> > [  293.565354] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
>
> Thanks for the trace!  But please trim the superfluous information, e.g. the
> timestamps, registers, code stream, etc... aren't necessary to understand why the
> fault occured.
>
> > [  293.583639] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> > [  293.592911] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> > [  293.624056] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> > [  293.633326] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> > [  293.644758] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> > [  293.656132] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> > [  293.667480] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> > [  293.678771] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> > [  293.690069] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> > [  293.702514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  293.712338] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> > [  293.723802] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  293.735230] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [  293.746597] PKRU: 55555554
> > [  293.752962] Call Trace:
> > [  293.758978]  <TASK>
> > [  293.764579]  kvm_mmu_free_roots+0xd1/0x200 [kvm]
> > [  293.773060]  __kvm_mmu_unload+0x29/0x70 [kvm]
> > [  293.781177]  kvm_mmu_unload+0x13/0x20 [kvm]
> > [  293.789012]  kvm_arch_destroy_vm+0x8a/0x190 [kvm]
> > [  293.797355]  kvm_put_kvm+0x197/0x2d0 [kvm]
> > [  293.804925]  kvm_vm_release+0x21/0x30 [kvm]
> > [  293.812499]  __fput+0x8e/0x260
> > [  293.818715]  ____fput+0xe/0x10
> > [  293.824822]  task_work_run+0x6f/0xb0
> > [  293.831433]  do_exit+0x327/0xa90
> > [  293.837586]  ? futex_unqueue+0x3f/0x70
>
> Lines with leading '?' should be dropped, they're "guesses" from the unwinder.
>
> > [  293.844283]  do_group_exit+0x35/0xa0
> > [  293.850770]  get_signal+0x911/0x930
> > [  293.857137]  arch_do_signal_or_restart+0x37/0x720
> > [  293.864896]  ? do_futex+0xf9/0x1a0
> > [  293.871139]  ? __x64_sys_futex+0x66/0x160
> > [  293.878001]  exit_to_user_mode_prepare+0xb2/0x140
> > [  293.885576]  syscall_exit_to_user_mode+0x16/0x30
> > [  293.892973]  do_syscall_64+0x4e/0x90
> > [  293.899162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  293.906972] RIP: 0033:0x7f6c844f752d
>
> Everything below here can be dropped as it's not relevant to the original bug.
>
> E.g. the entire trace can be trimmed to:

Ah, I thought that the original trace carries most information
which maybe useful to other people. Let me trim them as you
suggested in V2, thanks.

>
>   set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
>   BUG: unable to handle page fault for address: ffd43f00063049e8
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 86dfd8067 P4D 0
>   Oops: 0000 [#1] PREEMPT SMP
>   CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G        W         5.18.0-rc6-kvm-upstream-workaround+ #82
>   Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
>   RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
>   Call Trace:
>    <TASK>
>    kvm_mmu_free_roots+0xd1/0x200 [kvm]
>    __kvm_mmu_unload+0x29/0x70 [kvm]
>    kvm_mmu_unload+0x13/0x20 [kvm]
>    kvm_arch_destroy_vm+0x8a/0x190 [kvm]
>    kvm_put_kvm+0x197/0x2d0 [kvm]
>    kvm_vm_release+0x21/0x30 [kvm]
>    __fput+0x8e/0x260
>    ____fput+0xe/0x10
>    task_work_run+0x6f/0xb0
>    do_exit+0x327/0xa90
>    do_group_exit+0x35/0xa0
>    get_signal+0x911/0x930
>    arch_do_signal_or_restart+0x37/0x720
>    exit_to_user_mode_prepare+0xb2/0x140
>    syscall_exit_to_user_mode+0x16/0x30
>    do_syscall_64+0x4e/0x90
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> > [  293.913050] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> > [  293.922442] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> > [  293.933048] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> > [  293.943161] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> > [  293.953281] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> > [  293.963401] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> > [  293.973542] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> > [  293.983683]  </TASK>
> > [  293.988216] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> > [  294.022648] CR2: ffd43f00063049e8
> > [  294.028694] ---[ end trace 0000000000000000 ]---
> > [  294.042573] RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
> > [  294.050908] Code: 25 28 00 00 00 48 89 45 f0 31 c0 48 8b 06 48 83 f8 ff 74 4a 48 c1 e0 0c 48 89 f3 48 c1 e8 18 48 c1 e0 06 48 03 05 e4 08 20 c2 <48> 8b 70 28 48 85 f6 74 41 80 7e 20 00 75 17 83 6e 48 01 75 18 f6
> > [  294.079460] RSP: 0018:ffa000000b3afb88 EFLAGS: 00010286
> > [  294.087908] RAX: ffd43f00063049c0 RBX: ff110000777ff000 RCX: 0000000000000001
> > [  294.098558] RDX: ffa000000b3afbc8 RSI: ff110000777ff000 RDI: ffa000000c211000
> > [  294.109193] RBP: ffa000000b3afba0 R08: 0000000000000100 R09: ffa000000b3afbe0
> > [  294.119831] R10: ffa000000b3afbe0 R11: 0000000000000000 R12: ffa000000c211000
> > [  294.130449] R13: ffa000000b3afbc8 R14: ff11000112e9c290 R15: 00000000ffffffef
> > [  294.141090] FS:  0000000000000000(0000) GS:ff1100084e300000(0000) knlGS:0000000000000000
> > [  294.152867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  294.162035] CR2: ffd43f00063049e8 CR3: 000000000260a006 CR4: 0000000000773ee0
> > [  294.172825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  294.183618] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [  294.194390] PKRU: 55555554
> > [  294.200189] note: qemu-system-x86[4260] exited with preempt_count 1
> > [  294.210044] Fixing recursive fault but reboot is needed!
> > [  294.218854] BUG: scheduling while atomic: qemu-system-x86/4260/0x00000000
> > [  294.229357] Modules linked in: kvm_intel kvm x86_pkg_temp_thermal snd_pcm input_leds snd_timer joydev led_class snd irqbypass efi_pstore soundcore mac_hid button sch_fq_codel ip_tables x_tables ixgbe mdio mdio_devres libphy igc xfrm_algo ptp pps_core efivarfs [last unloaded: kvm]
> > [  294.266273] Preemption disabled at:
> > [  294.266273] [<ffffffff8109e404>] do_task_dead+0x24/0x50
> > [  294.282274] CPU: 164 PID: 4260 Comm: qemu-system-x86 Tainted: G      D W         5.18.0-rc6-kvm-upstream-workaround+ #82
> > [  294.300693] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0069.D14.2111291356 11/29/2021
> > [  294.319002] Call Trace:
> > [  294.325007]  <TASK>
> > [  294.330587]  dump_stack_lvl+0x38/0x49
> > [  294.337889]  ? do_task_dead+0x24/0x50
> > [  294.345102]  dump_stack+0x10/0x12
> > [  294.351836]  __schedule_bug.cold.156+0x7d/0x8e
> > [  294.359770]  __schedule+0x578/0x820
> > [  294.366552]  ? vprintk+0x52/0x80
> > [  294.373025]  ? _printk+0x58/0x6f
> > [  294.379449]  do_task_dead+0x44/0x50
> > [  294.386097]  make_task_dead.cold.48+0x50/0xaf
> > [  294.393650]  rewind_stack_and_make_dead+0x17/0x17
> > [  294.401549] RIP: 0033:0x7f6c844f752d
> > [  294.408147] Code: Unable to access opcode bytes at RIP 0x7f6c844f7503.
> > [  294.418086] RSP: 002b:00007f6c7fbfe648 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
> > [  294.429266] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f6c844f752d
> > [  294.439998] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000557dd5a1ac58
> > [  294.450748] RBP: 00007f6c7fbfe670 R08: 0000000000000000 R09: 0000000000000000
> > [  294.461498] R10: 0000000000000000 R11: 0000000000000212 R12: 00007ffd0f3001be
> > [  294.472199] R13: 00007ffd0f3001bf R14: 00007ffd0f300280 R15: 00007f6c7fbfe880
> > [  294.482869]  </TASK>
> >
> > Signed-off-by: Yuan Yao <yuan.yao@intel.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..6bd144f1e60c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3411,7 +3411,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
> >  			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
> >  					      i << 30, PT32_ROOT_LEVEL, true);
> >  			mmu->pae_root[i] = root | PT_PRESENT_MASK |
> > -					   shadow_me_mask;
> > +					   shadow_me_value;
> >  		}
> >  		mmu->root.hpa = __pa(mmu->pae_root);
> >  	} else {
> > --
> > 2.27.0
> >
Sean Christopherson June 7, 2022, 11:38 p.m. UTC | #4
On Wed, Jun 08, 2022, Yuan Yao wrote:
> On Tue, Jun 07, 2022 at 02:58:57PM +0000, Sean Christopherson wrote:
> > Everything below here can be dropped as it's not relevant to the original bug.
> >
> > E.g. the entire trace can be trimmed to:
> 
> Ah, I thought that the original trace carries most information
> which maybe useful to other people. Let me trim them as you
> suggested in V2, thanks.

For bug reports, it's helpful to have the raw trace as the context is useful for
debug.  But for changelogs, the goal is only to document the failure signature,
e.g. so that reviewers understand what broke, users that encounter a similar splat
can find a possible fix, etc...
Yuan Yao June 8, 2022, 12:37 a.m. UTC | #5
On Tue, Jun 07, 2022 at 11:38:48PM +0000, Sean Christopherson wrote:
> On Wed, Jun 08, 2022, Yuan Yao wrote:
> > On Tue, Jun 07, 2022 at 02:58:57PM +0000, Sean Christopherson wrote:
> > > Everything below here can be dropped as it's not relevant to the original bug.
> > >
> > > E.g. the entire trace can be trimmed to:
> >
> > Ah, I thought that the original trace carries most information
> > which maybe useful to other people. Let me trim them as you
> > suggested in V2, thanks.
>
> For bug reports, it's helpful to have the raw trace as the context is useful for
> debug.  But for changelogs, the goal is only to document the failure signature,
> e.g. so that reviewers understand what broke, users that encounter a similar splat
> can find a possible fix, etc...

I got it, I will send a new v1 patch with new subject for this:
KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e0..6bd144f1e60c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3411,7 +3411,7 @@  static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
 			mmu->pae_root[i] = root | PT_PRESENT_MASK |
-					   shadow_me_mask;
+					   shadow_me_value;
 		}
 		mmu->root.hpa = __pa(mmu->pae_root);
 	} else {