Message ID | 20211012072428.2569-1-dongli.zhang@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix the Xen HVM kdump/kexec boot panic issue | expand |
On 12.10.21 09:24, Dongli Zhang wrote: > When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap > to xen side with reason=soft_reset. As a result, the xen will reboot the VM > with the kdump kernel. > > Unfortunately, when the VM is panic with below command line ... > > "taskset -c 33 echo c > /proc/sysrq-trigger" > > ... the kdump kernel is panic at early stage ... > > PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20 > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1 > [ 0.000000] Hardware name: Xen HVM domU > [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 > ... ... > [ 0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: 0000000000000000 > [ 0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff > [ 0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020 > [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001 > [ 0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004 > [ 0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffaa95e000(0000) knlGS:0000000000000000 > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0 > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 0.000000] Call Trace: > [ 0.000000] ? xen_init_time_common+0x11/0x55 > [ 0.000000] ? xen_hvm_init_time_ops+0x23/0x45 > [ 0.000000] ? xen_hvm_guest_init+0x214/0x251 > [ 0.000000] ? 0xffffffffa8c00000 > [ 0.000000] ? setup_arch+0x440/0xbd6 > [ 0.000000] ? start_kernel+0x6a/0x689 > [ 0.000000] ? secondary_startup_64_no_verify+0xc2/0xcb > > This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' > embedded inside 'shared_info' during early stage until xen_vcpu_setup() is > used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. > > > The 1st patch is to fix the issue at VM kernel side. However, we may > observe clock drift at VM side due to the issue at xen hypervisor side. > This is because the pv vcpu_time_info is not updated when > VCPUOP_register_vcpu_info. > > The 2nd patch is to force_update_vcpu_system_time() at xen side when > VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel > boot. Please don't mix patches for multiple projects in one series. In cases like this it is fine to mention the other project's patch verbally instead. Juergen
Hi Juergen, On 10/12/21 1:47 AM, Juergen Gross wrote: > On 12.10.21 09:24, Dongli Zhang wrote: >> When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap >> to xen side with reason=soft_reset. As a result, the xen will reboot the VM >> with the kdump kernel. >> >> Unfortunately, when the VM is panic with below command line ... >> >> "taskset -c 33 echo c > /proc/sysrq-trigger" >> >> ... the kdump kernel is panic at early stage ... >> >> PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20 >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1 >> [ 0.000000] Hardware name: Xen HVM domU >> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >> ... ... >> [ 0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: >> 0000000000000000 >> [ 0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff >> [ 0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020 >> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001 >> [ 0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004 >> [ 0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000 >> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffaa95e000(0000) >> knlGS:0000000000000000 >> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0 >> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 0.000000] Call Trace: >> [ 0.000000] ? xen_init_time_common+0x11/0x55 >> [ 0.000000] ? xen_hvm_init_time_ops+0x23/0x45 >> [ 0.000000] ? xen_hvm_guest_init+0x214/0x251 >> [ 0.000000] ? 0xffffffffa8c00000 >> [ 0.000000] ? setup_arch+0x440/0xbd6 >> [ 0.000000] ? start_kernel+0x6a/0x689 >> [ 0.000000] ? secondary_startup_64_no_verify+0xc2/0xcb >> >> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' >> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is >> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. >> >> >> The 1st patch is to fix the issue at VM kernel side. However, we may >> observe clock drift at VM side due to the issue at xen hypervisor side. >> This is because the pv vcpu_time_info is not updated when >> VCPUOP_register_vcpu_info. >> >> The 2nd patch is to force_update_vcpu_system_time() at xen side when >> VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel >> boot. > > Please don't mix patches for multiple projects in one series. > > In cases like this it is fine to mention the other project's patch > verbally instead. > I will split the patchset in v2 and email to different projects. The core ideas of this combined patchset are: 1. Fix at HVM domU side (kdump kernel panic) 2. Fix at Xen hypervisor side (clock drift issue in kdump kernel) 3. To report (or seek for help) that soft_reset does not work with mainline-xen so that I am not able to test my patchset with the most recent mainline xen. Thank you very much! Dongli Zhang