Message ID | 20230730234840.1989974-1-gshan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: Fix crash by initializing kvm_state early | expand |
On 31.07.23 01:48, Gavin Shan wrote: > Runs into core dump on arm64 and the backtrace extracted from the > core dump is shown as below. It's caused by accessing @kvm_state which > isn't initialized at that point due to commit 176d073029 ("hw/arm/virt: > Use machine_memory_devices_init()"), where the machine's memory region > is added ealier than before. s/ealier/earlier/ > > main > qemu_init > configure_accelerators > qemu_opts_foreach > do_configure_accelerator > accel_init_machine > kvm_init > virt_kvm_type > virt_set_memmap > machine_memory_devices_init > memory_region_add_subregion > memory_region_add_subregion_common > memory_region_update_container_subregions > memory_region_transaction_begin > qemu_flush_coalesced_mmio_buffer > kvm_flush_coalesced_mmio_buffer > > Fix it by initializing @kvm_state early. With this applied, no crash > is observed on arm64. Interestingly, we register memory listeners in kvm_init() after setting kvm_state, so in theory it should have worked fine. But it's rather surprising that we see kvm_flush_coalesced_mmio_buffer() call even though we didn't even setup a listener with kvm_coalesce_mmio_region / kvm_uncoalesce_mmio_region. Such a notifier-specific flush might have been better placed in the MemoryListener->begin() call. But that needs more thought, as qemu_flush_coalesced_mmio_buffer() is called from a couple of places. > > Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()") > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > accel/kvm/kvm-all.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 373d876c05..c825cba12f 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -2464,6 +2464,7 @@ static int kvm_init(MachineState *ms) > qemu_mutex_init(&kml_slots_lock); > > s = KVM_STATE(ms->accelerator); > + kvm_state = s; > > /* > * On systems where the kernel can support different base page > @@ -2695,8 +2696,6 @@ static int kvm_init(MachineState *ms) > #endif > } > > - kvm_state = s; > - > ret = kvm_arch_init(ms, s); > if (ret < 0) { > goto err; As an alternative, we might simply do nothing in kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet. We don't have any notifier registered in that case. Thanks!
On Mon, 31 Jul 2023 at 08:18, David Hildenbrand <david@redhat.com> wrote: > > On 31.07.23 01:48, Gavin Shan wrote: > > Runs into core dump on arm64 and the backtrace extracted from the > > core dump is shown as below. It's caused by accessing @kvm_state which > > isn't initialized at that point due to commit 176d073029 ("hw/arm/virt: > > Use machine_memory_devices_init()"), where the machine's memory region > > is added ealier than before. > > s/ealier/earlier/ > > > > > main > > qemu_init > > configure_accelerators > > qemu_opts_foreach > > do_configure_accelerator > > accel_init_machine > > kvm_init > > virt_kvm_type > > virt_set_memmap > > machine_memory_devices_init > > memory_region_add_subregion > > memory_region_add_subregion_common > > memory_region_update_container_subregions > > memory_region_transaction_begin > > qemu_flush_coalesced_mmio_buffer > > kvm_flush_coalesced_mmio_buffer > > > > Fix it by initializing @kvm_state early. With this applied, no crash > > is observed on arm64. > As an alternative, we might simply do nothing in > kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet. > We don't have any notifier registered in that case. Yes, this seems better I think -- conceptually kvm_init() probably ought to first set up the accelerator state and then set kvm_state last, so that other code that looks at the kvm_state global either sees NULL or else a completely valid state, not a possibly half-initialised one. (We should probably also NULL the global in the error-exit path, though I imagine we're about to exit in that case.) Is somebody able to write/test a patch for that today? Ideally we'd fix this for tomorrow's rc... thanks -- PMM
On 7/31/23 22:39, Peter Maydell wrote: > On Mon, 31 Jul 2023 at 08:18, David Hildenbrand <david@redhat.com> wrote: >> >> On 31.07.23 01:48, Gavin Shan wrote: >>> Runs into core dump on arm64 and the backtrace extracted from the >>> core dump is shown as below. It's caused by accessing @kvm_state which >>> isn't initialized at that point due to commit 176d073029 ("hw/arm/virt: >>> Use machine_memory_devices_init()"), where the machine's memory region >>> is added ealier than before. >> >> s/ealier/earlier/ >> >>> >>> main >>> qemu_init >>> configure_accelerators >>> qemu_opts_foreach >>> do_configure_accelerator >>> accel_init_machine >>> kvm_init >>> virt_kvm_type >>> virt_set_memmap >>> machine_memory_devices_init >>> memory_region_add_subregion >>> memory_region_add_subregion_common >>> memory_region_update_container_subregions >>> memory_region_transaction_begin >>> qemu_flush_coalesced_mmio_buffer >>> kvm_flush_coalesced_mmio_buffer >>> >>> Fix it by initializing @kvm_state early. With this applied, no crash >>> is observed on arm64. > >> As an alternative, we might simply do nothing in >> kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet. >> We don't have any notifier registered in that case. > > Yes, this seems better I think -- conceptually kvm_init() > probably ought to first set up the accelerator state and > then set kvm_state last, so that other code that looks > at the kvm_state global either sees NULL or else a > completely valid state, not a possibly half-initialised > one. (We should probably also NULL the global in the > error-exit path, though I imagine we're about to exit > in that case.) > > Is somebody able to write/test a patch for that today? > Ideally we'd fix this for tomorrow's rc... > Thanks for your comments, David and Peter. v2 was posted for a quick merge. https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00702.html Thanks, Gavin
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 373d876c05..c825cba12f 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -2464,6 +2464,7 @@ static int kvm_init(MachineState *ms) qemu_mutex_init(&kml_slots_lock); s = KVM_STATE(ms->accelerator); + kvm_state = s; /* * On systems where the kernel can support different base page @@ -2695,8 +2696,6 @@ static int kvm_init(MachineState *ms) #endif } - kvm_state = s; - ret = kvm_arch_init(ms, s); if (ret < 0) { goto err;
Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing @kvm_state which isn't initialized at that point due to commit 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added ealier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by initializing @kvm_state early. With this applied, no crash is observed on arm64. Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> --- accel/kvm/kvm-all.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)