Message ID | 20201126215017.41156-3-agraf@csgraf.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hvf: Implement Apple Silicon Support | expand |
On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: > Until now, Hypervisor.framework has only been available on x86_64 systems. > With Apple Silicon shipping now, it extends its reach to aarch64. To > prepare for support for multiple architectures, let's move common code out > into its own accel directory. > > Signed-off-by: Alexander Graf <agraf@csgraf.de> > --- > MAINTAINERS | 9 +- > accel/hvf/hvf-all.c | 56 +++++ > accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ > accel/hvf/meson.build | 7 + > accel/meson.build | 1 + > include/sysemu/hvf_int.h | 69 ++++++ > target/i386/hvf/hvf-cpus.c | 131 ---------- > target/i386/hvf/hvf-cpus.h | 25 -- > target/i386/hvf/hvf-i386.h | 48 +--- > target/i386/hvf/hvf.c | 360 +-------------------------- > target/i386/hvf/meson.build | 1 - > target/i386/hvf/x86hvf.c | 11 +- > target/i386/hvf/x86hvf.h | 2 - > 13 files changed, 619 insertions(+), 569 deletions(-) > create mode 100644 accel/hvf/hvf-all.c > create mode 100644 accel/hvf/hvf-cpus.c > create mode 100644 accel/hvf/meson.build > create mode 100644 include/sysemu/hvf_int.h > delete mode 100644 target/i386/hvf/hvf-cpus.c > delete mode 100644 target/i386/hvf/hvf-cpus.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index 68bc160f41..ca4b6d9279 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> > M: Roman Bolshakov <r.bolshakov@yadro.com> > W: https://wiki.qemu.org/Features/HVF > S: Maintained > -F: accel/stubs/hvf-stub.c There was a patch for that in the RFC series from Claudio. > F: target/i386/hvf/ > + > +HVF > +M: Cameron Esfahani <dirty@apple.com> > +M: Roman Bolshakov <r.bolshakov@yadro.com> > +W: https://wiki.qemu.org/Features/HVF > +S: Maintained > +F: accel/hvf/ > F: include/sysemu/hvf.h > +F: include/sysemu/hvf_int.h > > WHPX CPUs > M: Sunil Muthuswamy <sunilmut@microsoft.com> > diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c > new file mode 100644 > index 0000000000..47d77a472a > --- /dev/null > +++ b/accel/hvf/hvf-all.c > @@ -0,0 +1,56 @@ > +/* > + * QEMU Hypervisor.framework support > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + * > + * Contributions after 2012-01-13 are licensed under the terms of the > + * GNU GPL, version 2 or (at your option) any later version. > + */ > + > +#include "qemu/osdep.h" > +#include "qemu-common.h" > +#include "qemu/error-report.h" > +#include "sysemu/hvf.h" > +#include "sysemu/hvf_int.h" > +#include "sysemu/runstate.h" > + > +#include "qemu/main-loop.h" > +#include "sysemu/accel.h" > + > +#include <Hypervisor/Hypervisor.h> > + > +bool hvf_allowed; > +HVFState *hvf_state; > + > +void assert_hvf_ok(hv_return_t ret) > +{ > + if (ret == HV_SUCCESS) { > + return; > + } > + > + switch (ret) { > + case HV_ERROR: > + error_report("Error: HV_ERROR"); > + break; > + case HV_BUSY: > + error_report("Error: HV_BUSY"); > + break; > + case HV_BAD_ARGUMENT: > + error_report("Error: HV_BAD_ARGUMENT"); > + break; > + case HV_NO_RESOURCES: > + error_report("Error: HV_NO_RESOURCES"); > + break; > + case HV_NO_DEVICE: > + error_report("Error: HV_NO_DEVICE"); > + break; > + case HV_UNSUPPORTED: > + error_report("Error: HV_UNSUPPORTED"); > + break; > + default: > + error_report("Unknown Error"); > + } > + > + abort(); > +} > diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c > new file mode 100644 > index 0000000000..f9bb5502b7 > --- /dev/null > +++ b/accel/hvf/hvf-cpus.c > @@ -0,0 +1,468 @@ > +/* > + * Copyright 2008 IBM Corporation > + * 2008 Red Hat, Inc. > + * Copyright 2011 Intel Corporation > + * Copyright 2016 Veertu, Inc. > + * Copyright 2017 The Android Open Source Project > + * > + * QEMU Hypervisor.framework support > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of version 2 of the GNU General Public > + * License as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, see <http://www.gnu.org/licenses/>. > + * > + * This file contain code under public domain from the hvdos project: > + * https://github.com/mist64/hvdos > + * > + * Parts Copyright (c) 2011 NetApp, Inc. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/error-report.h" > +#include "qemu/main-loop.h" > +#include "exec/address-spaces.h" > +#include "exec/exec-all.h" > +#include "sysemu/cpus.h" > +#include "sysemu/hvf.h" > +#include "sysemu/hvf_int.h" > +#include "sysemu/runstate.h" > +#include "qemu/guest-random.h" > + > +#include <Hypervisor/Hypervisor.h> > + > +/* Memory slots */ > + > +struct mac_slot { > + int present; > + uint64_t size; > + uint64_t gpa_start; > + uint64_t gva; > +}; > + > +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > +{ > + hvf_slot *slot; > + int x; > + for (x = 0; x < hvf_state->num_slots; ++x) { > + slot = &hvf_state->slots[x]; > + if (slot->size && start < (slot->start + slot->size) && > + (start + size) > slot->start) { > + return slot; > + } > + } > + return NULL; > +} > + > +struct mac_slot mac_slots[32]; > + > +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) > +{ > + struct mac_slot *macslot; > + hv_return_t ret; > + > + macslot = &mac_slots[slot->slot_id]; > + > + if (macslot->present) { > + if (macslot->size != slot->size) { > + macslot->present = 0; > + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); > + assert_hvf_ok(ret); > + } > + } > + > + if (!slot->size) { > + return 0; > + } > + > + macslot->present = 1; > + macslot->gpa_start = slot->start; > + macslot->size = slot->size; > + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); > + assert_hvf_ok(ret); > + return 0; > +} > + > +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) > +{ > + hvf_slot *mem; > + MemoryRegion *area = section->mr; > + bool writeable = !area->readonly && !area->rom_device; > + hv_memory_flags_t flags; > + > + if (!memory_region_is_ram(area)) { > + if (writeable) { > + return; > + } else if (!memory_region_is_romd(area)) { > + /* > + * If the memory device is not in romd_mode, then we actually want > + * to remove the hvf memory slot so all accesses will trap. > + */ > + add = false; > + } > + } > + > + mem = hvf_find_overlap_slot( > + section->offset_within_address_space, > + int128_get64(section->size)); > + > + if (mem && add) { > + if (mem->size == int128_get64(section->size) && > + mem->start == section->offset_within_address_space && > + mem->mem == (memory_region_get_ram_ptr(area) + > + section->offset_within_region)) { > + return; /* Same region was attempted to register, go away. */ > + } > + } > + > + /* Region needs to be reset. set the size to 0 and remap it. */ > + if (mem) { > + mem->size = 0; > + if (do_hvf_set_memory(mem, 0)) { > + error_report("Failed to reset overlapping slot"); > + abort(); > + } > + } > + > + if (!add) { > + return; > + } > + > + if (area->readonly || > + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { > + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > + } else { > + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; > + } > + > + /* Now make a new slot. */ > + int x; > + > + for (x = 0; x < hvf_state->num_slots; ++x) { > + mem = &hvf_state->slots[x]; > + if (!mem->size) { > + break; > + } > + } > + > + if (x == hvf_state->num_slots) { > + error_report("No free slots"); > + abort(); > + } > + > + mem->size = int128_get64(section->size); > + mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; > + mem->start = section->offset_within_address_space; > + mem->region = area; > + > + if (do_hvf_set_memory(mem, flags)) { > + error_report("Error registering new memory slot"); > + abort(); > + } > +} > + > +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) > +{ > + hvf_slot *slot; > + > + slot = hvf_find_overlap_slot( > + section->offset_within_address_space, > + int128_get64(section->size)); > + > + /* protect region against writes; begin tracking it */ > + if (on) { > + slot->flags |= HVF_SLOT_LOG; > + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, > + HV_MEMORY_READ); > + /* stop tracking region*/ > + } else { > + slot->flags &= ~HVF_SLOT_LOG; > + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, > + HV_MEMORY_READ | HV_MEMORY_WRITE); > + } > +} > + > +static void hvf_log_start(MemoryListener *listener, > + MemoryRegionSection *section, int old, int new) > +{ > + if (old != 0) { > + return; > + } > + > + hvf_set_dirty_tracking(section, 1); > +} > + > +static void hvf_log_stop(MemoryListener *listener, > + MemoryRegionSection *section, int old, int new) > +{ > + if (new != 0) { > + return; > + } > + > + hvf_set_dirty_tracking(section, 0); > +} > + > +static void hvf_log_sync(MemoryListener *listener, > + MemoryRegionSection *section) > +{ > + /* > + * sync of dirty pages is handled elsewhere; just make sure we keep > + * tracking the region. > + */ > + hvf_set_dirty_tracking(section, 1); > +} > + > +static void hvf_region_add(MemoryListener *listener, > + MemoryRegionSection *section) > +{ > + hvf_set_phys_mem(section, true); > +} > + > +static void hvf_region_del(MemoryListener *listener, > + MemoryRegionSection *section) > +{ > + hvf_set_phys_mem(section, false); > +} > + > +static MemoryListener hvf_memory_listener = { > + .priority = 10, > + .region_add = hvf_region_add, > + .region_del = hvf_region_del, > + .log_start = hvf_log_start, > + .log_stop = hvf_log_stop, > + .log_sync = hvf_log_sync, > +}; > + > +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) > +{ > + if (!cpu->vcpu_dirty) { > + hvf_get_registers(cpu); > + cpu->vcpu_dirty = true; > + } > +} > + > +static void hvf_cpu_synchronize_state(CPUState *cpu) > +{ > + if (!cpu->vcpu_dirty) { > + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); > + } > +} > + > +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > + run_on_cpu_data arg) > +{ > + hvf_put_registers(cpu); > + cpu->vcpu_dirty = false; > +} > + > +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) > +{ > + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); > +} > + > +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > + run_on_cpu_data arg) > +{ > + hvf_put_registers(cpu); > + cpu->vcpu_dirty = false; > +} > + > +static void hvf_cpu_synchronize_post_init(CPUState *cpu) > +{ > + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); > +} > + > +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > + run_on_cpu_data arg) > +{ > + cpu->vcpu_dirty = true; > +} > + > +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > +{ > + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); > +} > + > +static void hvf_vcpu_destroy(CPUState *cpu) > +{ > + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); > + assert_hvf_ok(ret); > + > + hvf_arch_vcpu_destroy(cpu); > +} > + > +static void dummy_signal(int sig) > +{ > +} > + > +static int hvf_init_vcpu(CPUState *cpu) > +{ > + int r; > + > + /* init cpu signals */ > + sigset_t set; > + struct sigaction sigact; > + > + memset(&sigact, 0, sizeof(sigact)); > + sigact.sa_handler = dummy_signal; > + sigaction(SIG_IPI, &sigact, NULL); > + > + pthread_sigmask(SIG_BLOCK, NULL, &set); > + sigdelset(&set, SIG_IPI); > + > +#ifdef __aarch64__ > + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL); > +#else > + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); > +#endif I think the first __aarch64__ bit fits better to arm part of the series. > + cpu->vcpu_dirty = 1; > + assert_hvf_ok(r); > + > + return hvf_arch_init_vcpu(cpu); > +} > + > +/* > + * The HVF-specific vCPU thread function. This one should only run when the host > + * CPU supports the VMX "unrestricted guest" feature. > + */ > +static void *hvf_cpu_thread_fn(void *arg) > +{ > + CPUState *cpu = arg; > + > + int r; > + > + assert(hvf_enabled()); > + > + rcu_register_thread(); > + > + qemu_mutex_lock_iothread(); > + qemu_thread_get_self(cpu->thread); > + > + cpu->thread_id = qemu_get_thread_id(); > + cpu->can_do_io = 1; > + current_cpu = cpu; > + > + hvf_init_vcpu(cpu); > + > + /* signal CPU creation */ > + cpu_thread_signal_created(cpu); > + qemu_guest_random_seed_thread_part2(cpu->random_seed); > + > + do { > + if (cpu_can_run(cpu)) { > + r = hvf_vcpu_exec(cpu); > + if (r == EXCP_DEBUG) { > + cpu_handle_guest_debug(cpu); > + } > + } > + qemu_wait_io_event(cpu); > + } while (!cpu->unplug || cpu_can_run(cpu)); > + > + hvf_vcpu_destroy(cpu); > + cpu_thread_signal_destroyed(cpu); > + qemu_mutex_unlock_iothread(); > + rcu_unregister_thread(); > + return NULL; > +} > + > +static void hvf_start_vcpu_thread(CPUState *cpu) > +{ > + char thread_name[VCPU_THREAD_NAME_SIZE]; > + > + /* > + * HVF currently does not support TCG, and only runs in > + * unrestricted-guest mode. > + */ > + assert(hvf_enabled()); > + > + cpu->thread = g_malloc0(sizeof(QemuThread)); > + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > + qemu_cond_init(cpu->halt_cond); > + > + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > + cpu->cpu_index); > + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, > + cpu, QEMU_THREAD_JOINABLE); > +} > + > +static const CpusAccel hvf_cpus = { > + .create_vcpu_thread = hvf_start_vcpu_thread, > + > + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > + .synchronize_post_init = hvf_cpu_synchronize_post_init, > + .synchronize_state = hvf_cpu_synchronize_state, > + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > +}; > + > +static int hvf_accel_init(MachineState *ms) > +{ > + int x; > + hv_return_t ret; > + HVFState *s; > + > + ret = hv_vm_create(HV_VM_DEFAULT); > + assert_hvf_ok(ret); > + > + s = g_new0(HVFState, 1); > + > + s->num_slots = 32; > + for (x = 0; x < s->num_slots; ++x) { > + s->slots[x].size = 0; > + s->slots[x].slot_id = x; > + } > + > + hvf_state = s; > + memory_listener_register(&hvf_memory_listener, &address_space_memory); > + cpus_register_accel(&hvf_cpus); > + return 0; > +} > + > +static void hvf_accel_class_init(ObjectClass *oc, void *data) > +{ > + AccelClass *ac = ACCEL_CLASS(oc); > + ac->name = "HVF"; > + ac->init_machine = hvf_accel_init; > + ac->allowed = &hvf_allowed; > +} > + > +static const TypeInfo hvf_accel_type = { > + .name = TYPE_HVF_ACCEL, > + .parent = TYPE_ACCEL, > + .class_init = hvf_accel_class_init, > +}; > + > +static void hvf_type_init(void) > +{ > + type_register_static(&hvf_accel_type); > +} > + > +type_init(hvf_type_init); > diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build > new file mode 100644 > index 0000000000..dfd6b68dc7 > --- /dev/null > +++ b/accel/hvf/meson.build > @@ -0,0 +1,7 @@ > +hvf_ss = ss.source_set() > +hvf_ss.add(files( > + 'hvf-all.c', > + 'hvf-cpus.c', > +)) > + > +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) > diff --git a/accel/meson.build b/accel/meson.build > index b26cca227a..6de12ce5d5 100644 > --- a/accel/meson.build > +++ b/accel/meson.build > @@ -1,5 +1,6 @@ > softmmu_ss.add(files('accel.c')) > > +subdir('hvf') > subdir('qtest') > subdir('kvm') > subdir('tcg') > diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h > new file mode 100644 > index 0000000000..de9bad23a8 > --- /dev/null > +++ b/include/sysemu/hvf_int.h > @@ -0,0 +1,69 @@ > +/* > + * QEMU Hypervisor.framework (HVF) support > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +/* header to be included in HVF-specific code */ > + > +#ifndef HVF_INT_H > +#define HVF_INT_H > + > +#include <Hypervisor/Hypervisor.h> > + > +#define HVF_MAX_VCPU 0x10 > + > +extern struct hvf_state hvf_global; > + > +struct hvf_vm { > + int id; > + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > +}; > + > +struct hvf_state { > + uint32_t version; > + struct hvf_vm *vm; > + uint64_t mem_quota; > +}; > + > +/* hvf_slot flags */ > +#define HVF_SLOT_LOG (1 << 0) > + > +typedef struct hvf_slot { > + uint64_t start; > + uint64_t size; > + uint8_t *mem; > + int slot_id; > + uint32_t flags; > + MemoryRegion *region; > +} hvf_slot; > + > +typedef struct hvf_vcpu_caps { > + uint64_t vmx_cap_pinbased; > + uint64_t vmx_cap_procbased; > + uint64_t vmx_cap_procbased2; > + uint64_t vmx_cap_entry; > + uint64_t vmx_cap_exit; > + uint64_t vmx_cap_preemption_timer; > +} hvf_vcpu_caps; > + > +struct HVFState { > + AccelState parent; > + hvf_slot slots[32]; > + int num_slots; > + > + hvf_vcpu_caps *hvf_caps; > +}; > +extern HVFState *hvf_state; > + > +void assert_hvf_ok(hv_return_t ret); > +int hvf_get_registers(CPUState *cpu); > +int hvf_put_registers(CPUState *cpu); > +int hvf_arch_init_vcpu(CPUState *cpu); > +void hvf_arch_vcpu_destroy(CPUState *cpu); > +int hvf_vcpu_exec(CPUState *cpu); > +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > + > +#endif > diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c > deleted file mode 100644 > index 817b3d7452..0000000000 > --- a/target/i386/hvf/hvf-cpus.c > +++ /dev/null > @@ -1,131 +0,0 @@ > -/* > - * Copyright 2008 IBM Corporation > - * 2008 Red Hat, Inc. > - * Copyright 2011 Intel Corporation > - * Copyright 2016 Veertu, Inc. > - * Copyright 2017 The Android Open Source Project > - * > - * QEMU Hypervisor.framework support > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of version 2 of the GNU General Public > - * License as published by the Free Software Foundation. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - * General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program; if not, see <http://www.gnu.org/licenses/>. > - * > - * This file contain code under public domain from the hvdos project: > - * https://github.com/mist64/hvdos > - * > - * Parts Copyright (c) 2011 NetApp, Inc. > - * All rights reserved. > - * > - * Redistribution and use in source and binary forms, with or without > - * modification, are permitted provided that the following conditions > - * are met: > - * 1. Redistributions of source code must retain the above copyright > - * notice, this list of conditions and the following disclaimer. > - * 2. Redistributions in binary form must reproduce the above copyright > - * notice, this list of conditions and the following disclaimer in the > - * documentation and/or other materials provided with the distribution. > - * > - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE > - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > - * SUCH DAMAGE. > - */ > - > -#include "qemu/osdep.h" > -#include "qemu/error-report.h" > -#include "qemu/main-loop.h" > -#include "sysemu/hvf.h" > -#include "sysemu/runstate.h" > -#include "target/i386/cpu.h" > -#include "qemu/guest-random.h" > - > -#include "hvf-cpus.h" > - > -/* > - * The HVF-specific vCPU thread function. This one should only run when the host > - * CPU supports the VMX "unrestricted guest" feature. > - */ > -static void *hvf_cpu_thread_fn(void *arg) > -{ > - CPUState *cpu = arg; > - > - int r; > - > - assert(hvf_enabled()); > - > - rcu_register_thread(); > - > - qemu_mutex_lock_iothread(); > - qemu_thread_get_self(cpu->thread); > - > - cpu->thread_id = qemu_get_thread_id(); > - cpu->can_do_io = 1; > - current_cpu = cpu; > - > - hvf_init_vcpu(cpu); > - > - /* signal CPU creation */ > - cpu_thread_signal_created(cpu); > - qemu_guest_random_seed_thread_part2(cpu->random_seed); > - > - do { > - if (cpu_can_run(cpu)) { > - r = hvf_vcpu_exec(cpu); > - if (r == EXCP_DEBUG) { > - cpu_handle_guest_debug(cpu); > - } > - } > - qemu_wait_io_event(cpu); > - } while (!cpu->unplug || cpu_can_run(cpu)); > - > - hvf_vcpu_destroy(cpu); > - cpu_thread_signal_destroyed(cpu); > - qemu_mutex_unlock_iothread(); > - rcu_unregister_thread(); > - return NULL; > -} > - > -static void hvf_start_vcpu_thread(CPUState *cpu) > -{ > - char thread_name[VCPU_THREAD_NAME_SIZE]; > - > - /* > - * HVF currently does not support TCG, and only runs in > - * unrestricted-guest mode. > - */ > - assert(hvf_enabled()); > - > - cpu->thread = g_malloc0(sizeof(QemuThread)); > - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > - qemu_cond_init(cpu->halt_cond); > - > - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > - cpu->cpu_index); > - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, > - cpu, QEMU_THREAD_JOINABLE); > -} > - > -const CpusAccel hvf_cpus = { > - .create_vcpu_thread = hvf_start_vcpu_thread, > - > - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > - .synchronize_post_init = hvf_cpu_synchronize_post_init, > - .synchronize_state = hvf_cpu_synchronize_state, > - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > -}; > diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h > deleted file mode 100644 > index ced31b82c0..0000000000 > --- a/target/i386/hvf/hvf-cpus.h > +++ /dev/null > @@ -1,25 +0,0 @@ > -/* > - * Accelerator CPUS Interface > - * > - * Copyright 2020 SUSE LLC > - * > - * This work is licensed under the terms of the GNU GPL, version 2 or later. > - * See the COPYING file in the top-level directory. > - */ > - > -#ifndef HVF_CPUS_H > -#define HVF_CPUS_H > - > -#include "sysemu/cpus.h" > - > -extern const CpusAccel hvf_cpus; > - > -int hvf_init_vcpu(CPUState *); > -int hvf_vcpu_exec(CPUState *); > -void hvf_cpu_synchronize_state(CPUState *); > -void hvf_cpu_synchronize_post_reset(CPUState *); > -void hvf_cpu_synchronize_post_init(CPUState *); > -void hvf_cpu_synchronize_pre_loadvm(CPUState *); > -void hvf_vcpu_destroy(CPUState *); > - > -#endif /* HVF_CPUS_H */ > diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h > index e0edffd077..6d56f8f6bb 100644 > --- a/target/i386/hvf/hvf-i386.h > +++ b/target/i386/hvf/hvf-i386.h > @@ -18,57 +18,11 @@ > > #include "sysemu/accel.h" > #include "sysemu/hvf.h" > +#include "sysemu/hvf_int.h" > #include "cpu.h" > #include "x86.h" > > -#define HVF_MAX_VCPU 0x10 > - > -extern struct hvf_state hvf_global; > - > -struct hvf_vm { > - int id; > - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > -}; > - > -struct hvf_state { > - uint32_t version; > - struct hvf_vm *vm; > - uint64_t mem_quota; > -}; > - > -/* hvf_slot flags */ > -#define HVF_SLOT_LOG (1 << 0) > - > -typedef struct hvf_slot { > - uint64_t start; > - uint64_t size; > - uint8_t *mem; > - int slot_id; > - uint32_t flags; > - MemoryRegion *region; > -} hvf_slot; > - > -typedef struct hvf_vcpu_caps { > - uint64_t vmx_cap_pinbased; > - uint64_t vmx_cap_procbased; > - uint64_t vmx_cap_procbased2; > - uint64_t vmx_cap_entry; > - uint64_t vmx_cap_exit; > - uint64_t vmx_cap_preemption_timer; > -} hvf_vcpu_caps; > - > -struct HVFState { > - AccelState parent; > - hvf_slot slots[32]; > - int num_slots; > - > - hvf_vcpu_caps *hvf_caps; > -}; > -extern HVFState *hvf_state; > - > -void hvf_set_phys_mem(MemoryRegionSection *, bool); > void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); > -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > > #ifdef NEED_CPU_H > /* Functions exported to host specific mode */ > diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c > index ed9356565c..8b96ecd619 100644 > --- a/target/i386/hvf/hvf.c > +++ b/target/i386/hvf/hvf.c > @@ -51,6 +51,7 @@ > #include "qemu/error-report.h" > > #include "sysemu/hvf.h" > +#include "sysemu/hvf_int.h" > #include "sysemu/runstate.h" > #include "hvf-i386.h" > #include "vmcs.h" > @@ -72,171 +73,6 @@ > #include "sysemu/accel.h" > #include "target/i386/cpu.h" > > -#include "hvf-cpus.h" > - > -HVFState *hvf_state; > - > -static void assert_hvf_ok(hv_return_t ret) > -{ > - if (ret == HV_SUCCESS) { > - return; > - } > - > - switch (ret) { > - case HV_ERROR: > - error_report("Error: HV_ERROR"); > - break; > - case HV_BUSY: > - error_report("Error: HV_BUSY"); > - break; > - case HV_BAD_ARGUMENT: > - error_report("Error: HV_BAD_ARGUMENT"); > - break; > - case HV_NO_RESOURCES: > - error_report("Error: HV_NO_RESOURCES"); > - break; > - case HV_NO_DEVICE: > - error_report("Error: HV_NO_DEVICE"); > - break; > - case HV_UNSUPPORTED: > - error_report("Error: HV_UNSUPPORTED"); > - break; > - default: > - error_report("Unknown Error"); > - } > - > - abort(); > -} > - > -/* Memory slots */ > -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > -{ > - hvf_slot *slot; > - int x; > - for (x = 0; x < hvf_state->num_slots; ++x) { > - slot = &hvf_state->slots[x]; > - if (slot->size && start < (slot->start + slot->size) && > - (start + size) > slot->start) { > - return slot; > - } > - } > - return NULL; > -} > - > -struct mac_slot { > - int present; > - uint64_t size; > - uint64_t gpa_start; > - uint64_t gva; > -}; > - > -struct mac_slot mac_slots[32]; > - > -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) > -{ > - struct mac_slot *macslot; > - hv_return_t ret; > - > - macslot = &mac_slots[slot->slot_id]; > - > - if (macslot->present) { > - if (macslot->size != slot->size) { > - macslot->present = 0; > - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); > - assert_hvf_ok(ret); > - } > - } > - > - if (!slot->size) { > - return 0; > - } > - > - macslot->present = 1; > - macslot->gpa_start = slot->start; > - macslot->size = slot->size; > - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags); > - assert_hvf_ok(ret); > - return 0; > -} > - > -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) > -{ > - hvf_slot *mem; > - MemoryRegion *area = section->mr; > - bool writeable = !area->readonly && !area->rom_device; > - hv_memory_flags_t flags; > - > - if (!memory_region_is_ram(area)) { > - if (writeable) { > - return; > - } else if (!memory_region_is_romd(area)) { > - /* > - * If the memory device is not in romd_mode, then we actually want > - * to remove the hvf memory slot so all accesses will trap. > - */ > - add = false; > - } > - } > - > - mem = hvf_find_overlap_slot( > - section->offset_within_address_space, > - int128_get64(section->size)); > - > - if (mem && add) { > - if (mem->size == int128_get64(section->size) && > - mem->start == section->offset_within_address_space && > - mem->mem == (memory_region_get_ram_ptr(area) + > - section->offset_within_region)) { > - return; /* Same region was attempted to register, go away. */ > - } > - } > - > - /* Region needs to be reset. set the size to 0 and remap it. */ > - if (mem) { > - mem->size = 0; > - if (do_hvf_set_memory(mem, 0)) { > - error_report("Failed to reset overlapping slot"); > - abort(); > - } > - } > - > - if (!add) { > - return; > - } > - > - if (area->readonly || > - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { > - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > - } else { > - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; > - } > - > - /* Now make a new slot. */ > - int x; > - > - for (x = 0; x < hvf_state->num_slots; ++x) { > - mem = &hvf_state->slots[x]; > - if (!mem->size) { > - break; > - } > - } > - > - if (x == hvf_state->num_slots) { > - error_report("No free slots"); > - abort(); > - } > - > - mem->size = int128_get64(section->size); > - mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; > - mem->start = section->offset_within_address_space; > - mem->region = area; > - > - if (do_hvf_set_memory(mem, flags)) { > - error_report("Error registering new memory slot"); > - abort(); > - } > -} > - > void vmx_update_tpr(CPUState *cpu) > { > /* TODO: need integrate APIC handling */ > @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer, > } > } > > -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) > -{ > - if (!cpu->vcpu_dirty) { > - hvf_get_registers(cpu); > - cpu->vcpu_dirty = true; > - } > -} > - > -void hvf_cpu_synchronize_state(CPUState *cpu) > -{ > - if (!cpu->vcpu_dirty) { > - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); > - } > -} > - > -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > - run_on_cpu_data arg) > -{ > - hvf_put_registers(cpu); > - cpu->vcpu_dirty = false; > -} > - > -void hvf_cpu_synchronize_post_reset(CPUState *cpu) > -{ > - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); > -} > - > -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > - run_on_cpu_data arg) > -{ > - hvf_put_registers(cpu); > - cpu->vcpu_dirty = false; > -} > - > -void hvf_cpu_synchronize_post_init(CPUState *cpu) > -{ > - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); > -} > - > -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > - run_on_cpu_data arg) > -{ > - cpu->vcpu_dirty = true; > -} > - > -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > -{ > - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); > -} > - > static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) > { > int read, write; > @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) > return false; > } > > -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) > -{ > - hvf_slot *slot; > - > - slot = hvf_find_overlap_slot( > - section->offset_within_address_space, > - int128_get64(section->size)); > - > - /* protect region against writes; begin tracking it */ > - if (on) { > - slot->flags |= HVF_SLOT_LOG; > - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > - HV_MEMORY_READ); > - /* stop tracking region*/ > - } else { > - slot->flags &= ~HVF_SLOT_LOG; > - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > - HV_MEMORY_READ | HV_MEMORY_WRITE); > - } > -} > - > -static void hvf_log_start(MemoryListener *listener, > - MemoryRegionSection *section, int old, int new) > -{ > - if (old != 0) { > - return; > - } > - > - hvf_set_dirty_tracking(section, 1); > -} > - > -static void hvf_log_stop(MemoryListener *listener, > - MemoryRegionSection *section, int old, int new) > -{ > - if (new != 0) { > - return; > - } > - > - hvf_set_dirty_tracking(section, 0); > -} > - > -static void hvf_log_sync(MemoryListener *listener, > - MemoryRegionSection *section) > -{ > - /* > - * sync of dirty pages is handled elsewhere; just make sure we keep > - * tracking the region. > - */ > - hvf_set_dirty_tracking(section, 1); > -} > - > -static void hvf_region_add(MemoryListener *listener, > - MemoryRegionSection *section) > -{ > - hvf_set_phys_mem(section, true); > -} > - > -static void hvf_region_del(MemoryListener *listener, > - MemoryRegionSection *section) > -{ > - hvf_set_phys_mem(section, false); > -} > - > -static MemoryListener hvf_memory_listener = { > - .priority = 10, > - .region_add = hvf_region_add, > - .region_del = hvf_region_del, > - .log_start = hvf_log_start, > - .log_stop = hvf_log_stop, > - .log_sync = hvf_log_sync, > -}; > - > -void hvf_vcpu_destroy(CPUState *cpu) > +void hvf_arch_vcpu_destroy(CPUState *cpu) > { > X86CPU *x86_cpu = X86_CPU(cpu); > CPUX86State *env = &x86_cpu->env; > > - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); > g_free(env->hvf_mmio_buf); > - assert_hvf_ok(ret); > -} > - > -static void dummy_signal(int sig) > -{ > } > > -int hvf_init_vcpu(CPUState *cpu) > +int hvf_arch_init_vcpu(CPUState *cpu) > { > > X86CPU *x86cpu = X86_CPU(cpu); > CPUX86State *env = &x86cpu->env; > - int r; > - > - /* init cpu signals */ > - sigset_t set; > - struct sigaction sigact; > - > - memset(&sigact, 0, sizeof(sigact)); > - sigact.sa_handler = dummy_signal; > - sigaction(SIG_IPI, &sigact, NULL); > - > - pthread_sigmask(SIG_BLOCK, NULL, &set); > - sigdelset(&set, SIG_IPI); > > init_emu(); > init_decoder(); > @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) > hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); > env->hvf_mmio_buf = g_new(char, 4096); > > - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); > - cpu->vcpu_dirty = 1; > - assert_hvf_ok(r); > - > if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, > &hvf_state->hvf_caps->vmx_cap_pinbased)) { > abort(); > @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) > > return ret; > } > - > -bool hvf_allowed; > - > -static int hvf_accel_init(MachineState *ms) > -{ > - int x; > - hv_return_t ret; > - HVFState *s; > - > - ret = hv_vm_create(HV_VM_DEFAULT); > - assert_hvf_ok(ret); > - > - s = g_new0(HVFState, 1); > - > - s->num_slots = 32; > - for (x = 0; x < s->num_slots; ++x) { > - s->slots[x].size = 0; > - s->slots[x].slot_id = x; > - } > - > - hvf_state = s; > - memory_listener_register(&hvf_memory_listener, &address_space_memory); > - cpus_register_accel(&hvf_cpus); > - return 0; > -} > - > -static void hvf_accel_class_init(ObjectClass *oc, void *data) > -{ > - AccelClass *ac = ACCEL_CLASS(oc); > - ac->name = "HVF"; > - ac->init_machine = hvf_accel_init; > - ac->allowed = &hvf_allowed; > -} > - > -static const TypeInfo hvf_accel_type = { > - .name = TYPE_HVF_ACCEL, > - .parent = TYPE_ACCEL, > - .class_init = hvf_accel_class_init, > -}; > - > -static void hvf_type_init(void) > -{ > - type_register_static(&hvf_accel_type); > -} > - > -type_init(hvf_type_init); > diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build > index 409c9a3f14..c8a43717ee 100644 > --- a/target/i386/hvf/meson.build > +++ b/target/i386/hvf/meson.build > @@ -1,6 +1,5 @@ > i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( > 'hvf.c', > - 'hvf-cpus.c', > 'x86.c', > 'x86_cpuid.c', > 'x86_decode.c', > diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c > index bbec412b6c..89b8e9d87a 100644 > --- a/target/i386/hvf/x86hvf.c > +++ b/target/i386/hvf/x86hvf.c > @@ -20,6 +20,9 @@ > #include "qemu/osdep.h" > > #include "qemu-common.h" > +#include "sysemu/hvf.h" > +#include "sysemu/hvf_int.h" > +#include "sysemu/hw_accel.h" > #include "x86hvf.h" > #include "vmx.h" > #include "vmcs.h" > @@ -32,8 +35,6 @@ > #include <Hypervisor/hv.h> > #include <Hypervisor/hv_vmx.h> > > -#include "hvf-cpus.h" > - > void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, > SegmentCache *qseg, bool is_tr) > { > @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) > env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); > > if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { > - hvf_cpu_synchronize_state(cpu_state); > + cpu_synchronize_state(cpu_state); > do_cpu_init(cpu); > } > > @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) > cpu_state->halted = 0; > } > if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { > - hvf_cpu_synchronize_state(cpu_state); > + cpu_synchronize_state(cpu_state); > do_cpu_sipi(cpu); > } > if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { > cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; > - hvf_cpu_synchronize_state(cpu_state); > + cpu_synchronize_state(cpu_state); The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should be a separate patch. It follows cpu/accel cleanups Claudio was doing the summer. Phillipe raised the idea that the patch might go ahead of ARM-specific part (which might involve some discussions) and I agree with that. Some sync between Claudio series (CC'd him) and the patch might be need. Thanks, Roman > apic_handle_tpr_access_report(cpu->apic_state, env->eip, > env->tpr_access_type); > } > diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h > index 635ab0f34e..99ed8d608d 100644 > --- a/target/i386/hvf/x86hvf.h > +++ b/target/i386/hvf/x86hvf.h > @@ -21,8 +21,6 @@ > #include "x86_descr.h" > > int hvf_process_events(CPUState *); > -int hvf_put_registers(CPUState *); > -int hvf_get_registers(CPUState *); > bool hvf_inject_interrupts(CPUState *); > void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, > SegmentCache *qseg, bool is_tr); > -- > 2.24.3 (Apple Git-128) > > >
On 27.11.20 21:00, Roman Bolshakov wrote: > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: >> Until now, Hypervisor.framework has only been available on x86_64 systems. >> With Apple Silicon shipping now, it extends its reach to aarch64. To >> prepare for support for multiple architectures, let's move common code out >> into its own accel directory. >> >> Signed-off-by: Alexander Graf <agraf@csgraf.de> >> --- >> MAINTAINERS | 9 +- >> accel/hvf/hvf-all.c | 56 +++++ >> accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ >> accel/hvf/meson.build | 7 + >> accel/meson.build | 1 + >> include/sysemu/hvf_int.h | 69 ++++++ >> target/i386/hvf/hvf-cpus.c | 131 ---------- >> target/i386/hvf/hvf-cpus.h | 25 -- >> target/i386/hvf/hvf-i386.h | 48 +--- >> target/i386/hvf/hvf.c | 360 +-------------------------- >> target/i386/hvf/meson.build | 1 - >> target/i386/hvf/x86hvf.c | 11 +- >> target/i386/hvf/x86hvf.h | 2 - >> 13 files changed, 619 insertions(+), 569 deletions(-) >> create mode 100644 accel/hvf/hvf-all.c >> create mode 100644 accel/hvf/hvf-cpus.c >> create mode 100644 accel/hvf/meson.build >> create mode 100644 include/sysemu/hvf_int.h >> delete mode 100644 target/i386/hvf/hvf-cpus.c >> delete mode 100644 target/i386/hvf/hvf-cpus.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 68bc160f41..ca4b6d9279 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> >> M: Roman Bolshakov <r.bolshakov@yadro.com> >> W: https://wiki.qemu.org/Features/HVF >> S: Maintained >> -F: accel/stubs/hvf-stub.c > There was a patch for that in the RFC series from Claudio. Yeah, I'm not worried about this hunk :). > >> F: target/i386/hvf/ >> + >> +HVF >> +M: Cameron Esfahani <dirty@apple.com> >> +M: Roman Bolshakov <r.bolshakov@yadro.com> >> +W: https://wiki.qemu.org/Features/HVF >> +S: Maintained >> +F: accel/hvf/ >> F: include/sysemu/hvf.h >> +F: include/sysemu/hvf_int.h >> >> WHPX CPUs >> M: Sunil Muthuswamy <sunilmut@microsoft.com> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c >> new file mode 100644 >> index 0000000000..47d77a472a >> --- /dev/null >> +++ b/accel/hvf/hvf-all.c >> @@ -0,0 +1,56 @@ >> +/* >> + * QEMU Hypervisor.framework support >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2. See >> + * the COPYING file in the top-level directory. >> + * >> + * Contributions after 2012-01-13 are licensed under the terms of the >> + * GNU GPL, version 2 or (at your option) any later version. >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu-common.h" >> +#include "qemu/error-report.h" >> +#include "sysemu/hvf.h" >> +#include "sysemu/hvf_int.h" >> +#include "sysemu/runstate.h" >> + >> +#include "qemu/main-loop.h" >> +#include "sysemu/accel.h" >> + >> +#include <Hypervisor/Hypervisor.h> >> + >> +bool hvf_allowed; >> +HVFState *hvf_state; >> + >> +void assert_hvf_ok(hv_return_t ret) >> +{ >> + if (ret == HV_SUCCESS) { >> + return; >> + } >> + >> + switch (ret) { >> + case HV_ERROR: >> + error_report("Error: HV_ERROR"); >> + break; >> + case HV_BUSY: >> + error_report("Error: HV_BUSY"); >> + break; >> + case HV_BAD_ARGUMENT: >> + error_report("Error: HV_BAD_ARGUMENT"); >> + break; >> + case HV_NO_RESOURCES: >> + error_report("Error: HV_NO_RESOURCES"); >> + break; >> + case HV_NO_DEVICE: >> + error_report("Error: HV_NO_DEVICE"); >> + break; >> + case HV_UNSUPPORTED: >> + error_report("Error: HV_UNSUPPORTED"); >> + break; >> + default: >> + error_report("Unknown Error"); >> + } >> + >> + abort(); >> +} >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c >> new file mode 100644 >> index 0000000000..f9bb5502b7 >> --- /dev/null >> +++ b/accel/hvf/hvf-cpus.c >> @@ -0,0 +1,468 @@ >> +/* >> + * Copyright 2008 IBM Corporation >> + * 2008 Red Hat, Inc. >> + * Copyright 2011 Intel Corporation >> + * Copyright 2016 Veertu, Inc. >> + * Copyright 2017 The Android Open Source Project >> + * >> + * QEMU Hypervisor.framework support >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of version 2 of the GNU General Public >> + * License as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program; if not, see <http://www.gnu.org/licenses/>. >> + * >> + * This file contain code under public domain from the hvdos project: >> + * https://github.com/mist64/hvdos >> + * >> + * Parts Copyright (c) 2011 NetApp, Inc. >> + * All rights reserved. >> + * >> + * Redistribution and use in source and binary forms, with or without >> + * modification, are permitted provided that the following conditions >> + * are met: >> + * 1. Redistributions of source code must retain the above copyright >> + * notice, this list of conditions and the following disclaimer. >> + * 2. Redistributions in binary form must reproduce the above copyright >> + * notice, this list of conditions and the following disclaimer in the >> + * documentation and/or other materials provided with the distribution. >> + * >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >> + * SUCH DAMAGE. >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu/error-report.h" >> +#include "qemu/main-loop.h" >> +#include "exec/address-spaces.h" >> +#include "exec/exec-all.h" >> +#include "sysemu/cpus.h" >> +#include "sysemu/hvf.h" >> +#include "sysemu/hvf_int.h" >> +#include "sysemu/runstate.h" >> +#include "qemu/guest-random.h" >> + >> +#include <Hypervisor/Hypervisor.h> >> + >> +/* Memory slots */ >> + >> +struct mac_slot { >> + int present; >> + uint64_t size; >> + uint64_t gpa_start; >> + uint64_t gva; >> +}; >> + >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >> +{ >> + hvf_slot *slot; >> + int x; >> + for (x = 0; x < hvf_state->num_slots; ++x) { >> + slot = &hvf_state->slots[x]; >> + if (slot->size && start < (slot->start + slot->size) && >> + (start + size) > slot->start) { >> + return slot; >> + } >> + } >> + return NULL; >> +} >> + >> +struct mac_slot mac_slots[32]; >> + >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >> +{ >> + struct mac_slot *macslot; >> + hv_return_t ret; >> + >> + macslot = &mac_slots[slot->slot_id]; >> + >> + if (macslot->present) { >> + if (macslot->size != slot->size) { >> + macslot->present = 0; >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >> + assert_hvf_ok(ret); >> + } >> + } >> + >> + if (!slot->size) { >> + return 0; >> + } >> + >> + macslot->present = 1; >> + macslot->gpa_start = slot->start; >> + macslot->size = slot->size; >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); >> + assert_hvf_ok(ret); >> + return 0; >> +} >> + >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >> +{ >> + hvf_slot *mem; >> + MemoryRegion *area = section->mr; >> + bool writeable = !area->readonly && !area->rom_device; >> + hv_memory_flags_t flags; >> + >> + if (!memory_region_is_ram(area)) { >> + if (writeable) { >> + return; >> + } else if (!memory_region_is_romd(area)) { >> + /* >> + * If the memory device is not in romd_mode, then we actually want >> + * to remove the hvf memory slot so all accesses will trap. >> + */ >> + add = false; >> + } >> + } >> + >> + mem = hvf_find_overlap_slot( >> + section->offset_within_address_space, >> + int128_get64(section->size)); >> + >> + if (mem && add) { >> + if (mem->size == int128_get64(section->size) && >> + mem->start == section->offset_within_address_space && >> + mem->mem == (memory_region_get_ram_ptr(area) + >> + section->offset_within_region)) { >> + return; /* Same region was attempted to register, go away. */ >> + } >> + } >> + >> + /* Region needs to be reset. set the size to 0 and remap it. */ >> + if (mem) { >> + mem->size = 0; >> + if (do_hvf_set_memory(mem, 0)) { >> + error_report("Failed to reset overlapping slot"); >> + abort(); >> + } >> + } >> + >> + if (!add) { >> + return; >> + } >> + >> + if (area->readonly || >> + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >> + } else { >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >> + } >> + >> + /* Now make a new slot. */ >> + int x; >> + >> + for (x = 0; x < hvf_state->num_slots; ++x) { >> + mem = &hvf_state->slots[x]; >> + if (!mem->size) { >> + break; >> + } >> + } >> + >> + if (x == hvf_state->num_slots) { >> + error_report("No free slots"); >> + abort(); >> + } >> + >> + mem->size = int128_get64(section->size); >> + mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >> + mem->start = section->offset_within_address_space; >> + mem->region = area; >> + >> + if (do_hvf_set_memory(mem, flags)) { >> + error_report("Error registering new memory slot"); >> + abort(); >> + } >> +} >> + >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >> +{ >> + hvf_slot *slot; >> + >> + slot = hvf_find_overlap_slot( >> + section->offset_within_address_space, >> + int128_get64(section->size)); >> + >> + /* protect region against writes; begin tracking it */ >> + if (on) { >> + slot->flags |= HVF_SLOT_LOG; >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >> + HV_MEMORY_READ); >> + /* stop tracking region*/ >> + } else { >> + slot->flags &= ~HVF_SLOT_LOG; >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >> + HV_MEMORY_READ | HV_MEMORY_WRITE); >> + } >> +} >> + >> +static void hvf_log_start(MemoryListener *listener, >> + MemoryRegionSection *section, int old, int new) >> +{ >> + if (old != 0) { >> + return; >> + } >> + >> + hvf_set_dirty_tracking(section, 1); >> +} >> + >> +static void hvf_log_stop(MemoryListener *listener, >> + MemoryRegionSection *section, int old, int new) >> +{ >> + if (new != 0) { >> + return; >> + } >> + >> + hvf_set_dirty_tracking(section, 0); >> +} >> + >> +static void hvf_log_sync(MemoryListener *listener, >> + MemoryRegionSection *section) >> +{ >> + /* >> + * sync of dirty pages is handled elsewhere; just make sure we keep >> + * tracking the region. >> + */ >> + hvf_set_dirty_tracking(section, 1); >> +} >> + >> +static void hvf_region_add(MemoryListener *listener, >> + MemoryRegionSection *section) >> +{ >> + hvf_set_phys_mem(section, true); >> +} >> + >> +static void hvf_region_del(MemoryListener *listener, >> + MemoryRegionSection *section) >> +{ >> + hvf_set_phys_mem(section, false); >> +} >> + >> +static MemoryListener hvf_memory_listener = { >> + .priority = 10, >> + .region_add = hvf_region_add, >> + .region_del = hvf_region_del, >> + .log_start = hvf_log_start, >> + .log_stop = hvf_log_stop, >> + .log_sync = hvf_log_sync, >> +}; >> + >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >> +{ >> + if (!cpu->vcpu_dirty) { >> + hvf_get_registers(cpu); >> + cpu->vcpu_dirty = true; >> + } >> +} >> + >> +static void hvf_cpu_synchronize_state(CPUState *cpu) >> +{ >> + if (!cpu->vcpu_dirty) { >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >> + } >> +} >> + >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >> + run_on_cpu_data arg) >> +{ >> + hvf_put_registers(cpu); >> + cpu->vcpu_dirty = false; >> +} >> + >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) >> +{ >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >> +} >> + >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >> + run_on_cpu_data arg) >> +{ >> + hvf_put_registers(cpu); >> + cpu->vcpu_dirty = false; >> +} >> + >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) >> +{ >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >> +} >> + >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >> + run_on_cpu_data arg) >> +{ >> + cpu->vcpu_dirty = true; >> +} >> + >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >> +{ >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >> +} >> + >> +static void hvf_vcpu_destroy(CPUState *cpu) >> +{ >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); >> + assert_hvf_ok(ret); >> + >> + hvf_arch_vcpu_destroy(cpu); >> +} >> + >> +static void dummy_signal(int sig) >> +{ >> +} >> + >> +static int hvf_init_vcpu(CPUState *cpu) >> +{ >> + int r; >> + >> + /* init cpu signals */ >> + sigset_t set; >> + struct sigaction sigact; >> + >> + memset(&sigact, 0, sizeof(sigact)); >> + sigact.sa_handler = dummy_signal; >> + sigaction(SIG_IPI, &sigact, NULL); >> + >> + pthread_sigmask(SIG_BLOCK, NULL, &set); >> + sigdelset(&set, SIG_IPI); >> + >> +#ifdef __aarch64__ >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL); >> +#else >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >> +#endif > I think the first __aarch64__ bit fits better to arm part of the series. Oops. Thanks for catching it! Yes, absolutely. It should be part of the ARM enablement. > >> + cpu->vcpu_dirty = 1; >> + assert_hvf_ok(r); >> + >> + return hvf_arch_init_vcpu(cpu); >> +} >> + >> +/* >> + * The HVF-specific vCPU thread function. This one should only run when the host >> + * CPU supports the VMX "unrestricted guest" feature. >> + */ >> +static void *hvf_cpu_thread_fn(void *arg) >> +{ >> + CPUState *cpu = arg; >> + >> + int r; >> + >> + assert(hvf_enabled()); >> + >> + rcu_register_thread(); >> + >> + qemu_mutex_lock_iothread(); >> + qemu_thread_get_self(cpu->thread); >> + >> + cpu->thread_id = qemu_get_thread_id(); >> + cpu->can_do_io = 1; >> + current_cpu = cpu; >> + >> + hvf_init_vcpu(cpu); >> + >> + /* signal CPU creation */ >> + cpu_thread_signal_created(cpu); >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); >> + >> + do { >> + if (cpu_can_run(cpu)) { >> + r = hvf_vcpu_exec(cpu); >> + if (r == EXCP_DEBUG) { >> + cpu_handle_guest_debug(cpu); >> + } >> + } >> + qemu_wait_io_event(cpu); >> + } while (!cpu->unplug || cpu_can_run(cpu)); >> + >> + hvf_vcpu_destroy(cpu); >> + cpu_thread_signal_destroyed(cpu); >> + qemu_mutex_unlock_iothread(); >> + rcu_unregister_thread(); >> + return NULL; >> +} >> + >> +static void hvf_start_vcpu_thread(CPUState *cpu) >> +{ >> + char thread_name[VCPU_THREAD_NAME_SIZE]; >> + >> + /* >> + * HVF currently does not support TCG, and only runs in >> + * unrestricted-guest mode. >> + */ >> + assert(hvf_enabled()); >> + >> + cpu->thread = g_malloc0(sizeof(QemuThread)); >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >> + qemu_cond_init(cpu->halt_cond); >> + >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >> + cpu->cpu_index); >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >> + cpu, QEMU_THREAD_JOINABLE); >> +} >> + >> +static const CpusAccel hvf_cpus = { >> + .create_vcpu_thread = hvf_start_vcpu_thread, >> + >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, >> + .synchronize_state = hvf_cpu_synchronize_state, >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >> +}; >> + >> +static int hvf_accel_init(MachineState *ms) >> +{ >> + int x; >> + hv_return_t ret; >> + HVFState *s; >> + >> + ret = hv_vm_create(HV_VM_DEFAULT); >> + assert_hvf_ok(ret); >> + >> + s = g_new0(HVFState, 1); >> + >> + s->num_slots = 32; >> + for (x = 0; x < s->num_slots; ++x) { >> + s->slots[x].size = 0; >> + s->slots[x].slot_id = x; >> + } >> + >> + hvf_state = s; >> + memory_listener_register(&hvf_memory_listener, &address_space_memory); >> + cpus_register_accel(&hvf_cpus); >> + return 0; >> +} >> + >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) >> +{ >> + AccelClass *ac = ACCEL_CLASS(oc); >> + ac->name = "HVF"; >> + ac->init_machine = hvf_accel_init; >> + ac->allowed = &hvf_allowed; >> +} >> + >> +static const TypeInfo hvf_accel_type = { >> + .name = TYPE_HVF_ACCEL, >> + .parent = TYPE_ACCEL, >> + .class_init = hvf_accel_class_init, >> +}; >> + >> +static void hvf_type_init(void) >> +{ >> + type_register_static(&hvf_accel_type); >> +} >> + >> +type_init(hvf_type_init); >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build >> new file mode 100644 >> index 0000000000..dfd6b68dc7 >> --- /dev/null >> +++ b/accel/hvf/meson.build >> @@ -0,0 +1,7 @@ >> +hvf_ss = ss.source_set() >> +hvf_ss.add(files( >> + 'hvf-all.c', >> + 'hvf-cpus.c', >> +)) >> + >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) >> diff --git a/accel/meson.build b/accel/meson.build >> index b26cca227a..6de12ce5d5 100644 >> --- a/accel/meson.build >> +++ b/accel/meson.build >> @@ -1,5 +1,6 @@ >> softmmu_ss.add(files('accel.c')) >> >> +subdir('hvf') >> subdir('qtest') >> subdir('kvm') >> subdir('tcg') >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h >> new file mode 100644 >> index 0000000000..de9bad23a8 >> --- /dev/null >> +++ b/include/sysemu/hvf_int.h >> @@ -0,0 +1,69 @@ >> +/* >> + * QEMU Hypervisor.framework (HVF) support >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + * >> + */ >> + >> +/* header to be included in HVF-specific code */ >> + >> +#ifndef HVF_INT_H >> +#define HVF_INT_H >> + >> +#include <Hypervisor/Hypervisor.h> >> + >> +#define HVF_MAX_VCPU 0x10 >> + >> +extern struct hvf_state hvf_global; >> + >> +struct hvf_vm { >> + int id; >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >> +}; >> + >> +struct hvf_state { >> + uint32_t version; >> + struct hvf_vm *vm; >> + uint64_t mem_quota; >> +}; >> + >> +/* hvf_slot flags */ >> +#define HVF_SLOT_LOG (1 << 0) >> + >> +typedef struct hvf_slot { >> + uint64_t start; >> + uint64_t size; >> + uint8_t *mem; >> + int slot_id; >> + uint32_t flags; >> + MemoryRegion *region; >> +} hvf_slot; >> + >> +typedef struct hvf_vcpu_caps { >> + uint64_t vmx_cap_pinbased; >> + uint64_t vmx_cap_procbased; >> + uint64_t vmx_cap_procbased2; >> + uint64_t vmx_cap_entry; >> + uint64_t vmx_cap_exit; >> + uint64_t vmx_cap_preemption_timer; >> +} hvf_vcpu_caps; >> + >> +struct HVFState { >> + AccelState parent; >> + hvf_slot slots[32]; >> + int num_slots; >> + >> + hvf_vcpu_caps *hvf_caps; >> +}; >> +extern HVFState *hvf_state; >> + >> +void assert_hvf_ok(hv_return_t ret); >> +int hvf_get_registers(CPUState *cpu); >> +int hvf_put_registers(CPUState *cpu); >> +int hvf_arch_init_vcpu(CPUState *cpu); >> +void hvf_arch_vcpu_destroy(CPUState *cpu); >> +int hvf_vcpu_exec(CPUState *cpu); >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >> + >> +#endif >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c >> deleted file mode 100644 >> index 817b3d7452..0000000000 >> --- a/target/i386/hvf/hvf-cpus.c >> +++ /dev/null >> @@ -1,131 +0,0 @@ >> -/* >> - * Copyright 2008 IBM Corporation >> - * 2008 Red Hat, Inc. >> - * Copyright 2011 Intel Corporation >> - * Copyright 2016 Veertu, Inc. >> - * Copyright 2017 The Android Open Source Project >> - * >> - * QEMU Hypervisor.framework support >> - * >> - * This program is free software; you can redistribute it and/or >> - * modify it under the terms of version 2 of the GNU General Public >> - * License as published by the Free Software Foundation. >> - * >> - * This program is distributed in the hope that it will be useful, >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> - * General Public License for more details. >> - * >> - * You should have received a copy of the GNU General Public License >> - * along with this program; if not, see <http://www.gnu.org/licenses/>. >> - * >> - * This file contain code under public domain from the hvdos project: >> - * https://github.com/mist64/hvdos >> - * >> - * Parts Copyright (c) 2011 NetApp, Inc. >> - * All rights reserved. >> - * >> - * Redistribution and use in source and binary forms, with or without >> - * modification, are permitted provided that the following conditions >> - * are met: >> - * 1. Redistributions of source code must retain the above copyright >> - * notice, this list of conditions and the following disclaimer. >> - * 2. Redistributions in binary form must reproduce the above copyright >> - * notice, this list of conditions and the following disclaimer in the >> - * documentation and/or other materials provided with the distribution. >> - * >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >> - * SUCH DAMAGE. >> - */ >> - >> -#include "qemu/osdep.h" >> -#include "qemu/error-report.h" >> -#include "qemu/main-loop.h" >> -#include "sysemu/hvf.h" >> -#include "sysemu/runstate.h" >> -#include "target/i386/cpu.h" >> -#include "qemu/guest-random.h" >> - >> -#include "hvf-cpus.h" >> - >> -/* >> - * The HVF-specific vCPU thread function. This one should only run when the host >> - * CPU supports the VMX "unrestricted guest" feature. >> - */ >> -static void *hvf_cpu_thread_fn(void *arg) >> -{ >> - CPUState *cpu = arg; >> - >> - int r; >> - >> - assert(hvf_enabled()); >> - >> - rcu_register_thread(); >> - >> - qemu_mutex_lock_iothread(); >> - qemu_thread_get_self(cpu->thread); >> - >> - cpu->thread_id = qemu_get_thread_id(); >> - cpu->can_do_io = 1; >> - current_cpu = cpu; >> - >> - hvf_init_vcpu(cpu); >> - >> - /* signal CPU creation */ >> - cpu_thread_signal_created(cpu); >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); >> - >> - do { >> - if (cpu_can_run(cpu)) { >> - r = hvf_vcpu_exec(cpu); >> - if (r == EXCP_DEBUG) { >> - cpu_handle_guest_debug(cpu); >> - } >> - } >> - qemu_wait_io_event(cpu); >> - } while (!cpu->unplug || cpu_can_run(cpu)); >> - >> - hvf_vcpu_destroy(cpu); >> - cpu_thread_signal_destroyed(cpu); >> - qemu_mutex_unlock_iothread(); >> - rcu_unregister_thread(); >> - return NULL; >> -} >> - >> -static void hvf_start_vcpu_thread(CPUState *cpu) >> -{ >> - char thread_name[VCPU_THREAD_NAME_SIZE]; >> - >> - /* >> - * HVF currently does not support TCG, and only runs in >> - * unrestricted-guest mode. >> - */ >> - assert(hvf_enabled()); >> - >> - cpu->thread = g_malloc0(sizeof(QemuThread)); >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >> - qemu_cond_init(cpu->halt_cond); >> - >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >> - cpu->cpu_index); >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >> - cpu, QEMU_THREAD_JOINABLE); >> -} >> - >> -const CpusAccel hvf_cpus = { >> - .create_vcpu_thread = hvf_start_vcpu_thread, >> - >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, >> - .synchronize_state = hvf_cpu_synchronize_state, >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >> -}; >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h >> deleted file mode 100644 >> index ced31b82c0..0000000000 >> --- a/target/i386/hvf/hvf-cpus.h >> +++ /dev/null >> @@ -1,25 +0,0 @@ >> -/* >> - * Accelerator CPUS Interface >> - * >> - * Copyright 2020 SUSE LLC >> - * >> - * This work is licensed under the terms of the GNU GPL, version 2 or later. >> - * See the COPYING file in the top-level directory. >> - */ >> - >> -#ifndef HVF_CPUS_H >> -#define HVF_CPUS_H >> - >> -#include "sysemu/cpus.h" >> - >> -extern const CpusAccel hvf_cpus; >> - >> -int hvf_init_vcpu(CPUState *); >> -int hvf_vcpu_exec(CPUState *); >> -void hvf_cpu_synchronize_state(CPUState *); >> -void hvf_cpu_synchronize_post_reset(CPUState *); >> -void hvf_cpu_synchronize_post_init(CPUState *); >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); >> -void hvf_vcpu_destroy(CPUState *); >> - >> -#endif /* HVF_CPUS_H */ >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h >> index e0edffd077..6d56f8f6bb 100644 >> --- a/target/i386/hvf/hvf-i386.h >> +++ b/target/i386/hvf/hvf-i386.h >> @@ -18,57 +18,11 @@ >> >> #include "sysemu/accel.h" >> #include "sysemu/hvf.h" >> +#include "sysemu/hvf_int.h" >> #include "cpu.h" >> #include "x86.h" >> >> -#define HVF_MAX_VCPU 0x10 >> - >> -extern struct hvf_state hvf_global; >> - >> -struct hvf_vm { >> - int id; >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >> -}; >> - >> -struct hvf_state { >> - uint32_t version; >> - struct hvf_vm *vm; >> - uint64_t mem_quota; >> -}; >> - >> -/* hvf_slot flags */ >> -#define HVF_SLOT_LOG (1 << 0) >> - >> -typedef struct hvf_slot { >> - uint64_t start; >> - uint64_t size; >> - uint8_t *mem; >> - int slot_id; >> - uint32_t flags; >> - MemoryRegion *region; >> -} hvf_slot; >> - >> -typedef struct hvf_vcpu_caps { >> - uint64_t vmx_cap_pinbased; >> - uint64_t vmx_cap_procbased; >> - uint64_t vmx_cap_procbased2; >> - uint64_t vmx_cap_entry; >> - uint64_t vmx_cap_exit; >> - uint64_t vmx_cap_preemption_timer; >> -} hvf_vcpu_caps; >> - >> -struct HVFState { >> - AccelState parent; >> - hvf_slot slots[32]; >> - int num_slots; >> - >> - hvf_vcpu_caps *hvf_caps; >> -}; >> -extern HVFState *hvf_state; >> - >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >> >> #ifdef NEED_CPU_H >> /* Functions exported to host specific mode */ >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c >> index ed9356565c..8b96ecd619 100644 >> --- a/target/i386/hvf/hvf.c >> +++ b/target/i386/hvf/hvf.c >> @@ -51,6 +51,7 @@ >> #include "qemu/error-report.h" >> >> #include "sysemu/hvf.h" >> +#include "sysemu/hvf_int.h" >> #include "sysemu/runstate.h" >> #include "hvf-i386.h" >> #include "vmcs.h" >> @@ -72,171 +73,6 @@ >> #include "sysemu/accel.h" >> #include "target/i386/cpu.h" >> >> -#include "hvf-cpus.h" >> - >> -HVFState *hvf_state; >> - >> -static void assert_hvf_ok(hv_return_t ret) >> -{ >> - if (ret == HV_SUCCESS) { >> - return; >> - } >> - >> - switch (ret) { >> - case HV_ERROR: >> - error_report("Error: HV_ERROR"); >> - break; >> - case HV_BUSY: >> - error_report("Error: HV_BUSY"); >> - break; >> - case HV_BAD_ARGUMENT: >> - error_report("Error: HV_BAD_ARGUMENT"); >> - break; >> - case HV_NO_RESOURCES: >> - error_report("Error: HV_NO_RESOURCES"); >> - break; >> - case HV_NO_DEVICE: >> - error_report("Error: HV_NO_DEVICE"); >> - break; >> - case HV_UNSUPPORTED: >> - error_report("Error: HV_UNSUPPORTED"); >> - break; >> - default: >> - error_report("Unknown Error"); >> - } >> - >> - abort(); >> -} >> - >> -/* Memory slots */ >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >> -{ >> - hvf_slot *slot; >> - int x; >> - for (x = 0; x < hvf_state->num_slots; ++x) { >> - slot = &hvf_state->slots[x]; >> - if (slot->size && start < (slot->start + slot->size) && >> - (start + size) > slot->start) { >> - return slot; >> - } >> - } >> - return NULL; >> -} >> - >> -struct mac_slot { >> - int present; >> - uint64_t size; >> - uint64_t gpa_start; >> - uint64_t gva; >> -}; >> - >> -struct mac_slot mac_slots[32]; >> - >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >> -{ >> - struct mac_slot *macslot; >> - hv_return_t ret; >> - >> - macslot = &mac_slots[slot->slot_id]; >> - >> - if (macslot->present) { >> - if (macslot->size != slot->size) { >> - macslot->present = 0; >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >> - assert_hvf_ok(ret); >> - } >> - } >> - >> - if (!slot->size) { >> - return 0; >> - } >> - >> - macslot->present = 1; >> - macslot->gpa_start = slot->start; >> - macslot->size = slot->size; >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags); >> - assert_hvf_ok(ret); >> - return 0; >> -} >> - >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >> -{ >> - hvf_slot *mem; >> - MemoryRegion *area = section->mr; >> - bool writeable = !area->readonly && !area->rom_device; >> - hv_memory_flags_t flags; >> - >> - if (!memory_region_is_ram(area)) { >> - if (writeable) { >> - return; >> - } else if (!memory_region_is_romd(area)) { >> - /* >> - * If the memory device is not in romd_mode, then we actually want >> - * to remove the hvf memory slot so all accesses will trap. >> - */ >> - add = false; >> - } >> - } >> - >> - mem = hvf_find_overlap_slot( >> - section->offset_within_address_space, >> - int128_get64(section->size)); >> - >> - if (mem && add) { >> - if (mem->size == int128_get64(section->size) && >> - mem->start == section->offset_within_address_space && >> - mem->mem == (memory_region_get_ram_ptr(area) + >> - section->offset_within_region)) { >> - return; /* Same region was attempted to register, go away. */ >> - } >> - } >> - >> - /* Region needs to be reset. set the size to 0 and remap it. */ >> - if (mem) { >> - mem->size = 0; >> - if (do_hvf_set_memory(mem, 0)) { >> - error_report("Failed to reset overlapping slot"); >> - abort(); >> - } >> - } >> - >> - if (!add) { >> - return; >> - } >> - >> - if (area->readonly || >> - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >> - } else { >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >> - } >> - >> - /* Now make a new slot. */ >> - int x; >> - >> - for (x = 0; x < hvf_state->num_slots; ++x) { >> - mem = &hvf_state->slots[x]; >> - if (!mem->size) { >> - break; >> - } >> - } >> - >> - if (x == hvf_state->num_slots) { >> - error_report("No free slots"); >> - abort(); >> - } >> - >> - mem->size = int128_get64(section->size); >> - mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >> - mem->start = section->offset_within_address_space; >> - mem->region = area; >> - >> - if (do_hvf_set_memory(mem, flags)) { >> - error_report("Error registering new memory slot"); >> - abort(); >> - } >> -} >> - >> void vmx_update_tpr(CPUState *cpu) >> { >> /* TODO: need integrate APIC handling */ >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer, >> } >> } >> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >> -{ >> - if (!cpu->vcpu_dirty) { >> - hvf_get_registers(cpu); >> - cpu->vcpu_dirty = true; >> - } >> -} >> - >> -void hvf_cpu_synchronize_state(CPUState *cpu) >> -{ >> - if (!cpu->vcpu_dirty) { >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >> - } >> -} >> - >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >> - run_on_cpu_data arg) >> -{ >> - hvf_put_registers(cpu); >> - cpu->vcpu_dirty = false; >> -} >> - >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) >> -{ >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >> -} >> - >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >> - run_on_cpu_data arg) >> -{ >> - hvf_put_registers(cpu); >> - cpu->vcpu_dirty = false; >> -} >> - >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) >> -{ >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >> -} >> - >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >> - run_on_cpu_data arg) >> -{ >> - cpu->vcpu_dirty = true; >> -} >> - >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >> -{ >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >> -} >> - >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >> { >> int read, write; >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >> return false; >> } >> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >> -{ >> - hvf_slot *slot; >> - >> - slot = hvf_find_overlap_slot( >> - section->offset_within_address_space, >> - int128_get64(section->size)); >> - >> - /* protect region against writes; begin tracking it */ >> - if (on) { >> - slot->flags |= HVF_SLOT_LOG; >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >> - HV_MEMORY_READ); >> - /* stop tracking region*/ >> - } else { >> - slot->flags &= ~HVF_SLOT_LOG; >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >> - HV_MEMORY_READ | HV_MEMORY_WRITE); >> - } >> -} >> - >> -static void hvf_log_start(MemoryListener *listener, >> - MemoryRegionSection *section, int old, int new) >> -{ >> - if (old != 0) { >> - return; >> - } >> - >> - hvf_set_dirty_tracking(section, 1); >> -} >> - >> -static void hvf_log_stop(MemoryListener *listener, >> - MemoryRegionSection *section, int old, int new) >> -{ >> - if (new != 0) { >> - return; >> - } >> - >> - hvf_set_dirty_tracking(section, 0); >> -} >> - >> -static void hvf_log_sync(MemoryListener *listener, >> - MemoryRegionSection *section) >> -{ >> - /* >> - * sync of dirty pages is handled elsewhere; just make sure we keep >> - * tracking the region. >> - */ >> - hvf_set_dirty_tracking(section, 1); >> -} >> - >> -static void hvf_region_add(MemoryListener *listener, >> - MemoryRegionSection *section) >> -{ >> - hvf_set_phys_mem(section, true); >> -} >> - >> -static void hvf_region_del(MemoryListener *listener, >> - MemoryRegionSection *section) >> -{ >> - hvf_set_phys_mem(section, false); >> -} >> - >> -static MemoryListener hvf_memory_listener = { >> - .priority = 10, >> - .region_add = hvf_region_add, >> - .region_del = hvf_region_del, >> - .log_start = hvf_log_start, >> - .log_stop = hvf_log_stop, >> - .log_sync = hvf_log_sync, >> -}; >> - >> -void hvf_vcpu_destroy(CPUState *cpu) >> +void hvf_arch_vcpu_destroy(CPUState *cpu) >> { >> X86CPU *x86_cpu = X86_CPU(cpu); >> CPUX86State *env = &x86_cpu->env; >> >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); >> g_free(env->hvf_mmio_buf); >> - assert_hvf_ok(ret); >> -} >> - >> -static void dummy_signal(int sig) >> -{ >> } >> >> -int hvf_init_vcpu(CPUState *cpu) >> +int hvf_arch_init_vcpu(CPUState *cpu) >> { >> >> X86CPU *x86cpu = X86_CPU(cpu); >> CPUX86State *env = &x86cpu->env; >> - int r; >> - >> - /* init cpu signals */ >> - sigset_t set; >> - struct sigaction sigact; >> - >> - memset(&sigact, 0, sizeof(sigact)); >> - sigact.sa_handler = dummy_signal; >> - sigaction(SIG_IPI, &sigact, NULL); >> - >> - pthread_sigmask(SIG_BLOCK, NULL, &set); >> - sigdelset(&set, SIG_IPI); >> >> init_emu(); >> init_decoder(); >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); >> env->hvf_mmio_buf = g_new(char, 4096); >> >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >> - cpu->vcpu_dirty = 1; >> - assert_hvf_ok(r); >> - >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { >> abort(); >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) >> >> return ret; >> } >> - >> -bool hvf_allowed; >> - >> -static int hvf_accel_init(MachineState *ms) >> -{ >> - int x; >> - hv_return_t ret; >> - HVFState *s; >> - >> - ret = hv_vm_create(HV_VM_DEFAULT); >> - assert_hvf_ok(ret); >> - >> - s = g_new0(HVFState, 1); >> - >> - s->num_slots = 32; >> - for (x = 0; x < s->num_slots; ++x) { >> - s->slots[x].size = 0; >> - s->slots[x].slot_id = x; >> - } >> - >> - hvf_state = s; >> - memory_listener_register(&hvf_memory_listener, &address_space_memory); >> - cpus_register_accel(&hvf_cpus); >> - return 0; >> -} >> - >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) >> -{ >> - AccelClass *ac = ACCEL_CLASS(oc); >> - ac->name = "HVF"; >> - ac->init_machine = hvf_accel_init; >> - ac->allowed = &hvf_allowed; >> -} >> - >> -static const TypeInfo hvf_accel_type = { >> - .name = TYPE_HVF_ACCEL, >> - .parent = TYPE_ACCEL, >> - .class_init = hvf_accel_class_init, >> -}; >> - >> -static void hvf_type_init(void) >> -{ >> - type_register_static(&hvf_accel_type); >> -} >> - >> -type_init(hvf_type_init); >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build >> index 409c9a3f14..c8a43717ee 100644 >> --- a/target/i386/hvf/meson.build >> +++ b/target/i386/hvf/meson.build >> @@ -1,6 +1,5 @@ >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( >> 'hvf.c', >> - 'hvf-cpus.c', >> 'x86.c', >> 'x86_cpuid.c', >> 'x86_decode.c', >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c >> index bbec412b6c..89b8e9d87a 100644 >> --- a/target/i386/hvf/x86hvf.c >> +++ b/target/i386/hvf/x86hvf.c >> @@ -20,6 +20,9 @@ >> #include "qemu/osdep.h" >> >> #include "qemu-common.h" >> +#include "sysemu/hvf.h" >> +#include "sysemu/hvf_int.h" >> +#include "sysemu/hw_accel.h" >> #include "x86hvf.h" >> #include "vmx.h" >> #include "vmcs.h" >> @@ -32,8 +35,6 @@ >> #include <Hypervisor/hv.h> >> #include <Hypervisor/hv_vmx.h> >> >> -#include "hvf-cpus.h" >> - >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, >> SegmentCache *qseg, bool is_tr) >> { >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); >> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { >> - hvf_cpu_synchronize_state(cpu_state); >> + cpu_synchronize_state(cpu_state); >> do_cpu_init(cpu); >> } >> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) >> cpu_state->halted = 0; >> } >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { >> - hvf_cpu_synchronize_state(cpu_state); >> + cpu_synchronize_state(cpu_state); >> do_cpu_sipi(cpu); >> } >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; >> - hvf_cpu_synchronize_state(cpu_state); >> + cpu_synchronize_state(cpu_state); > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should > be a separate patch. It follows cpu/accel cleanups Claudio was doing the > summer. The only reason they're in here is because we no longer have access to the hvf_ functions from the file. I am perfectly happy to rebase the patch on top of Claudio's if his goes in first. I'm sure it'll be trivial for him to rebase on top of this too if my series goes in first. > > Phillipe raised the idea that the patch might go ahead of ARM-specific > part (which might involve some discussions) and I agree with that. > > Some sync between Claudio series (CC'd him) and the patch might be need. I would prefer not to hold back because of the sync. Claudio's cleanup is trivial enough to adjust for if it gets merged ahead of this. Alex
Hi all, +Peter Collingbourne <pcc@google.com> I'm a developer on the Android Emulator, which is in a fork of QEMU. Peter and I have been working on an HVF Apple Silicon backend with an eye toward Android guests. We have gotten things to basically switch to Android userspace already (logcat/shell and graphics available at least) Our strategy so far has been to import logic from the KVM implementation and hook into QEMU's software devices that previously assumed to only work with TCG, or have KVM-specific paths. Thanks to Alexander for the tip on the 36-bit address space limitation btw; our way of addressing this is to still allow highmem but not put pci high mmio so high. Also, note we have a sleep/signal based mechanism to deal with WFx, which might be worth looking into in Alexander's implementation as well: https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 Patches so far, FYI: https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 Peter's also noticed that there are extra steps needed for M1's to allow TCG to work, as it involves JIT: https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 We'd appreciate any feedback/comments :) Best, Frank On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote: > > On 27.11.20 21:00, Roman Bolshakov wrote: > > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: > >> Until now, Hypervisor.framework has only been available on x86_64 > systems. > >> With Apple Silicon shipping now, it extends its reach to aarch64. To > >> prepare for support for multiple architectures, let's move common code > out > >> into its own accel directory. > >> > >> Signed-off-by: Alexander Graf <agraf@csgraf.de> > >> --- > >> MAINTAINERS | 9 +- > >> accel/hvf/hvf-all.c | 56 +++++ > >> accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ > >> accel/hvf/meson.build | 7 + > >> accel/meson.build | 1 + > >> include/sysemu/hvf_int.h | 69 ++++++ > >> target/i386/hvf/hvf-cpus.c | 131 ---------- > >> target/i386/hvf/hvf-cpus.h | 25 -- > >> target/i386/hvf/hvf-i386.h | 48 +--- > >> target/i386/hvf/hvf.c | 360 +-------------------------- > >> target/i386/hvf/meson.build | 1 - > >> target/i386/hvf/x86hvf.c | 11 +- > >> target/i386/hvf/x86hvf.h | 2 - > >> 13 files changed, 619 insertions(+), 569 deletions(-) > >> create mode 100644 accel/hvf/hvf-all.c > >> create mode 100644 accel/hvf/hvf-cpus.c > >> create mode 100644 accel/hvf/meson.build > >> create mode 100644 include/sysemu/hvf_int.h > >> delete mode 100644 target/i386/hvf/hvf-cpus.c > >> delete mode 100644 target/i386/hvf/hvf-cpus.h > >> > >> diff --git a/MAINTAINERS b/MAINTAINERS > >> index 68bc160f41..ca4b6d9279 100644 > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> > >> M: Roman Bolshakov <r.bolshakov@yadro.com> > >> W: https://wiki.qemu.org/Features/HVF > >> S: Maintained > >> -F: accel/stubs/hvf-stub.c > > There was a patch for that in the RFC series from Claudio. > > > Yeah, I'm not worried about this hunk :). > > > > > >> F: target/i386/hvf/ > >> + > >> +HVF > >> +M: Cameron Esfahani <dirty@apple.com> > >> +M: Roman Bolshakov <r.bolshakov@yadro.com> > >> +W: https://wiki.qemu.org/Features/HVF > >> +S: Maintained > >> +F: accel/hvf/ > >> F: include/sysemu/hvf.h > >> +F: include/sysemu/hvf_int.h > >> > >> WHPX CPUs > >> M: Sunil Muthuswamy <sunilmut@microsoft.com> > >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c > >> new file mode 100644 > >> index 0000000000..47d77a472a > >> --- /dev/null > >> +++ b/accel/hvf/hvf-all.c > >> @@ -0,0 +1,56 @@ > >> +/* > >> + * QEMU Hypervisor.framework support > >> + * > >> + * This work is licensed under the terms of the GNU GPL, version 2. > See > >> + * the COPYING file in the top-level directory. > >> + * > >> + * Contributions after 2012-01-13 are licensed under the terms of the > >> + * GNU GPL, version 2 or (at your option) any later version. > >> + */ > >> + > >> +#include "qemu/osdep.h" > >> +#include "qemu-common.h" > >> +#include "qemu/error-report.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/runstate.h" > >> + > >> +#include "qemu/main-loop.h" > >> +#include "sysemu/accel.h" > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +bool hvf_allowed; > >> +HVFState *hvf_state; > >> + > >> +void assert_hvf_ok(hv_return_t ret) > >> +{ > >> + if (ret == HV_SUCCESS) { > >> + return; > >> + } > >> + > >> + switch (ret) { > >> + case HV_ERROR: > >> + error_report("Error: HV_ERROR"); > >> + break; > >> + case HV_BUSY: > >> + error_report("Error: HV_BUSY"); > >> + break; > >> + case HV_BAD_ARGUMENT: > >> + error_report("Error: HV_BAD_ARGUMENT"); > >> + break; > >> + case HV_NO_RESOURCES: > >> + error_report("Error: HV_NO_RESOURCES"); > >> + break; > >> + case HV_NO_DEVICE: > >> + error_report("Error: HV_NO_DEVICE"); > >> + break; > >> + case HV_UNSUPPORTED: > >> + error_report("Error: HV_UNSUPPORTED"); > >> + break; > >> + default: > >> + error_report("Unknown Error"); > >> + } > >> + > >> + abort(); > >> +} > >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c > >> new file mode 100644 > >> index 0000000000..f9bb5502b7 > >> --- /dev/null > >> +++ b/accel/hvf/hvf-cpus.c > >> @@ -0,0 +1,468 @@ > >> +/* > >> + * Copyright 2008 IBM Corporation > >> + * 2008 Red Hat, Inc. > >> + * Copyright 2011 Intel Corporation > >> + * Copyright 2016 Veertu, Inc. > >> + * Copyright 2017 The Android Open Source Project > >> + * > >> + * QEMU Hypervisor.framework support > >> + * > >> + * This program is free software; you can redistribute it and/or > >> + * modify it under the terms of version 2 of the GNU General Public > >> + * License as published by the Free Software Foundation. > >> + * > >> + * This program is distributed in the hope that it will be useful, > >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >> + * General Public License for more details. > >> + * > >> + * You should have received a copy of the GNU General Public License > >> + * along with this program; if not, see <http://www.gnu.org/licenses/ > >. > >> + * > >> + * This file contain code under public domain from the hvdos project: > >> + * https://github.com/mist64/hvdos > >> + * > >> + * Parts Copyright (c) 2011 NetApp, Inc. > >> + * All rights reserved. > >> + * > >> + * Redistribution and use in source and binary forms, with or without > >> + * modification, are permitted provided that the following conditions > >> + * are met: > >> + * 1. Redistributions of source code must retain the above copyright > >> + * notice, this list of conditions and the following disclaimer. > >> + * 2. Redistributions in binary form must reproduce the above copyright > >> + * notice, this list of conditions and the following disclaimer in > the > >> + * documentation and/or other materials provided with the > distribution. > >> + * > >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > THE > >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE > LIABLE > >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE > GOODS > >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN > ANY WAY > >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY > OF > >> + * SUCH DAMAGE. > >> + */ > >> + > >> +#include "qemu/osdep.h" > >> +#include "qemu/error-report.h" > >> +#include "qemu/main-loop.h" > >> +#include "exec/address-spaces.h" > >> +#include "exec/exec-all.h" > >> +#include "sysemu/cpus.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/runstate.h" > >> +#include "qemu/guest-random.h" > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +/* Memory slots */ > >> + > >> +struct mac_slot { > >> + int present; > >> + uint64_t size; > >> + uint64_t gpa_start; > >> + uint64_t gva; > >> +}; > >> + > >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > >> +{ > >> + hvf_slot *slot; > >> + int x; > >> + for (x = 0; x < hvf_state->num_slots; ++x) { > >> + slot = &hvf_state->slots[x]; > >> + if (slot->size && start < (slot->start + slot->size) && > >> + (start + size) > slot->start) { > >> + return slot; > >> + } > >> + } > >> + return NULL; > >> +} > >> + > >> +struct mac_slot mac_slots[32]; > >> + > >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) > >> +{ > >> + struct mac_slot *macslot; > >> + hv_return_t ret; > >> + > >> + macslot = &mac_slots[slot->slot_id]; > >> + > >> + if (macslot->present) { > >> + if (macslot->size != slot->size) { > >> + macslot->present = 0; > >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); > >> + assert_hvf_ok(ret); > >> + } > >> + } > >> + > >> + if (!slot->size) { > >> + return 0; > >> + } > >> + > >> + macslot->present = 1; > >> + macslot->gpa_start = slot->start; > >> + macslot->size = slot->size; > >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); > >> + assert_hvf_ok(ret); > >> + return 0; > >> +} > >> + > >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) > >> +{ > >> + hvf_slot *mem; > >> + MemoryRegion *area = section->mr; > >> + bool writeable = !area->readonly && !area->rom_device; > >> + hv_memory_flags_t flags; > >> + > >> + if (!memory_region_is_ram(area)) { > >> + if (writeable) { > >> + return; > >> + } else if (!memory_region_is_romd(area)) { > >> + /* > >> + * If the memory device is not in romd_mode, then we > actually want > >> + * to remove the hvf memory slot so all accesses will trap. > >> + */ > >> + add = false; > >> + } > >> + } > >> + > >> + mem = hvf_find_overlap_slot( > >> + section->offset_within_address_space, > >> + int128_get64(section->size)); > >> + > >> + if (mem && add) { > >> + if (mem->size == int128_get64(section->size) && > >> + mem->start == section->offset_within_address_space && > >> + mem->mem == (memory_region_get_ram_ptr(area) + > >> + section->offset_within_region)) { > >> + return; /* Same region was attempted to register, go away. > */ > >> + } > >> + } > >> + > >> + /* Region needs to be reset. set the size to 0 and remap it. */ > >> + if (mem) { > >> + mem->size = 0; > >> + if (do_hvf_set_memory(mem, 0)) { > >> + error_report("Failed to reset overlapping slot"); > >> + abort(); > >> + } > >> + } > >> + > >> + if (!add) { > >> + return; > >> + } > >> + > >> + if (area->readonly || > >> + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { > >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > >> + } else { > >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; > >> + } > >> + > >> + /* Now make a new slot. */ > >> + int x; > >> + > >> + for (x = 0; x < hvf_state->num_slots; ++x) { > >> + mem = &hvf_state->slots[x]; > >> + if (!mem->size) { > >> + break; > >> + } > >> + } > >> + > >> + if (x == hvf_state->num_slots) { > >> + error_report("No free slots"); > >> + abort(); > >> + } > >> + > >> + mem->size = int128_get64(section->size); > >> + mem->mem = memory_region_get_ram_ptr(area) + > section->offset_within_region; > >> + mem->start = section->offset_within_address_space; > >> + mem->region = area; > >> + > >> + if (do_hvf_set_memory(mem, flags)) { > >> + error_report("Error registering new memory slot"); > >> + abort(); > >> + } > >> +} > >> + > >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool > on) > >> +{ > >> + hvf_slot *slot; > >> + > >> + slot = hvf_find_overlap_slot( > >> + section->offset_within_address_space, > >> + int128_get64(section->size)); > >> + > >> + /* protect region against writes; begin tracking it */ > >> + if (on) { > >> + slot->flags |= HVF_SLOT_LOG; > >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, > >> + HV_MEMORY_READ); > >> + /* stop tracking region*/ > >> + } else { > >> + slot->flags &= ~HVF_SLOT_LOG; > >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, > >> + HV_MEMORY_READ | HV_MEMORY_WRITE); > >> + } > >> +} > >> + > >> +static void hvf_log_start(MemoryListener *listener, > >> + MemoryRegionSection *section, int old, int > new) > >> +{ > >> + if (old != 0) { > >> + return; > >> + } > >> + > >> + hvf_set_dirty_tracking(section, 1); > >> +} > >> + > >> +static void hvf_log_stop(MemoryListener *listener, > >> + MemoryRegionSection *section, int old, int > new) > >> +{ > >> + if (new != 0) { > >> + return; > >> + } > >> + > >> + hvf_set_dirty_tracking(section, 0); > >> +} > >> + > >> +static void hvf_log_sync(MemoryListener *listener, > >> + MemoryRegionSection *section) > >> +{ > >> + /* > >> + * sync of dirty pages is handled elsewhere; just make sure we keep > >> + * tracking the region. > >> + */ > >> + hvf_set_dirty_tracking(section, 1); > >> +} > >> + > >> +static void hvf_region_add(MemoryListener *listener, > >> + MemoryRegionSection *section) > >> +{ > >> + hvf_set_phys_mem(section, true); > >> +} > >> + > >> +static void hvf_region_del(MemoryListener *listener, > >> + MemoryRegionSection *section) > >> +{ > >> + hvf_set_phys_mem(section, false); > >> +} > >> + > >> +static MemoryListener hvf_memory_listener = { > >> + .priority = 10, > >> + .region_add = hvf_region_add, > >> + .region_del = hvf_region_del, > >> + .log_start = hvf_log_start, > >> + .log_stop = hvf_log_stop, > >> + .log_sync = hvf_log_sync, > >> +}; > >> + > >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, > run_on_cpu_data arg) > >> +{ > >> + if (!cpu->vcpu_dirty) { > >> + hvf_get_registers(cpu); > >> + cpu->vcpu_dirty = true; > >> + } > >> +} > >> + > >> +static void hvf_cpu_synchronize_state(CPUState *cpu) > >> +{ > >> + if (!cpu->vcpu_dirty) { > >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); > >> + } > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > >> + run_on_cpu_data arg) > >> +{ > >> + hvf_put_registers(cpu); > >> + cpu->vcpu_dirty = false; > >> +} > >> + > >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) > >> +{ > >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, > RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > >> + run_on_cpu_data arg) > >> +{ > >> + hvf_put_registers(cpu); > >> + cpu->vcpu_dirty = false; > >> +} > >> + > >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) > >> +{ > >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > >> + run_on_cpu_data arg) > >> +{ > >> + cpu->vcpu_dirty = true; > >> +} > >> + > >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > >> +{ > >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, > RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void hvf_vcpu_destroy(CPUState *cpu) > >> +{ > >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); > >> + assert_hvf_ok(ret); > >> + > >> + hvf_arch_vcpu_destroy(cpu); > >> +} > >> + > >> +static void dummy_signal(int sig) > >> +{ > >> +} > >> + > >> +static int hvf_init_vcpu(CPUState *cpu) > >> +{ > >> + int r; > >> + > >> + /* init cpu signals */ > >> + sigset_t set; > >> + struct sigaction sigact; > >> + > >> + memset(&sigact, 0, sizeof(sigact)); > >> + sigact.sa_handler = dummy_signal; > >> + sigaction(SIG_IPI, &sigact, NULL); > >> + > >> + pthread_sigmask(SIG_BLOCK, NULL, &set); > >> + sigdelset(&set, SIG_IPI); > >> + > >> +#ifdef __aarch64__ > >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t > **)&cpu->hvf_exit, NULL); > >> +#else > >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); > >> +#endif > > I think the first __aarch64__ bit fits better to arm part of the series. > > > Oops. Thanks for catching it! Yes, absolutely. It should be part of the > ARM enablement. > > > > > >> + cpu->vcpu_dirty = 1; > >> + assert_hvf_ok(r); > >> + > >> + return hvf_arch_init_vcpu(cpu); > >> +} > >> + > >> +/* > >> + * The HVF-specific vCPU thread function. This one should only run > when the host > >> + * CPU supports the VMX "unrestricted guest" feature. > >> + */ > >> +static void *hvf_cpu_thread_fn(void *arg) > >> +{ > >> + CPUState *cpu = arg; > >> + > >> + int r; > >> + > >> + assert(hvf_enabled()); > >> + > >> + rcu_register_thread(); > >> + > >> + qemu_mutex_lock_iothread(); > >> + qemu_thread_get_self(cpu->thread); > >> + > >> + cpu->thread_id = qemu_get_thread_id(); > >> + cpu->can_do_io = 1; > >> + current_cpu = cpu; > >> + > >> + hvf_init_vcpu(cpu); > >> + > >> + /* signal CPU creation */ > >> + cpu_thread_signal_created(cpu); > >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); > >> + > >> + do { > >> + if (cpu_can_run(cpu)) { > >> + r = hvf_vcpu_exec(cpu); > >> + if (r == EXCP_DEBUG) { > >> + cpu_handle_guest_debug(cpu); > >> + } > >> + } > >> + qemu_wait_io_event(cpu); > >> + } while (!cpu->unplug || cpu_can_run(cpu)); > >> + > >> + hvf_vcpu_destroy(cpu); > >> + cpu_thread_signal_destroyed(cpu); > >> + qemu_mutex_unlock_iothread(); > >> + rcu_unregister_thread(); > >> + return NULL; > >> +} > >> + > >> +static void hvf_start_vcpu_thread(CPUState *cpu) > >> +{ > >> + char thread_name[VCPU_THREAD_NAME_SIZE]; > >> + > >> + /* > >> + * HVF currently does not support TCG, and only runs in > >> + * unrestricted-guest mode. > >> + */ > >> + assert(hvf_enabled()); > >> + > >> + cpu->thread = g_malloc0(sizeof(QemuThread)); > >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > >> + qemu_cond_init(cpu->halt_cond); > >> + > >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > >> + cpu->cpu_index); > >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, > >> + cpu, QEMU_THREAD_JOINABLE); > >> +} > >> + > >> +static const CpusAccel hvf_cpus = { > >> + .create_vcpu_thread = hvf_start_vcpu_thread, > >> + > >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, > >> + .synchronize_state = hvf_cpu_synchronize_state, > >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > >> +}; > >> + > >> +static int hvf_accel_init(MachineState *ms) > >> +{ > >> + int x; > >> + hv_return_t ret; > >> + HVFState *s; > >> + > >> + ret = hv_vm_create(HV_VM_DEFAULT); > >> + assert_hvf_ok(ret); > >> + > >> + s = g_new0(HVFState, 1); > >> + > >> + s->num_slots = 32; > >> + for (x = 0; x < s->num_slots; ++x) { > >> + s->slots[x].size = 0; > >> + s->slots[x].slot_id = x; > >> + } > >> + > >> + hvf_state = s; > >> + memory_listener_register(&hvf_memory_listener, > &address_space_memory); > >> + cpus_register_accel(&hvf_cpus); > >> + return 0; > >> +} > >> + > >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) > >> +{ > >> + AccelClass *ac = ACCEL_CLASS(oc); > >> + ac->name = "HVF"; > >> + ac->init_machine = hvf_accel_init; > >> + ac->allowed = &hvf_allowed; > >> +} > >> + > >> +static const TypeInfo hvf_accel_type = { > >> + .name = TYPE_HVF_ACCEL, > >> + .parent = TYPE_ACCEL, > >> + .class_init = hvf_accel_class_init, > >> +}; > >> + > >> +static void hvf_type_init(void) > >> +{ > >> + type_register_static(&hvf_accel_type); > >> +} > >> + > >> +type_init(hvf_type_init); > >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build > >> new file mode 100644 > >> index 0000000000..dfd6b68dc7 > >> --- /dev/null > >> +++ b/accel/hvf/meson.build > >> @@ -0,0 +1,7 @@ > >> +hvf_ss = ss.source_set() > >> +hvf_ss.add(files( > >> + 'hvf-all.c', > >> + 'hvf-cpus.c', > >> +)) > >> + > >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) > >> diff --git a/accel/meson.build b/accel/meson.build > >> index b26cca227a..6de12ce5d5 100644 > >> --- a/accel/meson.build > >> +++ b/accel/meson.build > >> @@ -1,5 +1,6 @@ > >> softmmu_ss.add(files('accel.c')) > >> > >> +subdir('hvf') > >> subdir('qtest') > >> subdir('kvm') > >> subdir('tcg') > >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h > >> new file mode 100644 > >> index 0000000000..de9bad23a8 > >> --- /dev/null > >> +++ b/include/sysemu/hvf_int.h > >> @@ -0,0 +1,69 @@ > >> +/* > >> + * QEMU Hypervisor.framework (HVF) support > >> + * > >> + * This work is licensed under the terms of the GNU GPL, version 2 or > later. > >> + * See the COPYING file in the top-level directory. > >> + * > >> + */ > >> + > >> +/* header to be included in HVF-specific code */ > >> + > >> +#ifndef HVF_INT_H > >> +#define HVF_INT_H > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +#define HVF_MAX_VCPU 0x10 > >> + > >> +extern struct hvf_state hvf_global; > >> + > >> +struct hvf_vm { > >> + int id; > >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > >> +}; > >> + > >> +struct hvf_state { > >> + uint32_t version; > >> + struct hvf_vm *vm; > >> + uint64_t mem_quota; > >> +}; > >> + > >> +/* hvf_slot flags */ > >> +#define HVF_SLOT_LOG (1 << 0) > >> + > >> +typedef struct hvf_slot { > >> + uint64_t start; > >> + uint64_t size; > >> + uint8_t *mem; > >> + int slot_id; > >> + uint32_t flags; > >> + MemoryRegion *region; > >> +} hvf_slot; > >> + > >> +typedef struct hvf_vcpu_caps { > >> + uint64_t vmx_cap_pinbased; > >> + uint64_t vmx_cap_procbased; > >> + uint64_t vmx_cap_procbased2; > >> + uint64_t vmx_cap_entry; > >> + uint64_t vmx_cap_exit; > >> + uint64_t vmx_cap_preemption_timer; > >> +} hvf_vcpu_caps; > >> + > >> +struct HVFState { > >> + AccelState parent; > >> + hvf_slot slots[32]; > >> + int num_slots; > >> + > >> + hvf_vcpu_caps *hvf_caps; > >> +}; > >> +extern HVFState *hvf_state; > >> + > >> +void assert_hvf_ok(hv_return_t ret); > >> +int hvf_get_registers(CPUState *cpu); > >> +int hvf_put_registers(CPUState *cpu); > >> +int hvf_arch_init_vcpu(CPUState *cpu); > >> +void hvf_arch_vcpu_destroy(CPUState *cpu); > >> +int hvf_vcpu_exec(CPUState *cpu); > >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > >> + > >> +#endif > >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c > >> deleted file mode 100644 > >> index 817b3d7452..0000000000 > >> --- a/target/i386/hvf/hvf-cpus.c > >> +++ /dev/null > >> @@ -1,131 +0,0 @@ > >> -/* > >> - * Copyright 2008 IBM Corporation > >> - * 2008 Red Hat, Inc. > >> - * Copyright 2011 Intel Corporation > >> - * Copyright 2016 Veertu, Inc. > >> - * Copyright 2017 The Android Open Source Project > >> - * > >> - * QEMU Hypervisor.framework support > >> - * > >> - * This program is free software; you can redistribute it and/or > >> - * modify it under the terms of version 2 of the GNU General Public > >> - * License as published by the Free Software Foundation. > >> - * > >> - * This program is distributed in the hope that it will be useful, > >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of > >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >> - * General Public License for more details. > >> - * > >> - * You should have received a copy of the GNU General Public License > >> - * along with this program; if not, see <http://www.gnu.org/licenses/ > >. > >> - * > >> - * This file contain code under public domain from the hvdos project: > >> - * https://github.com/mist64/hvdos > >> - * > >> - * Parts Copyright (c) 2011 NetApp, Inc. > >> - * All rights reserved. > >> - * > >> - * Redistribution and use in source and binary forms, with or without > >> - * modification, are permitted provided that the following conditions > >> - * are met: > >> - * 1. Redistributions of source code must retain the above copyright > >> - * notice, this list of conditions and the following disclaimer. > >> - * 2. Redistributions in binary form must reproduce the above copyright > >> - * notice, this list of conditions and the following disclaimer in > the > >> - * documentation and/or other materials provided with the > distribution. > >> - * > >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > THE > >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE > LIABLE > >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE > GOODS > >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN > ANY WAY > >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY > OF > >> - * SUCH DAMAGE. > >> - */ > >> - > >> -#include "qemu/osdep.h" > >> -#include "qemu/error-report.h" > >> -#include "qemu/main-loop.h" > >> -#include "sysemu/hvf.h" > >> -#include "sysemu/runstate.h" > >> -#include "target/i386/cpu.h" > >> -#include "qemu/guest-random.h" > >> - > >> -#include "hvf-cpus.h" > >> - > >> -/* > >> - * The HVF-specific vCPU thread function. This one should only run > when the host > >> - * CPU supports the VMX "unrestricted guest" feature. > >> - */ > >> -static void *hvf_cpu_thread_fn(void *arg) > >> -{ > >> - CPUState *cpu = arg; > >> - > >> - int r; > >> - > >> - assert(hvf_enabled()); > >> - > >> - rcu_register_thread(); > >> - > >> - qemu_mutex_lock_iothread(); > >> - qemu_thread_get_self(cpu->thread); > >> - > >> - cpu->thread_id = qemu_get_thread_id(); > >> - cpu->can_do_io = 1; > >> - current_cpu = cpu; > >> - > >> - hvf_init_vcpu(cpu); > >> - > >> - /* signal CPU creation */ > >> - cpu_thread_signal_created(cpu); > >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); > >> - > >> - do { > >> - if (cpu_can_run(cpu)) { > >> - r = hvf_vcpu_exec(cpu); > >> - if (r == EXCP_DEBUG) { > >> - cpu_handle_guest_debug(cpu); > >> - } > >> - } > >> - qemu_wait_io_event(cpu); > >> - } while (!cpu->unplug || cpu_can_run(cpu)); > >> - > >> - hvf_vcpu_destroy(cpu); > >> - cpu_thread_signal_destroyed(cpu); > >> - qemu_mutex_unlock_iothread(); > >> - rcu_unregister_thread(); > >> - return NULL; > >> -} > >> - > >> -static void hvf_start_vcpu_thread(CPUState *cpu) > >> -{ > >> - char thread_name[VCPU_THREAD_NAME_SIZE]; > >> - > >> - /* > >> - * HVF currently does not support TCG, and only runs in > >> - * unrestricted-guest mode. > >> - */ > >> - assert(hvf_enabled()); > >> - > >> - cpu->thread = g_malloc0(sizeof(QemuThread)); > >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > >> - qemu_cond_init(cpu->halt_cond); > >> - > >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > >> - cpu->cpu_index); > >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, > >> - cpu, QEMU_THREAD_JOINABLE); > >> -} > >> - > >> -const CpusAccel hvf_cpus = { > >> - .create_vcpu_thread = hvf_start_vcpu_thread, > >> - > >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, > >> - .synchronize_state = hvf_cpu_synchronize_state, > >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > >> -}; > >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h > >> deleted file mode 100644 > >> index ced31b82c0..0000000000 > >> --- a/target/i386/hvf/hvf-cpus.h > >> +++ /dev/null > >> @@ -1,25 +0,0 @@ > >> -/* > >> - * Accelerator CPUS Interface > >> - * > >> - * Copyright 2020 SUSE LLC > >> - * > >> - * This work is licensed under the terms of the GNU GPL, version 2 or > later. > >> - * See the COPYING file in the top-level directory. > >> - */ > >> - > >> -#ifndef HVF_CPUS_H > >> -#define HVF_CPUS_H > >> - > >> -#include "sysemu/cpus.h" > >> - > >> -extern const CpusAccel hvf_cpus; > >> - > >> -int hvf_init_vcpu(CPUState *); > >> -int hvf_vcpu_exec(CPUState *); > >> -void hvf_cpu_synchronize_state(CPUState *); > >> -void hvf_cpu_synchronize_post_reset(CPUState *); > >> -void hvf_cpu_synchronize_post_init(CPUState *); > >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); > >> -void hvf_vcpu_destroy(CPUState *); > >> - > >> -#endif /* HVF_CPUS_H */ > >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h > >> index e0edffd077..6d56f8f6bb 100644 > >> --- a/target/i386/hvf/hvf-i386.h > >> +++ b/target/i386/hvf/hvf-i386.h > >> @@ -18,57 +18,11 @@ > >> > >> #include "sysemu/accel.h" > >> #include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> #include "cpu.h" > >> #include "x86.h" > >> > >> -#define HVF_MAX_VCPU 0x10 > >> - > >> -extern struct hvf_state hvf_global; > >> - > >> -struct hvf_vm { > >> - int id; > >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > >> -}; > >> - > >> -struct hvf_state { > >> - uint32_t version; > >> - struct hvf_vm *vm; > >> - uint64_t mem_quota; > >> -}; > >> - > >> -/* hvf_slot flags */ > >> -#define HVF_SLOT_LOG (1 << 0) > >> - > >> -typedef struct hvf_slot { > >> - uint64_t start; > >> - uint64_t size; > >> - uint8_t *mem; > >> - int slot_id; > >> - uint32_t flags; > >> - MemoryRegion *region; > >> -} hvf_slot; > >> - > >> -typedef struct hvf_vcpu_caps { > >> - uint64_t vmx_cap_pinbased; > >> - uint64_t vmx_cap_procbased; > >> - uint64_t vmx_cap_procbased2; > >> - uint64_t vmx_cap_entry; > >> - uint64_t vmx_cap_exit; > >> - uint64_t vmx_cap_preemption_timer; > >> -} hvf_vcpu_caps; > >> - > >> -struct HVFState { > >> - AccelState parent; > >> - hvf_slot slots[32]; > >> - int num_slots; > >> - > >> - hvf_vcpu_caps *hvf_caps; > >> -}; > >> -extern HVFState *hvf_state; > >> - > >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); > >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); > >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > >> > >> #ifdef NEED_CPU_H > >> /* Functions exported to host specific mode */ > >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c > >> index ed9356565c..8b96ecd619 100644 > >> --- a/target/i386/hvf/hvf.c > >> +++ b/target/i386/hvf/hvf.c > >> @@ -51,6 +51,7 @@ > >> #include "qemu/error-report.h" > >> > >> #include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> #include "sysemu/runstate.h" > >> #include "hvf-i386.h" > >> #include "vmcs.h" > >> @@ -72,171 +73,6 @@ > >> #include "sysemu/accel.h" > >> #include "target/i386/cpu.h" > >> > >> -#include "hvf-cpus.h" > >> - > >> -HVFState *hvf_state; > >> - > >> -static void assert_hvf_ok(hv_return_t ret) > >> -{ > >> - if (ret == HV_SUCCESS) { > >> - return; > >> - } > >> - > >> - switch (ret) { > >> - case HV_ERROR: > >> - error_report("Error: HV_ERROR"); > >> - break; > >> - case HV_BUSY: > >> - error_report("Error: HV_BUSY"); > >> - break; > >> - case HV_BAD_ARGUMENT: > >> - error_report("Error: HV_BAD_ARGUMENT"); > >> - break; > >> - case HV_NO_RESOURCES: > >> - error_report("Error: HV_NO_RESOURCES"); > >> - break; > >> - case HV_NO_DEVICE: > >> - error_report("Error: HV_NO_DEVICE"); > >> - break; > >> - case HV_UNSUPPORTED: > >> - error_report("Error: HV_UNSUPPORTED"); > >> - break; > >> - default: > >> - error_report("Unknown Error"); > >> - } > >> - > >> - abort(); > >> -} > >> - > >> -/* Memory slots */ > >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > >> -{ > >> - hvf_slot *slot; > >> - int x; > >> - for (x = 0; x < hvf_state->num_slots; ++x) { > >> - slot = &hvf_state->slots[x]; > >> - if (slot->size && start < (slot->start + slot->size) && > >> - (start + size) > slot->start) { > >> - return slot; > >> - } > >> - } > >> - return NULL; > >> -} > >> - > >> -struct mac_slot { > >> - int present; > >> - uint64_t size; > >> - uint64_t gpa_start; > >> - uint64_t gva; > >> -}; > >> - > >> -struct mac_slot mac_slots[32]; > >> - > >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) > >> -{ > >> - struct mac_slot *macslot; > >> - hv_return_t ret; > >> - > >> - macslot = &mac_slots[slot->slot_id]; > >> - > >> - if (macslot->present) { > >> - if (macslot->size != slot->size) { > >> - macslot->present = 0; > >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); > >> - assert_hvf_ok(ret); > >> - } > >> - } > >> - > >> - if (!slot->size) { > >> - return 0; > >> - } > >> - > >> - macslot->present = 1; > >> - macslot->gpa_start = slot->start; > >> - macslot->size = slot->size; > >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, > flags); > >> - assert_hvf_ok(ret); > >> - return 0; > >> -} > >> - > >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) > >> -{ > >> - hvf_slot *mem; > >> - MemoryRegion *area = section->mr; > >> - bool writeable = !area->readonly && !area->rom_device; > >> - hv_memory_flags_t flags; > >> - > >> - if (!memory_region_is_ram(area)) { > >> - if (writeable) { > >> - return; > >> - } else if (!memory_region_is_romd(area)) { > >> - /* > >> - * If the memory device is not in romd_mode, then we > actually want > >> - * to remove the hvf memory slot so all accesses will trap. > >> - */ > >> - add = false; > >> - } > >> - } > >> - > >> - mem = hvf_find_overlap_slot( > >> - section->offset_within_address_space, > >> - int128_get64(section->size)); > >> - > >> - if (mem && add) { > >> - if (mem->size == int128_get64(section->size) && > >> - mem->start == section->offset_within_address_space && > >> - mem->mem == (memory_region_get_ram_ptr(area) + > >> - section->offset_within_region)) { > >> - return; /* Same region was attempted to register, go away. > */ > >> - } > >> - } > >> - > >> - /* Region needs to be reset. set the size to 0 and remap it. */ > >> - if (mem) { > >> - mem->size = 0; > >> - if (do_hvf_set_memory(mem, 0)) { > >> - error_report("Failed to reset overlapping slot"); > >> - abort(); > >> - } > >> - } > >> - > >> - if (!add) { > >> - return; > >> - } > >> - > >> - if (area->readonly || > >> - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { > >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > >> - } else { > >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; > >> - } > >> - > >> - /* Now make a new slot. */ > >> - int x; > >> - > >> - for (x = 0; x < hvf_state->num_slots; ++x) { > >> - mem = &hvf_state->slots[x]; > >> - if (!mem->size) { > >> - break; > >> - } > >> - } > >> - > >> - if (x == hvf_state->num_slots) { > >> - error_report("No free slots"); > >> - abort(); > >> - } > >> - > >> - mem->size = int128_get64(section->size); > >> - mem->mem = memory_region_get_ram_ptr(area) + > section->offset_within_region; > >> - mem->start = section->offset_within_address_space; > >> - mem->region = area; > >> - > >> - if (do_hvf_set_memory(mem, flags)) { > >> - error_report("Error registering new memory slot"); > >> - abort(); > >> - } > >> -} > >> - > >> void vmx_update_tpr(CPUState *cpu) > >> { > >> /* TODO: need integrate APIC handling */ > >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t > port, void *buffer, > >> } > >> } > >> > >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, > run_on_cpu_data arg) > >> -{ > >> - if (!cpu->vcpu_dirty) { > >> - hvf_get_registers(cpu); > >> - cpu->vcpu_dirty = true; > >> - } > >> -} > >> - > >> -void hvf_cpu_synchronize_state(CPUState *cpu) > >> -{ > >> - if (!cpu->vcpu_dirty) { > >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); > >> - } > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > >> - run_on_cpu_data arg) > >> -{ > >> - hvf_put_registers(cpu); > >> - cpu->vcpu_dirty = false; > >> -} > >> - > >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) > >> -{ > >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, > RUN_ON_CPU_NULL); > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > >> - run_on_cpu_data arg) > >> -{ > >> - hvf_put_registers(cpu); > >> - cpu->vcpu_dirty = false; > >> -} > >> - > >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) > >> -{ > >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > >> - run_on_cpu_data arg) > >> -{ > >> - cpu->vcpu_dirty = true; > >> -} > >> - > >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > >> -{ > >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, > RUN_ON_CPU_NULL); > >> -} > >> - > >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, > uint64_t ept_qual) > >> { > >> int read, write; > >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, > uint64_t gpa, uint64_t ept_qual) > >> return false; > >> } > >> > >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool > on) > >> -{ > >> - hvf_slot *slot; > >> - > >> - slot = hvf_find_overlap_slot( > >> - section->offset_within_address_space, > >> - int128_get64(section->size)); > >> - > >> - /* protect region against writes; begin tracking it */ > >> - if (on) { > >> - slot->flags |= HVF_SLOT_LOG; > >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > >> - HV_MEMORY_READ); > >> - /* stop tracking region*/ > >> - } else { > >> - slot->flags &= ~HVF_SLOT_LOG; > >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > >> - HV_MEMORY_READ | HV_MEMORY_WRITE); > >> - } > >> -} > >> - > >> -static void hvf_log_start(MemoryListener *listener, > >> - MemoryRegionSection *section, int old, int > new) > >> -{ > >> - if (old != 0) { > >> - return; > >> - } > >> - > >> - hvf_set_dirty_tracking(section, 1); > >> -} > >> - > >> -static void hvf_log_stop(MemoryListener *listener, > >> - MemoryRegionSection *section, int old, int > new) > >> -{ > >> - if (new != 0) { > >> - return; > >> - } > >> - > >> - hvf_set_dirty_tracking(section, 0); > >> -} > >> - > >> -static void hvf_log_sync(MemoryListener *listener, > >> - MemoryRegionSection *section) > >> -{ > >> - /* > >> - * sync of dirty pages is handled elsewhere; just make sure we keep > >> - * tracking the region. > >> - */ > >> - hvf_set_dirty_tracking(section, 1); > >> -} > >> - > >> -static void hvf_region_add(MemoryListener *listener, > >> - MemoryRegionSection *section) > >> -{ > >> - hvf_set_phys_mem(section, true); > >> -} > >> - > >> -static void hvf_region_del(MemoryListener *listener, > >> - MemoryRegionSection *section) > >> -{ > >> - hvf_set_phys_mem(section, false); > >> -} > >> - > >> -static MemoryListener hvf_memory_listener = { > >> - .priority = 10, > >> - .region_add = hvf_region_add, > >> - .region_del = hvf_region_del, > >> - .log_start = hvf_log_start, > >> - .log_stop = hvf_log_stop, > >> - .log_sync = hvf_log_sync, > >> -}; > >> - > >> -void hvf_vcpu_destroy(CPUState *cpu) > >> +void hvf_arch_vcpu_destroy(CPUState *cpu) > >> { > >> X86CPU *x86_cpu = X86_CPU(cpu); > >> CPUX86State *env = &x86_cpu->env; > >> > >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); > >> g_free(env->hvf_mmio_buf); > >> - assert_hvf_ok(ret); > >> -} > >> - > >> -static void dummy_signal(int sig) > >> -{ > >> } > >> > >> -int hvf_init_vcpu(CPUState *cpu) > >> +int hvf_arch_init_vcpu(CPUState *cpu) > >> { > >> > >> X86CPU *x86cpu = X86_CPU(cpu); > >> CPUX86State *env = &x86cpu->env; > >> - int r; > >> - > >> - /* init cpu signals */ > >> - sigset_t set; > >> - struct sigaction sigact; > >> - > >> - memset(&sigact, 0, sizeof(sigact)); > >> - sigact.sa_handler = dummy_signal; > >> - sigaction(SIG_IPI, &sigact, NULL); > >> - > >> - pthread_sigmask(SIG_BLOCK, NULL, &set); > >> - sigdelset(&set, SIG_IPI); > >> > >> init_emu(); > >> init_decoder(); > >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) > >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); > >> env->hvf_mmio_buf = g_new(char, 4096); > >> > >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); > >> - cpu->vcpu_dirty = 1; > >> - assert_hvf_ok(r); > >> - > >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, > >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { > >> abort(); > >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) > >> > >> return ret; > >> } > >> - > >> -bool hvf_allowed; > >> - > >> -static int hvf_accel_init(MachineState *ms) > >> -{ > >> - int x; > >> - hv_return_t ret; > >> - HVFState *s; > >> - > >> - ret = hv_vm_create(HV_VM_DEFAULT); > >> - assert_hvf_ok(ret); > >> - > >> - s = g_new0(HVFState, 1); > >> - > >> - s->num_slots = 32; > >> - for (x = 0; x < s->num_slots; ++x) { > >> - s->slots[x].size = 0; > >> - s->slots[x].slot_id = x; > >> - } > >> - > >> - hvf_state = s; > >> - memory_listener_register(&hvf_memory_listener, > &address_space_memory); > >> - cpus_register_accel(&hvf_cpus); > >> - return 0; > >> -} > >> - > >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) > >> -{ > >> - AccelClass *ac = ACCEL_CLASS(oc); > >> - ac->name = "HVF"; > >> - ac->init_machine = hvf_accel_init; > >> - ac->allowed = &hvf_allowed; > >> -} > >> - > >> -static const TypeInfo hvf_accel_type = { > >> - .name = TYPE_HVF_ACCEL, > >> - .parent = TYPE_ACCEL, > >> - .class_init = hvf_accel_class_init, > >> -}; > >> - > >> -static void hvf_type_init(void) > >> -{ > >> - type_register_static(&hvf_accel_type); > >> -} > >> - > >> -type_init(hvf_type_init); > >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build > >> index 409c9a3f14..c8a43717ee 100644 > >> --- a/target/i386/hvf/meson.build > >> +++ b/target/i386/hvf/meson.build > >> @@ -1,6 +1,5 @@ > >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( > >> 'hvf.c', > >> - 'hvf-cpus.c', > >> 'x86.c', > >> 'x86_cpuid.c', > >> 'x86_decode.c', > >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c > >> index bbec412b6c..89b8e9d87a 100644 > >> --- a/target/i386/hvf/x86hvf.c > >> +++ b/target/i386/hvf/x86hvf.c > >> @@ -20,6 +20,9 @@ > >> #include "qemu/osdep.h" > >> > >> #include "qemu-common.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/hw_accel.h" > >> #include "x86hvf.h" > >> #include "vmx.h" > >> #include "vmcs.h" > >> @@ -32,8 +35,6 @@ > >> #include <Hypervisor/hv.h> > >> #include <Hypervisor/hv_vmx.h> > >> > >> -#include "hvf-cpus.h" > >> - > >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment > *vmx_seg, > >> SegmentCache *qseg, bool is_tr) > >> { > >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) > >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); > >> > >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { > >> - hvf_cpu_synchronize_state(cpu_state); > >> + cpu_synchronize_state(cpu_state); > >> do_cpu_init(cpu); > >> } > >> > >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) > >> cpu_state->halted = 0; > >> } > >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { > >> - hvf_cpu_synchronize_state(cpu_state); > >> + cpu_synchronize_state(cpu_state); > >> do_cpu_sipi(cpu); > >> } > >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { > >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; > >> - hvf_cpu_synchronize_state(cpu_state); > >> + cpu_synchronize_state(cpu_state); > > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should > > be a separate patch. It follows cpu/accel cleanups Claudio was doing the > > summer. > > > The only reason they're in here is because we no longer have access to > the hvf_ functions from the file. I am perfectly happy to rebase the > patch on top of Claudio's if his goes in first. I'm sure it'll be > trivial for him to rebase on top of this too if my series goes in first. > > > > > > Phillipe raised the idea that the patch might go ahead of ARM-specific > > part (which might involve some discussions) and I agree with that. > > > > Some sync between Claudio series (CC'd him) and the patch might be need. > > > I would prefer not to hold back because of the sync. Claudio's cleanup > is trivial enough to adjust for if it gets merged ahead of this. > > > Alex > > > >
Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But the high CPU usage seems to be mitigated by having a poll interval (like KVM does) in handling WFI: https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 This is loosely inspired by https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 which does seem to specify a poll interval. It would be cool if we could have a lightweight way to enter sleep and restart the vcpus precisely when CVAL passes, though. Frank On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote: > Hi all, > > +Peter Collingbourne <pcc@google.com> > > I'm a developer on the Android Emulator, which is in a fork of QEMU. > > Peter and I have been working on an HVF Apple Silicon backend with an eye > toward Android guests. > > We have gotten things to basically switch to Android userspace already > (logcat/shell and graphics available at least) > > Our strategy so far has been to import logic from the KVM implementation > and hook into QEMU's software devices that previously assumed to only work > with TCG, or have KVM-specific paths. > > Thanks to Alexander for the tip on the 36-bit address space limitation > btw; our way of addressing this is to still allow highmem but not put pci > high mmio so high. > > Also, note we have a sleep/signal based mechanism to deal with WFx, which > might be worth looking into in Alexander's implementation as well: > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 > > Patches so far, FYI: > > > https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 > > > https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a > > https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b > > https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 > > https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 > > https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 > > https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 > > Peter's also noticed that there are extra steps needed for M1's to allow > TCG to work, as it involves JIT: > > > https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 > > We'd appreciate any feedback/comments :) > > Best, > > Frank > > On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote: > >> >> On 27.11.20 21:00, Roman Bolshakov wrote: >> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: >> >> Until now, Hypervisor.framework has only been available on x86_64 >> systems. >> >> With Apple Silicon shipping now, it extends its reach to aarch64. To >> >> prepare for support for multiple architectures, let's move common code >> out >> >> into its own accel directory. >> >> >> >> Signed-off-by: Alexander Graf <agraf@csgraf.de> >> >> --- >> >> MAINTAINERS | 9 +- >> >> accel/hvf/hvf-all.c | 56 +++++ >> >> accel/hvf/hvf-cpus.c | 468 >> ++++++++++++++++++++++++++++++++++++ >> >> accel/hvf/meson.build | 7 + >> >> accel/meson.build | 1 + >> >> include/sysemu/hvf_int.h | 69 ++++++ >> >> target/i386/hvf/hvf-cpus.c | 131 ---------- >> >> target/i386/hvf/hvf-cpus.h | 25 -- >> >> target/i386/hvf/hvf-i386.h | 48 +--- >> >> target/i386/hvf/hvf.c | 360 +-------------------------- >> >> target/i386/hvf/meson.build | 1 - >> >> target/i386/hvf/x86hvf.c | 11 +- >> >> target/i386/hvf/x86hvf.h | 2 - >> >> 13 files changed, 619 insertions(+), 569 deletions(-) >> >> create mode 100644 accel/hvf/hvf-all.c >> >> create mode 100644 accel/hvf/hvf-cpus.c >> >> create mode 100644 accel/hvf/meson.build >> >> create mode 100644 include/sysemu/hvf_int.h >> >> delete mode 100644 target/i386/hvf/hvf-cpus.c >> >> delete mode 100644 target/i386/hvf/hvf-cpus.h >> >> >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> >> index 68bc160f41..ca4b6d9279 100644 >> >> --- a/MAINTAINERS >> >> +++ b/MAINTAINERS >> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> >> >> M: Roman Bolshakov <r.bolshakov@yadro.com> >> >> W: https://wiki.qemu.org/Features/HVF >> >> S: Maintained >> >> -F: accel/stubs/hvf-stub.c >> > There was a patch for that in the RFC series from Claudio. >> >> >> Yeah, I'm not worried about this hunk :). >> >> >> > >> >> F: target/i386/hvf/ >> >> + >> >> +HVF >> >> +M: Cameron Esfahani <dirty@apple.com> >> >> +M: Roman Bolshakov <r.bolshakov@yadro.com> >> >> +W: https://wiki.qemu.org/Features/HVF >> >> +S: Maintained >> >> +F: accel/hvf/ >> >> F: include/sysemu/hvf.h >> >> +F: include/sysemu/hvf_int.h >> >> >> >> WHPX CPUs >> >> M: Sunil Muthuswamy <sunilmut@microsoft.com> >> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c >> >> new file mode 100644 >> >> index 0000000000..47d77a472a >> >> --- /dev/null >> >> +++ b/accel/hvf/hvf-all.c >> >> @@ -0,0 +1,56 @@ >> >> +/* >> >> + * QEMU Hypervisor.framework support >> >> + * >> >> + * This work is licensed under the terms of the GNU GPL, version 2. >> See >> >> + * the COPYING file in the top-level directory. >> >> + * >> >> + * Contributions after 2012-01-13 are licensed under the terms of the >> >> + * GNU GPL, version 2 or (at your option) any later version. >> >> + */ >> >> + >> >> +#include "qemu/osdep.h" >> >> +#include "qemu-common.h" >> >> +#include "qemu/error-report.h" >> >> +#include "sysemu/hvf.h" >> >> +#include "sysemu/hvf_int.h" >> >> +#include "sysemu/runstate.h" >> >> + >> >> +#include "qemu/main-loop.h" >> >> +#include "sysemu/accel.h" >> >> + >> >> +#include <Hypervisor/Hypervisor.h> >> >> + >> >> +bool hvf_allowed; >> >> +HVFState *hvf_state; >> >> + >> >> +void assert_hvf_ok(hv_return_t ret) >> >> +{ >> >> + if (ret == HV_SUCCESS) { >> >> + return; >> >> + } >> >> + >> >> + switch (ret) { >> >> + case HV_ERROR: >> >> + error_report("Error: HV_ERROR"); >> >> + break; >> >> + case HV_BUSY: >> >> + error_report("Error: HV_BUSY"); >> >> + break; >> >> + case HV_BAD_ARGUMENT: >> >> + error_report("Error: HV_BAD_ARGUMENT"); >> >> + break; >> >> + case HV_NO_RESOURCES: >> >> + error_report("Error: HV_NO_RESOURCES"); >> >> + break; >> >> + case HV_NO_DEVICE: >> >> + error_report("Error: HV_NO_DEVICE"); >> >> + break; >> >> + case HV_UNSUPPORTED: >> >> + error_report("Error: HV_UNSUPPORTED"); >> >> + break; >> >> + default: >> >> + error_report("Unknown Error"); >> >> + } >> >> + >> >> + abort(); >> >> +} >> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c >> >> new file mode 100644 >> >> index 0000000000..f9bb5502b7 >> >> --- /dev/null >> >> +++ b/accel/hvf/hvf-cpus.c >> >> @@ -0,0 +1,468 @@ >> >> +/* >> >> + * Copyright 2008 IBM Corporation >> >> + * 2008 Red Hat, Inc. >> >> + * Copyright 2011 Intel Corporation >> >> + * Copyright 2016 Veertu, Inc. >> >> + * Copyright 2017 The Android Open Source Project >> >> + * >> >> + * QEMU Hypervisor.framework support >> >> + * >> >> + * This program is free software; you can redistribute it and/or >> >> + * modify it under the terms of version 2 of the GNU General Public >> >> + * License as published by the Free Software Foundation. >> >> + * >> >> + * This program is distributed in the hope that it will be useful, >> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> >> + * General Public License for more details. >> >> + * >> >> + * You should have received a copy of the GNU General Public License >> >> + * along with this program; if not, see <http://www.gnu.org/licenses/ >> >. >> >> + * >> >> + * This file contain code under public domain from the hvdos project: >> >> + * https://github.com/mist64/hvdos >> >> + * >> >> + * Parts Copyright (c) 2011 NetApp, Inc. >> >> + * All rights reserved. >> >> + * >> >> + * Redistribution and use in source and binary forms, with or without >> >> + * modification, are permitted provided that the following conditions >> >> + * are met: >> >> + * 1. Redistributions of source code must retain the above copyright >> >> + * notice, this list of conditions and the following disclaimer. >> >> + * 2. Redistributions in binary form must reproduce the above >> copyright >> >> + * notice, this list of conditions and the following disclaimer in >> the >> >> + * documentation and/or other materials provided with the >> distribution. >> >> + * >> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >> THE >> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR >> PURPOSE >> >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE >> LIABLE >> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >> CONSEQUENTIAL >> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE >> GOODS >> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >> INTERRUPTION) >> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >> CONTRACT, STRICT >> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN >> ANY WAY >> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >> POSSIBILITY OF >> >> + * SUCH DAMAGE. >> >> + */ >> >> + >> >> +#include "qemu/osdep.h" >> >> +#include "qemu/error-report.h" >> >> +#include "qemu/main-loop.h" >> >> +#include "exec/address-spaces.h" >> >> +#include "exec/exec-all.h" >> >> +#include "sysemu/cpus.h" >> >> +#include "sysemu/hvf.h" >> >> +#include "sysemu/hvf_int.h" >> >> +#include "sysemu/runstate.h" >> >> +#include "qemu/guest-random.h" >> >> + >> >> +#include <Hypervisor/Hypervisor.h> >> >> + >> >> +/* Memory slots */ >> >> + >> >> +struct mac_slot { >> >> + int present; >> >> + uint64_t size; >> >> + uint64_t gpa_start; >> >> + uint64_t gva; >> >> +}; >> >> + >> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >> >> +{ >> >> + hvf_slot *slot; >> >> + int x; >> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >> >> + slot = &hvf_state->slots[x]; >> >> + if (slot->size && start < (slot->start + slot->size) && >> >> + (start + size) > slot->start) { >> >> + return slot; >> >> + } >> >> + } >> >> + return NULL; >> >> +} >> >> + >> >> +struct mac_slot mac_slots[32]; >> >> + >> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >> >> +{ >> >> + struct mac_slot *macslot; >> >> + hv_return_t ret; >> >> + >> >> + macslot = &mac_slots[slot->slot_id]; >> >> + >> >> + if (macslot->present) { >> >> + if (macslot->size != slot->size) { >> >> + macslot->present = 0; >> >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >> >> + assert_hvf_ok(ret); >> >> + } >> >> + } >> >> + >> >> + if (!slot->size) { >> >> + return 0; >> >> + } >> >> + >> >> + macslot->present = 1; >> >> + macslot->gpa_start = slot->start; >> >> + macslot->size = slot->size; >> >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); >> >> + assert_hvf_ok(ret); >> >> + return 0; >> >> +} >> >> + >> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >> >> +{ >> >> + hvf_slot *mem; >> >> + MemoryRegion *area = section->mr; >> >> + bool writeable = !area->readonly && !area->rom_device; >> >> + hv_memory_flags_t flags; >> >> + >> >> + if (!memory_region_is_ram(area)) { >> >> + if (writeable) { >> >> + return; >> >> + } else if (!memory_region_is_romd(area)) { >> >> + /* >> >> + * If the memory device is not in romd_mode, then we >> actually want >> >> + * to remove the hvf memory slot so all accesses will >> trap. >> >> + */ >> >> + add = false; >> >> + } >> >> + } >> >> + >> >> + mem = hvf_find_overlap_slot( >> >> + section->offset_within_address_space, >> >> + int128_get64(section->size)); >> >> + >> >> + if (mem && add) { >> >> + if (mem->size == int128_get64(section->size) && >> >> + mem->start == section->offset_within_address_space && >> >> + mem->mem == (memory_region_get_ram_ptr(area) + >> >> + section->offset_within_region)) { >> >> + return; /* Same region was attempted to register, go >> away. */ >> >> + } >> >> + } >> >> + >> >> + /* Region needs to be reset. set the size to 0 and remap it. */ >> >> + if (mem) { >> >> + mem->size = 0; >> >> + if (do_hvf_set_memory(mem, 0)) { >> >> + error_report("Failed to reset overlapping slot"); >> >> + abort(); >> >> + } >> >> + } >> >> + >> >> + if (!add) { >> >> + return; >> >> + } >> >> + >> >> + if (area->readonly || >> >> + (!memory_region_is_ram(area) && memory_region_is_romd(area))) >> { >> >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >> >> + } else { >> >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >> >> + } >> >> + >> >> + /* Now make a new slot. */ >> >> + int x; >> >> + >> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >> >> + mem = &hvf_state->slots[x]; >> >> + if (!mem->size) { >> >> + break; >> >> + } >> >> + } >> >> + >> >> + if (x == hvf_state->num_slots) { >> >> + error_report("No free slots"); >> >> + abort(); >> >> + } >> >> + >> >> + mem->size = int128_get64(section->size); >> >> + mem->mem = memory_region_get_ram_ptr(area) + >> section->offset_within_region; >> >> + mem->start = section->offset_within_address_space; >> >> + mem->region = area; >> >> + >> >> + if (do_hvf_set_memory(mem, flags)) { >> >> + error_report("Error registering new memory slot"); >> >> + abort(); >> >> + } >> >> +} >> >> + >> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool >> on) >> >> +{ >> >> + hvf_slot *slot; >> >> + >> >> + slot = hvf_find_overlap_slot( >> >> + section->offset_within_address_space, >> >> + int128_get64(section->size)); >> >> + >> >> + /* protect region against writes; begin tracking it */ >> >> + if (on) { >> >> + slot->flags |= HVF_SLOT_LOG; >> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >> >> + HV_MEMORY_READ); >> >> + /* stop tracking region*/ >> >> + } else { >> >> + slot->flags &= ~HVF_SLOT_LOG; >> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >> >> + HV_MEMORY_READ | HV_MEMORY_WRITE); >> >> + } >> >> +} >> >> + >> >> +static void hvf_log_start(MemoryListener *listener, >> >> + MemoryRegionSection *section, int old, int >> new) >> >> +{ >> >> + if (old != 0) { >> >> + return; >> >> + } >> >> + >> >> + hvf_set_dirty_tracking(section, 1); >> >> +} >> >> + >> >> +static void hvf_log_stop(MemoryListener *listener, >> >> + MemoryRegionSection *section, int old, int >> new) >> >> +{ >> >> + if (new != 0) { >> >> + return; >> >> + } >> >> + >> >> + hvf_set_dirty_tracking(section, 0); >> >> +} >> >> + >> >> +static void hvf_log_sync(MemoryListener *listener, >> >> + MemoryRegionSection *section) >> >> +{ >> >> + /* >> >> + * sync of dirty pages is handled elsewhere; just make sure we >> keep >> >> + * tracking the region. >> >> + */ >> >> + hvf_set_dirty_tracking(section, 1); >> >> +} >> >> + >> >> +static void hvf_region_add(MemoryListener *listener, >> >> + MemoryRegionSection *section) >> >> +{ >> >> + hvf_set_phys_mem(section, true); >> >> +} >> >> + >> >> +static void hvf_region_del(MemoryListener *listener, >> >> + MemoryRegionSection *section) >> >> +{ >> >> + hvf_set_phys_mem(section, false); >> >> +} >> >> + >> >> +static MemoryListener hvf_memory_listener = { >> >> + .priority = 10, >> >> + .region_add = hvf_region_add, >> >> + .region_del = hvf_region_del, >> >> + .log_start = hvf_log_start, >> >> + .log_stop = hvf_log_stop, >> >> + .log_sync = hvf_log_sync, >> >> +}; >> >> + >> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, >> run_on_cpu_data arg) >> >> +{ >> >> + if (!cpu->vcpu_dirty) { >> >> + hvf_get_registers(cpu); >> >> + cpu->vcpu_dirty = true; >> >> + } >> >> +} >> >> + >> >> +static void hvf_cpu_synchronize_state(CPUState *cpu) >> >> +{ >> >> + if (!cpu->vcpu_dirty) { >> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, >> RUN_ON_CPU_NULL); >> >> + } >> >> +} >> >> + >> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >> >> + run_on_cpu_data arg) >> >> +{ >> >> + hvf_put_registers(cpu); >> >> + cpu->vcpu_dirty = false; >> >> +} >> >> + >> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) >> >> +{ >> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, >> RUN_ON_CPU_NULL); >> >> +} >> >> + >> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >> >> + run_on_cpu_data arg) >> >> +{ >> >> + hvf_put_registers(cpu); >> >> + cpu->vcpu_dirty = false; >> >> +} >> >> + >> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) >> >> +{ >> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, >> RUN_ON_CPU_NULL); >> >> +} >> >> + >> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >> >> + run_on_cpu_data arg) >> >> +{ >> >> + cpu->vcpu_dirty = true; >> >> +} >> >> + >> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >> >> +{ >> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, >> RUN_ON_CPU_NULL); >> >> +} >> >> + >> >> +static void hvf_vcpu_destroy(CPUState *cpu) >> >> +{ >> >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); >> >> + assert_hvf_ok(ret); >> >> + >> >> + hvf_arch_vcpu_destroy(cpu); >> >> +} >> >> + >> >> +static void dummy_signal(int sig) >> >> +{ >> >> +} >> >> + >> >> +static int hvf_init_vcpu(CPUState *cpu) >> >> +{ >> >> + int r; >> >> + >> >> + /* init cpu signals */ >> >> + sigset_t set; >> >> + struct sigaction sigact; >> >> + >> >> + memset(&sigact, 0, sizeof(sigact)); >> >> + sigact.sa_handler = dummy_signal; >> >> + sigaction(SIG_IPI, &sigact, NULL); >> >> + >> >> + pthread_sigmask(SIG_BLOCK, NULL, &set); >> >> + sigdelset(&set, SIG_IPI); >> >> + >> >> +#ifdef __aarch64__ >> >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t >> **)&cpu->hvf_exit, NULL); >> >> +#else >> >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >> >> +#endif >> > I think the first __aarch64__ bit fits better to arm part of the series. >> >> >> Oops. Thanks for catching it! Yes, absolutely. It should be part of the >> ARM enablement. >> >> >> > >> >> + cpu->vcpu_dirty = 1; >> >> + assert_hvf_ok(r); >> >> + >> >> + return hvf_arch_init_vcpu(cpu); >> >> +} >> >> + >> >> +/* >> >> + * The HVF-specific vCPU thread function. This one should only run >> when the host >> >> + * CPU supports the VMX "unrestricted guest" feature. >> >> + */ >> >> +static void *hvf_cpu_thread_fn(void *arg) >> >> +{ >> >> + CPUState *cpu = arg; >> >> + >> >> + int r; >> >> + >> >> + assert(hvf_enabled()); >> >> + >> >> + rcu_register_thread(); >> >> + >> >> + qemu_mutex_lock_iothread(); >> >> + qemu_thread_get_self(cpu->thread); >> >> + >> >> + cpu->thread_id = qemu_get_thread_id(); >> >> + cpu->can_do_io = 1; >> >> + current_cpu = cpu; >> >> + >> >> + hvf_init_vcpu(cpu); >> >> + >> >> + /* signal CPU creation */ >> >> + cpu_thread_signal_created(cpu); >> >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); >> >> + >> >> + do { >> >> + if (cpu_can_run(cpu)) { >> >> + r = hvf_vcpu_exec(cpu); >> >> + if (r == EXCP_DEBUG) { >> >> + cpu_handle_guest_debug(cpu); >> >> + } >> >> + } >> >> + qemu_wait_io_event(cpu); >> >> + } while (!cpu->unplug || cpu_can_run(cpu)); >> >> + >> >> + hvf_vcpu_destroy(cpu); >> >> + cpu_thread_signal_destroyed(cpu); >> >> + qemu_mutex_unlock_iothread(); >> >> + rcu_unregister_thread(); >> >> + return NULL; >> >> +} >> >> + >> >> +static void hvf_start_vcpu_thread(CPUState *cpu) >> >> +{ >> >> + char thread_name[VCPU_THREAD_NAME_SIZE]; >> >> + >> >> + /* >> >> + * HVF currently does not support TCG, and only runs in >> >> + * unrestricted-guest mode. >> >> + */ >> >> + assert(hvf_enabled()); >> >> + >> >> + cpu->thread = g_malloc0(sizeof(QemuThread)); >> >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >> >> + qemu_cond_init(cpu->halt_cond); >> >> + >> >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >> >> + cpu->cpu_index); >> >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >> >> + cpu, QEMU_THREAD_JOINABLE); >> >> +} >> >> + >> >> +static const CpusAccel hvf_cpus = { >> >> + .create_vcpu_thread = hvf_start_vcpu_thread, >> >> + >> >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >> >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, >> >> + .synchronize_state = hvf_cpu_synchronize_state, >> >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >> >> +}; >> >> + >> >> +static int hvf_accel_init(MachineState *ms) >> >> +{ >> >> + int x; >> >> + hv_return_t ret; >> >> + HVFState *s; >> >> + >> >> + ret = hv_vm_create(HV_VM_DEFAULT); >> >> + assert_hvf_ok(ret); >> >> + >> >> + s = g_new0(HVFState, 1); >> >> + >> >> + s->num_slots = 32; >> >> + for (x = 0; x < s->num_slots; ++x) { >> >> + s->slots[x].size = 0; >> >> + s->slots[x].slot_id = x; >> >> + } >> >> + >> >> + hvf_state = s; >> >> + memory_listener_register(&hvf_memory_listener, >> &address_space_memory); >> >> + cpus_register_accel(&hvf_cpus); >> >> + return 0; >> >> +} >> >> + >> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) >> >> +{ >> >> + AccelClass *ac = ACCEL_CLASS(oc); >> >> + ac->name = "HVF"; >> >> + ac->init_machine = hvf_accel_init; >> >> + ac->allowed = &hvf_allowed; >> >> +} >> >> + >> >> +static const TypeInfo hvf_accel_type = { >> >> + .name = TYPE_HVF_ACCEL, >> >> + .parent = TYPE_ACCEL, >> >> + .class_init = hvf_accel_class_init, >> >> +}; >> >> + >> >> +static void hvf_type_init(void) >> >> +{ >> >> + type_register_static(&hvf_accel_type); >> >> +} >> >> + >> >> +type_init(hvf_type_init); >> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build >> >> new file mode 100644 >> >> index 0000000000..dfd6b68dc7 >> >> --- /dev/null >> >> +++ b/accel/hvf/meson.build >> >> @@ -0,0 +1,7 @@ >> >> +hvf_ss = ss.source_set() >> >> +hvf_ss.add(files( >> >> + 'hvf-all.c', >> >> + 'hvf-cpus.c', >> >> +)) >> >> + >> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) >> >> diff --git a/accel/meson.build b/accel/meson.build >> >> index b26cca227a..6de12ce5d5 100644 >> >> --- a/accel/meson.build >> >> +++ b/accel/meson.build >> >> @@ -1,5 +1,6 @@ >> >> softmmu_ss.add(files('accel.c')) >> >> >> >> +subdir('hvf') >> >> subdir('qtest') >> >> subdir('kvm') >> >> subdir('tcg') >> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h >> >> new file mode 100644 >> >> index 0000000000..de9bad23a8 >> >> --- /dev/null >> >> +++ b/include/sysemu/hvf_int.h >> >> @@ -0,0 +1,69 @@ >> >> +/* >> >> + * QEMU Hypervisor.framework (HVF) support >> >> + * >> >> + * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> >> + * See the COPYING file in the top-level directory. >> >> + * >> >> + */ >> >> + >> >> +/* header to be included in HVF-specific code */ >> >> + >> >> +#ifndef HVF_INT_H >> >> +#define HVF_INT_H >> >> + >> >> +#include <Hypervisor/Hypervisor.h> >> >> + >> >> +#define HVF_MAX_VCPU 0x10 >> >> + >> >> +extern struct hvf_state hvf_global; >> >> + >> >> +struct hvf_vm { >> >> + int id; >> >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >> >> +}; >> >> + >> >> +struct hvf_state { >> >> + uint32_t version; >> >> + struct hvf_vm *vm; >> >> + uint64_t mem_quota; >> >> +}; >> >> + >> >> +/* hvf_slot flags */ >> >> +#define HVF_SLOT_LOG (1 << 0) >> >> + >> >> +typedef struct hvf_slot { >> >> + uint64_t start; >> >> + uint64_t size; >> >> + uint8_t *mem; >> >> + int slot_id; >> >> + uint32_t flags; >> >> + MemoryRegion *region; >> >> +} hvf_slot; >> >> + >> >> +typedef struct hvf_vcpu_caps { >> >> + uint64_t vmx_cap_pinbased; >> >> + uint64_t vmx_cap_procbased; >> >> + uint64_t vmx_cap_procbased2; >> >> + uint64_t vmx_cap_entry; >> >> + uint64_t vmx_cap_exit; >> >> + uint64_t vmx_cap_preemption_timer; >> >> +} hvf_vcpu_caps; >> >> + >> >> +struct HVFState { >> >> + AccelState parent; >> >> + hvf_slot slots[32]; >> >> + int num_slots; >> >> + >> >> + hvf_vcpu_caps *hvf_caps; >> >> +}; >> >> +extern HVFState *hvf_state; >> >> + >> >> +void assert_hvf_ok(hv_return_t ret); >> >> +int hvf_get_registers(CPUState *cpu); >> >> +int hvf_put_registers(CPUState *cpu); >> >> +int hvf_arch_init_vcpu(CPUState *cpu); >> >> +void hvf_arch_vcpu_destroy(CPUState *cpu); >> >> +int hvf_vcpu_exec(CPUState *cpu); >> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >> >> + >> >> +#endif >> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c >> >> deleted file mode 100644 >> >> index 817b3d7452..0000000000 >> >> --- a/target/i386/hvf/hvf-cpus.c >> >> +++ /dev/null >> >> @@ -1,131 +0,0 @@ >> >> -/* >> >> - * Copyright 2008 IBM Corporation >> >> - * 2008 Red Hat, Inc. >> >> - * Copyright 2011 Intel Corporation >> >> - * Copyright 2016 Veertu, Inc. >> >> - * Copyright 2017 The Android Open Source Project >> >> - * >> >> - * QEMU Hypervisor.framework support >> >> - * >> >> - * This program is free software; you can redistribute it and/or >> >> - * modify it under the terms of version 2 of the GNU General Public >> >> - * License as published by the Free Software Foundation. >> >> - * >> >> - * This program is distributed in the hope that it will be useful, >> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> >> - * General Public License for more details. >> >> - * >> >> - * You should have received a copy of the GNU General Public License >> >> - * along with this program; if not, see <http://www.gnu.org/licenses/ >> >. >> >> - * >> >> - * This file contain code under public domain from the hvdos project: >> >> - * https://github.com/mist64/hvdos >> >> - * >> >> - * Parts Copyright (c) 2011 NetApp, Inc. >> >> - * All rights reserved. >> >> - * >> >> - * Redistribution and use in source and binary forms, with or without >> >> - * modification, are permitted provided that the following conditions >> >> - * are met: >> >> - * 1. Redistributions of source code must retain the above copyright >> >> - * notice, this list of conditions and the following disclaimer. >> >> - * 2. Redistributions in binary form must reproduce the above >> copyright >> >> - * notice, this list of conditions and the following disclaimer in >> the >> >> - * documentation and/or other materials provided with the >> distribution. >> >> - * >> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >> THE >> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR >> PURPOSE >> >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE >> LIABLE >> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >> CONSEQUENTIAL >> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE >> GOODS >> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >> INTERRUPTION) >> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >> CONTRACT, STRICT >> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN >> ANY WAY >> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >> POSSIBILITY OF >> >> - * SUCH DAMAGE. >> >> - */ >> >> - >> >> -#include "qemu/osdep.h" >> >> -#include "qemu/error-report.h" >> >> -#include "qemu/main-loop.h" >> >> -#include "sysemu/hvf.h" >> >> -#include "sysemu/runstate.h" >> >> -#include "target/i386/cpu.h" >> >> -#include "qemu/guest-random.h" >> >> - >> >> -#include "hvf-cpus.h" >> >> - >> >> -/* >> >> - * The HVF-specific vCPU thread function. This one should only run >> when the host >> >> - * CPU supports the VMX "unrestricted guest" feature. >> >> - */ >> >> -static void *hvf_cpu_thread_fn(void *arg) >> >> -{ >> >> - CPUState *cpu = arg; >> >> - >> >> - int r; >> >> - >> >> - assert(hvf_enabled()); >> >> - >> >> - rcu_register_thread(); >> >> - >> >> - qemu_mutex_lock_iothread(); >> >> - qemu_thread_get_self(cpu->thread); >> >> - >> >> - cpu->thread_id = qemu_get_thread_id(); >> >> - cpu->can_do_io = 1; >> >> - current_cpu = cpu; >> >> - >> >> - hvf_init_vcpu(cpu); >> >> - >> >> - /* signal CPU creation */ >> >> - cpu_thread_signal_created(cpu); >> >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); >> >> - >> >> - do { >> >> - if (cpu_can_run(cpu)) { >> >> - r = hvf_vcpu_exec(cpu); >> >> - if (r == EXCP_DEBUG) { >> >> - cpu_handle_guest_debug(cpu); >> >> - } >> >> - } >> >> - qemu_wait_io_event(cpu); >> >> - } while (!cpu->unplug || cpu_can_run(cpu)); >> >> - >> >> - hvf_vcpu_destroy(cpu); >> >> - cpu_thread_signal_destroyed(cpu); >> >> - qemu_mutex_unlock_iothread(); >> >> - rcu_unregister_thread(); >> >> - return NULL; >> >> -} >> >> - >> >> -static void hvf_start_vcpu_thread(CPUState *cpu) >> >> -{ >> >> - char thread_name[VCPU_THREAD_NAME_SIZE]; >> >> - >> >> - /* >> >> - * HVF currently does not support TCG, and only runs in >> >> - * unrestricted-guest mode. >> >> - */ >> >> - assert(hvf_enabled()); >> >> - >> >> - cpu->thread = g_malloc0(sizeof(QemuThread)); >> >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >> >> - qemu_cond_init(cpu->halt_cond); >> >> - >> >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >> >> - cpu->cpu_index); >> >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >> >> - cpu, QEMU_THREAD_JOINABLE); >> >> -} >> >> - >> >> -const CpusAccel hvf_cpus = { >> >> - .create_vcpu_thread = hvf_start_vcpu_thread, >> >> - >> >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >> >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, >> >> - .synchronize_state = hvf_cpu_synchronize_state, >> >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >> >> -}; >> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h >> >> deleted file mode 100644 >> >> index ced31b82c0..0000000000 >> >> --- a/target/i386/hvf/hvf-cpus.h >> >> +++ /dev/null >> >> @@ -1,25 +0,0 @@ >> >> -/* >> >> - * Accelerator CPUS Interface >> >> - * >> >> - * Copyright 2020 SUSE LLC >> >> - * >> >> - * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> >> - * See the COPYING file in the top-level directory. >> >> - */ >> >> - >> >> -#ifndef HVF_CPUS_H >> >> -#define HVF_CPUS_H >> >> - >> >> -#include "sysemu/cpus.h" >> >> - >> >> -extern const CpusAccel hvf_cpus; >> >> - >> >> -int hvf_init_vcpu(CPUState *); >> >> -int hvf_vcpu_exec(CPUState *); >> >> -void hvf_cpu_synchronize_state(CPUState *); >> >> -void hvf_cpu_synchronize_post_reset(CPUState *); >> >> -void hvf_cpu_synchronize_post_init(CPUState *); >> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); >> >> -void hvf_vcpu_destroy(CPUState *); >> >> - >> >> -#endif /* HVF_CPUS_H */ >> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h >> >> index e0edffd077..6d56f8f6bb 100644 >> >> --- a/target/i386/hvf/hvf-i386.h >> >> +++ b/target/i386/hvf/hvf-i386.h >> >> @@ -18,57 +18,11 @@ >> >> >> >> #include "sysemu/accel.h" >> >> #include "sysemu/hvf.h" >> >> +#include "sysemu/hvf_int.h" >> >> #include "cpu.h" >> >> #include "x86.h" >> >> >> >> -#define HVF_MAX_VCPU 0x10 >> >> - >> >> -extern struct hvf_state hvf_global; >> >> - >> >> -struct hvf_vm { >> >> - int id; >> >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >> >> -}; >> >> - >> >> -struct hvf_state { >> >> - uint32_t version; >> >> - struct hvf_vm *vm; >> >> - uint64_t mem_quota; >> >> -}; >> >> - >> >> -/* hvf_slot flags */ >> >> -#define HVF_SLOT_LOG (1 << 0) >> >> - >> >> -typedef struct hvf_slot { >> >> - uint64_t start; >> >> - uint64_t size; >> >> - uint8_t *mem; >> >> - int slot_id; >> >> - uint32_t flags; >> >> - MemoryRegion *region; >> >> -} hvf_slot; >> >> - >> >> -typedef struct hvf_vcpu_caps { >> >> - uint64_t vmx_cap_pinbased; >> >> - uint64_t vmx_cap_procbased; >> >> - uint64_t vmx_cap_procbased2; >> >> - uint64_t vmx_cap_entry; >> >> - uint64_t vmx_cap_exit; >> >> - uint64_t vmx_cap_preemption_timer; >> >> -} hvf_vcpu_caps; >> >> - >> >> -struct HVFState { >> >> - AccelState parent; >> >> - hvf_slot slots[32]; >> >> - int num_slots; >> >> - >> >> - hvf_vcpu_caps *hvf_caps; >> >> -}; >> >> -extern HVFState *hvf_state; >> >> - >> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); >> >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); >> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >> >> >> >> #ifdef NEED_CPU_H >> >> /* Functions exported to host specific mode */ >> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c >> >> index ed9356565c..8b96ecd619 100644 >> >> --- a/target/i386/hvf/hvf.c >> >> +++ b/target/i386/hvf/hvf.c >> >> @@ -51,6 +51,7 @@ >> >> #include "qemu/error-report.h" >> >> >> >> #include "sysemu/hvf.h" >> >> +#include "sysemu/hvf_int.h" >> >> #include "sysemu/runstate.h" >> >> #include "hvf-i386.h" >> >> #include "vmcs.h" >> >> @@ -72,171 +73,6 @@ >> >> #include "sysemu/accel.h" >> >> #include "target/i386/cpu.h" >> >> >> >> -#include "hvf-cpus.h" >> >> - >> >> -HVFState *hvf_state; >> >> - >> >> -static void assert_hvf_ok(hv_return_t ret) >> >> -{ >> >> - if (ret == HV_SUCCESS) { >> >> - return; >> >> - } >> >> - >> >> - switch (ret) { >> >> - case HV_ERROR: >> >> - error_report("Error: HV_ERROR"); >> >> - break; >> >> - case HV_BUSY: >> >> - error_report("Error: HV_BUSY"); >> >> - break; >> >> - case HV_BAD_ARGUMENT: >> >> - error_report("Error: HV_BAD_ARGUMENT"); >> >> - break; >> >> - case HV_NO_RESOURCES: >> >> - error_report("Error: HV_NO_RESOURCES"); >> >> - break; >> >> - case HV_NO_DEVICE: >> >> - error_report("Error: HV_NO_DEVICE"); >> >> - break; >> >> - case HV_UNSUPPORTED: >> >> - error_report("Error: HV_UNSUPPORTED"); >> >> - break; >> >> - default: >> >> - error_report("Unknown Error"); >> >> - } >> >> - >> >> - abort(); >> >> -} >> >> - >> >> -/* Memory slots */ >> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >> >> -{ >> >> - hvf_slot *slot; >> >> - int x; >> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >> >> - slot = &hvf_state->slots[x]; >> >> - if (slot->size && start < (slot->start + slot->size) && >> >> - (start + size) > slot->start) { >> >> - return slot; >> >> - } >> >> - } >> >> - return NULL; >> >> -} >> >> - >> >> -struct mac_slot { >> >> - int present; >> >> - uint64_t size; >> >> - uint64_t gpa_start; >> >> - uint64_t gva; >> >> -}; >> >> - >> >> -struct mac_slot mac_slots[32]; >> >> - >> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >> >> -{ >> >> - struct mac_slot *macslot; >> >> - hv_return_t ret; >> >> - >> >> - macslot = &mac_slots[slot->slot_id]; >> >> - >> >> - if (macslot->present) { >> >> - if (macslot->size != slot->size) { >> >> - macslot->present = 0; >> >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >> >> - assert_hvf_ok(ret); >> >> - } >> >> - } >> >> - >> >> - if (!slot->size) { >> >> - return 0; >> >> - } >> >> - >> >> - macslot->present = 1; >> >> - macslot->gpa_start = slot->start; >> >> - macslot->size = slot->size; >> >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, >> flags); >> >> - assert_hvf_ok(ret); >> >> - return 0; >> >> -} >> >> - >> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >> >> -{ >> >> - hvf_slot *mem; >> >> - MemoryRegion *area = section->mr; >> >> - bool writeable = !area->readonly && !area->rom_device; >> >> - hv_memory_flags_t flags; >> >> - >> >> - if (!memory_region_is_ram(area)) { >> >> - if (writeable) { >> >> - return; >> >> - } else if (!memory_region_is_romd(area)) { >> >> - /* >> >> - * If the memory device is not in romd_mode, then we >> actually want >> >> - * to remove the hvf memory slot so all accesses will >> trap. >> >> - */ >> >> - add = false; >> >> - } >> >> - } >> >> - >> >> - mem = hvf_find_overlap_slot( >> >> - section->offset_within_address_space, >> >> - int128_get64(section->size)); >> >> - >> >> - if (mem && add) { >> >> - if (mem->size == int128_get64(section->size) && >> >> - mem->start == section->offset_within_address_space && >> >> - mem->mem == (memory_region_get_ram_ptr(area) + >> >> - section->offset_within_region)) { >> >> - return; /* Same region was attempted to register, go >> away. */ >> >> - } >> >> - } >> >> - >> >> - /* Region needs to be reset. set the size to 0 and remap it. */ >> >> - if (mem) { >> >> - mem->size = 0; >> >> - if (do_hvf_set_memory(mem, 0)) { >> >> - error_report("Failed to reset overlapping slot"); >> >> - abort(); >> >> - } >> >> - } >> >> - >> >> - if (!add) { >> >> - return; >> >> - } >> >> - >> >> - if (area->readonly || >> >> - (!memory_region_is_ram(area) && memory_region_is_romd(area))) >> { >> >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >> >> - } else { >> >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >> >> - } >> >> - >> >> - /* Now make a new slot. */ >> >> - int x; >> >> - >> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >> >> - mem = &hvf_state->slots[x]; >> >> - if (!mem->size) { >> >> - break; >> >> - } >> >> - } >> >> - >> >> - if (x == hvf_state->num_slots) { >> >> - error_report("No free slots"); >> >> - abort(); >> >> - } >> >> - >> >> - mem->size = int128_get64(section->size); >> >> - mem->mem = memory_region_get_ram_ptr(area) + >> section->offset_within_region; >> >> - mem->start = section->offset_within_address_space; >> >> - mem->region = area; >> >> - >> >> - if (do_hvf_set_memory(mem, flags)) { >> >> - error_report("Error registering new memory slot"); >> >> - abort(); >> >> - } >> >> -} >> >> - >> >> void vmx_update_tpr(CPUState *cpu) >> >> { >> >> /* TODO: need integrate APIC handling */ >> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t >> port, void *buffer, >> >> } >> >> } >> >> >> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, >> run_on_cpu_data arg) >> >> -{ >> >> - if (!cpu->vcpu_dirty) { >> >> - hvf_get_registers(cpu); >> >> - cpu->vcpu_dirty = true; >> >> - } >> >> -} >> >> - >> >> -void hvf_cpu_synchronize_state(CPUState *cpu) >> >> -{ >> >> - if (!cpu->vcpu_dirty) { >> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, >> RUN_ON_CPU_NULL); >> >> - } >> >> -} >> >> - >> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >> >> - run_on_cpu_data arg) >> >> -{ >> >> - hvf_put_registers(cpu); >> >> - cpu->vcpu_dirty = false; >> >> -} >> >> - >> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) >> >> -{ >> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, >> RUN_ON_CPU_NULL); >> >> -} >> >> - >> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >> >> - run_on_cpu_data arg) >> >> -{ >> >> - hvf_put_registers(cpu); >> >> - cpu->vcpu_dirty = false; >> >> -} >> >> - >> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) >> >> -{ >> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, >> RUN_ON_CPU_NULL); >> >> -} >> >> - >> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >> >> - run_on_cpu_data arg) >> >> -{ >> >> - cpu->vcpu_dirty = true; >> >> -} >> >> - >> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >> >> -{ >> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, >> RUN_ON_CPU_NULL); >> >> -} >> >> - >> >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, >> uint64_t ept_qual) >> >> { >> >> int read, write; >> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, >> uint64_t gpa, uint64_t ept_qual) >> >> return false; >> >> } >> >> >> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool >> on) >> >> -{ >> >> - hvf_slot *slot; >> >> - >> >> - slot = hvf_find_overlap_slot( >> >> - section->offset_within_address_space, >> >> - int128_get64(section->size)); >> >> - >> >> - /* protect region against writes; begin tracking it */ >> >> - if (on) { >> >> - slot->flags |= HVF_SLOT_LOG; >> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >> >> - HV_MEMORY_READ); >> >> - /* stop tracking region*/ >> >> - } else { >> >> - slot->flags &= ~HVF_SLOT_LOG; >> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >> >> - HV_MEMORY_READ | HV_MEMORY_WRITE); >> >> - } >> >> -} >> >> - >> >> -static void hvf_log_start(MemoryListener *listener, >> >> - MemoryRegionSection *section, int old, int >> new) >> >> -{ >> >> - if (old != 0) { >> >> - return; >> >> - } >> >> - >> >> - hvf_set_dirty_tracking(section, 1); >> >> -} >> >> - >> >> -static void hvf_log_stop(MemoryListener *listener, >> >> - MemoryRegionSection *section, int old, int >> new) >> >> -{ >> >> - if (new != 0) { >> >> - return; >> >> - } >> >> - >> >> - hvf_set_dirty_tracking(section, 0); >> >> -} >> >> - >> >> -static void hvf_log_sync(MemoryListener *listener, >> >> - MemoryRegionSection *section) >> >> -{ >> >> - /* >> >> - * sync of dirty pages is handled elsewhere; just make sure we >> keep >> >> - * tracking the region. >> >> - */ >> >> - hvf_set_dirty_tracking(section, 1); >> >> -} >> >> - >> >> -static void hvf_region_add(MemoryListener *listener, >> >> - MemoryRegionSection *section) >> >> -{ >> >> - hvf_set_phys_mem(section, true); >> >> -} >> >> - >> >> -static void hvf_region_del(MemoryListener *listener, >> >> - MemoryRegionSection *section) >> >> -{ >> >> - hvf_set_phys_mem(section, false); >> >> -} >> >> - >> >> -static MemoryListener hvf_memory_listener = { >> >> - .priority = 10, >> >> - .region_add = hvf_region_add, >> >> - .region_del = hvf_region_del, >> >> - .log_start = hvf_log_start, >> >> - .log_stop = hvf_log_stop, >> >> - .log_sync = hvf_log_sync, >> >> -}; >> >> - >> >> -void hvf_vcpu_destroy(CPUState *cpu) >> >> +void hvf_arch_vcpu_destroy(CPUState *cpu) >> >> { >> >> X86CPU *x86_cpu = X86_CPU(cpu); >> >> CPUX86State *env = &x86_cpu->env; >> >> >> >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); >> >> g_free(env->hvf_mmio_buf); >> >> - assert_hvf_ok(ret); >> >> -} >> >> - >> >> -static void dummy_signal(int sig) >> >> -{ >> >> } >> >> >> >> -int hvf_init_vcpu(CPUState *cpu) >> >> +int hvf_arch_init_vcpu(CPUState *cpu) >> >> { >> >> >> >> X86CPU *x86cpu = X86_CPU(cpu); >> >> CPUX86State *env = &x86cpu->env; >> >> - int r; >> >> - >> >> - /* init cpu signals */ >> >> - sigset_t set; >> >> - struct sigaction sigact; >> >> - >> >> - memset(&sigact, 0, sizeof(sigact)); >> >> - sigact.sa_handler = dummy_signal; >> >> - sigaction(SIG_IPI, &sigact, NULL); >> >> - >> >> - pthread_sigmask(SIG_BLOCK, NULL, &set); >> >> - sigdelset(&set, SIG_IPI); >> >> >> >> init_emu(); >> >> init_decoder(); >> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) >> >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); >> >> env->hvf_mmio_buf = g_new(char, 4096); >> >> >> >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >> >> - cpu->vcpu_dirty = 1; >> >> - assert_hvf_ok(r); >> >> - >> >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, >> >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { >> >> abort(); >> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) >> >> >> >> return ret; >> >> } >> >> - >> >> -bool hvf_allowed; >> >> - >> >> -static int hvf_accel_init(MachineState *ms) >> >> -{ >> >> - int x; >> >> - hv_return_t ret; >> >> - HVFState *s; >> >> - >> >> - ret = hv_vm_create(HV_VM_DEFAULT); >> >> - assert_hvf_ok(ret); >> >> - >> >> - s = g_new0(HVFState, 1); >> >> - >> >> - s->num_slots = 32; >> >> - for (x = 0; x < s->num_slots; ++x) { >> >> - s->slots[x].size = 0; >> >> - s->slots[x].slot_id = x; >> >> - } >> >> - >> >> - hvf_state = s; >> >> - memory_listener_register(&hvf_memory_listener, >> &address_space_memory); >> >> - cpus_register_accel(&hvf_cpus); >> >> - return 0; >> >> -} >> >> - >> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) >> >> -{ >> >> - AccelClass *ac = ACCEL_CLASS(oc); >> >> - ac->name = "HVF"; >> >> - ac->init_machine = hvf_accel_init; >> >> - ac->allowed = &hvf_allowed; >> >> -} >> >> - >> >> -static const TypeInfo hvf_accel_type = { >> >> - .name = TYPE_HVF_ACCEL, >> >> - .parent = TYPE_ACCEL, >> >> - .class_init = hvf_accel_class_init, >> >> -}; >> >> - >> >> -static void hvf_type_init(void) >> >> -{ >> >> - type_register_static(&hvf_accel_type); >> >> -} >> >> - >> >> -type_init(hvf_type_init); >> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build >> >> index 409c9a3f14..c8a43717ee 100644 >> >> --- a/target/i386/hvf/meson.build >> >> +++ b/target/i386/hvf/meson.build >> >> @@ -1,6 +1,5 @@ >> >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( >> >> 'hvf.c', >> >> - 'hvf-cpus.c', >> >> 'x86.c', >> >> 'x86_cpuid.c', >> >> 'x86_decode.c', >> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c >> >> index bbec412b6c..89b8e9d87a 100644 >> >> --- a/target/i386/hvf/x86hvf.c >> >> +++ b/target/i386/hvf/x86hvf.c >> >> @@ -20,6 +20,9 @@ >> >> #include "qemu/osdep.h" >> >> >> >> #include "qemu-common.h" >> >> +#include "sysemu/hvf.h" >> >> +#include "sysemu/hvf_int.h" >> >> +#include "sysemu/hw_accel.h" >> >> #include "x86hvf.h" >> >> #include "vmx.h" >> >> #include "vmcs.h" >> >> @@ -32,8 +35,6 @@ >> >> #include <Hypervisor/hv.h> >> >> #include <Hypervisor/hv_vmx.h> >> >> >> >> -#include "hvf-cpus.h" >> >> - >> >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment >> *vmx_seg, >> >> SegmentCache *qseg, bool is_tr) >> >> { >> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) >> >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); >> >> >> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { >> >> - hvf_cpu_synchronize_state(cpu_state); >> >> + cpu_synchronize_state(cpu_state); >> >> do_cpu_init(cpu); >> >> } >> >> >> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) >> >> cpu_state->halted = 0; >> >> } >> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { >> >> - hvf_cpu_synchronize_state(cpu_state); >> >> + cpu_synchronize_state(cpu_state); >> >> do_cpu_sipi(cpu); >> >> } >> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { >> >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; >> >> - hvf_cpu_synchronize_state(cpu_state); >> >> + cpu_synchronize_state(cpu_state); >> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should >> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the >> > summer. >> >> >> The only reason they're in here is because we no longer have access to >> the hvf_ functions from the file. I am perfectly happy to rebase the >> patch on top of Claudio's if his goes in first. I'm sure it'll be >> trivial for him to rebase on top of this too if my series goes in first. >> >> >> > >> > Phillipe raised the idea that the patch might go ahead of ARM-specific >> > part (which might involve some discussions) and I agree with that. >> > >> > Some sync between Claudio series (CC'd him) and the patch might be need. >> >> >> I would prefer not to hold back because of the sync. Claudio's cleanup >> is trivial enough to adjust for if it gets merged ahead of this. >> >> >> Alex >> >> >> >>
Hi Frank, Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. Also, is there a particular reason you're working on this super interesting and useful code in a random downstream fork of QEMU? Wouldn't it be more helpful to contribute to the upstream code base instead? Alex On 30.11.20 21:15, Frank Yang wrote: > Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. > But the high CPU usage seems to be mitigated by having a poll interval > (like KVM does) in handling WFI: > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512501> > > This is loosely inspired by > https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 > <https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766> > which does seem to specify a poll interval. > > It would be cool if we could have a lightweight way to enter sleep and > restart the vcpus precisely when CVAL passes, though. > > Frank > > > On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com > <mailto:lfy@google.com>> wrote: > > Hi all, > > +Peter Collingbourne <mailto:pcc@google.com> > > I'm a developer on the Android Emulator, which is in a fork of QEMU. > > Peter and I have been working on an HVF Apple Silicon backend with > an eye toward Android guests. > > We have gotten things to basically switch to Android userspace > already (logcat/shell and graphics available at least) > > Our strategy so far has been to import logic from the KVM > implementation and hook into QEMU's software devices > that previously assumed to only work with TCG, or have > KVM-specific paths. > > Thanks to Alexander for the tip on the 36-bit address space > limitation btw; our way of addressing this is to still allow > highmem but not put pci high mmio so high. > > Also, note we have a sleep/signal based mechanism to deal with > WFx, which might be worth looking into in Alexander's > implementation as well: > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512551> > > Patches so far, FYI: > > https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1> > https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3> > https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3> > https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3> > https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 > <https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3> > > https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a > <https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a> > https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b > <https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b> > https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 > <https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01> > https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 > <https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228> > https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 > <https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102> > https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 > <https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6> > > Peter's also noticed that there are extra steps needed for M1's to > allow TCG to work, as it involves JIT: > > https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 > <https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9> > > We'd appreciate any feedback/comments :) > > Best, > > Frank > > On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de > <mailto:agraf@csgraf.de>> wrote: > > > On 27.11.20 21:00, Roman Bolshakov wrote: > > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: > >> Until now, Hypervisor.framework has only been available on > x86_64 systems. > >> With Apple Silicon shipping now, it extends its reach to > aarch64. To > >> prepare for support for multiple architectures, let's move > common code out > >> into its own accel directory. > >> > >> Signed-off-by: Alexander Graf <agraf@csgraf.de > <mailto:agraf@csgraf.de>> > >> --- > >>  MAINTAINERS         |  9 +- > >>  accel/hvf/hvf-all.c     | 56 +++++ > >>  accel/hvf/hvf-cpus.c    | 468 > ++++++++++++++++++++++++++++++++++++ > >>  accel/hvf/meson.build    |  7 + > >>  accel/meson.build      |  1 + > >>  include/sysemu/hvf_int.h  | 69 ++++++ > >>  target/i386/hvf/hvf-cpus.c | 131 ---------- > >>  target/i386/hvf/hvf-cpus.h | 25 -- > >>  target/i386/hvf/hvf-i386.h | 48 +--- > >>  target/i386/hvf/hvf.c    | 360 +-------------------------- > >>  target/i386/hvf/meson.build |  1 - > >>  target/i386/hvf/x86hvf.c  | 11 +- > >>  target/i386/hvf/x86hvf.h  |  2 - > >>  13 files changed, 619 insertions(+), 569 deletions(-) > >>  create mode 100644 accel/hvf/hvf-all.c > >>  create mode 100644 accel/hvf/hvf-cpus.c > >>  create mode 100644 accel/hvf/meson.build > >>  create mode 100644 include/sysemu/hvf_int.h > >>  delete mode 100644 target/i386/hvf/hvf-cpus.c > >>  delete mode 100644 target/i386/hvf/hvf-cpus.h > >> > >> diff --git a/MAINTAINERS b/MAINTAINERS > >> index 68bc160f41..ca4b6d9279 100644 > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com > <mailto:dirty@apple.com>> > >>  M: Roman Bolshakov <r.bolshakov@yadro.com > <mailto:r.bolshakov@yadro.com>> > >>  W: https://wiki.qemu.org/Features/HVF > <https://wiki.qemu.org/Features/HVF> > >>  S: Maintained > >> -F: accel/stubs/hvf-stub.c > > There was a patch for that in the RFC series from Claudio. > > > Yeah, I'm not worried about this hunk :). > > > > > >>  F: target/i386/hvf/ > >> + > >> +HVF > >> +M: Cameron Esfahani <dirty@apple.com <mailto:dirty@apple.com>> > >> +M: Roman Bolshakov <r.bolshakov@yadro.com > <mailto:r.bolshakov@yadro.com>> > >> +W: https://wiki.qemu.org/Features/HVF > <https://wiki.qemu.org/Features/HVF> > >> +S: Maintained > >> +F: accel/hvf/ > >>  F: include/sysemu/hvf.h > >> +F: include/sysemu/hvf_int.h > >> > >>  WHPX CPUs > >>  M: Sunil Muthuswamy <sunilmut@microsoft.com > <mailto:sunilmut@microsoft.com>> > >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c > >> new file mode 100644 > >> index 0000000000..47d77a472a > >> --- /dev/null > >> +++ b/accel/hvf/hvf-all.c > >> @@ -0,0 +1,56 @@ > >> +/* > >> + * QEMU Hypervisor.framework support > >> + * > >> + * This work is licensed under the terms of the GNU GPL, > version 2. See > >> + * the COPYING file in the top-level directory. > >> + * > >> + * Contributions after 2012-01-13 are licensed under the > terms of the > >> + * GNU GPL, version 2 or (at your option) any later version. > >> + */ > >> + > >> +#include "qemu/osdep.h" > >> +#include "qemu-common.h" > >> +#include "qemu/error-report.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/runstate.h" > >> + > >> +#include "qemu/main-loop.h" > >> +#include "sysemu/accel.h" > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +bool hvf_allowed; > >> +HVFState *hvf_state; > >> + > >> +void assert_hvf_ok(hv_return_t ret) > >> +{ > >> +  if (ret == HV_SUCCESS) { > >> +    return; > >> +  } > >> + > >> +  switch (ret) { > >> +  case HV_ERROR: > >> +    error_report("Error: HV_ERROR"); > >> +    break; > >> +  case HV_BUSY: > >> +    error_report("Error: HV_BUSY"); > >> +    break; > >> +  case HV_BAD_ARGUMENT: > >> +    error_report("Error: HV_BAD_ARGUMENT"); > >> +    break; > >> +  case HV_NO_RESOURCES: > >> +    error_report("Error: HV_NO_RESOURCES"); > >> +    break; > >> +  case HV_NO_DEVICE: > >> +    error_report("Error: HV_NO_DEVICE"); > >> +    break; > >> +  case HV_UNSUPPORTED: > >> +    error_report("Error: HV_UNSUPPORTED"); > >> +    break; > >> +  default: > >> +    error_report("Unknown Error"); > >> +  } > >> + > >> +  abort(); > >> +} > >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c > >> new file mode 100644 > >> index 0000000000..f9bb5502b7 > >> --- /dev/null > >> +++ b/accel/hvf/hvf-cpus.c > >> @@ -0,0 +1,468 @@ > >> +/* > >> + * Copyright 2008 IBM Corporation > >> + *      2008 Red Hat, Inc. > >> + * Copyright 2011 Intel Corporation > >> + * Copyright 2016 Veertu, Inc. > >> + * Copyright 2017 The Android Open Source Project > >> + * > >> + * QEMU Hypervisor.framework support > >> + * > >> + * This program is free software; you can redistribute it > and/or > >> + * modify it under the terms of version 2 of the GNU > General Public > >> + * License as published by the Free Software Foundation. > >> + * > >> + * This program is distributed in the hope that it will be > useful, > >> + * but WITHOUT ANY WARRANTY; without even the implied > warranty of > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > See the GNU > >> + * General Public License for more details. > >> + * > >> + * You should have received a copy of the GNU General > Public License > >> + * along with this program; if not, see > <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>. > >> + * > >> + * This file contain code under public domain from the > hvdos project: > >> + * https://github.com/mist64/hvdos > <https://github.com/mist64/hvdos> > >> + * > >> + * Parts Copyright (c) 2011 NetApp, Inc. > >> + * All rights reserved. > >> + * > >> + * Redistribution and use in source and binary forms, with > or without > >> + * modification, are permitted provided that the following > conditions > >> + * are met: > >> + * 1. Redistributions of source code must retain the above > copyright > >> + *  notice, this list of conditions and the following > disclaimer. > >> + * 2. Redistributions in binary form must reproduce the > above copyright > >> + *  notice, this list of conditions and the following > disclaimer in the > >> + *  documentation and/or other materials provided with > the distribution. > >> + * > >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > LIMITED TO, THE > >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR PURPOSE > >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR > CONTRIBUTORS BE LIABLE > >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > EXEMPLARY, OR CONSEQUENTIAL > >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > SUBSTITUTE GOODS > >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER > IN CONTRACT, STRICT > >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > ARISING IN ANY WAY > >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > >> + * SUCH DAMAGE. > >> + */ > >> + > >> +#include "qemu/osdep.h" > >> +#include "qemu/error-report.h" > >> +#include "qemu/main-loop.h" > >> +#include "exec/address-spaces.h" > >> +#include "exec/exec-all.h" > >> +#include "sysemu/cpus.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/runstate.h" > >> +#include "qemu/guest-random.h" > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +/* Memory slots */ > >> + > >> +struct mac_slot { > >> +  int present; > >> +  uint64_t size; > >> +  uint64_t gpa_start; > >> +  uint64_t gva; > >> +}; > >> + > >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > >> +{ > >> +  hvf_slot *slot; > >> +  int x; > >> +  for (x = 0; x < hvf_state->num_slots; ++x) { > >> +    slot = &hvf_state->slots[x]; > >> +    if (slot->size && start < (slot->start + > slot->size) && > >> +      (start + size) > slot->start) { > >> +      return slot; > >> +    } > >> +  } > >> +  return NULL; > >> +} > >> + > >> +struct mac_slot mac_slots[32]; > >> + > >> +static int do_hvf_set_memory(hvf_slot *slot, > hv_memory_flags_t flags) > >> +{ > >> +  struct mac_slot *macslot; > >> +  hv_return_t ret; > >> + > >> +  macslot = &mac_slots[slot->slot_id]; > >> + > >> +  if (macslot->present) { > >> +    if (macslot->size != slot->size) { > >> +      macslot->present = 0; > >> +      ret = hv_vm_unmap(macslot->gpa_start, > macslot->size); > >> +      assert_hvf_ok(ret); > >> +    } > >> +  } > >> + > >> +  if (!slot->size) { > >> +    return 0; > >> +  } > >> + > >> +  macslot->present = 1; > >> +  macslot->gpa_start = slot->start; > >> +  macslot->size = slot->size; > >> +  ret = hv_vm_map(slot->mem, slot->start, slot->size, > flags); > >> +  assert_hvf_ok(ret); > >> +  return 0; > >> +} > >> + > >> +static void hvf_set_phys_mem(MemoryRegionSection *section, > bool add) > >> +{ > >> +  hvf_slot *mem; > >> +  MemoryRegion *area = section->mr; > >> +  bool writeable = !area->readonly && !area->rom_device; > >> +  hv_memory_flags_t flags; > >> + > >> +  if (!memory_region_is_ram(area)) { > >> +    if (writeable) { > >> +      return; > >> +    } else if (!memory_region_is_romd(area)) { > >> +      /* > >> +       * If the memory device is not in romd_mode, > then we actually want > >> +       * to remove the hvf memory slot so all > accesses will trap. > >> +       */ > >> +       add = false; > >> +    } > >> +  } > >> + > >> +  mem = hvf_find_overlap_slot( > >> + section->offset_within_address_space, > >> +      int128_get64(section->size)); > >> + > >> +  if (mem && add) { > >> +    if (mem->size == int128_get64(section->size) && > >> +      mem->start == > section->offset_within_address_space && > >> +      mem->mem == (memory_region_get_ram_ptr(area) + > >> +      section->offset_within_region)) { > >> +      return; /* Same region was attempted to > register, go away. */ > >> +    } > >> +  } > >> + > >> +  /* Region needs to be reset. set the size to 0 and > remap it. */ > >> +  if (mem) { > >> +    mem->size = 0; > >> +    if (do_hvf_set_memory(mem, 0)) { > >> +      error_report("Failed to reset overlapping slot"); > >> +      abort(); > >> +    } > >> +  } > >> + > >> +  if (!add) { > >> +    return; > >> +  } > >> + > >> +  if (area->readonly || > >> +    (!memory_region_is_ram(area) && > memory_region_is_romd(area))) { > >> +    flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > >> +  } else { > >> +    flags = HV_MEMORY_READ | HV_MEMORY_WRITE | > HV_MEMORY_EXEC; > >> +  } > >> + > >> +  /* Now make a new slot. */ > >> +  int x; > >> + > >> +  for (x = 0; x < hvf_state->num_slots; ++x) { > >> +    mem = &hvf_state->slots[x]; > >> +    if (!mem->size) { > >> +      break; > >> +    } > >> +  } > >> + > >> +  if (x == hvf_state->num_slots) { > >> +    error_report("No free slots"); > >> +    abort(); > >> +  } > >> + > >> +  mem->size = int128_get64(section->size); > >> +  mem->mem = memory_region_get_ram_ptr(area) + > section->offset_within_region; > >> +  mem->start = section->offset_within_address_space; > >> +  mem->region = area; > >> + > >> +  if (do_hvf_set_memory(mem, flags)) { > >> +    error_report("Error registering new memory slot"); > >> +    abort(); > >> +  } > >> +} > >> + > >> +static void hvf_set_dirty_tracking(MemoryRegionSection > *section, bool on) > >> +{ > >> +  hvf_slot *slot; > >> + > >> +  slot = hvf_find_overlap_slot( > >> + section->offset_within_address_space, > >> +      int128_get64(section->size)); > >> + > >> +  /* protect region against writes; begin tracking it */ > >> +  if (on) { > >> +    slot->flags |= HVF_SLOT_LOG; > >> +    hv_vm_protect((uintptr_t)slot->start, > (size_t)slot->size, > >> +           HV_MEMORY_READ); > >> +  /* stop tracking region*/ > >> +  } else { > >> +    slot->flags &= ~HVF_SLOT_LOG; > >> +    hv_vm_protect((uintptr_t)slot->start, > (size_t)slot->size, > >> +           HV_MEMORY_READ | HV_MEMORY_WRITE); > >> +  } > >> +} > >> + > >> +static void hvf_log_start(MemoryListener *listener, > >> +             MemoryRegionSection *section, > int old, int new) > >> +{ > >> +  if (old != 0) { > >> +    return; > >> +  } > >> + > >> +  hvf_set_dirty_tracking(section, 1); > >> +} > >> + > >> +static void hvf_log_stop(MemoryListener *listener, > >> +             MemoryRegionSection *section, int > old, int new) > >> +{ > >> +  if (new != 0) { > >> +    return; > >> +  } > >> + > >> +  hvf_set_dirty_tracking(section, 0); > >> +} > >> + > >> +static void hvf_log_sync(MemoryListener *listener, > >> +             MemoryRegionSection *section) > >> +{ > >> +  /* > >> +   * sync of dirty pages is handled elsewhere; just make > sure we keep > >> +   * tracking the region. > >> +   */ > >> +  hvf_set_dirty_tracking(section, 1); > >> +} > >> + > >> +static void hvf_region_add(MemoryListener *listener, > >> +              MemoryRegionSection *section) > >> +{ > >> +  hvf_set_phys_mem(section, true); > >> +} > >> + > >> +static void hvf_region_del(MemoryListener *listener, > >> +              MemoryRegionSection *section) > >> +{ > >> +  hvf_set_phys_mem(section, false); > >> +} > >> + > >> +static MemoryListener hvf_memory_listener = { > >> +  .priority = 10, > >> +  .region_add = hvf_region_add, > >> +  .region_del = hvf_region_del, > >> +  .log_start = hvf_log_start, > >> +  .log_stop = hvf_log_stop, > >> +  .log_sync = hvf_log_sync, > >> +}; > >> + > >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, > run_on_cpu_data arg) > >> +{ > >> +  if (!cpu->vcpu_dirty) { > >> +    hvf_get_registers(cpu); > >> +    cpu->vcpu_dirty = true; > >> +  } > >> +} > >> + > >> +static void hvf_cpu_synchronize_state(CPUState *cpu) > >> +{ > >> +  if (!cpu->vcpu_dirty) { > >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_state, > RUN_ON_CPU_NULL); > >> +  } > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > >> + run_on_cpu_data arg) > >> +{ > >> +  hvf_put_registers(cpu); > >> +  cpu->vcpu_dirty = false; > >> +} > >> + > >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) > >> +{ > >> +  run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, > RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > >> +  run_on_cpu_data arg) > >> +{ > >> +  hvf_put_registers(cpu); > >> +  cpu->vcpu_dirty = false; > >> +} > >> + > >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) > >> +{ > >> +  run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, > RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > >> + run_on_cpu_data arg) > >> +{ > >> +  cpu->vcpu_dirty = true; > >> +} > >> + > >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > >> +{ > >> +  run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, > RUN_ON_CPU_NULL); > >> +} > >> + > >> +static void hvf_vcpu_destroy(CPUState *cpu) > >> +{ > >> +  hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); > >> +  assert_hvf_ok(ret); > >> + > >> +  hvf_arch_vcpu_destroy(cpu); > >> +} > >> + > >> +static void dummy_signal(int sig) > >> +{ > >> +} > >> + > >> +static int hvf_init_vcpu(CPUState *cpu) > >> +{ > >> +  int r; > >> + > >> +  /* init cpu signals */ > >> +  sigset_t set; > >> +  struct sigaction sigact; > >> + > >> +  memset(&sigact, 0, sizeof(sigact)); > >> +  sigact.sa_handler = dummy_signal; > >> +  sigaction(SIG_IPI, &sigact, NULL); > >> + > >> +  pthread_sigmask(SIG_BLOCK, NULL, &set); > >> +  sigdelset(&set, SIG_IPI); > >> + > >> +#ifdef __aarch64__ > >> +  r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t > **)&cpu->hvf_exit, NULL); > >> +#else > >> +  r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, > HV_VCPU_DEFAULT); > >> +#endif > > I think the first __aarch64__ bit fits better to arm part of > the series. > > > Oops. Thanks for catching it! Yes, absolutely. It should be > part of the > ARM enablement. > > > > > >> +  cpu->vcpu_dirty = 1; > >> +  assert_hvf_ok(r); > >> + > >> +  return hvf_arch_init_vcpu(cpu); > >> +} > >> + > >> +/* > >> + * The HVF-specific vCPU thread function. This one should > only run when the host > >> + * CPU supports the VMX "unrestricted guest" feature. > >> + */ > >> +static void *hvf_cpu_thread_fn(void *arg) > >> +{ > >> +  CPUState *cpu = arg; > >> + > >> +  int r; > >> + > >> +  assert(hvf_enabled()); > >> + > >> +  rcu_register_thread(); > >> + > >> +  qemu_mutex_lock_iothread(); > >> +  qemu_thread_get_self(cpu->thread); > >> + > >> +  cpu->thread_id = qemu_get_thread_id(); > >> +  cpu->can_do_io = 1; > >> +  current_cpu = cpu; > >> + > >> +  hvf_init_vcpu(cpu); > >> + > >> +  /* signal CPU creation */ > >> +  cpu_thread_signal_created(cpu); > >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); > >> + > >> +  do { > >> +    if (cpu_can_run(cpu)) { > >> +      r = hvf_vcpu_exec(cpu); > >> +      if (r == EXCP_DEBUG) { > >> +        cpu_handle_guest_debug(cpu); > >> +      } > >> +    } > >> +    qemu_wait_io_event(cpu); > >> +  } while (!cpu->unplug || cpu_can_run(cpu)); > >> + > >> +  hvf_vcpu_destroy(cpu); > >> +  cpu_thread_signal_destroyed(cpu); > >> +  qemu_mutex_unlock_iothread(); > >> +  rcu_unregister_thread(); > >> +  return NULL; > >> +} > >> + > >> +static void hvf_start_vcpu_thread(CPUState *cpu) > >> +{ > >> +  char thread_name[VCPU_THREAD_NAME_SIZE]; > >> + > >> +  /* > >> +   * HVF currently does not support TCG, and only runs in > >> +   * unrestricted-guest mode. > >> +   */ > >> +  assert(hvf_enabled()); > >> + > >> +  cpu->thread = g_malloc0(sizeof(QemuThread)); > >> +  cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > >> +  qemu_cond_init(cpu->halt_cond); > >> + > >> +  snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > >> +       cpu->cpu_index); > >> +  qemu_thread_create(cpu->thread, thread_name, > hvf_cpu_thread_fn, > >> +            cpu, QEMU_THREAD_JOINABLE); > >> +} > >> + > >> +static const CpusAccel hvf_cpus = { > >> +  .create_vcpu_thread = hvf_start_vcpu_thread, > >> + > >> +  .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > >> +  .synchronize_post_init = hvf_cpu_synchronize_post_init, > >> +  .synchronize_state = hvf_cpu_synchronize_state, > >> +  .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > >> +}; > >> + > >> +static int hvf_accel_init(MachineState *ms) > >> +{ > >> +  int x; > >> +  hv_return_t ret; > >> +  HVFState *s; > >> + > >> +  ret = hv_vm_create(HV_VM_DEFAULT); > >> +  assert_hvf_ok(ret); > >> + > >> +  s = g_new0(HVFState, 1); > >> + > >> +  s->num_slots = 32; > >> +  for (x = 0; x < s->num_slots; ++x) { > >> +    s->slots[x].size = 0; > >> +    s->slots[x].slot_id = x; > >> +  } > >> + > >> +  hvf_state = s; > >> + memory_listener_register(&hvf_memory_listener, > &address_space_memory); > >> +  cpus_register_accel(&hvf_cpus); > >> +  return 0; > >> +} > >> + > >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) > >> +{ > >> +  AccelClass *ac = ACCEL_CLASS(oc); > >> +  ac->name = "HVF"; > >> +  ac->init_machine = hvf_accel_init; > >> +  ac->allowed = &hvf_allowed; > >> +} > >> + > >> +static const TypeInfo hvf_accel_type = { > >> +  .name = TYPE_HVF_ACCEL, > >> +  .parent = TYPE_ACCEL, > >> +  .class_init = hvf_accel_class_init, > >> +}; > >> + > >> +static void hvf_type_init(void) > >> +{ > >> +  type_register_static(&hvf_accel_type); > >> +} > >> + > >> +type_init(hvf_type_init); > >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build > >> new file mode 100644 > >> index 0000000000..dfd6b68dc7 > >> --- /dev/null > >> +++ b/accel/hvf/meson.build > >> @@ -0,0 +1,7 @@ > >> +hvf_ss = ss.source_set() > >> +hvf_ss.add(files( > >> + 'hvf-all.c', > >> + 'hvf-cpus.c', > >> +)) > >> + > >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) > >> diff --git a/accel/meson.build b/accel/meson.build > >> index b26cca227a..6de12ce5d5 100644 > >> --- a/accel/meson.build > >> +++ b/accel/meson.build > >> @@ -1,5 +1,6 @@ > >>  softmmu_ss.add(files('accel.c')) > >> > >> +subdir('hvf') > >>  subdir('qtest') > >>  subdir('kvm') > >>  subdir('tcg') > >> diff --git a/include/sysemu/hvf_int.h > b/include/sysemu/hvf_int.h > >> new file mode 100644 > >> index 0000000000..de9bad23a8 > >> --- /dev/null > >> +++ b/include/sysemu/hvf_int.h > >> @@ -0,0 +1,69 @@ > >> +/* > >> + * QEMU Hypervisor.framework (HVF) support > >> + * > >> + * This work is licensed under the terms of the GNU GPL, > version 2 or later. > >> + * See the COPYING file in the top-level directory. > >> + * > >> + */ > >> + > >> +/* header to be included in HVF-specific code */ > >> + > >> +#ifndef HVF_INT_H > >> +#define HVF_INT_H > >> + > >> +#include <Hypervisor/Hypervisor.h> > >> + > >> +#define HVF_MAX_VCPU 0x10 > >> + > >> +extern struct hvf_state hvf_global; > >> + > >> +struct hvf_vm { > >> +  int id; > >> +  struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > >> +}; > >> + > >> +struct hvf_state { > >> +  uint32_t version; > >> +  struct hvf_vm *vm; > >> +  uint64_t mem_quota; > >> +}; > >> + > >> +/* hvf_slot flags */ > >> +#define HVF_SLOT_LOG (1 << 0) > >> + > >> +typedef struct hvf_slot { > >> +  uint64_t start; > >> +  uint64_t size; > >> +  uint8_t *mem; > >> +  int slot_id; > >> +  uint32_t flags; > >> +  MemoryRegion *region; > >> +} hvf_slot; > >> + > >> +typedef struct hvf_vcpu_caps { > >> +  uint64_t vmx_cap_pinbased; > >> +  uint64_t vmx_cap_procbased; > >> +  uint64_t vmx_cap_procbased2; > >> +  uint64_t vmx_cap_entry; > >> +  uint64_t vmx_cap_exit; > >> +  uint64_t vmx_cap_preemption_timer; > >> +} hvf_vcpu_caps; > >> + > >> +struct HVFState { > >> +  AccelState parent; > >> +  hvf_slot slots[32]; > >> +  int num_slots; > >> + > >> +  hvf_vcpu_caps *hvf_caps; > >> +}; > >> +extern HVFState *hvf_state; > >> + > >> +void assert_hvf_ok(hv_return_t ret); > >> +int hvf_get_registers(CPUState *cpu); > >> +int hvf_put_registers(CPUState *cpu); > >> +int hvf_arch_init_vcpu(CPUState *cpu); > >> +void hvf_arch_vcpu_destroy(CPUState *cpu); > >> +int hvf_vcpu_exec(CPUState *cpu); > >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > >> + > >> +#endif > >> diff --git a/target/i386/hvf/hvf-cpus.c > b/target/i386/hvf/hvf-cpus.c > >> deleted file mode 100644 > >> index 817b3d7452..0000000000 > >> --- a/target/i386/hvf/hvf-cpus.c > >> +++ /dev/null > >> @@ -1,131 +0,0 @@ > >> -/* > >> - * Copyright 2008 IBM Corporation > >> - *      2008 Red Hat, Inc. > >> - * Copyright 2011 Intel Corporation > >> - * Copyright 2016 Veertu, Inc. > >> - * Copyright 2017 The Android Open Source Project > >> - * > >> - * QEMU Hypervisor.framework support > >> - * > >> - * This program is free software; you can redistribute it > and/or > >> - * modify it under the terms of version 2 of the GNU > General Public > >> - * License as published by the Free Software Foundation. > >> - * > >> - * This program is distributed in the hope that it will be > useful, > >> - * but WITHOUT ANY WARRANTY; without even the implied > warranty of > >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > See the GNU > >> - * General Public License for more details. > >> - * > >> - * You should have received a copy of the GNU General > Public License > >> - * along with this program; if not, see > <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>. > >> - * > >> - * This file contain code under public domain from the > hvdos project: > >> - * https://github.com/mist64/hvdos > <https://github.com/mist64/hvdos> > >> - * > >> - * Parts Copyright (c) 2011 NetApp, Inc. > >> - * All rights reserved. > >> - * > >> - * Redistribution and use in source and binary forms, with > or without > >> - * modification, are permitted provided that the following > conditions > >> - * are met: > >> - * 1. Redistributions of source code must retain the above > copyright > >> - *  notice, this list of conditions and the following > disclaimer. > >> - * 2. Redistributions in binary form must reproduce the > above copyright > >> - *  notice, this list of conditions and the following > disclaimer in the > >> - *  documentation and/or other materials provided with > the distribution. > >> - * > >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND > >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > LIMITED TO, THE > >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR PURPOSE > >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR > CONTRIBUTORS BE LIABLE > >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > EXEMPLARY, OR CONSEQUENTIAL > >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > SUBSTITUTE GOODS > >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER > IN CONTRACT, STRICT > >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > ARISING IN ANY WAY > >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > >> - * SUCH DAMAGE. > >> - */ > >> - > >> -#include "qemu/osdep.h" > >> -#include "qemu/error-report.h" > >> -#include "qemu/main-loop.h" > >> -#include "sysemu/hvf.h" > >> -#include "sysemu/runstate.h" > >> -#include "target/i386/cpu.h" > >> -#include "qemu/guest-random.h" > >> - > >> -#include "hvf-cpus.h" > >> - > >> -/* > >> - * The HVF-specific vCPU thread function. This one should > only run when the host > >> - * CPU supports the VMX "unrestricted guest" feature. > >> - */ > >> -static void *hvf_cpu_thread_fn(void *arg) > >> -{ > >> -  CPUState *cpu = arg; > >> - > >> -  int r; > >> - > >> -  assert(hvf_enabled()); > >> - > >> -  rcu_register_thread(); > >> - > >> -  qemu_mutex_lock_iothread(); > >> -  qemu_thread_get_self(cpu->thread); > >> - > >> -  cpu->thread_id = qemu_get_thread_id(); > >> -  cpu->can_do_io = 1; > >> -  current_cpu = cpu; > >> - > >> -  hvf_init_vcpu(cpu); > >> - > >> -  /* signal CPU creation */ > >> -  cpu_thread_signal_created(cpu); > >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); > >> - > >> -  do { > >> -    if (cpu_can_run(cpu)) { > >> -      r = hvf_vcpu_exec(cpu); > >> -      if (r == EXCP_DEBUG) { > >> -        cpu_handle_guest_debug(cpu); > >> -      } > >> -    } > >> -    qemu_wait_io_event(cpu); > >> -  } while (!cpu->unplug || cpu_can_run(cpu)); > >> - > >> -  hvf_vcpu_destroy(cpu); > >> -  cpu_thread_signal_destroyed(cpu); > >> -  qemu_mutex_unlock_iothread(); > >> -  rcu_unregister_thread(); > >> -  return NULL; > >> -} > >> - > >> -static void hvf_start_vcpu_thread(CPUState *cpu) > >> -{ > >> -  char thread_name[VCPU_THREAD_NAME_SIZE]; > >> - > >> -  /* > >> -   * HVF currently does not support TCG, and only runs in > >> -   * unrestricted-guest mode. > >> -   */ > >> -  assert(hvf_enabled()); > >> - > >> -  cpu->thread = g_malloc0(sizeof(QemuThread)); > >> -  cpu->halt_cond = g_malloc0(sizeof(QemuCond)); > >> -  qemu_cond_init(cpu->halt_cond); > >> - > >> -  snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", > >> -       cpu->cpu_index); > >> -  qemu_thread_create(cpu->thread, thread_name, > hvf_cpu_thread_fn, > >> -            cpu, QEMU_THREAD_JOINABLE); > >> -} > >> - > >> -const CpusAccel hvf_cpus = { > >> -  .create_vcpu_thread = hvf_start_vcpu_thread, > >> - > >> -  .synchronize_post_reset = hvf_cpu_synchronize_post_reset, > >> -  .synchronize_post_init = hvf_cpu_synchronize_post_init, > >> -  .synchronize_state = hvf_cpu_synchronize_state, > >> -  .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, > >> -}; > >> diff --git a/target/i386/hvf/hvf-cpus.h > b/target/i386/hvf/hvf-cpus.h > >> deleted file mode 100644 > >> index ced31b82c0..0000000000 > >> --- a/target/i386/hvf/hvf-cpus.h > >> +++ /dev/null > >> @@ -1,25 +0,0 @@ > >> -/* > >> - * Accelerator CPUS Interface > >> - * > >> - * Copyright 2020 SUSE LLC > >> - * > >> - * This work is licensed under the terms of the GNU GPL, > version 2 or later. > >> - * See the COPYING file in the top-level directory. > >> - */ > >> - > >> -#ifndef HVF_CPUS_H > >> -#define HVF_CPUS_H > >> - > >> -#include "sysemu/cpus.h" > >> - > >> -extern const CpusAccel hvf_cpus; > >> - > >> -int hvf_init_vcpu(CPUState *); > >> -int hvf_vcpu_exec(CPUState *); > >> -void hvf_cpu_synchronize_state(CPUState *); > >> -void hvf_cpu_synchronize_post_reset(CPUState *); > >> -void hvf_cpu_synchronize_post_init(CPUState *); > >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); > >> -void hvf_vcpu_destroy(CPUState *); > >> - > >> -#endif /* HVF_CPUS_H */ > >> diff --git a/target/i386/hvf/hvf-i386.h > b/target/i386/hvf/hvf-i386.h > >> index e0edffd077..6d56f8f6bb 100644 > >> --- a/target/i386/hvf/hvf-i386.h > >> +++ b/target/i386/hvf/hvf-i386.h > >> @@ -18,57 +18,11 @@ > >> > >>  #include "sysemu/accel.h" > >>  #include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >>  #include "cpu.h" > >>  #include "x86.h" > >> > >> -#define HVF_MAX_VCPU 0x10 > >> - > >> -extern struct hvf_state hvf_global; > >> - > >> -struct hvf_vm { > >> -  int id; > >> -  struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; > >> -}; > >> - > >> -struct hvf_state { > >> -  uint32_t version; > >> -  struct hvf_vm *vm; > >> -  uint64_t mem_quota; > >> -}; > >> - > >> -/* hvf_slot flags */ > >> -#define HVF_SLOT_LOG (1 << 0) > >> - > >> -typedef struct hvf_slot { > >> -  uint64_t start; > >> -  uint64_t size; > >> -  uint8_t *mem; > >> -  int slot_id; > >> -  uint32_t flags; > >> -  MemoryRegion *region; > >> -} hvf_slot; > >> - > >> -typedef struct hvf_vcpu_caps { > >> -  uint64_t vmx_cap_pinbased; > >> -  uint64_t vmx_cap_procbased; > >> -  uint64_t vmx_cap_procbased2; > >> -  uint64_t vmx_cap_entry; > >> -  uint64_t vmx_cap_exit; > >> -  uint64_t vmx_cap_preemption_timer; > >> -} hvf_vcpu_caps; > >> - > >> -struct HVFState { > >> -  AccelState parent; > >> -  hvf_slot slots[32]; > >> -  int num_slots; > >> - > >> -  hvf_vcpu_caps *hvf_caps; > >> -}; > >> -extern HVFState *hvf_state; > >> - > >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); > >>  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, > int, int); > >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); > >> > >>  #ifdef NEED_CPU_H > >>  /* Functions exported to host specific mode */ > >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c > >> index ed9356565c..8b96ecd619 100644 > >> --- a/target/i386/hvf/hvf.c > >> +++ b/target/i386/hvf/hvf.c > >> @@ -51,6 +51,7 @@ > >>  #include "qemu/error-report.h" > >> > >>  #include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >>  #include "sysemu/runstate.h" > >>  #include "hvf-i386.h" > >>  #include "vmcs.h" > >> @@ -72,171 +73,6 @@ > >>  #include "sysemu/accel.h" > >>  #include "target/i386/cpu.h" > >> > >> -#include "hvf-cpus.h" > >> - > >> -HVFState *hvf_state; > >> - > >> -static void assert_hvf_ok(hv_return_t ret) > >> -{ > >> -  if (ret == HV_SUCCESS) { > >> -    return; > >> -  } > >> - > >> -  switch (ret) { > >> -  case HV_ERROR: > >> -    error_report("Error: HV_ERROR"); > >> -    break; > >> -  case HV_BUSY: > >> -    error_report("Error: HV_BUSY"); > >> -    break; > >> -  case HV_BAD_ARGUMENT: > >> -    error_report("Error: HV_BAD_ARGUMENT"); > >> -    break; > >> -  case HV_NO_RESOURCES: > >> -    error_report("Error: HV_NO_RESOURCES"); > >> -    break; > >> -  case HV_NO_DEVICE: > >> -    error_report("Error: HV_NO_DEVICE"); > >> -    break; > >> -  case HV_UNSUPPORTED: > >> -    error_report("Error: HV_UNSUPPORTED"); > >> -    break; > >> -  default: > >> -    error_report("Unknown Error"); > >> -  } > >> - > >> -  abort(); > >> -} > >> - > >> -/* Memory slots */ > >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) > >> -{ > >> -  hvf_slot *slot; > >> -  int x; > >> -  for (x = 0; x < hvf_state->num_slots; ++x) { > >> -    slot = &hvf_state->slots[x]; > >> -    if (slot->size && start < (slot->start + > slot->size) && > >> -      (start + size) > slot->start) { > >> -      return slot; > >> -    } > >> -  } > >> -  return NULL; > >> -} > >> - > >> -struct mac_slot { > >> -  int present; > >> -  uint64_t size; > >> -  uint64_t gpa_start; > >> -  uint64_t gva; > >> -}; > >> - > >> -struct mac_slot mac_slots[32]; > >> - > >> -static int do_hvf_set_memory(hvf_slot *slot, > hv_memory_flags_t flags) > >> -{ > >> -  struct mac_slot *macslot; > >> -  hv_return_t ret; > >> - > >> -  macslot = &mac_slots[slot->slot_id]; > >> - > >> -  if (macslot->present) { > >> -    if (macslot->size != slot->size) { > >> -      macslot->present = 0; > >> -      ret = hv_vm_unmap(macslot->gpa_start, > macslot->size); > >> -      assert_hvf_ok(ret); > >> -    } > >> -  } > >> - > >> -  if (!slot->size) { > >> -    return 0; > >> -  } > >> - > >> -  macslot->present = 1; > >> -  macslot->gpa_start = slot->start; > >> -  macslot->size = slot->size; > >> -  ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, > slot->size, flags); > >> -  assert_hvf_ok(ret); > >> -  return 0; > >> -} > >> - > >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) > >> -{ > >> -  hvf_slot *mem; > >> -  MemoryRegion *area = section->mr; > >> -  bool writeable = !area->readonly && !area->rom_device; > >> -  hv_memory_flags_t flags; > >> - > >> -  if (!memory_region_is_ram(area)) { > >> -    if (writeable) { > >> -      return; > >> -    } else if (!memory_region_is_romd(area)) { > >> -      /* > >> -       * If the memory device is not in romd_mode, > then we actually want > >> -       * to remove the hvf memory slot so all > accesses will trap. > >> -       */ > >> -       add = false; > >> -    } > >> -  } > >> - > >> -  mem = hvf_find_overlap_slot( > >> - section->offset_within_address_space, > >> -      int128_get64(section->size)); > >> - > >> -  if (mem && add) { > >> -    if (mem->size == int128_get64(section->size) && > >> -      mem->start == > section->offset_within_address_space && > >> -      mem->mem == (memory_region_get_ram_ptr(area) + > >> -      section->offset_within_region)) { > >> -      return; /* Same region was attempted to > register, go away. */ > >> -    } > >> -  } > >> - > >> -  /* Region needs to be reset. set the size to 0 and > remap it. */ > >> -  if (mem) { > >> -    mem->size = 0; > >> -    if (do_hvf_set_memory(mem, 0)) { > >> -      error_report("Failed to reset overlapping slot"); > >> -      abort(); > >> -    } > >> -  } > >> - > >> -  if (!add) { > >> -    return; > >> -  } > >> - > >> -  if (area->readonly || > >> -    (!memory_region_is_ram(area) && > memory_region_is_romd(area))) { > >> -    flags = HV_MEMORY_READ | HV_MEMORY_EXEC; > >> -  } else { > >> -    flags = HV_MEMORY_READ | HV_MEMORY_WRITE | > HV_MEMORY_EXEC; > >> -  } > >> - > >> -  /* Now make a new slot. */ > >> -  int x; > >> - > >> -  for (x = 0; x < hvf_state->num_slots; ++x) { > >> -    mem = &hvf_state->slots[x]; > >> -    if (!mem->size) { > >> -      break; > >> -    } > >> -  } > >> - > >> -  if (x == hvf_state->num_slots) { > >> -    error_report("No free slots"); > >> -    abort(); > >> -  } > >> - > >> -  mem->size = int128_get64(section->size); > >> -  mem->mem = memory_region_get_ram_ptr(area) + > section->offset_within_region; > >> -  mem->start = section->offset_within_address_space; > >> -  mem->region = area; > >> - > >> -  if (do_hvf_set_memory(mem, flags)) { > >> -    error_report("Error registering new memory slot"); > >> -    abort(); > >> -  } > >> -} > >> - > >>  void vmx_update_tpr(CPUState *cpu) > >>  { > >>    /* TODO: need integrate APIC handling */ > >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, > uint16_t port, void *buffer, > >>    } > >>  } > >> > >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, > run_on_cpu_data arg) > >> -{ > >> -  if (!cpu->vcpu_dirty) { > >> -    hvf_get_registers(cpu); > >> -    cpu->vcpu_dirty = true; > >> -  } > >> -} > >> - > >> -void hvf_cpu_synchronize_state(CPUState *cpu) > >> -{ > >> -  if (!cpu->vcpu_dirty) { > >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_state, > RUN_ON_CPU_NULL); > >> -  } > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, > >> - run_on_cpu_data arg) > >> -{ > >> -  hvf_put_registers(cpu); > >> -  cpu->vcpu_dirty = false; > >> -} > >> - > >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) > >> -{ > >> -  run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, > RUN_ON_CPU_NULL); > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, > >> -  run_on_cpu_data arg) > >> -{ > >> -  hvf_put_registers(cpu); > >> -  cpu->vcpu_dirty = false; > >> -} > >> - > >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) > >> -{ > >> -  run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, > RUN_ON_CPU_NULL); > >> -} > >> - > >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, > >> - run_on_cpu_data arg) > >> -{ > >> -  cpu->vcpu_dirty = true; > >> -} > >> - > >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) > >> -{ > >> -  run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, > RUN_ON_CPU_NULL); > >> -} > >> - > >>  static bool ept_emulation_fault(hvf_slot *slot, uint64_t > gpa, uint64_t ept_qual) > >>  { > >>    int read, write; > >> @@ -370,109 +156,19 @@ static bool > ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t > ept_qual) > >>    return false; > >>  } > >> > >> -static void hvf_set_dirty_tracking(MemoryRegionSection > *section, bool on) > >> -{ > >> -  hvf_slot *slot; > >> - > >> -  slot = hvf_find_overlap_slot( > >> - section->offset_within_address_space, > >> -      int128_get64(section->size)); > >> - > >> -  /* protect region against writes; begin tracking it */ > >> -  if (on) { > >> -    slot->flags |= HVF_SLOT_LOG; > >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > >> -           HV_MEMORY_READ); > >> -  /* stop tracking region*/ > >> -  } else { > >> -    slot->flags &= ~HVF_SLOT_LOG; > >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, > >> -           HV_MEMORY_READ | HV_MEMORY_WRITE); > >> -  } > >> -} > >> - > >> -static void hvf_log_start(MemoryListener *listener, > >> -             MemoryRegionSection *section, > int old, int new) > >> -{ > >> -  if (old != 0) { > >> -    return; > >> -  } > >> - > >> -  hvf_set_dirty_tracking(section, 1); > >> -} > >> - > >> -static void hvf_log_stop(MemoryListener *listener, > >> -             MemoryRegionSection *section, int > old, int new) > >> -{ > >> -  if (new != 0) { > >> -    return; > >> -  } > >> - > >> -  hvf_set_dirty_tracking(section, 0); > >> -} > >> - > >> -static void hvf_log_sync(MemoryListener *listener, > >> -             MemoryRegionSection *section) > >> -{ > >> -  /* > >> -   * sync of dirty pages is handled elsewhere; just make > sure we keep > >> -   * tracking the region. > >> -   */ > >> -  hvf_set_dirty_tracking(section, 1); > >> -} > >> - > >> -static void hvf_region_add(MemoryListener *listener, > >> -              MemoryRegionSection *section) > >> -{ > >> -  hvf_set_phys_mem(section, true); > >> -} > >> - > >> -static void hvf_region_del(MemoryListener *listener, > >> -              MemoryRegionSection *section) > >> -{ > >> -  hvf_set_phys_mem(section, false); > >> -} > >> - > >> -static MemoryListener hvf_memory_listener = { > >> -  .priority = 10, > >> -  .region_add = hvf_region_add, > >> -  .region_del = hvf_region_del, > >> -  .log_start = hvf_log_start, > >> -  .log_stop = hvf_log_stop, > >> -  .log_sync = hvf_log_sync, > >> -}; > >> - > >> -void hvf_vcpu_destroy(CPUState *cpu) > >> +void hvf_arch_vcpu_destroy(CPUState *cpu) > >>  { > >>    X86CPU *x86_cpu = X86_CPU(cpu); > >>    CPUX86State *env = &x86_cpu->env; > >> > >> -  hv_return_t ret = > hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); > >>    g_free(env->hvf_mmio_buf); > >> -  assert_hvf_ok(ret); > >> -} > >> - > >> -static void dummy_signal(int sig) > >> -{ > >>  } > >> > >> -int hvf_init_vcpu(CPUState *cpu) > >> +int hvf_arch_init_vcpu(CPUState *cpu) > >>  { > >> > >>    X86CPU *x86cpu = X86_CPU(cpu); > >>    CPUX86State *env = &x86cpu->env; > >> -  int r; > >> - > >> -  /* init cpu signals */ > >> -  sigset_t set; > >> -  struct sigaction sigact; > >> - > >> -  memset(&sigact, 0, sizeof(sigact)); > >> -  sigact.sa_handler = dummy_signal; > >> -  sigaction(SIG_IPI, &sigact, NULL); > >> - > >> -  pthread_sigmask(SIG_BLOCK, NULL, &set); > >> -  sigdelset(&set, SIG_IPI); > >> > >>    init_emu(); > >>    init_decoder(); > >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) > >>    hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); > >>    env->hvf_mmio_buf = g_new(char, 4096); > >> > >> -  r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, > HV_VCPU_DEFAULT); > >> -  cpu->vcpu_dirty = 1; > >> -  assert_hvf_ok(r); > >> - > >>    if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, > >>  &hvf_state->hvf_caps->vmx_cap_pinbased)) { > >>      abort(); > >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) > >> > >>    return ret; > >>  } > >> - > >> -bool hvf_allowed; > >> - > >> -static int hvf_accel_init(MachineState *ms) > >> -{ > >> -  int x; > >> -  hv_return_t ret; > >> -  HVFState *s; > >> - > >> -  ret = hv_vm_create(HV_VM_DEFAULT); > >> -  assert_hvf_ok(ret); > >> - > >> -  s = g_new0(HVFState, 1); > >> - > >> -  s->num_slots = 32; > >> -  for (x = 0; x < s->num_slots; ++x) { > >> -    s->slots[x].size = 0; > >> -    s->slots[x].slot_id = x; > >> -  } > >> - > >> -  hvf_state = s; > >> - memory_listener_register(&hvf_memory_listener, > &address_space_memory); > >> -  cpus_register_accel(&hvf_cpus); > >> -  return 0; > >> -} > >> - > >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) > >> -{ > >> -  AccelClass *ac = ACCEL_CLASS(oc); > >> -  ac->name = "HVF"; > >> -  ac->init_machine = hvf_accel_init; > >> -  ac->allowed = &hvf_allowed; > >> -} > >> - > >> -static const TypeInfo hvf_accel_type = { > >> -  .name = TYPE_HVF_ACCEL, > >> -  .parent = TYPE_ACCEL, > >> -  .class_init = hvf_accel_class_init, > >> -}; > >> - > >> -static void hvf_type_init(void) > >> -{ > >> -  type_register_static(&hvf_accel_type); > >> -} > >> - > >> -type_init(hvf_type_init); > >> diff --git a/target/i386/hvf/meson.build > b/target/i386/hvf/meson.build > >> index 409c9a3f14..c8a43717ee 100644 > >> --- a/target/i386/hvf/meson.build > >> +++ b/target/i386/hvf/meson.build > >> @@ -1,6 +1,5 @@ > >>  i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: > files( > >>   'hvf.c', > >> - 'hvf-cpus.c', > >>   'x86.c', > >>   'x86_cpuid.c', > >>   'x86_decode.c', > >> diff --git a/target/i386/hvf/x86hvf.c > b/target/i386/hvf/x86hvf.c > >> index bbec412b6c..89b8e9d87a 100644 > >> --- a/target/i386/hvf/x86hvf.c > >> +++ b/target/i386/hvf/x86hvf.c > >> @@ -20,6 +20,9 @@ > >>  #include "qemu/osdep.h" > >> > >>  #include "qemu-common.h" > >> +#include "sysemu/hvf.h" > >> +#include "sysemu/hvf_int.h" > >> +#include "sysemu/hw_accel.h" > >>  #include "x86hvf.h" > >>  #include "vmx.h" > >>  #include "vmcs.h" > >> @@ -32,8 +35,6 @@ > >>  #include <Hypervisor/hv.h> > >>  #include <Hypervisor/hv_vmx.h> > >> > >> -#include "hvf-cpus.h" > >> - > >>  void hvf_set_segment(struct CPUState *cpu, struct > vmx_segment *vmx_seg, > >>            SegmentCache *qseg, bool is_tr) > >>  { > >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) > >>    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); > >> > >>    if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { > >> -    hvf_cpu_synchronize_state(cpu_state); > >> +    cpu_synchronize_state(cpu_state); > >>      do_cpu_init(cpu); > >>    } > >> > >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState > *cpu_state) > >>      cpu_state->halted = 0; > >>    } > >>    if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { > >> -    hvf_cpu_synchronize_state(cpu_state); > >> +    cpu_synchronize_state(cpu_state); > >>      do_cpu_sipi(cpu); > >>    } > >>    if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { > >>      cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; > >> -    hvf_cpu_synchronize_state(cpu_state); > >> +    cpu_synchronize_state(cpu_state); > > The changes from hvf_cpu_*() to cpu_*() are cleanup and > perhaps should > > be a separate patch. It follows cpu/accel cleanups Claudio > was doing the > > summer. > > > The only reason they're in here is because we no longer have > access to > the hvf_ functions from the file. I am perfectly happy to > rebase the > patch on top of Claudio's if his goes in first. I'm sure it'll be > trivial for him to rebase on top of this too if my series goes > in first. > > > > > > Phillipe raised the idea that the patch might go ahead of > ARM-specific > > part (which might involve some discussions) and I agree with > that. > > > > Some sync between Claudio series (CC'd him) and the patch > might be need. > > > I would prefer not to hold back because of the sync. Claudio's > cleanup > is trivial enough to adjust for if it gets merged ahead of this. > > > Alex > > >
On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > Hi Frank, > > Thanks for the update :). Your previous email nudged me into the right > direction. I previously had implemented WFI through the internal timer > framework which performed way worse. > Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > Along the way, I stumbled over a few issues though. For starters, the > signal mask for SIG_IPI was not set correctly, so while pselect() would > exit, the signal would never get delivered to the thread! For a fix, check > out > > > https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > > Thanks, we'll take a look :) > Please also have a look at my latest stab at WFI emulation. It doesn't > handle WFE (that's only relevant in overcommitted scenarios). But it does > handle WFI and even does something similar to hlt polling, albeit not with > an adaptive threshold. > > Also, is there a particular reason you're working on this super > interesting and useful code in a random downstream fork of QEMU? Wouldn't > it be more helpful to contribute to the upstream code base instead? > We'd actually like to contribute upstream too :) We do want to maintain our own downstream though; Android Emulator codebase needs to work solidly on macos and windows which has made keeping up with upstream difficult, and staying on a previous version (2.12) with known quirks easier. (theres also some android related customization relating to Qt Ui + different set of virtual devices and snapshot support (incl. snapshots of graphics devices with OpenGLES state tracking), which we hope to separate into other libraries/processes, but its not insignificant) > > Alex > > On 30.11.20 21:15, Frank Yang wrote: > > Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But > the high CPU usage seems to be mitigated by having a poll interval (like > KVM does) in handling WFI: > > https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 > > This is loosely inspired by > https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 > which does seem to specify a poll interval. > > It would be cool if we could have a lightweight way to enter sleep and > restart the vcpus precisely when CVAL passes, though. > > Frank > > > On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote: > >> Hi all, >> >> +Peter Collingbourne <pcc@google.com> >> >> I'm a developer on the Android Emulator, which is in a fork of QEMU. >> >> Peter and I have been working on an HVF Apple Silicon backend with an eye >> toward Android guests. >> >> We have gotten things to basically switch to Android userspace already >> (logcat/shell and graphics available at least) >> >> Our strategy so far has been to import logic from the KVM implementation >> and hook into QEMU's software devices that previously assumed to only work >> with TCG, or have KVM-specific paths. >> >> Thanks to Alexander for the tip on the 36-bit address space limitation >> btw; our way of addressing this is to still allow highmem but not put pci >> high mmio so high. >> >> Also, note we have a sleep/signal based mechanism to deal with WFx, which >> might be worth looking into in Alexander's implementation as well: >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 >> >> Patches so far, FYI: >> >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 >> >> >> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a >> >> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b >> >> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 >> >> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 >> >> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 >> >> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 >> >> Peter's also noticed that there are extra steps needed for M1's to allow >> TCG to work, as it involves JIT: >> >> >> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 >> >> We'd appreciate any feedback/comments :) >> >> Best, >> >> Frank >> >> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote: >> >>> >>> On 27.11.20 21:00, Roman Bolshakov wrote: >>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: >>> >> Until now, Hypervisor.framework has only been available on x86_64 >>> systems. >>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To >>> >> prepare for support for multiple architectures, let's move common >>> code out >>> >> into its own accel directory. >>> >> >>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de> >>> >> --- >>> >> MAINTAINERS | 9 +- >>> >> accel/hvf/hvf-all.c | 56 +++++ >>> >> accel/hvf/hvf-cpus.c | 468 >>> ++++++++++++++++++++++++++++++++++++ >>> >> accel/hvf/meson.build | 7 + >>> >> accel/meson.build | 1 + >>> >> include/sysemu/hvf_int.h | 69 ++++++ >>> >> target/i386/hvf/hvf-cpus.c | 131 ---------- >>> >> target/i386/hvf/hvf-cpus.h | 25 -- >>> >> target/i386/hvf/hvf-i386.h | 48 +--- >>> >> target/i386/hvf/hvf.c | 360 +-------------------------- >>> >> target/i386/hvf/meson.build | 1 - >>> >> target/i386/hvf/x86hvf.c | 11 +- >>> >> target/i386/hvf/x86hvf.h | 2 - >>> >> 13 files changed, 619 insertions(+), 569 deletions(-) >>> >> create mode 100644 accel/hvf/hvf-all.c >>> >> create mode 100644 accel/hvf/hvf-cpus.c >>> >> create mode 100644 accel/hvf/meson.build >>> >> create mode 100644 include/sysemu/hvf_int.h >>> >> delete mode 100644 target/i386/hvf/hvf-cpus.c >>> >> delete mode 100644 target/i386/hvf/hvf-cpus.h >>> >> >>> >> diff --git a/MAINTAINERS b/MAINTAINERS >>> >> index 68bc160f41..ca4b6d9279 100644 >>> >> --- a/MAINTAINERS >>> >> +++ b/MAINTAINERS >>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> >>> >> M: Roman Bolshakov <r.bolshakov@yadro.com> >>> >> W: https://wiki.qemu.org/Features/HVF >>> >> S: Maintained >>> >> -F: accel/stubs/hvf-stub.c >>> > There was a patch for that in the RFC series from Claudio. >>> >>> >>> Yeah, I'm not worried about this hunk :). >>> >>> >>> > >>> >> F: target/i386/hvf/ >>> >> + >>> >> +HVF >>> >> +M: Cameron Esfahani <dirty@apple.com> >>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com> >>> >> +W: https://wiki.qemu.org/Features/HVF >>> >> +S: Maintained >>> >> +F: accel/hvf/ >>> >> F: include/sysemu/hvf.h >>> >> +F: include/sysemu/hvf_int.h >>> >> >>> >> WHPX CPUs >>> >> M: Sunil Muthuswamy <sunilmut@microsoft.com> >>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c >>> >> new file mode 100644 >>> >> index 0000000000..47d77a472a >>> >> --- /dev/null >>> >> +++ b/accel/hvf/hvf-all.c >>> >> @@ -0,0 +1,56 @@ >>> >> +/* >>> >> + * QEMU Hypervisor.framework support >>> >> + * >>> >> + * This work is licensed under the terms of the GNU GPL, version 2. >>> See >>> >> + * the COPYING file in the top-level directory. >>> >> + * >>> >> + * Contributions after 2012-01-13 are licensed under the terms of the >>> >> + * GNU GPL, version 2 or (at your option) any later version. >>> >> + */ >>> >> + >>> >> +#include "qemu/osdep.h" >>> >> +#include "qemu-common.h" >>> >> +#include "qemu/error-report.h" >>> >> +#include "sysemu/hvf.h" >>> >> +#include "sysemu/hvf_int.h" >>> >> +#include "sysemu/runstate.h" >>> >> + >>> >> +#include "qemu/main-loop.h" >>> >> +#include "sysemu/accel.h" >>> >> + >>> >> +#include <Hypervisor/Hypervisor.h> >>> >> + >>> >> +bool hvf_allowed; >>> >> +HVFState *hvf_state; >>> >> + >>> >> +void assert_hvf_ok(hv_return_t ret) >>> >> +{ >>> >> + if (ret == HV_SUCCESS) { >>> >> + return; >>> >> + } >>> >> + >>> >> + switch (ret) { >>> >> + case HV_ERROR: >>> >> + error_report("Error: HV_ERROR"); >>> >> + break; >>> >> + case HV_BUSY: >>> >> + error_report("Error: HV_BUSY"); >>> >> + break; >>> >> + case HV_BAD_ARGUMENT: >>> >> + error_report("Error: HV_BAD_ARGUMENT"); >>> >> + break; >>> >> + case HV_NO_RESOURCES: >>> >> + error_report("Error: HV_NO_RESOURCES"); >>> >> + break; >>> >> + case HV_NO_DEVICE: >>> >> + error_report("Error: HV_NO_DEVICE"); >>> >> + break; >>> >> + case HV_UNSUPPORTED: >>> >> + error_report("Error: HV_UNSUPPORTED"); >>> >> + break; >>> >> + default: >>> >> + error_report("Unknown Error"); >>> >> + } >>> >> + >>> >> + abort(); >>> >> +} >>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c >>> >> new file mode 100644 >>> >> index 0000000000..f9bb5502b7 >>> >> --- /dev/null >>> >> +++ b/accel/hvf/hvf-cpus.c >>> >> @@ -0,0 +1,468 @@ >>> >> +/* >>> >> + * Copyright 2008 IBM Corporation >>> >> + * 2008 Red Hat, Inc. >>> >> + * Copyright 2011 Intel Corporation >>> >> + * Copyright 2016 Veertu, Inc. >>> >> + * Copyright 2017 The Android Open Source Project >>> >> + * >>> >> + * QEMU Hypervisor.framework support >>> >> + * >>> >> + * This program is free software; you can redistribute it and/or >>> >> + * modify it under the terms of version 2 of the GNU General Public >>> >> + * License as published by the Free Software Foundation. >>> >> + * >>> >> + * This program is distributed in the hope that it will be useful, >>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> >> + * General Public License for more details. >>> >> + * >>> >> + * You should have received a copy of the GNU General Public License >>> >> + * along with this program; if not, see < >>> http://www.gnu.org/licenses/>. >>> >> + * >>> >> + * This file contain code under public domain from the hvdos project: >>> >> + * https://github.com/mist64/hvdos >>> >> + * >>> >> + * Parts Copyright (c) 2011 NetApp, Inc. >>> >> + * All rights reserved. >>> >> + * >>> >> + * Redistribution and use in source and binary forms, with or without >>> >> + * modification, are permitted provided that the following conditions >>> >> + * are met: >>> >> + * 1. Redistributions of source code must retain the above copyright >>> >> + * notice, this list of conditions and the following disclaimer. >>> >> + * 2. Redistributions in binary form must reproduce the above >>> copyright >>> >> + * notice, this list of conditions and the following disclaimer >>> in the >>> >> + * documentation and/or other materials provided with the >>> distribution. >>> >> + * >>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >>> THE >>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A >>> PARTICULAR PURPOSE >>> >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE >>> LIABLE >>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >>> CONSEQUENTIAL >>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE >>> GOODS >>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >>> INTERRUPTION) >>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >>> CONTRACT, STRICT >>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN >>> ANY WAY >>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >>> POSSIBILITY OF >>> >> + * SUCH DAMAGE. >>> >> + */ >>> >> + >>> >> +#include "qemu/osdep.h" >>> >> +#include "qemu/error-report.h" >>> >> +#include "qemu/main-loop.h" >>> >> +#include "exec/address-spaces.h" >>> >> +#include "exec/exec-all.h" >>> >> +#include "sysemu/cpus.h" >>> >> +#include "sysemu/hvf.h" >>> >> +#include "sysemu/hvf_int.h" >>> >> +#include "sysemu/runstate.h" >>> >> +#include "qemu/guest-random.h" >>> >> + >>> >> +#include <Hypervisor/Hypervisor.h> >>> >> + >>> >> +/* Memory slots */ >>> >> + >>> >> +struct mac_slot { >>> >> + int present; >>> >> + uint64_t size; >>> >> + uint64_t gpa_start; >>> >> + uint64_t gva; >>> >> +}; >>> >> + >>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>> >> +{ >>> >> + hvf_slot *slot; >>> >> + int x; >>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>> >> + slot = &hvf_state->slots[x]; >>> >> + if (slot->size && start < (slot->start + slot->size) && >>> >> + (start + size) > slot->start) { >>> >> + return slot; >>> >> + } >>> >> + } >>> >> + return NULL; >>> >> +} >>> >> + >>> >> +struct mac_slot mac_slots[32]; >>> >> + >>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>> >> +{ >>> >> + struct mac_slot *macslot; >>> >> + hv_return_t ret; >>> >> + >>> >> + macslot = &mac_slots[slot->slot_id]; >>> >> + >>> >> + if (macslot->present) { >>> >> + if (macslot->size != slot->size) { >>> >> + macslot->present = 0; >>> >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>> >> + assert_hvf_ok(ret); >>> >> + } >>> >> + } >>> >> + >>> >> + if (!slot->size) { >>> >> + return 0; >>> >> + } >>> >> + >>> >> + macslot->present = 1; >>> >> + macslot->gpa_start = slot->start; >>> >> + macslot->size = slot->size; >>> >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); >>> >> + assert_hvf_ok(ret); >>> >> + return 0; >>> >> +} >>> >> + >>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>> >> +{ >>> >> + hvf_slot *mem; >>> >> + MemoryRegion *area = section->mr; >>> >> + bool writeable = !area->readonly && !area->rom_device; >>> >> + hv_memory_flags_t flags; >>> >> + >>> >> + if (!memory_region_is_ram(area)) { >>> >> + if (writeable) { >>> >> + return; >>> >> + } else if (!memory_region_is_romd(area)) { >>> >> + /* >>> >> + * If the memory device is not in romd_mode, then we >>> actually want >>> >> + * to remove the hvf memory slot so all accesses will >>> trap. >>> >> + */ >>> >> + add = false; >>> >> + } >>> >> + } >>> >> + >>> >> + mem = hvf_find_overlap_slot( >>> >> + section->offset_within_address_space, >>> >> + int128_get64(section->size)); >>> >> + >>> >> + if (mem && add) { >>> >> + if (mem->size == int128_get64(section->size) && >>> >> + mem->start == section->offset_within_address_space && >>> >> + mem->mem == (memory_region_get_ram_ptr(area) + >>> >> + section->offset_within_region)) { >>> >> + return; /* Same region was attempted to register, go >>> away. */ >>> >> + } >>> >> + } >>> >> + >>> >> + /* Region needs to be reset. set the size to 0 and remap it. */ >>> >> + if (mem) { >>> >> + mem->size = 0; >>> >> + if (do_hvf_set_memory(mem, 0)) { >>> >> + error_report("Failed to reset overlapping slot"); >>> >> + abort(); >>> >> + } >>> >> + } >>> >> + >>> >> + if (!add) { >>> >> + return; >>> >> + } >>> >> + >>> >> + if (area->readonly || >>> >> + (!memory_region_is_ram(area) && >>> memory_region_is_romd(area))) { >>> >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>> >> + } else { >>> >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>> >> + } >>> >> + >>> >> + /* Now make a new slot. */ >>> >> + int x; >>> >> + >>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>> >> + mem = &hvf_state->slots[x]; >>> >> + if (!mem->size) { >>> >> + break; >>> >> + } >>> >> + } >>> >> + >>> >> + if (x == hvf_state->num_slots) { >>> >> + error_report("No free slots"); >>> >> + abort(); >>> >> + } >>> >> + >>> >> + mem->size = int128_get64(section->size); >>> >> + mem->mem = memory_region_get_ram_ptr(area) + >>> section->offset_within_region; >>> >> + mem->start = section->offset_within_address_space; >>> >> + mem->region = area; >>> >> + >>> >> + if (do_hvf_set_memory(mem, flags)) { >>> >> + error_report("Error registering new memory slot"); >>> >> + abort(); >>> >> + } >>> >> +} >>> >> + >>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, >>> bool on) >>> >> +{ >>> >> + hvf_slot *slot; >>> >> + >>> >> + slot = hvf_find_overlap_slot( >>> >> + section->offset_within_address_space, >>> >> + int128_get64(section->size)); >>> >> + >>> >> + /* protect region against writes; begin tracking it */ >>> >> + if (on) { >>> >> + slot->flags |= HVF_SLOT_LOG; >>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>> >> + HV_MEMORY_READ); >>> >> + /* stop tracking region*/ >>> >> + } else { >>> >> + slot->flags &= ~HVF_SLOT_LOG; >>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>> >> + HV_MEMORY_READ | HV_MEMORY_WRITE); >>> >> + } >>> >> +} >>> >> + >>> >> +static void hvf_log_start(MemoryListener *listener, >>> >> + MemoryRegionSection *section, int old, int >>> new) >>> >> +{ >>> >> + if (old != 0) { >>> >> + return; >>> >> + } >>> >> + >>> >> + hvf_set_dirty_tracking(section, 1); >>> >> +} >>> >> + >>> >> +static void hvf_log_stop(MemoryListener *listener, >>> >> + MemoryRegionSection *section, int old, int >>> new) >>> >> +{ >>> >> + if (new != 0) { >>> >> + return; >>> >> + } >>> >> + >>> >> + hvf_set_dirty_tracking(section, 0); >>> >> +} >>> >> + >>> >> +static void hvf_log_sync(MemoryListener *listener, >>> >> + MemoryRegionSection *section) >>> >> +{ >>> >> + /* >>> >> + * sync of dirty pages is handled elsewhere; just make sure we >>> keep >>> >> + * tracking the region. >>> >> + */ >>> >> + hvf_set_dirty_tracking(section, 1); >>> >> +} >>> >> + >>> >> +static void hvf_region_add(MemoryListener *listener, >>> >> + MemoryRegionSection *section) >>> >> +{ >>> >> + hvf_set_phys_mem(section, true); >>> >> +} >>> >> + >>> >> +static void hvf_region_del(MemoryListener *listener, >>> >> + MemoryRegionSection *section) >>> >> +{ >>> >> + hvf_set_phys_mem(section, false); >>> >> +} >>> >> + >>> >> +static MemoryListener hvf_memory_listener = { >>> >> + .priority = 10, >>> >> + .region_add = hvf_region_add, >>> >> + .region_del = hvf_region_del, >>> >> + .log_start = hvf_log_start, >>> >> + .log_stop = hvf_log_stop, >>> >> + .log_sync = hvf_log_sync, >>> >> +}; >>> >> + >>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, >>> run_on_cpu_data arg) >>> >> +{ >>> >> + if (!cpu->vcpu_dirty) { >>> >> + hvf_get_registers(cpu); >>> >> + cpu->vcpu_dirty = true; >>> >> + } >>> >> +} >>> >> + >>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu) >>> >> +{ >>> >> + if (!cpu->vcpu_dirty) { >>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, >>> RUN_ON_CPU_NULL); >>> >> + } >>> >> +} >>> >> + >>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>> >> + run_on_cpu_data arg) >>> >> +{ >>> >> + hvf_put_registers(cpu); >>> >> + cpu->vcpu_dirty = false; >>> >> +} >>> >> + >>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>> >> +{ >>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, >>> RUN_ON_CPU_NULL); >>> >> +} >>> >> + >>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>> >> + run_on_cpu_data arg) >>> >> +{ >>> >> + hvf_put_registers(cpu); >>> >> + cpu->vcpu_dirty = false; >>> >> +} >>> >> + >>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) >>> >> +{ >>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, >>> RUN_ON_CPU_NULL); >>> >> +} >>> >> + >>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>> >> + run_on_cpu_data arg) >>> >> +{ >>> >> + cpu->vcpu_dirty = true; >>> >> +} >>> >> + >>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>> >> +{ >>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, >>> RUN_ON_CPU_NULL); >>> >> +} >>> >> + >>> >> +static void hvf_vcpu_destroy(CPUState *cpu) >>> >> +{ >>> >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); >>> >> + assert_hvf_ok(ret); >>> >> + >>> >> + hvf_arch_vcpu_destroy(cpu); >>> >> +} >>> >> + >>> >> +static void dummy_signal(int sig) >>> >> +{ >>> >> +} >>> >> + >>> >> +static int hvf_init_vcpu(CPUState *cpu) >>> >> +{ >>> >> + int r; >>> >> + >>> >> + /* init cpu signals */ >>> >> + sigset_t set; >>> >> + struct sigaction sigact; >>> >> + >>> >> + memset(&sigact, 0, sizeof(sigact)); >>> >> + sigact.sa_handler = dummy_signal; >>> >> + sigaction(SIG_IPI, &sigact, NULL); >>> >> + >>> >> + pthread_sigmask(SIG_BLOCK, NULL, &set); >>> >> + sigdelset(&set, SIG_IPI); >>> >> + >>> >> +#ifdef __aarch64__ >>> >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t >>> **)&cpu->hvf_exit, NULL); >>> >> +#else >>> >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>> >> +#endif >>> > I think the first __aarch64__ bit fits better to arm part of the >>> series. >>> >>> >>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the >>> ARM enablement. >>> >>> >>> > >>> >> + cpu->vcpu_dirty = 1; >>> >> + assert_hvf_ok(r); >>> >> + >>> >> + return hvf_arch_init_vcpu(cpu); >>> >> +} >>> >> + >>> >> +/* >>> >> + * The HVF-specific vCPU thread function. This one should only run >>> when the host >>> >> + * CPU supports the VMX "unrestricted guest" feature. >>> >> + */ >>> >> +static void *hvf_cpu_thread_fn(void *arg) >>> >> +{ >>> >> + CPUState *cpu = arg; >>> >> + >>> >> + int r; >>> >> + >>> >> + assert(hvf_enabled()); >>> >> + >>> >> + rcu_register_thread(); >>> >> + >>> >> + qemu_mutex_lock_iothread(); >>> >> + qemu_thread_get_self(cpu->thread); >>> >> + >>> >> + cpu->thread_id = qemu_get_thread_id(); >>> >> + cpu->can_do_io = 1; >>> >> + current_cpu = cpu; >>> >> + >>> >> + hvf_init_vcpu(cpu); >>> >> + >>> >> + /* signal CPU creation */ >>> >> + cpu_thread_signal_created(cpu); >>> >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); >>> >> + >>> >> + do { >>> >> + if (cpu_can_run(cpu)) { >>> >> + r = hvf_vcpu_exec(cpu); >>> >> + if (r == EXCP_DEBUG) { >>> >> + cpu_handle_guest_debug(cpu); >>> >> + } >>> >> + } >>> >> + qemu_wait_io_event(cpu); >>> >> + } while (!cpu->unplug || cpu_can_run(cpu)); >>> >> + >>> >> + hvf_vcpu_destroy(cpu); >>> >> + cpu_thread_signal_destroyed(cpu); >>> >> + qemu_mutex_unlock_iothread(); >>> >> + rcu_unregister_thread(); >>> >> + return NULL; >>> >> +} >>> >> + >>> >> +static void hvf_start_vcpu_thread(CPUState *cpu) >>> >> +{ >>> >> + char thread_name[VCPU_THREAD_NAME_SIZE]; >>> >> + >>> >> + /* >>> >> + * HVF currently does not support TCG, and only runs in >>> >> + * unrestricted-guest mode. >>> >> + */ >>> >> + assert(hvf_enabled()); >>> >> + >>> >> + cpu->thread = g_malloc0(sizeof(QemuThread)); >>> >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>> >> + qemu_cond_init(cpu->halt_cond); >>> >> + >>> >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>> >> + cpu->cpu_index); >>> >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>> >> + cpu, QEMU_THREAD_JOINABLE); >>> >> +} >>> >> + >>> >> +static const CpusAccel hvf_cpus = { >>> >> + .create_vcpu_thread = hvf_start_vcpu_thread, >>> >> + >>> >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>> >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, >>> >> + .synchronize_state = hvf_cpu_synchronize_state, >>> >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>> >> +}; >>> >> + >>> >> +static int hvf_accel_init(MachineState *ms) >>> >> +{ >>> >> + int x; >>> >> + hv_return_t ret; >>> >> + HVFState *s; >>> >> + >>> >> + ret = hv_vm_create(HV_VM_DEFAULT); >>> >> + assert_hvf_ok(ret); >>> >> + >>> >> + s = g_new0(HVFState, 1); >>> >> + >>> >> + s->num_slots = 32; >>> >> + for (x = 0; x < s->num_slots; ++x) { >>> >> + s->slots[x].size = 0; >>> >> + s->slots[x].slot_id = x; >>> >> + } >>> >> + >>> >> + hvf_state = s; >>> >> + memory_listener_register(&hvf_memory_listener, >>> &address_space_memory); >>> >> + cpus_register_accel(&hvf_cpus); >>> >> + return 0; >>> >> +} >>> >> + >>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) >>> >> +{ >>> >> + AccelClass *ac = ACCEL_CLASS(oc); >>> >> + ac->name = "HVF"; >>> >> + ac->init_machine = hvf_accel_init; >>> >> + ac->allowed = &hvf_allowed; >>> >> +} >>> >> + >>> >> +static const TypeInfo hvf_accel_type = { >>> >> + .name = TYPE_HVF_ACCEL, >>> >> + .parent = TYPE_ACCEL, >>> >> + .class_init = hvf_accel_class_init, >>> >> +}; >>> >> + >>> >> +static void hvf_type_init(void) >>> >> +{ >>> >> + type_register_static(&hvf_accel_type); >>> >> +} >>> >> + >>> >> +type_init(hvf_type_init); >>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build >>> >> new file mode 100644 >>> >> index 0000000000..dfd6b68dc7 >>> >> --- /dev/null >>> >> +++ b/accel/hvf/meson.build >>> >> @@ -0,0 +1,7 @@ >>> >> +hvf_ss = ss.source_set() >>> >> +hvf_ss.add(files( >>> >> + 'hvf-all.c', >>> >> + 'hvf-cpus.c', >>> >> +)) >>> >> + >>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) >>> >> diff --git a/accel/meson.build b/accel/meson.build >>> >> index b26cca227a..6de12ce5d5 100644 >>> >> --- a/accel/meson.build >>> >> +++ b/accel/meson.build >>> >> @@ -1,5 +1,6 @@ >>> >> softmmu_ss.add(files('accel.c')) >>> >> >>> >> +subdir('hvf') >>> >> subdir('qtest') >>> >> subdir('kvm') >>> >> subdir('tcg') >>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h >>> >> new file mode 100644 >>> >> index 0000000000..de9bad23a8 >>> >> --- /dev/null >>> >> +++ b/include/sysemu/hvf_int.h >>> >> @@ -0,0 +1,69 @@ >>> >> +/* >>> >> + * QEMU Hypervisor.framework (HVF) support >>> >> + * >>> >> + * This work is licensed under the terms of the GNU GPL, version 2 >>> or later. >>> >> + * See the COPYING file in the top-level directory. >>> >> + * >>> >> + */ >>> >> + >>> >> +/* header to be included in HVF-specific code */ >>> >> + >>> >> +#ifndef HVF_INT_H >>> >> +#define HVF_INT_H >>> >> + >>> >> +#include <Hypervisor/Hypervisor.h> >>> >> + >>> >> +#define HVF_MAX_VCPU 0x10 >>> >> + >>> >> +extern struct hvf_state hvf_global; >>> >> + >>> >> +struct hvf_vm { >>> >> + int id; >>> >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>> >> +}; >>> >> + >>> >> +struct hvf_state { >>> >> + uint32_t version; >>> >> + struct hvf_vm *vm; >>> >> + uint64_t mem_quota; >>> >> +}; >>> >> + >>> >> +/* hvf_slot flags */ >>> >> +#define HVF_SLOT_LOG (1 << 0) >>> >> + >>> >> +typedef struct hvf_slot { >>> >> + uint64_t start; >>> >> + uint64_t size; >>> >> + uint8_t *mem; >>> >> + int slot_id; >>> >> + uint32_t flags; >>> >> + MemoryRegion *region; >>> >> +} hvf_slot; >>> >> + >>> >> +typedef struct hvf_vcpu_caps { >>> >> + uint64_t vmx_cap_pinbased; >>> >> + uint64_t vmx_cap_procbased; >>> >> + uint64_t vmx_cap_procbased2; >>> >> + uint64_t vmx_cap_entry; >>> >> + uint64_t vmx_cap_exit; >>> >> + uint64_t vmx_cap_preemption_timer; >>> >> +} hvf_vcpu_caps; >>> >> + >>> >> +struct HVFState { >>> >> + AccelState parent; >>> >> + hvf_slot slots[32]; >>> >> + int num_slots; >>> >> + >>> >> + hvf_vcpu_caps *hvf_caps; >>> >> +}; >>> >> +extern HVFState *hvf_state; >>> >> + >>> >> +void assert_hvf_ok(hv_return_t ret); >>> >> +int hvf_get_registers(CPUState *cpu); >>> >> +int hvf_put_registers(CPUState *cpu); >>> >> +int hvf_arch_init_vcpu(CPUState *cpu); >>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu); >>> >> +int hvf_vcpu_exec(CPUState *cpu); >>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>> >> + >>> >> +#endif >>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c >>> >> deleted file mode 100644 >>> >> index 817b3d7452..0000000000 >>> >> --- a/target/i386/hvf/hvf-cpus.c >>> >> +++ /dev/null >>> >> @@ -1,131 +0,0 @@ >>> >> -/* >>> >> - * Copyright 2008 IBM Corporation >>> >> - * 2008 Red Hat, Inc. >>> >> - * Copyright 2011 Intel Corporation >>> >> - * Copyright 2016 Veertu, Inc. >>> >> - * Copyright 2017 The Android Open Source Project >>> >> - * >>> >> - * QEMU Hypervisor.framework support >>> >> - * >>> >> - * This program is free software; you can redistribute it and/or >>> >> - * modify it under the terms of version 2 of the GNU General Public >>> >> - * License as published by the Free Software Foundation. >>> >> - * >>> >> - * This program is distributed in the hope that it will be useful, >>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> >> - * General Public License for more details. >>> >> - * >>> >> - * You should have received a copy of the GNU General Public License >>> >> - * along with this program; if not, see < >>> http://www.gnu.org/licenses/>. >>> >> - * >>> >> - * This file contain code under public domain from the hvdos project: >>> >> - * https://github.com/mist64/hvdos >>> >> - * >>> >> - * Parts Copyright (c) 2011 NetApp, Inc. >>> >> - * All rights reserved. >>> >> - * >>> >> - * Redistribution and use in source and binary forms, with or without >>> >> - * modification, are permitted provided that the following conditions >>> >> - * are met: >>> >> - * 1. Redistributions of source code must retain the above copyright >>> >> - * notice, this list of conditions and the following disclaimer. >>> >> - * 2. Redistributions in binary form must reproduce the above >>> copyright >>> >> - * notice, this list of conditions and the following disclaimer >>> in the >>> >> - * documentation and/or other materials provided with the >>> distribution. >>> >> - * >>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >>> THE >>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A >>> PARTICULAR PURPOSE >>> >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE >>> LIABLE >>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >>> CONSEQUENTIAL >>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE >>> GOODS >>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >>> INTERRUPTION) >>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >>> CONTRACT, STRICT >>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN >>> ANY WAY >>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >>> POSSIBILITY OF >>> >> - * SUCH DAMAGE. >>> >> - */ >>> >> - >>> >> -#include "qemu/osdep.h" >>> >> -#include "qemu/error-report.h" >>> >> -#include "qemu/main-loop.h" >>> >> -#include "sysemu/hvf.h" >>> >> -#include "sysemu/runstate.h" >>> >> -#include "target/i386/cpu.h" >>> >> -#include "qemu/guest-random.h" >>> >> - >>> >> -#include "hvf-cpus.h" >>> >> - >>> >> -/* >>> >> - * The HVF-specific vCPU thread function. This one should only run >>> when the host >>> >> - * CPU supports the VMX "unrestricted guest" feature. >>> >> - */ >>> >> -static void *hvf_cpu_thread_fn(void *arg) >>> >> -{ >>> >> - CPUState *cpu = arg; >>> >> - >>> >> - int r; >>> >> - >>> >> - assert(hvf_enabled()); >>> >> - >>> >> - rcu_register_thread(); >>> >> - >>> >> - qemu_mutex_lock_iothread(); >>> >> - qemu_thread_get_self(cpu->thread); >>> >> - >>> >> - cpu->thread_id = qemu_get_thread_id(); >>> >> - cpu->can_do_io = 1; >>> >> - current_cpu = cpu; >>> >> - >>> >> - hvf_init_vcpu(cpu); >>> >> - >>> >> - /* signal CPU creation */ >>> >> - cpu_thread_signal_created(cpu); >>> >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); >>> >> - >>> >> - do { >>> >> - if (cpu_can_run(cpu)) { >>> >> - r = hvf_vcpu_exec(cpu); >>> >> - if (r == EXCP_DEBUG) { >>> >> - cpu_handle_guest_debug(cpu); >>> >> - } >>> >> - } >>> >> - qemu_wait_io_event(cpu); >>> >> - } while (!cpu->unplug || cpu_can_run(cpu)); >>> >> - >>> >> - hvf_vcpu_destroy(cpu); >>> >> - cpu_thread_signal_destroyed(cpu); >>> >> - qemu_mutex_unlock_iothread(); >>> >> - rcu_unregister_thread(); >>> >> - return NULL; >>> >> -} >>> >> - >>> >> -static void hvf_start_vcpu_thread(CPUState *cpu) >>> >> -{ >>> >> - char thread_name[VCPU_THREAD_NAME_SIZE]; >>> >> - >>> >> - /* >>> >> - * HVF currently does not support TCG, and only runs in >>> >> - * unrestricted-guest mode. >>> >> - */ >>> >> - assert(hvf_enabled()); >>> >> - >>> >> - cpu->thread = g_malloc0(sizeof(QemuThread)); >>> >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>> >> - qemu_cond_init(cpu->halt_cond); >>> >> - >>> >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>> >> - cpu->cpu_index); >>> >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>> >> - cpu, QEMU_THREAD_JOINABLE); >>> >> -} >>> >> - >>> >> -const CpusAccel hvf_cpus = { >>> >> - .create_vcpu_thread = hvf_start_vcpu_thread, >>> >> - >>> >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>> >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, >>> >> - .synchronize_state = hvf_cpu_synchronize_state, >>> >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>> >> -}; >>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h >>> >> deleted file mode 100644 >>> >> index ced31b82c0..0000000000 >>> >> --- a/target/i386/hvf/hvf-cpus.h >>> >> +++ /dev/null >>> >> @@ -1,25 +0,0 @@ >>> >> -/* >>> >> - * Accelerator CPUS Interface >>> >> - * >>> >> - * Copyright 2020 SUSE LLC >>> >> - * >>> >> - * This work is licensed under the terms of the GNU GPL, version 2 >>> or later. >>> >> - * See the COPYING file in the top-level directory. >>> >> - */ >>> >> - >>> >> -#ifndef HVF_CPUS_H >>> >> -#define HVF_CPUS_H >>> >> - >>> >> -#include "sysemu/cpus.h" >>> >> - >>> >> -extern const CpusAccel hvf_cpus; >>> >> - >>> >> -int hvf_init_vcpu(CPUState *); >>> >> -int hvf_vcpu_exec(CPUState *); >>> >> -void hvf_cpu_synchronize_state(CPUState *); >>> >> -void hvf_cpu_synchronize_post_reset(CPUState *); >>> >> -void hvf_cpu_synchronize_post_init(CPUState *); >>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); >>> >> -void hvf_vcpu_destroy(CPUState *); >>> >> - >>> >> -#endif /* HVF_CPUS_H */ >>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h >>> >> index e0edffd077..6d56f8f6bb 100644 >>> >> --- a/target/i386/hvf/hvf-i386.h >>> >> +++ b/target/i386/hvf/hvf-i386.h >>> >> @@ -18,57 +18,11 @@ >>> >> >>> >> #include "sysemu/accel.h" >>> >> #include "sysemu/hvf.h" >>> >> +#include "sysemu/hvf_int.h" >>> >> #include "cpu.h" >>> >> #include "x86.h" >>> >> >>> >> -#define HVF_MAX_VCPU 0x10 >>> >> - >>> >> -extern struct hvf_state hvf_global; >>> >> - >>> >> -struct hvf_vm { >>> >> - int id; >>> >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>> >> -}; >>> >> - >>> >> -struct hvf_state { >>> >> - uint32_t version; >>> >> - struct hvf_vm *vm; >>> >> - uint64_t mem_quota; >>> >> -}; >>> >> - >>> >> -/* hvf_slot flags */ >>> >> -#define HVF_SLOT_LOG (1 << 0) >>> >> - >>> >> -typedef struct hvf_slot { >>> >> - uint64_t start; >>> >> - uint64_t size; >>> >> - uint8_t *mem; >>> >> - int slot_id; >>> >> - uint32_t flags; >>> >> - MemoryRegion *region; >>> >> -} hvf_slot; >>> >> - >>> >> -typedef struct hvf_vcpu_caps { >>> >> - uint64_t vmx_cap_pinbased; >>> >> - uint64_t vmx_cap_procbased; >>> >> - uint64_t vmx_cap_procbased2; >>> >> - uint64_t vmx_cap_entry; >>> >> - uint64_t vmx_cap_exit; >>> >> - uint64_t vmx_cap_preemption_timer; >>> >> -} hvf_vcpu_caps; >>> >> - >>> >> -struct HVFState { >>> >> - AccelState parent; >>> >> - hvf_slot slots[32]; >>> >> - int num_slots; >>> >> - >>> >> - hvf_vcpu_caps *hvf_caps; >>> >> -}; >>> >> -extern HVFState *hvf_state; >>> >> - >>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); >>> >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); >>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>> >> >>> >> #ifdef NEED_CPU_H >>> >> /* Functions exported to host specific mode */ >>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c >>> >> index ed9356565c..8b96ecd619 100644 >>> >> --- a/target/i386/hvf/hvf.c >>> >> +++ b/target/i386/hvf/hvf.c >>> >> @@ -51,6 +51,7 @@ >>> >> #include "qemu/error-report.h" >>> >> >>> >> #include "sysemu/hvf.h" >>> >> +#include "sysemu/hvf_int.h" >>> >> #include "sysemu/runstate.h" >>> >> #include "hvf-i386.h" >>> >> #include "vmcs.h" >>> >> @@ -72,171 +73,6 @@ >>> >> #include "sysemu/accel.h" >>> >> #include "target/i386/cpu.h" >>> >> >>> >> -#include "hvf-cpus.h" >>> >> - >>> >> -HVFState *hvf_state; >>> >> - >>> >> -static void assert_hvf_ok(hv_return_t ret) >>> >> -{ >>> >> - if (ret == HV_SUCCESS) { >>> >> - return; >>> >> - } >>> >> - >>> >> - switch (ret) { >>> >> - case HV_ERROR: >>> >> - error_report("Error: HV_ERROR"); >>> >> - break; >>> >> - case HV_BUSY: >>> >> - error_report("Error: HV_BUSY"); >>> >> - break; >>> >> - case HV_BAD_ARGUMENT: >>> >> - error_report("Error: HV_BAD_ARGUMENT"); >>> >> - break; >>> >> - case HV_NO_RESOURCES: >>> >> - error_report("Error: HV_NO_RESOURCES"); >>> >> - break; >>> >> - case HV_NO_DEVICE: >>> >> - error_report("Error: HV_NO_DEVICE"); >>> >> - break; >>> >> - case HV_UNSUPPORTED: >>> >> - error_report("Error: HV_UNSUPPORTED"); >>> >> - break; >>> >> - default: >>> >> - error_report("Unknown Error"); >>> >> - } >>> >> - >>> >> - abort(); >>> >> -} >>> >> - >>> >> -/* Memory slots */ >>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>> >> -{ >>> >> - hvf_slot *slot; >>> >> - int x; >>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>> >> - slot = &hvf_state->slots[x]; >>> >> - if (slot->size && start < (slot->start + slot->size) && >>> >> - (start + size) > slot->start) { >>> >> - return slot; >>> >> - } >>> >> - } >>> >> - return NULL; >>> >> -} >>> >> - >>> >> -struct mac_slot { >>> >> - int present; >>> >> - uint64_t size; >>> >> - uint64_t gpa_start; >>> >> - uint64_t gva; >>> >> -}; >>> >> - >>> >> -struct mac_slot mac_slots[32]; >>> >> - >>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>> >> -{ >>> >> - struct mac_slot *macslot; >>> >> - hv_return_t ret; >>> >> - >>> >> - macslot = &mac_slots[slot->slot_id]; >>> >> - >>> >> - if (macslot->present) { >>> >> - if (macslot->size != slot->size) { >>> >> - macslot->present = 0; >>> >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>> >> - assert_hvf_ok(ret); >>> >> - } >>> >> - } >>> >> - >>> >> - if (!slot->size) { >>> >> - return 0; >>> >> - } >>> >> - >>> >> - macslot->present = 1; >>> >> - macslot->gpa_start = slot->start; >>> >> - macslot->size = slot->size; >>> >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, >>> flags); >>> >> - assert_hvf_ok(ret); >>> >> - return 0; >>> >> -} >>> >> - >>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>> >> -{ >>> >> - hvf_slot *mem; >>> >> - MemoryRegion *area = section->mr; >>> >> - bool writeable = !area->readonly && !area->rom_device; >>> >> - hv_memory_flags_t flags; >>> >> - >>> >> - if (!memory_region_is_ram(area)) { >>> >> - if (writeable) { >>> >> - return; >>> >> - } else if (!memory_region_is_romd(area)) { >>> >> - /* >>> >> - * If the memory device is not in romd_mode, then we >>> actually want >>> >> - * to remove the hvf memory slot so all accesses will >>> trap. >>> >> - */ >>> >> - add = false; >>> >> - } >>> >> - } >>> >> - >>> >> - mem = hvf_find_overlap_slot( >>> >> - section->offset_within_address_space, >>> >> - int128_get64(section->size)); >>> >> - >>> >> - if (mem && add) { >>> >> - if (mem->size == int128_get64(section->size) && >>> >> - mem->start == section->offset_within_address_space && >>> >> - mem->mem == (memory_region_get_ram_ptr(area) + >>> >> - section->offset_within_region)) { >>> >> - return; /* Same region was attempted to register, go >>> away. */ >>> >> - } >>> >> - } >>> >> - >>> >> - /* Region needs to be reset. set the size to 0 and remap it. */ >>> >> - if (mem) { >>> >> - mem->size = 0; >>> >> - if (do_hvf_set_memory(mem, 0)) { >>> >> - error_report("Failed to reset overlapping slot"); >>> >> - abort(); >>> >> - } >>> >> - } >>> >> - >>> >> - if (!add) { >>> >> - return; >>> >> - } >>> >> - >>> >> - if (area->readonly || >>> >> - (!memory_region_is_ram(area) && >>> memory_region_is_romd(area))) { >>> >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>> >> - } else { >>> >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>> >> - } >>> >> - >>> >> - /* Now make a new slot. */ >>> >> - int x; >>> >> - >>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>> >> - mem = &hvf_state->slots[x]; >>> >> - if (!mem->size) { >>> >> - break; >>> >> - } >>> >> - } >>> >> - >>> >> - if (x == hvf_state->num_slots) { >>> >> - error_report("No free slots"); >>> >> - abort(); >>> >> - } >>> >> - >>> >> - mem->size = int128_get64(section->size); >>> >> - mem->mem = memory_region_get_ram_ptr(area) + >>> section->offset_within_region; >>> >> - mem->start = section->offset_within_address_space; >>> >> - mem->region = area; >>> >> - >>> >> - if (do_hvf_set_memory(mem, flags)) { >>> >> - error_report("Error registering new memory slot"); >>> >> - abort(); >>> >> - } >>> >> -} >>> >> - >>> >> void vmx_update_tpr(CPUState *cpu) >>> >> { >>> >> /* TODO: need integrate APIC handling */ >>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t >>> port, void *buffer, >>> >> } >>> >> } >>> >> >>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, >>> run_on_cpu_data arg) >>> >> -{ >>> >> - if (!cpu->vcpu_dirty) { >>> >> - hvf_get_registers(cpu); >>> >> - cpu->vcpu_dirty = true; >>> >> - } >>> >> -} >>> >> - >>> >> -void hvf_cpu_synchronize_state(CPUState *cpu) >>> >> -{ >>> >> - if (!cpu->vcpu_dirty) { >>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, >>> RUN_ON_CPU_NULL); >>> >> - } >>> >> -} >>> >> - >>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>> >> - run_on_cpu_data arg) >>> >> -{ >>> >> - hvf_put_registers(cpu); >>> >> - cpu->vcpu_dirty = false; >>> >> -} >>> >> - >>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>> >> -{ >>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, >>> RUN_ON_CPU_NULL); >>> >> -} >>> >> - >>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>> >> - run_on_cpu_data arg) >>> >> -{ >>> >> - hvf_put_registers(cpu); >>> >> - cpu->vcpu_dirty = false; >>> >> -} >>> >> - >>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) >>> >> -{ >>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, >>> RUN_ON_CPU_NULL); >>> >> -} >>> >> - >>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>> >> - run_on_cpu_data arg) >>> >> -{ >>> >> - cpu->vcpu_dirty = true; >>> >> -} >>> >> - >>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>> >> -{ >>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, >>> RUN_ON_CPU_NULL); >>> >> -} >>> >> - >>> >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, >>> uint64_t ept_qual) >>> >> { >>> >> int read, write; >>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot >>> *slot, uint64_t gpa, uint64_t ept_qual) >>> >> return false; >>> >> } >>> >> >>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, >>> bool on) >>> >> -{ >>> >> - hvf_slot *slot; >>> >> - >>> >> - slot = hvf_find_overlap_slot( >>> >> - section->offset_within_address_space, >>> >> - int128_get64(section->size)); >>> >> - >>> >> - /* protect region against writes; begin tracking it */ >>> >> - if (on) { >>> >> - slot->flags |= HVF_SLOT_LOG; >>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>> >> - HV_MEMORY_READ); >>> >> - /* stop tracking region*/ >>> >> - } else { >>> >> - slot->flags &= ~HVF_SLOT_LOG; >>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>> >> - HV_MEMORY_READ | HV_MEMORY_WRITE); >>> >> - } >>> >> -} >>> >> - >>> >> -static void hvf_log_start(MemoryListener *listener, >>> >> - MemoryRegionSection *section, int old, int >>> new) >>> >> -{ >>> >> - if (old != 0) { >>> >> - return; >>> >> - } >>> >> - >>> >> - hvf_set_dirty_tracking(section, 1); >>> >> -} >>> >> - >>> >> -static void hvf_log_stop(MemoryListener *listener, >>> >> - MemoryRegionSection *section, int old, int >>> new) >>> >> -{ >>> >> - if (new != 0) { >>> >> - return; >>> >> - } >>> >> - >>> >> - hvf_set_dirty_tracking(section, 0); >>> >> -} >>> >> - >>> >> -static void hvf_log_sync(MemoryListener *listener, >>> >> - MemoryRegionSection *section) >>> >> -{ >>> >> - /* >>> >> - * sync of dirty pages is handled elsewhere; just make sure we >>> keep >>> >> - * tracking the region. >>> >> - */ >>> >> - hvf_set_dirty_tracking(section, 1); >>> >> -} >>> >> - >>> >> -static void hvf_region_add(MemoryListener *listener, >>> >> - MemoryRegionSection *section) >>> >> -{ >>> >> - hvf_set_phys_mem(section, true); >>> >> -} >>> >> - >>> >> -static void hvf_region_del(MemoryListener *listener, >>> >> - MemoryRegionSection *section) >>> >> -{ >>> >> - hvf_set_phys_mem(section, false); >>> >> -} >>> >> - >>> >> -static MemoryListener hvf_memory_listener = { >>> >> - .priority = 10, >>> >> - .region_add = hvf_region_add, >>> >> - .region_del = hvf_region_del, >>> >> - .log_start = hvf_log_start, >>> >> - .log_stop = hvf_log_stop, >>> >> - .log_sync = hvf_log_sync, >>> >> -}; >>> >> - >>> >> -void hvf_vcpu_destroy(CPUState *cpu) >>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu) >>> >> { >>> >> X86CPU *x86_cpu = X86_CPU(cpu); >>> >> CPUX86State *env = &x86_cpu->env; >>> >> >>> >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); >>> >> g_free(env->hvf_mmio_buf); >>> >> - assert_hvf_ok(ret); >>> >> -} >>> >> - >>> >> -static void dummy_signal(int sig) >>> >> -{ >>> >> } >>> >> >>> >> -int hvf_init_vcpu(CPUState *cpu) >>> >> +int hvf_arch_init_vcpu(CPUState *cpu) >>> >> { >>> >> >>> >> X86CPU *x86cpu = X86_CPU(cpu); >>> >> CPUX86State *env = &x86cpu->env; >>> >> - int r; >>> >> - >>> >> - /* init cpu signals */ >>> >> - sigset_t set; >>> >> - struct sigaction sigact; >>> >> - >>> >> - memset(&sigact, 0, sizeof(sigact)); >>> >> - sigact.sa_handler = dummy_signal; >>> >> - sigaction(SIG_IPI, &sigact, NULL); >>> >> - >>> >> - pthread_sigmask(SIG_BLOCK, NULL, &set); >>> >> - sigdelset(&set, SIG_IPI); >>> >> >>> >> init_emu(); >>> >> init_decoder(); >>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) >>> >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); >>> >> env->hvf_mmio_buf = g_new(char, 4096); >>> >> >>> >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>> >> - cpu->vcpu_dirty = 1; >>> >> - assert_hvf_ok(r); >>> >> - >>> >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, >>> >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { >>> >> abort(); >>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) >>> >> >>> >> return ret; >>> >> } >>> >> - >>> >> -bool hvf_allowed; >>> >> - >>> >> -static int hvf_accel_init(MachineState *ms) >>> >> -{ >>> >> - int x; >>> >> - hv_return_t ret; >>> >> - HVFState *s; >>> >> - >>> >> - ret = hv_vm_create(HV_VM_DEFAULT); >>> >> - assert_hvf_ok(ret); >>> >> - >>> >> - s = g_new0(HVFState, 1); >>> >> - >>> >> - s->num_slots = 32; >>> >> - for (x = 0; x < s->num_slots; ++x) { >>> >> - s->slots[x].size = 0; >>> >> - s->slots[x].slot_id = x; >>> >> - } >>> >> - >>> >> - hvf_state = s; >>> >> - memory_listener_register(&hvf_memory_listener, >>> &address_space_memory); >>> >> - cpus_register_accel(&hvf_cpus); >>> >> - return 0; >>> >> -} >>> >> - >>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) >>> >> -{ >>> >> - AccelClass *ac = ACCEL_CLASS(oc); >>> >> - ac->name = "HVF"; >>> >> - ac->init_machine = hvf_accel_init; >>> >> - ac->allowed = &hvf_allowed; >>> >> -} >>> >> - >>> >> -static const TypeInfo hvf_accel_type = { >>> >> - .name = TYPE_HVF_ACCEL, >>> >> - .parent = TYPE_ACCEL, >>> >> - .class_init = hvf_accel_class_init, >>> >> -}; >>> >> - >>> >> -static void hvf_type_init(void) >>> >> -{ >>> >> - type_register_static(&hvf_accel_type); >>> >> -} >>> >> - >>> >> -type_init(hvf_type_init); >>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build >>> >> index 409c9a3f14..c8a43717ee 100644 >>> >> --- a/target/i386/hvf/meson.build >>> >> +++ b/target/i386/hvf/meson.build >>> >> @@ -1,6 +1,5 @@ >>> >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( >>> >> 'hvf.c', >>> >> - 'hvf-cpus.c', >>> >> 'x86.c', >>> >> 'x86_cpuid.c', >>> >> 'x86_decode.c', >>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c >>> >> index bbec412b6c..89b8e9d87a 100644 >>> >> --- a/target/i386/hvf/x86hvf.c >>> >> +++ b/target/i386/hvf/x86hvf.c >>> >> @@ -20,6 +20,9 @@ >>> >> #include "qemu/osdep.h" >>> >> >>> >> #include "qemu-common.h" >>> >> +#include "sysemu/hvf.h" >>> >> +#include "sysemu/hvf_int.h" >>> >> +#include "sysemu/hw_accel.h" >>> >> #include "x86hvf.h" >>> >> #include "vmx.h" >>> >> #include "vmcs.h" >>> >> @@ -32,8 +35,6 @@ >>> >> #include <Hypervisor/hv.h> >>> >> #include <Hypervisor/hv_vmx.h> >>> >> >>> >> -#include "hvf-cpus.h" >>> >> - >>> >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment >>> *vmx_seg, >>> >> SegmentCache *qseg, bool is_tr) >>> >> { >>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) >>> >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); >>> >> >>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { >>> >> - hvf_cpu_synchronize_state(cpu_state); >>> >> + cpu_synchronize_state(cpu_state); >>> >> do_cpu_init(cpu); >>> >> } >>> >> >>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) >>> >> cpu_state->halted = 0; >>> >> } >>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { >>> >> - hvf_cpu_synchronize_state(cpu_state); >>> >> + cpu_synchronize_state(cpu_state); >>> >> do_cpu_sipi(cpu); >>> >> } >>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { >>> >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; >>> >> - hvf_cpu_synchronize_state(cpu_state); >>> >> + cpu_synchronize_state(cpu_state); >>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should >>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing >>> the >>> > summer. >>> >>> >>> The only reason they're in here is because we no longer have access to >>> the hvf_ functions from the file. I am perfectly happy to rebase the >>> patch on top of Claudio's if his goes in first. I'm sure it'll be >>> trivial for him to rebase on top of this too if my series goes in first. >>> >>> >>> > >>> > Phillipe raised the idea that the patch might go ahead of ARM-specific >>> > part (which might involve some discussions) and I agree with that. >>> > >>> > Some sync between Claudio series (CC'd him) and the patch might be >>> need. >>> >>> >>> I would prefer not to hold back because of the sync. Claudio's cleanup >>> is trivial enough to adjust for if it gets merged ahead of this. >>> >>> >>> Alex >>> >>> >>> >>>
On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > > > > On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >> >> Hi Frank, >> >> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > > Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! >> >> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >> >> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >> > > Thanks, we'll take a look :) > >> >> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so I'll reply to your patch here. You have: + /* Set cpu->hvf->sleeping so that we get a SIG_IPI signal. */ + cpu->hvf->sleeping = true; + smp_mb(); + + /* Bail out if we received an IRQ meanwhile */ + if (cpu->thread_kicked || (cpu->interrupt_request & + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { + cpu->hvf->sleeping = false; + break; + } + + /* nanosleep returns on signal, so we wake up on kick. */ + nanosleep(ts, NULL); and then send the signal conditional on whether sleeping is true, but I think this is racy. If the signal is sent after sleeping is set to true but before entering nanosleep then I think it will be ignored and we will miss the wakeup. That's why in my implementation I block IPI on the CPU thread at startup and then use pselect to atomically unblock and begin sleeping. The signal is sent unconditionally so there's no need to worry about races between actually sleeping and the "we think we're sleeping" state. It may lead to an extra wakeup but that's better than missing it entirely. Peter >> >> Also, is there a particular reason you're working on this super interesting and useful code in a random downstream fork of QEMU? Wouldn't it be more helpful to contribute to the upstream code base instead? > > We'd actually like to contribute upstream too :) We do want to maintain our own downstream though; Android Emulator codebase needs to work solidly on macos and windows which has made keeping up with upstream difficult, and staying on a previous version (2.12) with known quirks easier. (theres also some android related customization relating to Qt Ui + different set of virtual devices and snapshot support (incl. snapshots of graphics devices with OpenGLES state tracking), which we hope to separate into other libraries/processes, but its not insignificant) >> >> >> Alex >> >> >> On 30.11.20 21:15, Frank Yang wrote: >> >> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But the high CPU usage seems to be mitigated by having a poll interval (like KVM does) in handling WFI: >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 >> >> This is loosely inspired by https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 which does seem to specify a poll interval. >> >> It would be cool if we could have a lightweight way to enter sleep and restart the vcpus precisely when CVAL passes, though. >> >> Frank >> >> >> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote: >>> >>> Hi all, >>> >>> +Peter Collingbourne >>> >>> I'm a developer on the Android Emulator, which is in a fork of QEMU. >>> >>> Peter and I have been working on an HVF Apple Silicon backend with an eye toward Android guests. >>> >>> We have gotten things to basically switch to Android userspace already (logcat/shell and graphics available at least) >>> >>> Our strategy so far has been to import logic from the KVM implementation and hook into QEMU's software devices that previously assumed to only work with TCG, or have KVM-specific paths. >>> >>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; our way of addressing this is to still allow highmem but not put pci high mmio so high. >>> >>> Also, note we have a sleep/signal based mechanism to deal with WFx, which might be worth looking into in Alexander's implementation as well: >>> >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 >>> >>> Patches so far, FYI: >>> >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 >>> >>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a >>> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b >>> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 >>> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 >>> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 >>> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 >>> >>> Peter's also noticed that there are extra steps needed for M1's to allow TCG to work, as it involves JIT: >>> >>> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 >>> >>> We'd appreciate any feedback/comments :) >>> >>> Best, >>> >>> Frank >>> >>> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote: >>>> >>>> >>>> On 27.11.20 21:00, Roman Bolshakov wrote: >>>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: >>>> >> Until now, Hypervisor.framework has only been available on x86_64 systems. >>>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To >>>> >> prepare for support for multiple architectures, let's move common code out >>>> >> into its own accel directory. >>>> >> >>>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de> >>>> >> --- >>>> >> MAINTAINERS | 9 +- >>>> >> accel/hvf/hvf-all.c | 56 +++++ >>>> >> accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ >>>> >> accel/hvf/meson.build | 7 + >>>> >> accel/meson.build | 1 + >>>> >> include/sysemu/hvf_int.h | 69 ++++++ >>>> >> target/i386/hvf/hvf-cpus.c | 131 ---------- >>>> >> target/i386/hvf/hvf-cpus.h | 25 -- >>>> >> target/i386/hvf/hvf-i386.h | 48 +--- >>>> >> target/i386/hvf/hvf.c | 360 +-------------------------- >>>> >> target/i386/hvf/meson.build | 1 - >>>> >> target/i386/hvf/x86hvf.c | 11 +- >>>> >> target/i386/hvf/x86hvf.h | 2 - >>>> >> 13 files changed, 619 insertions(+), 569 deletions(-) >>>> >> create mode 100644 accel/hvf/hvf-all.c >>>> >> create mode 100644 accel/hvf/hvf-cpus.c >>>> >> create mode 100644 accel/hvf/meson.build >>>> >> create mode 100644 include/sysemu/hvf_int.h >>>> >> delete mode 100644 target/i386/hvf/hvf-cpus.c >>>> >> delete mode 100644 target/i386/hvf/hvf-cpus.h >>>> >> >>>> >> diff --git a/MAINTAINERS b/MAINTAINERS >>>> >> index 68bc160f41..ca4b6d9279 100644 >>>> >> --- a/MAINTAINERS >>>> >> +++ b/MAINTAINERS >>>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> >>>> >> M: Roman Bolshakov <r.bolshakov@yadro.com> >>>> >> W: https://wiki.qemu.org/Features/HVF >>>> >> S: Maintained >>>> >> -F: accel/stubs/hvf-stub.c >>>> > There was a patch for that in the RFC series from Claudio. >>>> >>>> >>>> Yeah, I'm not worried about this hunk :). >>>> >>>> >>>> > >>>> >> F: target/i386/hvf/ >>>> >> + >>>> >> +HVF >>>> >> +M: Cameron Esfahani <dirty@apple.com> >>>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com> >>>> >> +W: https://wiki.qemu.org/Features/HVF >>>> >> +S: Maintained >>>> >> +F: accel/hvf/ >>>> >> F: include/sysemu/hvf.h >>>> >> +F: include/sysemu/hvf_int.h >>>> >> >>>> >> WHPX CPUs >>>> >> M: Sunil Muthuswamy <sunilmut@microsoft.com> >>>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c >>>> >> new file mode 100644 >>>> >> index 0000000000..47d77a472a >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/hvf-all.c >>>> >> @@ -0,0 +1,56 @@ >>>> >> +/* >>>> >> + * QEMU Hypervisor.framework support >>>> >> + * >>>> >> + * This work is licensed under the terms of the GNU GPL, version 2. See >>>> >> + * the COPYING file in the top-level directory. >>>> >> + * >>>> >> + * Contributions after 2012-01-13 are licensed under the terms of the >>>> >> + * GNU GPL, version 2 or (at your option) any later version. >>>> >> + */ >>>> >> + >>>> >> +#include "qemu/osdep.h" >>>> >> +#include "qemu-common.h" >>>> >> +#include "qemu/error-report.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/runstate.h" >>>> >> + >>>> >> +#include "qemu/main-loop.h" >>>> >> +#include "sysemu/accel.h" >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +bool hvf_allowed; >>>> >> +HVFState *hvf_state; >>>> >> + >>>> >> +void assert_hvf_ok(hv_return_t ret) >>>> >> +{ >>>> >> + if (ret == HV_SUCCESS) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + switch (ret) { >>>> >> + case HV_ERROR: >>>> >> + error_report("Error: HV_ERROR"); >>>> >> + break; >>>> >> + case HV_BUSY: >>>> >> + error_report("Error: HV_BUSY"); >>>> >> + break; >>>> >> + case HV_BAD_ARGUMENT: >>>> >> + error_report("Error: HV_BAD_ARGUMENT"); >>>> >> + break; >>>> >> + case HV_NO_RESOURCES: >>>> >> + error_report("Error: HV_NO_RESOURCES"); >>>> >> + break; >>>> >> + case HV_NO_DEVICE: >>>> >> + error_report("Error: HV_NO_DEVICE"); >>>> >> + break; >>>> >> + case HV_UNSUPPORTED: >>>> >> + error_report("Error: HV_UNSUPPORTED"); >>>> >> + break; >>>> >> + default: >>>> >> + error_report("Unknown Error"); >>>> >> + } >>>> >> + >>>> >> + abort(); >>>> >> +} >>>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c >>>> >> new file mode 100644 >>>> >> index 0000000000..f9bb5502b7 >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/hvf-cpus.c >>>> >> @@ -0,0 +1,468 @@ >>>> >> +/* >>>> >> + * Copyright 2008 IBM Corporation >>>> >> + * 2008 Red Hat, Inc. >>>> >> + * Copyright 2011 Intel Corporation >>>> >> + * Copyright 2016 Veertu, Inc. >>>> >> + * Copyright 2017 The Android Open Source Project >>>> >> + * >>>> >> + * QEMU Hypervisor.framework support >>>> >> + * >>>> >> + * This program is free software; you can redistribute it and/or >>>> >> + * modify it under the terms of version 2 of the GNU General Public >>>> >> + * License as published by the Free Software Foundation. >>>> >> + * >>>> >> + * This program is distributed in the hope that it will be useful, >>>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>> >> + * General Public License for more details. >>>> >> + * >>>> >> + * You should have received a copy of the GNU General Public License >>>> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>. >>>> >> + * >>>> >> + * This file contain code under public domain from the hvdos project: >>>> >> + * https://github.com/mist64/hvdos >>>> >> + * >>>> >> + * Parts Copyright (c) 2011 NetApp, Inc. >>>> >> + * All rights reserved. >>>> >> + * >>>> >> + * Redistribution and use in source and binary forms, with or without >>>> >> + * modification, are permitted provided that the following conditions >>>> >> + * are met: >>>> >> + * 1. Redistributions of source code must retain the above copyright >>>> >> + * notice, this list of conditions and the following disclaimer. >>>> >> + * 2. Redistributions in binary form must reproduce the above copyright >>>> >> + * notice, this list of conditions and the following disclaimer in the >>>> >> + * documentation and/or other materials provided with the distribution. >>>> >> + * >>>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >>>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >>>> >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >>>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >>>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >>>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >>>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >>>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >>>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >>>> >> + * SUCH DAMAGE. >>>> >> + */ >>>> >> + >>>> >> +#include "qemu/osdep.h" >>>> >> +#include "qemu/error-report.h" >>>> >> +#include "qemu/main-loop.h" >>>> >> +#include "exec/address-spaces.h" >>>> >> +#include "exec/exec-all.h" >>>> >> +#include "sysemu/cpus.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/runstate.h" >>>> >> +#include "qemu/guest-random.h" >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +/* Memory slots */ >>>> >> + >>>> >> +struct mac_slot { >>>> >> + int present; >>>> >> + uint64_t size; >>>> >> + uint64_t gpa_start; >>>> >> + uint64_t gva; >>>> >> +}; >>>> >> + >>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>>> >> +{ >>>> >> + hvf_slot *slot; >>>> >> + int x; >>>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> + slot = &hvf_state->slots[x]; >>>> >> + if (slot->size && start < (slot->start + slot->size) && >>>> >> + (start + size) > slot->start) { >>>> >> + return slot; >>>> >> + } >>>> >> + } >>>> >> + return NULL; >>>> >> +} >>>> >> + >>>> >> +struct mac_slot mac_slots[32]; >>>> >> + >>>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>>> >> +{ >>>> >> + struct mac_slot *macslot; >>>> >> + hv_return_t ret; >>>> >> + >>>> >> + macslot = &mac_slots[slot->slot_id]; >>>> >> + >>>> >> + if (macslot->present) { >>>> >> + if (macslot->size != slot->size) { >>>> >> + macslot->present = 0; >>>> >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>>> >> + assert_hvf_ok(ret); >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (!slot->size) { >>>> >> + return 0; >>>> >> + } >>>> >> + >>>> >> + macslot->present = 1; >>>> >> + macslot->gpa_start = slot->start; >>>> >> + macslot->size = slot->size; >>>> >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); >>>> >> + assert_hvf_ok(ret); >>>> >> + return 0; >>>> >> +} >>>> >> + >>>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>>> >> +{ >>>> >> + hvf_slot *mem; >>>> >> + MemoryRegion *area = section->mr; >>>> >> + bool writeable = !area->readonly && !area->rom_device; >>>> >> + hv_memory_flags_t flags; >>>> >> + >>>> >> + if (!memory_region_is_ram(area)) { >>>> >> + if (writeable) { >>>> >> + return; >>>> >> + } else if (!memory_region_is_romd(area)) { >>>> >> + /* >>>> >> + * If the memory device is not in romd_mode, then we actually want >>>> >> + * to remove the hvf memory slot so all accesses will trap. >>>> >> + */ >>>> >> + add = false; >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + mem = hvf_find_overlap_slot( >>>> >> + section->offset_within_address_space, >>>> >> + int128_get64(section->size)); >>>> >> + >>>> >> + if (mem && add) { >>>> >> + if (mem->size == int128_get64(section->size) && >>>> >> + mem->start == section->offset_within_address_space && >>>> >> + mem->mem == (memory_region_get_ram_ptr(area) + >>>> >> + section->offset_within_region)) { >>>> >> + return; /* Same region was attempted to register, go away. */ >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + /* Region needs to be reset. set the size to 0 and remap it. */ >>>> >> + if (mem) { >>>> >> + mem->size = 0; >>>> >> + if (do_hvf_set_memory(mem, 0)) { >>>> >> + error_report("Failed to reset overlapping slot"); >>>> >> + abort(); >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (!add) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + if (area->readonly || >>>> >> + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >>>> >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>>> >> + } else { >>>> >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>>> >> + } >>>> >> + >>>> >> + /* Now make a new slot. */ >>>> >> + int x; >>>> >> + >>>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> + mem = &hvf_state->slots[x]; >>>> >> + if (!mem->size) { >>>> >> + break; >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (x == hvf_state->num_slots) { >>>> >> + error_report("No free slots"); >>>> >> + abort(); >>>> >> + } >>>> >> + >>>> >> + mem->size = int128_get64(section->size); >>>> >> + mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >>>> >> + mem->start = section->offset_within_address_space; >>>> >> + mem->region = area; >>>> >> + >>>> >> + if (do_hvf_set_memory(mem, flags)) { >>>> >> + error_report("Error registering new memory slot"); >>>> >> + abort(); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >>>> >> +{ >>>> >> + hvf_slot *slot; >>>> >> + >>>> >> + slot = hvf_find_overlap_slot( >>>> >> + section->offset_within_address_space, >>>> >> + int128_get64(section->size)); >>>> >> + >>>> >> + /* protect region against writes; begin tracking it */ >>>> >> + if (on) { >>>> >> + slot->flags |= HVF_SLOT_LOG; >>>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>>> >> + HV_MEMORY_READ); >>>> >> + /* stop tracking region*/ >>>> >> + } else { >>>> >> + slot->flags &= ~HVF_SLOT_LOG; >>>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>>> >> + HV_MEMORY_READ | HV_MEMORY_WRITE); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_start(MemoryListener *listener, >>>> >> + MemoryRegionSection *section, int old, int new) >>>> >> +{ >>>> >> + if (old != 0) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + hvf_set_dirty_tracking(section, 1); >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_stop(MemoryListener *listener, >>>> >> + MemoryRegionSection *section, int old, int new) >>>> >> +{ >>>> >> + if (new != 0) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + hvf_set_dirty_tracking(section, 0); >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_sync(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + /* >>>> >> + * sync of dirty pages is handled elsewhere; just make sure we keep >>>> >> + * tracking the region. >>>> >> + */ >>>> >> + hvf_set_dirty_tracking(section, 1); >>>> >> +} >>>> >> + >>>> >> +static void hvf_region_add(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + hvf_set_phys_mem(section, true); >>>> >> +} >>>> >> + >>>> >> +static void hvf_region_del(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + hvf_set_phys_mem(section, false); >>>> >> +} >>>> >> + >>>> >> +static MemoryListener hvf_memory_listener = { >>>> >> + .priority = 10, >>>> >> + .region_add = hvf_region_add, >>>> >> + .region_del = hvf_region_del, >>>> >> + .log_start = hvf_log_start, >>>> >> + .log_stop = hvf_log_stop, >>>> >> + .log_sync = hvf_log_sync, >>>> >> +}; >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >>>> >> +{ >>>> >> + if (!cpu->vcpu_dirty) { >>>> >> + hvf_get_registers(cpu); >>>> >> + cpu->vcpu_dirty = true; >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu) >>>> >> +{ >>>> >> + if (!cpu->vcpu_dirty) { >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + hvf_put_registers(cpu); >>>> >> + cpu->vcpu_dirty = false; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + hvf_put_registers(cpu); >>>> >> + cpu->vcpu_dirty = false; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + cpu->vcpu_dirty = true; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void hvf_vcpu_destroy(CPUState *cpu) >>>> >> +{ >>>> >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); >>>> >> + assert_hvf_ok(ret); >>>> >> + >>>> >> + hvf_arch_vcpu_destroy(cpu); >>>> >> +} >>>> >> + >>>> >> +static void dummy_signal(int sig) >>>> >> +{ >>>> >> +} >>>> >> + >>>> >> +static int hvf_init_vcpu(CPUState *cpu) >>>> >> +{ >>>> >> + int r; >>>> >> + >>>> >> + /* init cpu signals */ >>>> >> + sigset_t set; >>>> >> + struct sigaction sigact; >>>> >> + >>>> >> + memset(&sigact, 0, sizeof(sigact)); >>>> >> + sigact.sa_handler = dummy_signal; >>>> >> + sigaction(SIG_IPI, &sigact, NULL); >>>> >> + >>>> >> + pthread_sigmask(SIG_BLOCK, NULL, &set); >>>> >> + sigdelset(&set, SIG_IPI); >>>> >> + >>>> >> +#ifdef __aarch64__ >>>> >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL); >>>> >> +#else >>>> >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>>> >> +#endif >>>> > I think the first __aarch64__ bit fits better to arm part of the series. >>>> >>>> >>>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the >>>> ARM enablement. >>>> >>>> >>>> > >>>> >> + cpu->vcpu_dirty = 1; >>>> >> + assert_hvf_ok(r); >>>> >> + >>>> >> + return hvf_arch_init_vcpu(cpu); >>>> >> +} >>>> >> + >>>> >> +/* >>>> >> + * The HVF-specific vCPU thread function. This one should only run when the host >>>> >> + * CPU supports the VMX "unrestricted guest" feature. >>>> >> + */ >>>> >> +static void *hvf_cpu_thread_fn(void *arg) >>>> >> +{ >>>> >> + CPUState *cpu = arg; >>>> >> + >>>> >> + int r; >>>> >> + >>>> >> + assert(hvf_enabled()); >>>> >> + >>>> >> + rcu_register_thread(); >>>> >> + >>>> >> + qemu_mutex_lock_iothread(); >>>> >> + qemu_thread_get_self(cpu->thread); >>>> >> + >>>> >> + cpu->thread_id = qemu_get_thread_id(); >>>> >> + cpu->can_do_io = 1; >>>> >> + current_cpu = cpu; >>>> >> + >>>> >> + hvf_init_vcpu(cpu); >>>> >> + >>>> >> + /* signal CPU creation */ >>>> >> + cpu_thread_signal_created(cpu); >>>> >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); >>>> >> + >>>> >> + do { >>>> >> + if (cpu_can_run(cpu)) { >>>> >> + r = hvf_vcpu_exec(cpu); >>>> >> + if (r == EXCP_DEBUG) { >>>> >> + cpu_handle_guest_debug(cpu); >>>> >> + } >>>> >> + } >>>> >> + qemu_wait_io_event(cpu); >>>> >> + } while (!cpu->unplug || cpu_can_run(cpu)); >>>> >> + >>>> >> + hvf_vcpu_destroy(cpu); >>>> >> + cpu_thread_signal_destroyed(cpu); >>>> >> + qemu_mutex_unlock_iothread(); >>>> >> + rcu_unregister_thread(); >>>> >> + return NULL; >>>> >> +} >>>> >> + >>>> >> +static void hvf_start_vcpu_thread(CPUState *cpu) >>>> >> +{ >>>> >> + char thread_name[VCPU_THREAD_NAME_SIZE]; >>>> >> + >>>> >> + /* >>>> >> + * HVF currently does not support TCG, and only runs in >>>> >> + * unrestricted-guest mode. >>>> >> + */ >>>> >> + assert(hvf_enabled()); >>>> >> + >>>> >> + cpu->thread = g_malloc0(sizeof(QemuThread)); >>>> >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>>> >> + qemu_cond_init(cpu->halt_cond); >>>> >> + >>>> >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>>> >> + cpu->cpu_index); >>>> >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>>> >> + cpu, QEMU_THREAD_JOINABLE); >>>> >> +} >>>> >> + >>>> >> +static const CpusAccel hvf_cpus = { >>>> >> + .create_vcpu_thread = hvf_start_vcpu_thread, >>>> >> + >>>> >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>>> >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, >>>> >> + .synchronize_state = hvf_cpu_synchronize_state, >>>> >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>>> >> +}; >>>> >> + >>>> >> +static int hvf_accel_init(MachineState *ms) >>>> >> +{ >>>> >> + int x; >>>> >> + hv_return_t ret; >>>> >> + HVFState *s; >>>> >> + >>>> >> + ret = hv_vm_create(HV_VM_DEFAULT); >>>> >> + assert_hvf_ok(ret); >>>> >> + >>>> >> + s = g_new0(HVFState, 1); >>>> >> + >>>> >> + s->num_slots = 32; >>>> >> + for (x = 0; x < s->num_slots; ++x) { >>>> >> + s->slots[x].size = 0; >>>> >> + s->slots[x].slot_id = x; >>>> >> + } >>>> >> + >>>> >> + hvf_state = s; >>>> >> + memory_listener_register(&hvf_memory_listener, &address_space_memory); >>>> >> + cpus_register_accel(&hvf_cpus); >>>> >> + return 0; >>>> >> +} >>>> >> + >>>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) >>>> >> +{ >>>> >> + AccelClass *ac = ACCEL_CLASS(oc); >>>> >> + ac->name = "HVF"; >>>> >> + ac->init_machine = hvf_accel_init; >>>> >> + ac->allowed = &hvf_allowed; >>>> >> +} >>>> >> + >>>> >> +static const TypeInfo hvf_accel_type = { >>>> >> + .name = TYPE_HVF_ACCEL, >>>> >> + .parent = TYPE_ACCEL, >>>> >> + .class_init = hvf_accel_class_init, >>>> >> +}; >>>> >> + >>>> >> +static void hvf_type_init(void) >>>> >> +{ >>>> >> + type_register_static(&hvf_accel_type); >>>> >> +} >>>> >> + >>>> >> +type_init(hvf_type_init); >>>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build >>>> >> new file mode 100644 >>>> >> index 0000000000..dfd6b68dc7 >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/meson.build >>>> >> @@ -0,0 +1,7 @@ >>>> >> +hvf_ss = ss.source_set() >>>> >> +hvf_ss.add(files( >>>> >> + 'hvf-all.c', >>>> >> + 'hvf-cpus.c', >>>> >> +)) >>>> >> + >>>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) >>>> >> diff --git a/accel/meson.build b/accel/meson.build >>>> >> index b26cca227a..6de12ce5d5 100644 >>>> >> --- a/accel/meson.build >>>> >> +++ b/accel/meson.build >>>> >> @@ -1,5 +1,6 @@ >>>> >> softmmu_ss.add(files('accel.c')) >>>> >> >>>> >> +subdir('hvf') >>>> >> subdir('qtest') >>>> >> subdir('kvm') >>>> >> subdir('tcg') >>>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h >>>> >> new file mode 100644 >>>> >> index 0000000000..de9bad23a8 >>>> >> --- /dev/null >>>> >> +++ b/include/sysemu/hvf_int.h >>>> >> @@ -0,0 +1,69 @@ >>>> >> +/* >>>> >> + * QEMU Hypervisor.framework (HVF) support >>>> >> + * >>>> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >>>> >> + * See the COPYING file in the top-level directory. >>>> >> + * >>>> >> + */ >>>> >> + >>>> >> +/* header to be included in HVF-specific code */ >>>> >> + >>>> >> +#ifndef HVF_INT_H >>>> >> +#define HVF_INT_H >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +#define HVF_MAX_VCPU 0x10 >>>> >> + >>>> >> +extern struct hvf_state hvf_global; >>>> >> + >>>> >> +struct hvf_vm { >>>> >> + int id; >>>> >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>>> >> +}; >>>> >> + >>>> >> +struct hvf_state { >>>> >> + uint32_t version; >>>> >> + struct hvf_vm *vm; >>>> >> + uint64_t mem_quota; >>>> >> +}; >>>> >> + >>>> >> +/* hvf_slot flags */ >>>> >> +#define HVF_SLOT_LOG (1 << 0) >>>> >> + >>>> >> +typedef struct hvf_slot { >>>> >> + uint64_t start; >>>> >> + uint64_t size; >>>> >> + uint8_t *mem; >>>> >> + int slot_id; >>>> >> + uint32_t flags; >>>> >> + MemoryRegion *region; >>>> >> +} hvf_slot; >>>> >> + >>>> >> +typedef struct hvf_vcpu_caps { >>>> >> + uint64_t vmx_cap_pinbased; >>>> >> + uint64_t vmx_cap_procbased; >>>> >> + uint64_t vmx_cap_procbased2; >>>> >> + uint64_t vmx_cap_entry; >>>> >> + uint64_t vmx_cap_exit; >>>> >> + uint64_t vmx_cap_preemption_timer; >>>> >> +} hvf_vcpu_caps; >>>> >> + >>>> >> +struct HVFState { >>>> >> + AccelState parent; >>>> >> + hvf_slot slots[32]; >>>> >> + int num_slots; >>>> >> + >>>> >> + hvf_vcpu_caps *hvf_caps; >>>> >> +}; >>>> >> +extern HVFState *hvf_state; >>>> >> + >>>> >> +void assert_hvf_ok(hv_return_t ret); >>>> >> +int hvf_get_registers(CPUState *cpu); >>>> >> +int hvf_put_registers(CPUState *cpu); >>>> >> +int hvf_arch_init_vcpu(CPUState *cpu); >>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu); >>>> >> +int hvf_vcpu_exec(CPUState *cpu); >>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>>> >> + >>>> >> +#endif >>>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c >>>> >> deleted file mode 100644 >>>> >> index 817b3d7452..0000000000 >>>> >> --- a/target/i386/hvf/hvf-cpus.c >>>> >> +++ /dev/null >>>> >> @@ -1,131 +0,0 @@ >>>> >> -/* >>>> >> - * Copyright 2008 IBM Corporation >>>> >> - * 2008 Red Hat, Inc. >>>> >> - * Copyright 2011 Intel Corporation >>>> >> - * Copyright 2016 Veertu, Inc. >>>> >> - * Copyright 2017 The Android Open Source Project >>>> >> - * >>>> >> - * QEMU Hypervisor.framework support >>>> >> - * >>>> >> - * This program is free software; you can redistribute it and/or >>>> >> - * modify it under the terms of version 2 of the GNU General Public >>>> >> - * License as published by the Free Software Foundation. >>>> >> - * >>>> >> - * This program is distributed in the hope that it will be useful, >>>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>> >> - * General Public License for more details. >>>> >> - * >>>> >> - * You should have received a copy of the GNU General Public License >>>> >> - * along with this program; if not, see <http://www.gnu.org/licenses/>. >>>> >> - * >>>> >> - * This file contain code under public domain from the hvdos project: >>>> >> - * https://github.com/mist64/hvdos >>>> >> - * >>>> >> - * Parts Copyright (c) 2011 NetApp, Inc. >>>> >> - * All rights reserved. >>>> >> - * >>>> >> - * Redistribution and use in source and binary forms, with or without >>>> >> - * modification, are permitted provided that the following conditions >>>> >> - * are met: >>>> >> - * 1. Redistributions of source code must retain the above copyright >>>> >> - * notice, this list of conditions and the following disclaimer. >>>> >> - * 2. Redistributions in binary form must reproduce the above copyright >>>> >> - * notice, this list of conditions and the following disclaimer in the >>>> >> - * documentation and/or other materials provided with the distribution. >>>> >> - * >>>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >>>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >>>> >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >>>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >>>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >>>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >>>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >>>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >>>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >>>> >> - * SUCH DAMAGE. >>>> >> - */ >>>> >> - >>>> >> -#include "qemu/osdep.h" >>>> >> -#include "qemu/error-report.h" >>>> >> -#include "qemu/main-loop.h" >>>> >> -#include "sysemu/hvf.h" >>>> >> -#include "sysemu/runstate.h" >>>> >> -#include "target/i386/cpu.h" >>>> >> -#include "qemu/guest-random.h" >>>> >> - >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> -/* >>>> >> - * The HVF-specific vCPU thread function. This one should only run when the host >>>> >> - * CPU supports the VMX "unrestricted guest" feature. >>>> >> - */ >>>> >> -static void *hvf_cpu_thread_fn(void *arg) >>>> >> -{ >>>> >> - CPUState *cpu = arg; >>>> >> - >>>> >> - int r; >>>> >> - >>>> >> - assert(hvf_enabled()); >>>> >> - >>>> >> - rcu_register_thread(); >>>> >> - >>>> >> - qemu_mutex_lock_iothread(); >>>> >> - qemu_thread_get_self(cpu->thread); >>>> >> - >>>> >> - cpu->thread_id = qemu_get_thread_id(); >>>> >> - cpu->can_do_io = 1; >>>> >> - current_cpu = cpu; >>>> >> - >>>> >> - hvf_init_vcpu(cpu); >>>> >> - >>>> >> - /* signal CPU creation */ >>>> >> - cpu_thread_signal_created(cpu); >>>> >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); >>>> >> - >>>> >> - do { >>>> >> - if (cpu_can_run(cpu)) { >>>> >> - r = hvf_vcpu_exec(cpu); >>>> >> - if (r == EXCP_DEBUG) { >>>> >> - cpu_handle_guest_debug(cpu); >>>> >> - } >>>> >> - } >>>> >> - qemu_wait_io_event(cpu); >>>> >> - } while (!cpu->unplug || cpu_can_run(cpu)); >>>> >> - >>>> >> - hvf_vcpu_destroy(cpu); >>>> >> - cpu_thread_signal_destroyed(cpu); >>>> >> - qemu_mutex_unlock_iothread(); >>>> >> - rcu_unregister_thread(); >>>> >> - return NULL; >>>> >> -} >>>> >> - >>>> >> -static void hvf_start_vcpu_thread(CPUState *cpu) >>>> >> -{ >>>> >> - char thread_name[VCPU_THREAD_NAME_SIZE]; >>>> >> - >>>> >> - /* >>>> >> - * HVF currently does not support TCG, and only runs in >>>> >> - * unrestricted-guest mode. >>>> >> - */ >>>> >> - assert(hvf_enabled()); >>>> >> - >>>> >> - cpu->thread = g_malloc0(sizeof(QemuThread)); >>>> >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>>> >> - qemu_cond_init(cpu->halt_cond); >>>> >> - >>>> >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>>> >> - cpu->cpu_index); >>>> >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>>> >> - cpu, QEMU_THREAD_JOINABLE); >>>> >> -} >>>> >> - >>>> >> -const CpusAccel hvf_cpus = { >>>> >> - .create_vcpu_thread = hvf_start_vcpu_thread, >>>> >> - >>>> >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>>> >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, >>>> >> - .synchronize_state = hvf_cpu_synchronize_state, >>>> >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>>> >> -}; >>>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h >>>> >> deleted file mode 100644 >>>> >> index ced31b82c0..0000000000 >>>> >> --- a/target/i386/hvf/hvf-cpus.h >>>> >> +++ /dev/null >>>> >> @@ -1,25 +0,0 @@ >>>> >> -/* >>>> >> - * Accelerator CPUS Interface >>>> >> - * >>>> >> - * Copyright 2020 SUSE LLC >>>> >> - * >>>> >> - * This work is licensed under the terms of the GNU GPL, version 2 or later. >>>> >> - * See the COPYING file in the top-level directory. >>>> >> - */ >>>> >> - >>>> >> -#ifndef HVF_CPUS_H >>>> >> -#define HVF_CPUS_H >>>> >> - >>>> >> -#include "sysemu/cpus.h" >>>> >> - >>>> >> -extern const CpusAccel hvf_cpus; >>>> >> - >>>> >> -int hvf_init_vcpu(CPUState *); >>>> >> -int hvf_vcpu_exec(CPUState *); >>>> >> -void hvf_cpu_synchronize_state(CPUState *); >>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *); >>>> >> -void hvf_cpu_synchronize_post_init(CPUState *); >>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); >>>> >> -void hvf_vcpu_destroy(CPUState *); >>>> >> - >>>> >> -#endif /* HVF_CPUS_H */ >>>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h >>>> >> index e0edffd077..6d56f8f6bb 100644 >>>> >> --- a/target/i386/hvf/hvf-i386.h >>>> >> +++ b/target/i386/hvf/hvf-i386.h >>>> >> @@ -18,57 +18,11 @@ >>>> >> >>>> >> #include "sysemu/accel.h" >>>> >> #include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> #include "cpu.h" >>>> >> #include "x86.h" >>>> >> >>>> >> -#define HVF_MAX_VCPU 0x10 >>>> >> - >>>> >> -extern struct hvf_state hvf_global; >>>> >> - >>>> >> -struct hvf_vm { >>>> >> - int id; >>>> >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>>> >> -}; >>>> >> - >>>> >> -struct hvf_state { >>>> >> - uint32_t version; >>>> >> - struct hvf_vm *vm; >>>> >> - uint64_t mem_quota; >>>> >> -}; >>>> >> - >>>> >> -/* hvf_slot flags */ >>>> >> -#define HVF_SLOT_LOG (1 << 0) >>>> >> - >>>> >> -typedef struct hvf_slot { >>>> >> - uint64_t start; >>>> >> - uint64_t size; >>>> >> - uint8_t *mem; >>>> >> - int slot_id; >>>> >> - uint32_t flags; >>>> >> - MemoryRegion *region; >>>> >> -} hvf_slot; >>>> >> - >>>> >> -typedef struct hvf_vcpu_caps { >>>> >> - uint64_t vmx_cap_pinbased; >>>> >> - uint64_t vmx_cap_procbased; >>>> >> - uint64_t vmx_cap_procbased2; >>>> >> - uint64_t vmx_cap_entry; >>>> >> - uint64_t vmx_cap_exit; >>>> >> - uint64_t vmx_cap_preemption_timer; >>>> >> -} hvf_vcpu_caps; >>>> >> - >>>> >> -struct HVFState { >>>> >> - AccelState parent; >>>> >> - hvf_slot slots[32]; >>>> >> - int num_slots; >>>> >> - >>>> >> - hvf_vcpu_caps *hvf_caps; >>>> >> -}; >>>> >> -extern HVFState *hvf_state; >>>> >> - >>>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); >>>> >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); >>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>>> >> >>>> >> #ifdef NEED_CPU_H >>>> >> /* Functions exported to host specific mode */ >>>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c >>>> >> index ed9356565c..8b96ecd619 100644 >>>> >> --- a/target/i386/hvf/hvf.c >>>> >> +++ b/target/i386/hvf/hvf.c >>>> >> @@ -51,6 +51,7 @@ >>>> >> #include "qemu/error-report.h" >>>> >> >>>> >> #include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> #include "sysemu/runstate.h" >>>> >> #include "hvf-i386.h" >>>> >> #include "vmcs.h" >>>> >> @@ -72,171 +73,6 @@ >>>> >> #include "sysemu/accel.h" >>>> >> #include "target/i386/cpu.h" >>>> >> >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> -HVFState *hvf_state; >>>> >> - >>>> >> -static void assert_hvf_ok(hv_return_t ret) >>>> >> -{ >>>> >> - if (ret == HV_SUCCESS) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - switch (ret) { >>>> >> - case HV_ERROR: >>>> >> - error_report("Error: HV_ERROR"); >>>> >> - break; >>>> >> - case HV_BUSY: >>>> >> - error_report("Error: HV_BUSY"); >>>> >> - break; >>>> >> - case HV_BAD_ARGUMENT: >>>> >> - error_report("Error: HV_BAD_ARGUMENT"); >>>> >> - break; >>>> >> - case HV_NO_RESOURCES: >>>> >> - error_report("Error: HV_NO_RESOURCES"); >>>> >> - break; >>>> >> - case HV_NO_DEVICE: >>>> >> - error_report("Error: HV_NO_DEVICE"); >>>> >> - break; >>>> >> - case HV_UNSUPPORTED: >>>> >> - error_report("Error: HV_UNSUPPORTED"); >>>> >> - break; >>>> >> - default: >>>> >> - error_report("Unknown Error"); >>>> >> - } >>>> >> - >>>> >> - abort(); >>>> >> -} >>>> >> - >>>> >> -/* Memory slots */ >>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>>> >> -{ >>>> >> - hvf_slot *slot; >>>> >> - int x; >>>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> - slot = &hvf_state->slots[x]; >>>> >> - if (slot->size && start < (slot->start + slot->size) && >>>> >> - (start + size) > slot->start) { >>>> >> - return slot; >>>> >> - } >>>> >> - } >>>> >> - return NULL; >>>> >> -} >>>> >> - >>>> >> -struct mac_slot { >>>> >> - int present; >>>> >> - uint64_t size; >>>> >> - uint64_t gpa_start; >>>> >> - uint64_t gva; >>>> >> -}; >>>> >> - >>>> >> -struct mac_slot mac_slots[32]; >>>> >> - >>>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>>> >> -{ >>>> >> - struct mac_slot *macslot; >>>> >> - hv_return_t ret; >>>> >> - >>>> >> - macslot = &mac_slots[slot->slot_id]; >>>> >> - >>>> >> - if (macslot->present) { >>>> >> - if (macslot->size != slot->size) { >>>> >> - macslot->present = 0; >>>> >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>>> >> - assert_hvf_ok(ret); >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (!slot->size) { >>>> >> - return 0; >>>> >> - } >>>> >> - >>>> >> - macslot->present = 1; >>>> >> - macslot->gpa_start = slot->start; >>>> >> - macslot->size = slot->size; >>>> >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags); >>>> >> - assert_hvf_ok(ret); >>>> >> - return 0; >>>> >> -} >>>> >> - >>>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>>> >> -{ >>>> >> - hvf_slot *mem; >>>> >> - MemoryRegion *area = section->mr; >>>> >> - bool writeable = !area->readonly && !area->rom_device; >>>> >> - hv_memory_flags_t flags; >>>> >> - >>>> >> - if (!memory_region_is_ram(area)) { >>>> >> - if (writeable) { >>>> >> - return; >>>> >> - } else if (!memory_region_is_romd(area)) { >>>> >> - /* >>>> >> - * If the memory device is not in romd_mode, then we actually want >>>> >> - * to remove the hvf memory slot so all accesses will trap. >>>> >> - */ >>>> >> - add = false; >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - mem = hvf_find_overlap_slot( >>>> >> - section->offset_within_address_space, >>>> >> - int128_get64(section->size)); >>>> >> - >>>> >> - if (mem && add) { >>>> >> - if (mem->size == int128_get64(section->size) && >>>> >> - mem->start == section->offset_within_address_space && >>>> >> - mem->mem == (memory_region_get_ram_ptr(area) + >>>> >> - section->offset_within_region)) { >>>> >> - return; /* Same region was attempted to register, go away. */ >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - /* Region needs to be reset. set the size to 0 and remap it. */ >>>> >> - if (mem) { >>>> >> - mem->size = 0; >>>> >> - if (do_hvf_set_memory(mem, 0)) { >>>> >> - error_report("Failed to reset overlapping slot"); >>>> >> - abort(); >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (!add) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - if (area->readonly || >>>> >> - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >>>> >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>>> >> - } else { >>>> >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>>> >> - } >>>> >> - >>>> >> - /* Now make a new slot. */ >>>> >> - int x; >>>> >> - >>>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> - mem = &hvf_state->slots[x]; >>>> >> - if (!mem->size) { >>>> >> - break; >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (x == hvf_state->num_slots) { >>>> >> - error_report("No free slots"); >>>> >> - abort(); >>>> >> - } >>>> >> - >>>> >> - mem->size = int128_get64(section->size); >>>> >> - mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >>>> >> - mem->start = section->offset_within_address_space; >>>> >> - mem->region = area; >>>> >> - >>>> >> - if (do_hvf_set_memory(mem, flags)) { >>>> >> - error_report("Error registering new memory slot"); >>>> >> - abort(); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> void vmx_update_tpr(CPUState *cpu) >>>> >> { >>>> >> /* TODO: need integrate APIC handling */ >>>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer, >>>> >> } >>>> >> } >>>> >> >>>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >>>> >> -{ >>>> >> - if (!cpu->vcpu_dirty) { >>>> >> - hvf_get_registers(cpu); >>>> >> - cpu->vcpu_dirty = true; >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_state(CPUState *cpu) >>>> >> -{ >>>> >> - if (!cpu->vcpu_dirty) { >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - hvf_put_registers(cpu); >>>> >> - cpu->vcpu_dirty = false; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - hvf_put_registers(cpu); >>>> >> - cpu->vcpu_dirty = false; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - cpu->vcpu_dirty = true; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >>>> >> { >>>> >> int read, write; >>>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >>>> >> return false; >>>> >> } >>>> >> >>>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >>>> >> -{ >>>> >> - hvf_slot *slot; >>>> >> - >>>> >> - slot = hvf_find_overlap_slot( >>>> >> - section->offset_within_address_space, >>>> >> - int128_get64(section->size)); >>>> >> - >>>> >> - /* protect region against writes; begin tracking it */ >>>> >> - if (on) { >>>> >> - slot->flags |= HVF_SLOT_LOG; >>>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>>> >> - HV_MEMORY_READ); >>>> >> - /* stop tracking region*/ >>>> >> - } else { >>>> >> - slot->flags &= ~HVF_SLOT_LOG; >>>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>>> >> - HV_MEMORY_READ | HV_MEMORY_WRITE); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_start(MemoryListener *listener, >>>> >> - MemoryRegionSection *section, int old, int new) >>>> >> -{ >>>> >> - if (old != 0) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - hvf_set_dirty_tracking(section, 1); >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_stop(MemoryListener *listener, >>>> >> - MemoryRegionSection *section, int old, int new) >>>> >> -{ >>>> >> - if (new != 0) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - hvf_set_dirty_tracking(section, 0); >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_sync(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - /* >>>> >> - * sync of dirty pages is handled elsewhere; just make sure we keep >>>> >> - * tracking the region. >>>> >> - */ >>>> >> - hvf_set_dirty_tracking(section, 1); >>>> >> -} >>>> >> - >>>> >> -static void hvf_region_add(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - hvf_set_phys_mem(section, true); >>>> >> -} >>>> >> - >>>> >> -static void hvf_region_del(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - hvf_set_phys_mem(section, false); >>>> >> -} >>>> >> - >>>> >> -static MemoryListener hvf_memory_listener = { >>>> >> - .priority = 10, >>>> >> - .region_add = hvf_region_add, >>>> >> - .region_del = hvf_region_del, >>>> >> - .log_start = hvf_log_start, >>>> >> - .log_stop = hvf_log_stop, >>>> >> - .log_sync = hvf_log_sync, >>>> >> -}; >>>> >> - >>>> >> -void hvf_vcpu_destroy(CPUState *cpu) >>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu) >>>> >> { >>>> >> X86CPU *x86_cpu = X86_CPU(cpu); >>>> >> CPUX86State *env = &x86_cpu->env; >>>> >> >>>> >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); >>>> >> g_free(env->hvf_mmio_buf); >>>> >> - assert_hvf_ok(ret); >>>> >> -} >>>> >> - >>>> >> -static void dummy_signal(int sig) >>>> >> -{ >>>> >> } >>>> >> >>>> >> -int hvf_init_vcpu(CPUState *cpu) >>>> >> +int hvf_arch_init_vcpu(CPUState *cpu) >>>> >> { >>>> >> >>>> >> X86CPU *x86cpu = X86_CPU(cpu); >>>> >> CPUX86State *env = &x86cpu->env; >>>> >> - int r; >>>> >> - >>>> >> - /* init cpu signals */ >>>> >> - sigset_t set; >>>> >> - struct sigaction sigact; >>>> >> - >>>> >> - memset(&sigact, 0, sizeof(sigact)); >>>> >> - sigact.sa_handler = dummy_signal; >>>> >> - sigaction(SIG_IPI, &sigact, NULL); >>>> >> - >>>> >> - pthread_sigmask(SIG_BLOCK, NULL, &set); >>>> >> - sigdelset(&set, SIG_IPI); >>>> >> >>>> >> init_emu(); >>>> >> init_decoder(); >>>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) >>>> >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); >>>> >> env->hvf_mmio_buf = g_new(char, 4096); >>>> >> >>>> >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>>> >> - cpu->vcpu_dirty = 1; >>>> >> - assert_hvf_ok(r); >>>> >> - >>>> >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, >>>> >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { >>>> >> abort(); >>>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) >>>> >> >>>> >> return ret; >>>> >> } >>>> >> - >>>> >> -bool hvf_allowed; >>>> >> - >>>> >> -static int hvf_accel_init(MachineState *ms) >>>> >> -{ >>>> >> - int x; >>>> >> - hv_return_t ret; >>>> >> - HVFState *s; >>>> >> - >>>> >> - ret = hv_vm_create(HV_VM_DEFAULT); >>>> >> - assert_hvf_ok(ret); >>>> >> - >>>> >> - s = g_new0(HVFState, 1); >>>> >> - >>>> >> - s->num_slots = 32; >>>> >> - for (x = 0; x < s->num_slots; ++x) { >>>> >> - s->slots[x].size = 0; >>>> >> - s->slots[x].slot_id = x; >>>> >> - } >>>> >> - >>>> >> - hvf_state = s; >>>> >> - memory_listener_register(&hvf_memory_listener, &address_space_memory); >>>> >> - cpus_register_accel(&hvf_cpus); >>>> >> - return 0; >>>> >> -} >>>> >> - >>>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) >>>> >> -{ >>>> >> - AccelClass *ac = ACCEL_CLASS(oc); >>>> >> - ac->name = "HVF"; >>>> >> - ac->init_machine = hvf_accel_init; >>>> >> - ac->allowed = &hvf_allowed; >>>> >> -} >>>> >> - >>>> >> -static const TypeInfo hvf_accel_type = { >>>> >> - .name = TYPE_HVF_ACCEL, >>>> >> - .parent = TYPE_ACCEL, >>>> >> - .class_init = hvf_accel_class_init, >>>> >> -}; >>>> >> - >>>> >> -static void hvf_type_init(void) >>>> >> -{ >>>> >> - type_register_static(&hvf_accel_type); >>>> >> -} >>>> >> - >>>> >> -type_init(hvf_type_init); >>>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build >>>> >> index 409c9a3f14..c8a43717ee 100644 >>>> >> --- a/target/i386/hvf/meson.build >>>> >> +++ b/target/i386/hvf/meson.build >>>> >> @@ -1,6 +1,5 @@ >>>> >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( >>>> >> 'hvf.c', >>>> >> - 'hvf-cpus.c', >>>> >> 'x86.c', >>>> >> 'x86_cpuid.c', >>>> >> 'x86_decode.c', >>>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c >>>> >> index bbec412b6c..89b8e9d87a 100644 >>>> >> --- a/target/i386/hvf/x86hvf.c >>>> >> +++ b/target/i386/hvf/x86hvf.c >>>> >> @@ -20,6 +20,9 @@ >>>> >> #include "qemu/osdep.h" >>>> >> >>>> >> #include "qemu-common.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/hw_accel.h" >>>> >> #include "x86hvf.h" >>>> >> #include "vmx.h" >>>> >> #include "vmcs.h" >>>> >> @@ -32,8 +35,6 @@ >>>> >> #include <Hypervisor/hv.h> >>>> >> #include <Hypervisor/hv_vmx.h> >>>> >> >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, >>>> >> SegmentCache *qseg, bool is_tr) >>>> >> { >>>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) >>>> >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); >>>> >> >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> >> do_cpu_init(cpu); >>>> >> } >>>> >> >>>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) >>>> >> cpu_state->halted = 0; >>>> >> } >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> >> do_cpu_sipi(cpu); >>>> >> } >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { >>>> >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should >>>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the >>>> > summer. >>>> >>>> >>>> The only reason they're in here is because we no longer have access to >>>> the hvf_ functions from the file. I am perfectly happy to rebase the >>>> patch on top of Claudio's if his goes in first. I'm sure it'll be >>>> trivial for him to rebase on top of this too if my series goes in first. >>>> >>>> >>>> > >>>> > Phillipe raised the idea that the patch might go ahead of ARM-specific >>>> > part (which might involve some discussions) and I agree with that. >>>> > >>>> > Some sync between Claudio series (CC'd him) and the patch might be need. >>>> >>>> >>>> I would prefer not to hold back because of the sync. Claudio's cleanup >>>> is trivial enough to adjust for if it gets merged ahead of this. >>>> >>>> >>>> Alex >>>> >>>> >>>>
Hi Peter, On 30.11.20 22:08, Peter Collingbourne wrote: > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: >> >> >> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >>> Hi Frank, >>> >>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. >> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! >>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >>> >>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >>> >> Thanks, we'll take a look :) >> >>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > I'll reply to your patch here. You have: > > + /* Set cpu->hvf->sleeping so that we get a > SIG_IPI signal. */ > + cpu->hvf->sleeping = true; > + smp_mb(); > + > + /* Bail out if we received an IRQ meanwhile */ > + if (cpu->thread_kicked || (cpu->interrupt_request & > + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > + cpu->hvf->sleeping = false; > + break; > + } > + > + /* nanosleep returns on signal, so we wake up on kick. */ > + nanosleep(ts, NULL); > > and then send the signal conditional on whether sleeping is true, but > I think this is racy. If the signal is sent after sleeping is set to > true but before entering nanosleep then I think it will be ignored and > we will miss the wakeup. That's why in my implementation I block IPI > on the CPU thread at startup and then use pselect to atomically > unblock and begin sleeping. The signal is sent unconditionally so > there's no need to worry about races between actually sleeping and the > "we think we're sleeping" state. It may lead to an extra wakeup but > that's better than missing it entirely. Thanks a bunch for the comment! So the trick I was using here is to modify the timespec from the kick function before sending the IPI signal. That way, we know that either we are inside the sleep (where the signal wakes it up) or we are outside the sleep (where timespec={} will make it return immediately). The only race I can think of is if nanosleep does calculations based on the timespec and we happen to send the signal right there and then. The problem with blocking IPIs is basically what Frank was describing earlier: How do you unset the IPI signal pending status? If the signal is never delivered, how can pselect differentiate "signal from last time is still pending" from "new signal because I got an IPI"? Alex
On Mon, 30 Nov 2020 at 20:56, Frank Yang <lfy@google.com> wrote: > We'd actually like to contribute upstream too :) We do want to maintain > our own downstream though; Android Emulator codebase needs to work > solidly on macos and windows which has made keeping up with upstream difficult One of the main reasons why OSX and Windows support upstream is not so great is because very few people are helping to develop, test and support it upstream. The way to fix that IMHO is for more people who do care about those platforms to actively engage with us upstream to help in making those platforms move closer to being first class citizens. If you stay on a downstream fork forever then I don't think you'll ever see things improve. thanks -- PMM
On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > > > > On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >> >> Hi Frank, >> >> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > > Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! We tracked down the discrepancies between CNTPCT_EL0 on the guest vs on the host to the fact that CNTPCT_EL0 on the guest does not increment while the system is asleep and as such corresponds to mach_absolute_time() on the host (if you read the XNU sources you will see that mach_absolute_time() is implemented as CNTPCT_EL0 plus a constant representing the time spent asleep) while CNTPCT_EL0 on the host does increment while asleep. This patch switches the implementation over to using mach_absolute_time() instead of reading CNTPCT_EL0 directly: https://android-review.googlesource.com/c/platform/external/qemu/+/1514870 Peter >> >> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >> >> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >> > > Thanks, we'll take a look :) > >> >> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. >> >> Also, is there a particular reason you're working on this super interesting and useful code in a random downstream fork of QEMU? Wouldn't it be more helpful to contribute to the upstream code base instead? > > We'd actually like to contribute upstream too :) We do want to maintain our own downstream though; Android Emulator codebase needs to work solidly on macos and windows which has made keeping up with upstream difficult, and staying on a previous version (2.12) with known quirks easier. (theres also some android related customization relating to Qt Ui + different set of virtual devices and snapshot support (incl. snapshots of graphics devices with OpenGLES state tracking), which we hope to separate into other libraries/processes, but its not insignificant) >> >> >> Alex >> >> >> On 30.11.20 21:15, Frank Yang wrote: >> >> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But the high CPU usage seems to be mitigated by having a poll interval (like KVM does) in handling WFI: >> >> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 >> >> This is loosely inspired by https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 which does seem to specify a poll interval. >> >> It would be cool if we could have a lightweight way to enter sleep and restart the vcpus precisely when CVAL passes, though. >> >> Frank >> >> >> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote: >>> >>> Hi all, >>> >>> +Peter Collingbourne >>> >>> I'm a developer on the Android Emulator, which is in a fork of QEMU. >>> >>> Peter and I have been working on an HVF Apple Silicon backend with an eye toward Android guests. >>> >>> We have gotten things to basically switch to Android userspace already (logcat/shell and graphics available at least) >>> >>> Our strategy so far has been to import logic from the KVM implementation and hook into QEMU's software devices that previously assumed to only work with TCG, or have KVM-specific paths. >>> >>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; our way of addressing this is to still allow highmem but not put pci high mmio so high. >>> >>> Also, note we have a sleep/signal based mechanism to deal with WFx, which might be worth looking into in Alexander's implementation as well: >>> >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551 >>> >>> Patches so far, FYI: >>> >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3 >>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3 >>> >>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a >>> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b >>> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01 >>> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228 >>> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102 >>> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6 >>> >>> Peter's also noticed that there are extra steps needed for M1's to allow TCG to work, as it involves JIT: >>> >>> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9 >>> >>> We'd appreciate any feedback/comments :) >>> >>> Best, >>> >>> Frank >>> >>> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote: >>>> >>>> >>>> On 27.11.20 21:00, Roman Bolshakov wrote: >>>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote: >>>> >> Until now, Hypervisor.framework has only been available on x86_64 systems. >>>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To >>>> >> prepare for support for multiple architectures, let's move common code out >>>> >> into its own accel directory. >>>> >> >>>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de> >>>> >> --- >>>> >> MAINTAINERS | 9 +- >>>> >> accel/hvf/hvf-all.c | 56 +++++ >>>> >> accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ >>>> >> accel/hvf/meson.build | 7 + >>>> >> accel/meson.build | 1 + >>>> >> include/sysemu/hvf_int.h | 69 ++++++ >>>> >> target/i386/hvf/hvf-cpus.c | 131 ---------- >>>> >> target/i386/hvf/hvf-cpus.h | 25 -- >>>> >> target/i386/hvf/hvf-i386.h | 48 +--- >>>> >> target/i386/hvf/hvf.c | 360 +-------------------------- >>>> >> target/i386/hvf/meson.build | 1 - >>>> >> target/i386/hvf/x86hvf.c | 11 +- >>>> >> target/i386/hvf/x86hvf.h | 2 - >>>> >> 13 files changed, 619 insertions(+), 569 deletions(-) >>>> >> create mode 100644 accel/hvf/hvf-all.c >>>> >> create mode 100644 accel/hvf/hvf-cpus.c >>>> >> create mode 100644 accel/hvf/meson.build >>>> >> create mode 100644 include/sysemu/hvf_int.h >>>> >> delete mode 100644 target/i386/hvf/hvf-cpus.c >>>> >> delete mode 100644 target/i386/hvf/hvf-cpus.h >>>> >> >>>> >> diff --git a/MAINTAINERS b/MAINTAINERS >>>> >> index 68bc160f41..ca4b6d9279 100644 >>>> >> --- a/MAINTAINERS >>>> >> +++ b/MAINTAINERS >>>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> >>>> >> M: Roman Bolshakov <r.bolshakov@yadro.com> >>>> >> W: https://wiki.qemu.org/Features/HVF >>>> >> S: Maintained >>>> >> -F: accel/stubs/hvf-stub.c >>>> > There was a patch for that in the RFC series from Claudio. >>>> >>>> >>>> Yeah, I'm not worried about this hunk :). >>>> >>>> >>>> > >>>> >> F: target/i386/hvf/ >>>> >> + >>>> >> +HVF >>>> >> +M: Cameron Esfahani <dirty@apple.com> >>>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com> >>>> >> +W: https://wiki.qemu.org/Features/HVF >>>> >> +S: Maintained >>>> >> +F: accel/hvf/ >>>> >> F: include/sysemu/hvf.h >>>> >> +F: include/sysemu/hvf_int.h >>>> >> >>>> >> WHPX CPUs >>>> >> M: Sunil Muthuswamy <sunilmut@microsoft.com> >>>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c >>>> >> new file mode 100644 >>>> >> index 0000000000..47d77a472a >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/hvf-all.c >>>> >> @@ -0,0 +1,56 @@ >>>> >> +/* >>>> >> + * QEMU Hypervisor.framework support >>>> >> + * >>>> >> + * This work is licensed under the terms of the GNU GPL, version 2. See >>>> >> + * the COPYING file in the top-level directory. >>>> >> + * >>>> >> + * Contributions after 2012-01-13 are licensed under the terms of the >>>> >> + * GNU GPL, version 2 or (at your option) any later version. >>>> >> + */ >>>> >> + >>>> >> +#include "qemu/osdep.h" >>>> >> +#include "qemu-common.h" >>>> >> +#include "qemu/error-report.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/runstate.h" >>>> >> + >>>> >> +#include "qemu/main-loop.h" >>>> >> +#include "sysemu/accel.h" >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +bool hvf_allowed; >>>> >> +HVFState *hvf_state; >>>> >> + >>>> >> +void assert_hvf_ok(hv_return_t ret) >>>> >> +{ >>>> >> + if (ret == HV_SUCCESS) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + switch (ret) { >>>> >> + case HV_ERROR: >>>> >> + error_report("Error: HV_ERROR"); >>>> >> + break; >>>> >> + case HV_BUSY: >>>> >> + error_report("Error: HV_BUSY"); >>>> >> + break; >>>> >> + case HV_BAD_ARGUMENT: >>>> >> + error_report("Error: HV_BAD_ARGUMENT"); >>>> >> + break; >>>> >> + case HV_NO_RESOURCES: >>>> >> + error_report("Error: HV_NO_RESOURCES"); >>>> >> + break; >>>> >> + case HV_NO_DEVICE: >>>> >> + error_report("Error: HV_NO_DEVICE"); >>>> >> + break; >>>> >> + case HV_UNSUPPORTED: >>>> >> + error_report("Error: HV_UNSUPPORTED"); >>>> >> + break; >>>> >> + default: >>>> >> + error_report("Unknown Error"); >>>> >> + } >>>> >> + >>>> >> + abort(); >>>> >> +} >>>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c >>>> >> new file mode 100644 >>>> >> index 0000000000..f9bb5502b7 >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/hvf-cpus.c >>>> >> @@ -0,0 +1,468 @@ >>>> >> +/* >>>> >> + * Copyright 2008 IBM Corporation >>>> >> + * 2008 Red Hat, Inc. >>>> >> + * Copyright 2011 Intel Corporation >>>> >> + * Copyright 2016 Veertu, Inc. >>>> >> + * Copyright 2017 The Android Open Source Project >>>> >> + * >>>> >> + * QEMU Hypervisor.framework support >>>> >> + * >>>> >> + * This program is free software; you can redistribute it and/or >>>> >> + * modify it under the terms of version 2 of the GNU General Public >>>> >> + * License as published by the Free Software Foundation. >>>> >> + * >>>> >> + * This program is distributed in the hope that it will be useful, >>>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>> >> + * General Public License for more details. >>>> >> + * >>>> >> + * You should have received a copy of the GNU General Public License >>>> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>. >>>> >> + * >>>> >> + * This file contain code under public domain from the hvdos project: >>>> >> + * https://github.com/mist64/hvdos >>>> >> + * >>>> >> + * Parts Copyright (c) 2011 NetApp, Inc. >>>> >> + * All rights reserved. >>>> >> + * >>>> >> + * Redistribution and use in source and binary forms, with or without >>>> >> + * modification, are permitted provided that the following conditions >>>> >> + * are met: >>>> >> + * 1. Redistributions of source code must retain the above copyright >>>> >> + * notice, this list of conditions and the following disclaimer. >>>> >> + * 2. Redistributions in binary form must reproduce the above copyright >>>> >> + * notice, this list of conditions and the following disclaimer in the >>>> >> + * documentation and/or other materials provided with the distribution. >>>> >> + * >>>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >>>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >>>> >> + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >>>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >>>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >>>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >>>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >>>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >>>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >>>> >> + * SUCH DAMAGE. >>>> >> + */ >>>> >> + >>>> >> +#include "qemu/osdep.h" >>>> >> +#include "qemu/error-report.h" >>>> >> +#include "qemu/main-loop.h" >>>> >> +#include "exec/address-spaces.h" >>>> >> +#include "exec/exec-all.h" >>>> >> +#include "sysemu/cpus.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/runstate.h" >>>> >> +#include "qemu/guest-random.h" >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +/* Memory slots */ >>>> >> + >>>> >> +struct mac_slot { >>>> >> + int present; >>>> >> + uint64_t size; >>>> >> + uint64_t gpa_start; >>>> >> + uint64_t gva; >>>> >> +}; >>>> >> + >>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>>> >> +{ >>>> >> + hvf_slot *slot; >>>> >> + int x; >>>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> + slot = &hvf_state->slots[x]; >>>> >> + if (slot->size && start < (slot->start + slot->size) && >>>> >> + (start + size) > slot->start) { >>>> >> + return slot; >>>> >> + } >>>> >> + } >>>> >> + return NULL; >>>> >> +} >>>> >> + >>>> >> +struct mac_slot mac_slots[32]; >>>> >> + >>>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>>> >> +{ >>>> >> + struct mac_slot *macslot; >>>> >> + hv_return_t ret; >>>> >> + >>>> >> + macslot = &mac_slots[slot->slot_id]; >>>> >> + >>>> >> + if (macslot->present) { >>>> >> + if (macslot->size != slot->size) { >>>> >> + macslot->present = 0; >>>> >> + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>>> >> + assert_hvf_ok(ret); >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (!slot->size) { >>>> >> + return 0; >>>> >> + } >>>> >> + >>>> >> + macslot->present = 1; >>>> >> + macslot->gpa_start = slot->start; >>>> >> + macslot->size = slot->size; >>>> >> + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); >>>> >> + assert_hvf_ok(ret); >>>> >> + return 0; >>>> >> +} >>>> >> + >>>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>>> >> +{ >>>> >> + hvf_slot *mem; >>>> >> + MemoryRegion *area = section->mr; >>>> >> + bool writeable = !area->readonly && !area->rom_device; >>>> >> + hv_memory_flags_t flags; >>>> >> + >>>> >> + if (!memory_region_is_ram(area)) { >>>> >> + if (writeable) { >>>> >> + return; >>>> >> + } else if (!memory_region_is_romd(area)) { >>>> >> + /* >>>> >> + * If the memory device is not in romd_mode, then we actually want >>>> >> + * to remove the hvf memory slot so all accesses will trap. >>>> >> + */ >>>> >> + add = false; >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + mem = hvf_find_overlap_slot( >>>> >> + section->offset_within_address_space, >>>> >> + int128_get64(section->size)); >>>> >> + >>>> >> + if (mem && add) { >>>> >> + if (mem->size == int128_get64(section->size) && >>>> >> + mem->start == section->offset_within_address_space && >>>> >> + mem->mem == (memory_region_get_ram_ptr(area) + >>>> >> + section->offset_within_region)) { >>>> >> + return; /* Same region was attempted to register, go away. */ >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + /* Region needs to be reset. set the size to 0 and remap it. */ >>>> >> + if (mem) { >>>> >> + mem->size = 0; >>>> >> + if (do_hvf_set_memory(mem, 0)) { >>>> >> + error_report("Failed to reset overlapping slot"); >>>> >> + abort(); >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (!add) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + if (area->readonly || >>>> >> + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >>>> >> + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>>> >> + } else { >>>> >> + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>>> >> + } >>>> >> + >>>> >> + /* Now make a new slot. */ >>>> >> + int x; >>>> >> + >>>> >> + for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> + mem = &hvf_state->slots[x]; >>>> >> + if (!mem->size) { >>>> >> + break; >>>> >> + } >>>> >> + } >>>> >> + >>>> >> + if (x == hvf_state->num_slots) { >>>> >> + error_report("No free slots"); >>>> >> + abort(); >>>> >> + } >>>> >> + >>>> >> + mem->size = int128_get64(section->size); >>>> >> + mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >>>> >> + mem->start = section->offset_within_address_space; >>>> >> + mem->region = area; >>>> >> + >>>> >> + if (do_hvf_set_memory(mem, flags)) { >>>> >> + error_report("Error registering new memory slot"); >>>> >> + abort(); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >>>> >> +{ >>>> >> + hvf_slot *slot; >>>> >> + >>>> >> + slot = hvf_find_overlap_slot( >>>> >> + section->offset_within_address_space, >>>> >> + int128_get64(section->size)); >>>> >> + >>>> >> + /* protect region against writes; begin tracking it */ >>>> >> + if (on) { >>>> >> + slot->flags |= HVF_SLOT_LOG; >>>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>>> >> + HV_MEMORY_READ); >>>> >> + /* stop tracking region*/ >>>> >> + } else { >>>> >> + slot->flags &= ~HVF_SLOT_LOG; >>>> >> + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, >>>> >> + HV_MEMORY_READ | HV_MEMORY_WRITE); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_start(MemoryListener *listener, >>>> >> + MemoryRegionSection *section, int old, int new) >>>> >> +{ >>>> >> + if (old != 0) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + hvf_set_dirty_tracking(section, 1); >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_stop(MemoryListener *listener, >>>> >> + MemoryRegionSection *section, int old, int new) >>>> >> +{ >>>> >> + if (new != 0) { >>>> >> + return; >>>> >> + } >>>> >> + >>>> >> + hvf_set_dirty_tracking(section, 0); >>>> >> +} >>>> >> + >>>> >> +static void hvf_log_sync(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + /* >>>> >> + * sync of dirty pages is handled elsewhere; just make sure we keep >>>> >> + * tracking the region. >>>> >> + */ >>>> >> + hvf_set_dirty_tracking(section, 1); >>>> >> +} >>>> >> + >>>> >> +static void hvf_region_add(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + hvf_set_phys_mem(section, true); >>>> >> +} >>>> >> + >>>> >> +static void hvf_region_del(MemoryListener *listener, >>>> >> + MemoryRegionSection *section) >>>> >> +{ >>>> >> + hvf_set_phys_mem(section, false); >>>> >> +} >>>> >> + >>>> >> +static MemoryListener hvf_memory_listener = { >>>> >> + .priority = 10, >>>> >> + .region_add = hvf_region_add, >>>> >> + .region_del = hvf_region_del, >>>> >> + .log_start = hvf_log_start, >>>> >> + .log_stop = hvf_log_stop, >>>> >> + .log_sync = hvf_log_sync, >>>> >> +}; >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >>>> >> +{ >>>> >> + if (!cpu->vcpu_dirty) { >>>> >> + hvf_get_registers(cpu); >>>> >> + cpu->vcpu_dirty = true; >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu) >>>> >> +{ >>>> >> + if (!cpu->vcpu_dirty) { >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >>>> >> + } >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + hvf_put_registers(cpu); >>>> >> + cpu->vcpu_dirty = false; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + hvf_put_registers(cpu); >>>> >> + cpu->vcpu_dirty = false; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>>> >> + run_on_cpu_data arg) >>>> >> +{ >>>> >> + cpu->vcpu_dirty = true; >>>> >> +} >>>> >> + >>>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>>> >> +{ >>>> >> + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >>>> >> +} >>>> >> + >>>> >> +static void hvf_vcpu_destroy(CPUState *cpu) >>>> >> +{ >>>> >> + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); >>>> >> + assert_hvf_ok(ret); >>>> >> + >>>> >> + hvf_arch_vcpu_destroy(cpu); >>>> >> +} >>>> >> + >>>> >> +static void dummy_signal(int sig) >>>> >> +{ >>>> >> +} >>>> >> + >>>> >> +static int hvf_init_vcpu(CPUState *cpu) >>>> >> +{ >>>> >> + int r; >>>> >> + >>>> >> + /* init cpu signals */ >>>> >> + sigset_t set; >>>> >> + struct sigaction sigact; >>>> >> + >>>> >> + memset(&sigact, 0, sizeof(sigact)); >>>> >> + sigact.sa_handler = dummy_signal; >>>> >> + sigaction(SIG_IPI, &sigact, NULL); >>>> >> + >>>> >> + pthread_sigmask(SIG_BLOCK, NULL, &set); >>>> >> + sigdelset(&set, SIG_IPI); >>>> >> + >>>> >> +#ifdef __aarch64__ >>>> >> + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL); >>>> >> +#else >>>> >> + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>>> >> +#endif >>>> > I think the first __aarch64__ bit fits better to arm part of the series. >>>> >>>> >>>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the >>>> ARM enablement. >>>> >>>> >>>> > >>>> >> + cpu->vcpu_dirty = 1; >>>> >> + assert_hvf_ok(r); >>>> >> + >>>> >> + return hvf_arch_init_vcpu(cpu); >>>> >> +} >>>> >> + >>>> >> +/* >>>> >> + * The HVF-specific vCPU thread function. This one should only run when the host >>>> >> + * CPU supports the VMX "unrestricted guest" feature. >>>> >> + */ >>>> >> +static void *hvf_cpu_thread_fn(void *arg) >>>> >> +{ >>>> >> + CPUState *cpu = arg; >>>> >> + >>>> >> + int r; >>>> >> + >>>> >> + assert(hvf_enabled()); >>>> >> + >>>> >> + rcu_register_thread(); >>>> >> + >>>> >> + qemu_mutex_lock_iothread(); >>>> >> + qemu_thread_get_self(cpu->thread); >>>> >> + >>>> >> + cpu->thread_id = qemu_get_thread_id(); >>>> >> + cpu->can_do_io = 1; >>>> >> + current_cpu = cpu; >>>> >> + >>>> >> + hvf_init_vcpu(cpu); >>>> >> + >>>> >> + /* signal CPU creation */ >>>> >> + cpu_thread_signal_created(cpu); >>>> >> + qemu_guest_random_seed_thread_part2(cpu->random_seed); >>>> >> + >>>> >> + do { >>>> >> + if (cpu_can_run(cpu)) { >>>> >> + r = hvf_vcpu_exec(cpu); >>>> >> + if (r == EXCP_DEBUG) { >>>> >> + cpu_handle_guest_debug(cpu); >>>> >> + } >>>> >> + } >>>> >> + qemu_wait_io_event(cpu); >>>> >> + } while (!cpu->unplug || cpu_can_run(cpu)); >>>> >> + >>>> >> + hvf_vcpu_destroy(cpu); >>>> >> + cpu_thread_signal_destroyed(cpu); >>>> >> + qemu_mutex_unlock_iothread(); >>>> >> + rcu_unregister_thread(); >>>> >> + return NULL; >>>> >> +} >>>> >> + >>>> >> +static void hvf_start_vcpu_thread(CPUState *cpu) >>>> >> +{ >>>> >> + char thread_name[VCPU_THREAD_NAME_SIZE]; >>>> >> + >>>> >> + /* >>>> >> + * HVF currently does not support TCG, and only runs in >>>> >> + * unrestricted-guest mode. >>>> >> + */ >>>> >> + assert(hvf_enabled()); >>>> >> + >>>> >> + cpu->thread = g_malloc0(sizeof(QemuThread)); >>>> >> + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>>> >> + qemu_cond_init(cpu->halt_cond); >>>> >> + >>>> >> + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>>> >> + cpu->cpu_index); >>>> >> + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>>> >> + cpu, QEMU_THREAD_JOINABLE); >>>> >> +} >>>> >> + >>>> >> +static const CpusAccel hvf_cpus = { >>>> >> + .create_vcpu_thread = hvf_start_vcpu_thread, >>>> >> + >>>> >> + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>>> >> + .synchronize_post_init = hvf_cpu_synchronize_post_init, >>>> >> + .synchronize_state = hvf_cpu_synchronize_state, >>>> >> + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>>> >> +}; >>>> >> + >>>> >> +static int hvf_accel_init(MachineState *ms) >>>> >> +{ >>>> >> + int x; >>>> >> + hv_return_t ret; >>>> >> + HVFState *s; >>>> >> + >>>> >> + ret = hv_vm_create(HV_VM_DEFAULT); >>>> >> + assert_hvf_ok(ret); >>>> >> + >>>> >> + s = g_new0(HVFState, 1); >>>> >> + >>>> >> + s->num_slots = 32; >>>> >> + for (x = 0; x < s->num_slots; ++x) { >>>> >> + s->slots[x].size = 0; >>>> >> + s->slots[x].slot_id = x; >>>> >> + } >>>> >> + >>>> >> + hvf_state = s; >>>> >> + memory_listener_register(&hvf_memory_listener, &address_space_memory); >>>> >> + cpus_register_accel(&hvf_cpus); >>>> >> + return 0; >>>> >> +} >>>> >> + >>>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data) >>>> >> +{ >>>> >> + AccelClass *ac = ACCEL_CLASS(oc); >>>> >> + ac->name = "HVF"; >>>> >> + ac->init_machine = hvf_accel_init; >>>> >> + ac->allowed = &hvf_allowed; >>>> >> +} >>>> >> + >>>> >> +static const TypeInfo hvf_accel_type = { >>>> >> + .name = TYPE_HVF_ACCEL, >>>> >> + .parent = TYPE_ACCEL, >>>> >> + .class_init = hvf_accel_class_init, >>>> >> +}; >>>> >> + >>>> >> +static void hvf_type_init(void) >>>> >> +{ >>>> >> + type_register_static(&hvf_accel_type); >>>> >> +} >>>> >> + >>>> >> +type_init(hvf_type_init); >>>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build >>>> >> new file mode 100644 >>>> >> index 0000000000..dfd6b68dc7 >>>> >> --- /dev/null >>>> >> +++ b/accel/hvf/meson.build >>>> >> @@ -0,0 +1,7 @@ >>>> >> +hvf_ss = ss.source_set() >>>> >> +hvf_ss.add(files( >>>> >> + 'hvf-all.c', >>>> >> + 'hvf-cpus.c', >>>> >> +)) >>>> >> + >>>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) >>>> >> diff --git a/accel/meson.build b/accel/meson.build >>>> >> index b26cca227a..6de12ce5d5 100644 >>>> >> --- a/accel/meson.build >>>> >> +++ b/accel/meson.build >>>> >> @@ -1,5 +1,6 @@ >>>> >> softmmu_ss.add(files('accel.c')) >>>> >> >>>> >> +subdir('hvf') >>>> >> subdir('qtest') >>>> >> subdir('kvm') >>>> >> subdir('tcg') >>>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h >>>> >> new file mode 100644 >>>> >> index 0000000000..de9bad23a8 >>>> >> --- /dev/null >>>> >> +++ b/include/sysemu/hvf_int.h >>>> >> @@ -0,0 +1,69 @@ >>>> >> +/* >>>> >> + * QEMU Hypervisor.framework (HVF) support >>>> >> + * >>>> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >>>> >> + * See the COPYING file in the top-level directory. >>>> >> + * >>>> >> + */ >>>> >> + >>>> >> +/* header to be included in HVF-specific code */ >>>> >> + >>>> >> +#ifndef HVF_INT_H >>>> >> +#define HVF_INT_H >>>> >> + >>>> >> +#include <Hypervisor/Hypervisor.h> >>>> >> + >>>> >> +#define HVF_MAX_VCPU 0x10 >>>> >> + >>>> >> +extern struct hvf_state hvf_global; >>>> >> + >>>> >> +struct hvf_vm { >>>> >> + int id; >>>> >> + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>>> >> +}; >>>> >> + >>>> >> +struct hvf_state { >>>> >> + uint32_t version; >>>> >> + struct hvf_vm *vm; >>>> >> + uint64_t mem_quota; >>>> >> +}; >>>> >> + >>>> >> +/* hvf_slot flags */ >>>> >> +#define HVF_SLOT_LOG (1 << 0) >>>> >> + >>>> >> +typedef struct hvf_slot { >>>> >> + uint64_t start; >>>> >> + uint64_t size; >>>> >> + uint8_t *mem; >>>> >> + int slot_id; >>>> >> + uint32_t flags; >>>> >> + MemoryRegion *region; >>>> >> +} hvf_slot; >>>> >> + >>>> >> +typedef struct hvf_vcpu_caps { >>>> >> + uint64_t vmx_cap_pinbased; >>>> >> + uint64_t vmx_cap_procbased; >>>> >> + uint64_t vmx_cap_procbased2; >>>> >> + uint64_t vmx_cap_entry; >>>> >> + uint64_t vmx_cap_exit; >>>> >> + uint64_t vmx_cap_preemption_timer; >>>> >> +} hvf_vcpu_caps; >>>> >> + >>>> >> +struct HVFState { >>>> >> + AccelState parent; >>>> >> + hvf_slot slots[32]; >>>> >> + int num_slots; >>>> >> + >>>> >> + hvf_vcpu_caps *hvf_caps; >>>> >> +}; >>>> >> +extern HVFState *hvf_state; >>>> >> + >>>> >> +void assert_hvf_ok(hv_return_t ret); >>>> >> +int hvf_get_registers(CPUState *cpu); >>>> >> +int hvf_put_registers(CPUState *cpu); >>>> >> +int hvf_arch_init_vcpu(CPUState *cpu); >>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu); >>>> >> +int hvf_vcpu_exec(CPUState *cpu); >>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>>> >> + >>>> >> +#endif >>>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c >>>> >> deleted file mode 100644 >>>> >> index 817b3d7452..0000000000 >>>> >> --- a/target/i386/hvf/hvf-cpus.c >>>> >> +++ /dev/null >>>> >> @@ -1,131 +0,0 @@ >>>> >> -/* >>>> >> - * Copyright 2008 IBM Corporation >>>> >> - * 2008 Red Hat, Inc. >>>> >> - * Copyright 2011 Intel Corporation >>>> >> - * Copyright 2016 Veertu, Inc. >>>> >> - * Copyright 2017 The Android Open Source Project >>>> >> - * >>>> >> - * QEMU Hypervisor.framework support >>>> >> - * >>>> >> - * This program is free software; you can redistribute it and/or >>>> >> - * modify it under the terms of version 2 of the GNU General Public >>>> >> - * License as published by the Free Software Foundation. >>>> >> - * >>>> >> - * This program is distributed in the hope that it will be useful, >>>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>> >> - * General Public License for more details. >>>> >> - * >>>> >> - * You should have received a copy of the GNU General Public License >>>> >> - * along with this program; if not, see <http://www.gnu.org/licenses/>. >>>> >> - * >>>> >> - * This file contain code under public domain from the hvdos project: >>>> >> - * https://github.com/mist64/hvdos >>>> >> - * >>>> >> - * Parts Copyright (c) 2011 NetApp, Inc. >>>> >> - * All rights reserved. >>>> >> - * >>>> >> - * Redistribution and use in source and binary forms, with or without >>>> >> - * modification, are permitted provided that the following conditions >>>> >> - * are met: >>>> >> - * 1. Redistributions of source code must retain the above copyright >>>> >> - * notice, this list of conditions and the following disclaimer. >>>> >> - * 2. Redistributions in binary form must reproduce the above copyright >>>> >> - * notice, this list of conditions and the following disclaimer in the >>>> >> - * documentation and/or other materials provided with the distribution. >>>> >> - * >>>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND >>>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE >>>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE >>>> >> - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE >>>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL >>>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS >>>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) >>>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT >>>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY >>>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >>>> >> - * SUCH DAMAGE. >>>> >> - */ >>>> >> - >>>> >> -#include "qemu/osdep.h" >>>> >> -#include "qemu/error-report.h" >>>> >> -#include "qemu/main-loop.h" >>>> >> -#include "sysemu/hvf.h" >>>> >> -#include "sysemu/runstate.h" >>>> >> -#include "target/i386/cpu.h" >>>> >> -#include "qemu/guest-random.h" >>>> >> - >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> -/* >>>> >> - * The HVF-specific vCPU thread function. This one should only run when the host >>>> >> - * CPU supports the VMX "unrestricted guest" feature. >>>> >> - */ >>>> >> -static void *hvf_cpu_thread_fn(void *arg) >>>> >> -{ >>>> >> - CPUState *cpu = arg; >>>> >> - >>>> >> - int r; >>>> >> - >>>> >> - assert(hvf_enabled()); >>>> >> - >>>> >> - rcu_register_thread(); >>>> >> - >>>> >> - qemu_mutex_lock_iothread(); >>>> >> - qemu_thread_get_self(cpu->thread); >>>> >> - >>>> >> - cpu->thread_id = qemu_get_thread_id(); >>>> >> - cpu->can_do_io = 1; >>>> >> - current_cpu = cpu; >>>> >> - >>>> >> - hvf_init_vcpu(cpu); >>>> >> - >>>> >> - /* signal CPU creation */ >>>> >> - cpu_thread_signal_created(cpu); >>>> >> - qemu_guest_random_seed_thread_part2(cpu->random_seed); >>>> >> - >>>> >> - do { >>>> >> - if (cpu_can_run(cpu)) { >>>> >> - r = hvf_vcpu_exec(cpu); >>>> >> - if (r == EXCP_DEBUG) { >>>> >> - cpu_handle_guest_debug(cpu); >>>> >> - } >>>> >> - } >>>> >> - qemu_wait_io_event(cpu); >>>> >> - } while (!cpu->unplug || cpu_can_run(cpu)); >>>> >> - >>>> >> - hvf_vcpu_destroy(cpu); >>>> >> - cpu_thread_signal_destroyed(cpu); >>>> >> - qemu_mutex_unlock_iothread(); >>>> >> - rcu_unregister_thread(); >>>> >> - return NULL; >>>> >> -} >>>> >> - >>>> >> -static void hvf_start_vcpu_thread(CPUState *cpu) >>>> >> -{ >>>> >> - char thread_name[VCPU_THREAD_NAME_SIZE]; >>>> >> - >>>> >> - /* >>>> >> - * HVF currently does not support TCG, and only runs in >>>> >> - * unrestricted-guest mode. >>>> >> - */ >>>> >> - assert(hvf_enabled()); >>>> >> - >>>> >> - cpu->thread = g_malloc0(sizeof(QemuThread)); >>>> >> - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); >>>> >> - qemu_cond_init(cpu->halt_cond); >>>> >> - >>>> >> - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", >>>> >> - cpu->cpu_index); >>>> >> - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, >>>> >> - cpu, QEMU_THREAD_JOINABLE); >>>> >> -} >>>> >> - >>>> >> -const CpusAccel hvf_cpus = { >>>> >> - .create_vcpu_thread = hvf_start_vcpu_thread, >>>> >> - >>>> >> - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, >>>> >> - .synchronize_post_init = hvf_cpu_synchronize_post_init, >>>> >> - .synchronize_state = hvf_cpu_synchronize_state, >>>> >> - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, >>>> >> -}; >>>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h >>>> >> deleted file mode 100644 >>>> >> index ced31b82c0..0000000000 >>>> >> --- a/target/i386/hvf/hvf-cpus.h >>>> >> +++ /dev/null >>>> >> @@ -1,25 +0,0 @@ >>>> >> -/* >>>> >> - * Accelerator CPUS Interface >>>> >> - * >>>> >> - * Copyright 2020 SUSE LLC >>>> >> - * >>>> >> - * This work is licensed under the terms of the GNU GPL, version 2 or later. >>>> >> - * See the COPYING file in the top-level directory. >>>> >> - */ >>>> >> - >>>> >> -#ifndef HVF_CPUS_H >>>> >> -#define HVF_CPUS_H >>>> >> - >>>> >> -#include "sysemu/cpus.h" >>>> >> - >>>> >> -extern const CpusAccel hvf_cpus; >>>> >> - >>>> >> -int hvf_init_vcpu(CPUState *); >>>> >> -int hvf_vcpu_exec(CPUState *); >>>> >> -void hvf_cpu_synchronize_state(CPUState *); >>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *); >>>> >> -void hvf_cpu_synchronize_post_init(CPUState *); >>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *); >>>> >> -void hvf_vcpu_destroy(CPUState *); >>>> >> - >>>> >> -#endif /* HVF_CPUS_H */ >>>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h >>>> >> index e0edffd077..6d56f8f6bb 100644 >>>> >> --- a/target/i386/hvf/hvf-i386.h >>>> >> +++ b/target/i386/hvf/hvf-i386.h >>>> >> @@ -18,57 +18,11 @@ >>>> >> >>>> >> #include "sysemu/accel.h" >>>> >> #include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> #include "cpu.h" >>>> >> #include "x86.h" >>>> >> >>>> >> -#define HVF_MAX_VCPU 0x10 >>>> >> - >>>> >> -extern struct hvf_state hvf_global; >>>> >> - >>>> >> -struct hvf_vm { >>>> >> - int id; >>>> >> - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; >>>> >> -}; >>>> >> - >>>> >> -struct hvf_state { >>>> >> - uint32_t version; >>>> >> - struct hvf_vm *vm; >>>> >> - uint64_t mem_quota; >>>> >> -}; >>>> >> - >>>> >> -/* hvf_slot flags */ >>>> >> -#define HVF_SLOT_LOG (1 << 0) >>>> >> - >>>> >> -typedef struct hvf_slot { >>>> >> - uint64_t start; >>>> >> - uint64_t size; >>>> >> - uint8_t *mem; >>>> >> - int slot_id; >>>> >> - uint32_t flags; >>>> >> - MemoryRegion *region; >>>> >> -} hvf_slot; >>>> >> - >>>> >> -typedef struct hvf_vcpu_caps { >>>> >> - uint64_t vmx_cap_pinbased; >>>> >> - uint64_t vmx_cap_procbased; >>>> >> - uint64_t vmx_cap_procbased2; >>>> >> - uint64_t vmx_cap_entry; >>>> >> - uint64_t vmx_cap_exit; >>>> >> - uint64_t vmx_cap_preemption_timer; >>>> >> -} hvf_vcpu_caps; >>>> >> - >>>> >> -struct HVFState { >>>> >> - AccelState parent; >>>> >> - hvf_slot slots[32]; >>>> >> - int num_slots; >>>> >> - >>>> >> - hvf_vcpu_caps *hvf_caps; >>>> >> -}; >>>> >> -extern HVFState *hvf_state; >>>> >> - >>>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool); >>>> >> void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); >>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); >>>> >> >>>> >> #ifdef NEED_CPU_H >>>> >> /* Functions exported to host specific mode */ >>>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c >>>> >> index ed9356565c..8b96ecd619 100644 >>>> >> --- a/target/i386/hvf/hvf.c >>>> >> +++ b/target/i386/hvf/hvf.c >>>> >> @@ -51,6 +51,7 @@ >>>> >> #include "qemu/error-report.h" >>>> >> >>>> >> #include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> #include "sysemu/runstate.h" >>>> >> #include "hvf-i386.h" >>>> >> #include "vmcs.h" >>>> >> @@ -72,171 +73,6 @@ >>>> >> #include "sysemu/accel.h" >>>> >> #include "target/i386/cpu.h" >>>> >> >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> -HVFState *hvf_state; >>>> >> - >>>> >> -static void assert_hvf_ok(hv_return_t ret) >>>> >> -{ >>>> >> - if (ret == HV_SUCCESS) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - switch (ret) { >>>> >> - case HV_ERROR: >>>> >> - error_report("Error: HV_ERROR"); >>>> >> - break; >>>> >> - case HV_BUSY: >>>> >> - error_report("Error: HV_BUSY"); >>>> >> - break; >>>> >> - case HV_BAD_ARGUMENT: >>>> >> - error_report("Error: HV_BAD_ARGUMENT"); >>>> >> - break; >>>> >> - case HV_NO_RESOURCES: >>>> >> - error_report("Error: HV_NO_RESOURCES"); >>>> >> - break; >>>> >> - case HV_NO_DEVICE: >>>> >> - error_report("Error: HV_NO_DEVICE"); >>>> >> - break; >>>> >> - case HV_UNSUPPORTED: >>>> >> - error_report("Error: HV_UNSUPPORTED"); >>>> >> - break; >>>> >> - default: >>>> >> - error_report("Unknown Error"); >>>> >> - } >>>> >> - >>>> >> - abort(); >>>> >> -} >>>> >> - >>>> >> -/* Memory slots */ >>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) >>>> >> -{ >>>> >> - hvf_slot *slot; >>>> >> - int x; >>>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> - slot = &hvf_state->slots[x]; >>>> >> - if (slot->size && start < (slot->start + slot->size) && >>>> >> - (start + size) > slot->start) { >>>> >> - return slot; >>>> >> - } >>>> >> - } >>>> >> - return NULL; >>>> >> -} >>>> >> - >>>> >> -struct mac_slot { >>>> >> - int present; >>>> >> - uint64_t size; >>>> >> - uint64_t gpa_start; >>>> >> - uint64_t gva; >>>> >> -}; >>>> >> - >>>> >> -struct mac_slot mac_slots[32]; >>>> >> - >>>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) >>>> >> -{ >>>> >> - struct mac_slot *macslot; >>>> >> - hv_return_t ret; >>>> >> - >>>> >> - macslot = &mac_slots[slot->slot_id]; >>>> >> - >>>> >> - if (macslot->present) { >>>> >> - if (macslot->size != slot->size) { >>>> >> - macslot->present = 0; >>>> >> - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); >>>> >> - assert_hvf_ok(ret); >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (!slot->size) { >>>> >> - return 0; >>>> >> - } >>>> >> - >>>> >> - macslot->present = 1; >>>> >> - macslot->gpa_start = slot->start; >>>> >> - macslot->size = slot->size; >>>> >> - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags); >>>> >> - assert_hvf_ok(ret); >>>> >> - return 0; >>>> >> -} >>>> >> - >>>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) >>>> >> -{ >>>> >> - hvf_slot *mem; >>>> >> - MemoryRegion *area = section->mr; >>>> >> - bool writeable = !area->readonly && !area->rom_device; >>>> >> - hv_memory_flags_t flags; >>>> >> - >>>> >> - if (!memory_region_is_ram(area)) { >>>> >> - if (writeable) { >>>> >> - return; >>>> >> - } else if (!memory_region_is_romd(area)) { >>>> >> - /* >>>> >> - * If the memory device is not in romd_mode, then we actually want >>>> >> - * to remove the hvf memory slot so all accesses will trap. >>>> >> - */ >>>> >> - add = false; >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - mem = hvf_find_overlap_slot( >>>> >> - section->offset_within_address_space, >>>> >> - int128_get64(section->size)); >>>> >> - >>>> >> - if (mem && add) { >>>> >> - if (mem->size == int128_get64(section->size) && >>>> >> - mem->start == section->offset_within_address_space && >>>> >> - mem->mem == (memory_region_get_ram_ptr(area) + >>>> >> - section->offset_within_region)) { >>>> >> - return; /* Same region was attempted to register, go away. */ >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - /* Region needs to be reset. set the size to 0 and remap it. */ >>>> >> - if (mem) { >>>> >> - mem->size = 0; >>>> >> - if (do_hvf_set_memory(mem, 0)) { >>>> >> - error_report("Failed to reset overlapping slot"); >>>> >> - abort(); >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (!add) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - if (area->readonly || >>>> >> - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { >>>> >> - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; >>>> >> - } else { >>>> >> - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; >>>> >> - } >>>> >> - >>>> >> - /* Now make a new slot. */ >>>> >> - int x; >>>> >> - >>>> >> - for (x = 0; x < hvf_state->num_slots; ++x) { >>>> >> - mem = &hvf_state->slots[x]; >>>> >> - if (!mem->size) { >>>> >> - break; >>>> >> - } >>>> >> - } >>>> >> - >>>> >> - if (x == hvf_state->num_slots) { >>>> >> - error_report("No free slots"); >>>> >> - abort(); >>>> >> - } >>>> >> - >>>> >> - mem->size = int128_get64(section->size); >>>> >> - mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; >>>> >> - mem->start = section->offset_within_address_space; >>>> >> - mem->region = area; >>>> >> - >>>> >> - if (do_hvf_set_memory(mem, flags)) { >>>> >> - error_report("Error registering new memory slot"); >>>> >> - abort(); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> void vmx_update_tpr(CPUState *cpu) >>>> >> { >>>> >> /* TODO: need integrate APIC handling */ >>>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer, >>>> >> } >>>> >> } >>>> >> >>>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) >>>> >> -{ >>>> >> - if (!cpu->vcpu_dirty) { >>>> >> - hvf_get_registers(cpu); >>>> >> - cpu->vcpu_dirty = true; >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_state(CPUState *cpu) >>>> >> -{ >>>> >> - if (!cpu->vcpu_dirty) { >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - hvf_put_registers(cpu); >>>> >> - cpu->vcpu_dirty = false; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - hvf_put_registers(cpu); >>>> >> - cpu->vcpu_dirty = false; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, >>>> >> - run_on_cpu_data arg) >>>> >> -{ >>>> >> - cpu->vcpu_dirty = true; >>>> >> -} >>>> >> - >>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) >>>> >> -{ >>>> >> - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); >>>> >> -} >>>> >> - >>>> >> static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >>>> >> { >>>> >> int read, write; >>>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) >>>> >> return false; >>>> >> } >>>> >> >>>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) >>>> >> -{ >>>> >> - hvf_slot *slot; >>>> >> - >>>> >> - slot = hvf_find_overlap_slot( >>>> >> - section->offset_within_address_space, >>>> >> - int128_get64(section->size)); >>>> >> - >>>> >> - /* protect region against writes; begin tracking it */ >>>> >> - if (on) { >>>> >> - slot->flags |= HVF_SLOT_LOG; >>>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>>> >> - HV_MEMORY_READ); >>>> >> - /* stop tracking region*/ >>>> >> - } else { >>>> >> - slot->flags &= ~HVF_SLOT_LOG; >>>> >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, >>>> >> - HV_MEMORY_READ | HV_MEMORY_WRITE); >>>> >> - } >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_start(MemoryListener *listener, >>>> >> - MemoryRegionSection *section, int old, int new) >>>> >> -{ >>>> >> - if (old != 0) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - hvf_set_dirty_tracking(section, 1); >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_stop(MemoryListener *listener, >>>> >> - MemoryRegionSection *section, int old, int new) >>>> >> -{ >>>> >> - if (new != 0) { >>>> >> - return; >>>> >> - } >>>> >> - >>>> >> - hvf_set_dirty_tracking(section, 0); >>>> >> -} >>>> >> - >>>> >> -static void hvf_log_sync(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - /* >>>> >> - * sync of dirty pages is handled elsewhere; just make sure we keep >>>> >> - * tracking the region. >>>> >> - */ >>>> >> - hvf_set_dirty_tracking(section, 1); >>>> >> -} >>>> >> - >>>> >> -static void hvf_region_add(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - hvf_set_phys_mem(section, true); >>>> >> -} >>>> >> - >>>> >> -static void hvf_region_del(MemoryListener *listener, >>>> >> - MemoryRegionSection *section) >>>> >> -{ >>>> >> - hvf_set_phys_mem(section, false); >>>> >> -} >>>> >> - >>>> >> -static MemoryListener hvf_memory_listener = { >>>> >> - .priority = 10, >>>> >> - .region_add = hvf_region_add, >>>> >> - .region_del = hvf_region_del, >>>> >> - .log_start = hvf_log_start, >>>> >> - .log_stop = hvf_log_stop, >>>> >> - .log_sync = hvf_log_sync, >>>> >> -}; >>>> >> - >>>> >> -void hvf_vcpu_destroy(CPUState *cpu) >>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu) >>>> >> { >>>> >> X86CPU *x86_cpu = X86_CPU(cpu); >>>> >> CPUX86State *env = &x86_cpu->env; >>>> >> >>>> >> - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); >>>> >> g_free(env->hvf_mmio_buf); >>>> >> - assert_hvf_ok(ret); >>>> >> -} >>>> >> - >>>> >> -static void dummy_signal(int sig) >>>> >> -{ >>>> >> } >>>> >> >>>> >> -int hvf_init_vcpu(CPUState *cpu) >>>> >> +int hvf_arch_init_vcpu(CPUState *cpu) >>>> >> { >>>> >> >>>> >> X86CPU *x86cpu = X86_CPU(cpu); >>>> >> CPUX86State *env = &x86cpu->env; >>>> >> - int r; >>>> >> - >>>> >> - /* init cpu signals */ >>>> >> - sigset_t set; >>>> >> - struct sigaction sigact; >>>> >> - >>>> >> - memset(&sigact, 0, sizeof(sigact)); >>>> >> - sigact.sa_handler = dummy_signal; >>>> >> - sigaction(SIG_IPI, &sigact, NULL); >>>> >> - >>>> >> - pthread_sigmask(SIG_BLOCK, NULL, &set); >>>> >> - sigdelset(&set, SIG_IPI); >>>> >> >>>> >> init_emu(); >>>> >> init_decoder(); >>>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) >>>> >> hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); >>>> >> env->hvf_mmio_buf = g_new(char, 4096); >>>> >> >>>> >> - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); >>>> >> - cpu->vcpu_dirty = 1; >>>> >> - assert_hvf_ok(r); >>>> >> - >>>> >> if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, >>>> >> &hvf_state->hvf_caps->vmx_cap_pinbased)) { >>>> >> abort(); >>>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) >>>> >> >>>> >> return ret; >>>> >> } >>>> >> - >>>> >> -bool hvf_allowed; >>>> >> - >>>> >> -static int hvf_accel_init(MachineState *ms) >>>> >> -{ >>>> >> - int x; >>>> >> - hv_return_t ret; >>>> >> - HVFState *s; >>>> >> - >>>> >> - ret = hv_vm_create(HV_VM_DEFAULT); >>>> >> - assert_hvf_ok(ret); >>>> >> - >>>> >> - s = g_new0(HVFState, 1); >>>> >> - >>>> >> - s->num_slots = 32; >>>> >> - for (x = 0; x < s->num_slots; ++x) { >>>> >> - s->slots[x].size = 0; >>>> >> - s->slots[x].slot_id = x; >>>> >> - } >>>> >> - >>>> >> - hvf_state = s; >>>> >> - memory_listener_register(&hvf_memory_listener, &address_space_memory); >>>> >> - cpus_register_accel(&hvf_cpus); >>>> >> - return 0; >>>> >> -} >>>> >> - >>>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data) >>>> >> -{ >>>> >> - AccelClass *ac = ACCEL_CLASS(oc); >>>> >> - ac->name = "HVF"; >>>> >> - ac->init_machine = hvf_accel_init; >>>> >> - ac->allowed = &hvf_allowed; >>>> >> -} >>>> >> - >>>> >> -static const TypeInfo hvf_accel_type = { >>>> >> - .name = TYPE_HVF_ACCEL, >>>> >> - .parent = TYPE_ACCEL, >>>> >> - .class_init = hvf_accel_class_init, >>>> >> -}; >>>> >> - >>>> >> -static void hvf_type_init(void) >>>> >> -{ >>>> >> - type_register_static(&hvf_accel_type); >>>> >> -} >>>> >> - >>>> >> -type_init(hvf_type_init); >>>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build >>>> >> index 409c9a3f14..c8a43717ee 100644 >>>> >> --- a/target/i386/hvf/meson.build >>>> >> +++ b/target/i386/hvf/meson.build >>>> >> @@ -1,6 +1,5 @@ >>>> >> i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( >>>> >> 'hvf.c', >>>> >> - 'hvf-cpus.c', >>>> >> 'x86.c', >>>> >> 'x86_cpuid.c', >>>> >> 'x86_decode.c', >>>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c >>>> >> index bbec412b6c..89b8e9d87a 100644 >>>> >> --- a/target/i386/hvf/x86hvf.c >>>> >> +++ b/target/i386/hvf/x86hvf.c >>>> >> @@ -20,6 +20,9 @@ >>>> >> #include "qemu/osdep.h" >>>> >> >>>> >> #include "qemu-common.h" >>>> >> +#include "sysemu/hvf.h" >>>> >> +#include "sysemu/hvf_int.h" >>>> >> +#include "sysemu/hw_accel.h" >>>> >> #include "x86hvf.h" >>>> >> #include "vmx.h" >>>> >> #include "vmcs.h" >>>> >> @@ -32,8 +35,6 @@ >>>> >> #include <Hypervisor/hv.h> >>>> >> #include <Hypervisor/hv_vmx.h> >>>> >> >>>> >> -#include "hvf-cpus.h" >>>> >> - >>>> >> void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, >>>> >> SegmentCache *qseg, bool is_tr) >>>> >> { >>>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) >>>> >> env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); >>>> >> >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> >> do_cpu_init(cpu); >>>> >> } >>>> >> >>>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) >>>> >> cpu_state->halted = 0; >>>> >> } >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> >> do_cpu_sipi(cpu); >>>> >> } >>>> >> if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { >>>> >> cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; >>>> >> - hvf_cpu_synchronize_state(cpu_state); >>>> >> + cpu_synchronize_state(cpu_state); >>>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should >>>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the >>>> > summer. >>>> >>>> >>>> The only reason they're in here is because we no longer have access to >>>> the hvf_ functions from the file. I am perfectly happy to rebase the >>>> patch on top of Claudio's if his goes in first. I'm sure it'll be >>>> trivial for him to rebase on top of this too if my series goes in first. >>>> >>>> >>>> > >>>> > Phillipe raised the idea that the patch might go ahead of ARM-specific >>>> > part (which might involve some discussions) and I agree with that. >>>> > >>>> > Some sync between Claudio series (CC'd him) and the patch might be need. >>>> >>>> >>>> I would prefer not to hold back because of the sync. Claudio's cleanup >>>> is trivial enough to adjust for if it gets merged ahead of this. >>>> >>>> >>>> Alex >>>> >>>> >>>>
On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: > > Hi Peter, > > On 30.11.20 22:08, Peter Collingbourne wrote: > > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > >> > >> > >> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > >>> Hi Frank, > >>> > >>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > >> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > >>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out > >>> > >>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > >>> > >> Thanks, we'll take a look :) > >> > >>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > > I'll reply to your patch here. You have: > > > > + /* Set cpu->hvf->sleeping so that we get a > > SIG_IPI signal. */ > > + cpu->hvf->sleeping = true; > > + smp_mb(); > > + > > + /* Bail out if we received an IRQ meanwhile */ > > + if (cpu->thread_kicked || (cpu->interrupt_request & > > + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > > + cpu->hvf->sleeping = false; > > + break; > > + } > > + > > + /* nanosleep returns on signal, so we wake up on kick. */ > > + nanosleep(ts, NULL); > > > > and then send the signal conditional on whether sleeping is true, but > > I think this is racy. If the signal is sent after sleeping is set to > > true but before entering nanosleep then I think it will be ignored and > > we will miss the wakeup. That's why in my implementation I block IPI > > on the CPU thread at startup and then use pselect to atomically > > unblock and begin sleeping. The signal is sent unconditionally so > > there's no need to worry about races between actually sleeping and the > > "we think we're sleeping" state. It may lead to an extra wakeup but > > that's better than missing it entirely. > > > Thanks a bunch for the comment! So the trick I was using here is to > modify the timespec from the kick function before sending the IPI > signal. That way, we know that either we are inside the sleep (where the > signal wakes it up) or we are outside the sleep (where timespec={} will > make it return immediately). > > The only race I can think of is if nanosleep does calculations based on > the timespec and we happen to send the signal right there and then. Yes that's the race I was thinking of. Admittedly it's a small window but it's theoretically possible and part of the reason why pselect was created. > The problem with blocking IPIs is basically what Frank was describing > earlier: How do you unset the IPI signal pending status? If the signal > is never delivered, how can pselect differentiate "signal from last time > is still pending" from "new signal because I got an IPI"? In this case we would take the additional wakeup which should be harmless since we will take the WFx exit again and put us in the correct state. But that's a lot better than busy looping. I reckon that you could improve things a little by unblocking the signal and then reblocking it before unlocking iothread (e.g. with a pselect with zero time interval), which would flush any pending signals. Since any such signal would correspond to a signal from last time (because we still have the iothread lock) we know that any future signals should correspond to new IPIs. Peter
On 01.12.20 00:01, Peter Collingbourne wrote: > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: >> Hi Peter, >> >> On 30.11.20 22:08, Peter Collingbourne wrote: >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: >>>> >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >>>>> Hi Frank, >>>>> >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >>>>> >>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >>>>> >>>> Thanks, we'll take a look :) >>>> >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so >>> I'll reply to your patch here. You have: >>> >>> + /* Set cpu->hvf->sleeping so that we get a >>> SIG_IPI signal. */ >>> + cpu->hvf->sleeping = true; >>> + smp_mb(); >>> + >>> + /* Bail out if we received an IRQ meanwhile */ >>> + if (cpu->thread_kicked || (cpu->interrupt_request & >>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { >>> + cpu->hvf->sleeping = false; >>> + break; >>> + } >>> + >>> + /* nanosleep returns on signal, so we wake up on kick. */ >>> + nanosleep(ts, NULL); >>> >>> and then send the signal conditional on whether sleeping is true, but >>> I think this is racy. If the signal is sent after sleeping is set to >>> true but before entering nanosleep then I think it will be ignored and >>> we will miss the wakeup. That's why in my implementation I block IPI >>> on the CPU thread at startup and then use pselect to atomically >>> unblock and begin sleeping. The signal is sent unconditionally so >>> there's no need to worry about races between actually sleeping and the >>> "we think we're sleeping" state. It may lead to an extra wakeup but >>> that's better than missing it entirely. >> >> Thanks a bunch for the comment! So the trick I was using here is to >> modify the timespec from the kick function before sending the IPI >> signal. That way, we know that either we are inside the sleep (where the >> signal wakes it up) or we are outside the sleep (where timespec={} will >> make it return immediately). >> >> The only race I can think of is if nanosleep does calculations based on >> the timespec and we happen to send the signal right there and then. > Yes that's the race I was thinking of. Admittedly it's a small window > but it's theoretically possible and part of the reason why pselect was > created. > >> The problem with blocking IPIs is basically what Frank was describing >> earlier: How do you unset the IPI signal pending status? If the signal >> is never delivered, how can pselect differentiate "signal from last time >> is still pending" from "new signal because I got an IPI"? > In this case we would take the additional wakeup which should be > harmless since we will take the WFx exit again and put us in the > correct state. But that's a lot better than busy looping. I'm not sure I follow. I'm thinking of the following scenario:  - trap into WFI handler  - go to sleep with blocked SIG_IPI  - SIG_IPI arrives, pselect() exits  - signal is still pending because it's blocked  - enter guest  - trap into WFI handler  - run pselect(), but it immediate exits because SIG_IPI is still pending This was the loop I was seeing when running with SIG_IPI blocked. That's part of the reason why I switched to a different model. > I reckon that you could improve things a little by unblocking the > signal and then reblocking it before unlocking iothread (e.g. with a > pselect with zero time interval), which would flush any pending > signals. Since any such signal would correspond to a signal from last > time (because we still have the iothread lock) we know that any future > signals should correspond to new IPIs. Yeah, I think you actually *have* to do exactly that, because otherwise pselect() will always return after 0ns because the signal is still pending. And yes, I agree that that starts to sound a bit less racy now. But it means we can probably also just do  - WFI handler  - block SIG_IPI  - set hvf->sleeping = true  - check for pending interrupts  - pselect()  - unblock SIG_IPI which means we run with SIG_IPI unmasked by default. I don't think the number of signal mask changes is any different with that compared to running with SIG_IPI always masked, right? Alex
On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote: > > > On 01.12.20 00:01, Peter Collingbourne wrote: > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: > >> Hi Peter, > >> > >> On 30.11.20 22:08, Peter Collingbourne wrote: > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > >>>> > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > >>>>> Hi Frank, > >>>>> > >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out > >>>>> > >>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > >>>>> > >>>> Thanks, we'll take a look :) > >>>> > >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > >>> I'll reply to your patch here. You have: > >>> > >>> + /* Set cpu->hvf->sleeping so that we get a > >>> SIG_IPI signal. */ > >>> + cpu->hvf->sleeping = true; > >>> + smp_mb(); > >>> + > >>> + /* Bail out if we received an IRQ meanwhile */ > >>> + if (cpu->thread_kicked || (cpu->interrupt_request & > >>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > >>> + cpu->hvf->sleeping = false; > >>> + break; > >>> + } > >>> + > >>> + /* nanosleep returns on signal, so we wake up on kick. */ > >>> + nanosleep(ts, NULL); > >>> > >>> and then send the signal conditional on whether sleeping is true, but > >>> I think this is racy. If the signal is sent after sleeping is set to > >>> true but before entering nanosleep then I think it will be ignored and > >>> we will miss the wakeup. That's why in my implementation I block IPI > >>> on the CPU thread at startup and then use pselect to atomically > >>> unblock and begin sleeping. The signal is sent unconditionally so > >>> there's no need to worry about races between actually sleeping and the > >>> "we think we're sleeping" state. It may lead to an extra wakeup but > >>> that's better than missing it entirely. > >> > >> Thanks a bunch for the comment! So the trick I was using here is to > >> modify the timespec from the kick function before sending the IPI > >> signal. That way, we know that either we are inside the sleep (where the > >> signal wakes it up) or we are outside the sleep (where timespec={} will > >> make it return immediately). > >> > >> The only race I can think of is if nanosleep does calculations based on > >> the timespec and we happen to send the signal right there and then. > > Yes that's the race I was thinking of. Admittedly it's a small window > > but it's theoretically possible and part of the reason why pselect was > > created. > > > >> The problem with blocking IPIs is basically what Frank was describing > >> earlier: How do you unset the IPI signal pending status? If the signal > >> is never delivered, how can pselect differentiate "signal from last time > >> is still pending" from "new signal because I got an IPI"? > > In this case we would take the additional wakeup which should be > > harmless since we will take the WFx exit again and put us in the > > correct state. But that's a lot better than busy looping. > > > I'm not sure I follow. I'm thinking of the following scenario: > > - trap into WFI handler > - go to sleep with blocked SIG_IPI > - SIG_IPI arrives, pselect() exits > - signal is still pending because it's blocked > - enter guest > - trap into WFI handler > - run pselect(), but it immediate exits because SIG_IPI is still pending > > This was the loop I was seeing when running with SIG_IPI blocked. That's > part of the reason why I switched to a different model. What I observe is that when returning from a pending signal pselect consumes the signal (which is also consistent with my understanding of what pselect does). That means that it doesn't matter if we take a second WFx exit because once we reach the pselect in the second WFx exit the signal will have been consumed by the pselect in the first exit and we will just wait for the next one. I don't know why things may have been going wrong in your implementation but it may be related to the issue with mach_absolute_time() which I posted about separately and was also causing busy loops for us in some cases. Once that issue was fixed in our implementation we started seeing sleep until VTIMER due work properly. > > > > I reckon that you could improve things a little by unblocking the > > signal and then reblocking it before unlocking iothread (e.g. with a > > pselect with zero time interval), which would flush any pending > > signals. Since any such signal would correspond to a signal from last > > time (because we still have the iothread lock) we know that any future > > signals should correspond to new IPIs. > > > Yeah, I think you actually *have* to do exactly that, because otherwise > pselect() will always return after 0ns because the signal is still pending. > > And yes, I agree that that starts to sound a bit less racy now. But it > means we can probably also just do > > - WFI handler > - block SIG_IPI > - set hvf->sleeping = true > - check for pending interrupts > - pselect() > - unblock SIG_IPI > > which means we run with SIG_IPI unmasked by default. I don't think the > number of signal mask changes is any different with that compared to > running with SIG_IPI always masked, right? And unlock/lock iothread around the pselect? I suppose that could work but as I mentioned it would just be an optimization. Maybe I can try to make my approach work on top of your series, or if you already have a patch I can try to debug it. Let me know. Peter
On 01.12.20 01:00, Peter Collingbourne wrote: > On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote: >> >> On 01.12.20 00:01, Peter Collingbourne wrote: >>> On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: >>>> Hi Peter, >>>> >>>> On 30.11.20 22:08, Peter Collingbourne wrote: >>>>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: >>>>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. >>>>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! >>>>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >>>>>>> >>>>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >>>>>>> >>>>>> Thanks, we'll take a look :) >>>>>> >>>>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. >>>>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so >>>>> I'll reply to your patch here. You have: >>>>> >>>>> + /* Set cpu->hvf->sleeping so that we get a >>>>> SIG_IPI signal. */ >>>>> + cpu->hvf->sleeping = true; >>>>> + smp_mb(); >>>>> + >>>>> + /* Bail out if we received an IRQ meanwhile */ >>>>> + if (cpu->thread_kicked || (cpu->interrupt_request & >>>>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { >>>>> + cpu->hvf->sleeping = false; >>>>> + break; >>>>> + } >>>>> + >>>>> + /* nanosleep returns on signal, so we wake up on kick. */ >>>>> + nanosleep(ts, NULL); >>>>> >>>>> and then send the signal conditional on whether sleeping is true, but >>>>> I think this is racy. If the signal is sent after sleeping is set to >>>>> true but before entering nanosleep then I think it will be ignored and >>>>> we will miss the wakeup. That's why in my implementation I block IPI >>>>> on the CPU thread at startup and then use pselect to atomically >>>>> unblock and begin sleeping. The signal is sent unconditionally so >>>>> there's no need to worry about races between actually sleeping and the >>>>> "we think we're sleeping" state. It may lead to an extra wakeup but >>>>> that's better than missing it entirely. >>>> Thanks a bunch for the comment! So the trick I was using here is to >>>> modify the timespec from the kick function before sending the IPI >>>> signal. That way, we know that either we are inside the sleep (where the >>>> signal wakes it up) or we are outside the sleep (where timespec={} will >>>> make it return immediately). >>>> >>>> The only race I can think of is if nanosleep does calculations based on >>>> the timespec and we happen to send the signal right there and then. >>> Yes that's the race I was thinking of. Admittedly it's a small window >>> but it's theoretically possible and part of the reason why pselect was >>> created. >>> >>>> The problem with blocking IPIs is basically what Frank was describing >>>> earlier: How do you unset the IPI signal pending status? If the signal >>>> is never delivered, how can pselect differentiate "signal from last time >>>> is still pending" from "new signal because I got an IPI"? >>> In this case we would take the additional wakeup which should be >>> harmless since we will take the WFx exit again and put us in the >>> correct state. But that's a lot better than busy looping. >> >> I'm not sure I follow. I'm thinking of the following scenario: >> >> - trap into WFI handler >> - go to sleep with blocked SIG_IPI >> - SIG_IPI arrives, pselect() exits >> - signal is still pending because it's blocked >> - enter guest >> - trap into WFI handler >> - run pselect(), but it immediate exits because SIG_IPI is still pending >> >> This was the loop I was seeing when running with SIG_IPI blocked. That's >> part of the reason why I switched to a different model. > What I observe is that when returning from a pending signal pselect > consumes the signal (which is also consistent with my understanding of > what pselect does). That means that it doesn't matter if we take a > second WFx exit because once we reach the pselect in the second WFx > exit the signal will have been consumed by the pselect in the first > exit and we will just wait for the next one. > > I don't know why things may have been going wrong in your > implementation but it may be related to the issue with > mach_absolute_time() which I posted about separately and was also > causing busy loops for us in some cases. Once that issue was fixed in > our implementation we started seeing sleep until VTIMER due work > properly. > >> >>> I reckon that you could improve things a little by unblocking the >>> signal and then reblocking it before unlocking iothread (e.g. with a >>> pselect with zero time interval), which would flush any pending >>> signals. Since any such signal would correspond to a signal from last >>> time (because we still have the iothread lock) we know that any future >>> signals should correspond to new IPIs. >> >> Yeah, I think you actually *have* to do exactly that, because otherwise >> pselect() will always return after 0ns because the signal is still pending. >> >> And yes, I agree that that starts to sound a bit less racy now. But it >> means we can probably also just do >> >> - WFI handler >> - block SIG_IPI >> - set hvf->sleeping = true >> - check for pending interrupts >> - pselect() >> - unblock SIG_IPI >> >> which means we run with SIG_IPI unmasked by default. I don't think the >> number of signal mask changes is any different with that compared to >> running with SIG_IPI always masked, right? > And unlock/lock iothread around the pselect? I suppose that could work > but as I mentioned it would just be an optimization. > > Maybe I can try to make my approach work on top of your series, or if > you already have a patch I can try to debug it. Let me know. I would love to take a patch from you here :). I'll still be stuck for a while with the sysreg sync rework that Peter asked for before I can look at WFI again. Alex
On Mon, Nov 30, 2020 at 10:40:49PM +0100, Alexander Graf wrote: > Hi Peter, > > On 30.11.20 22:08, Peter Collingbourne wrote: > > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > > > > > > > > > On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > > > > Hi Frank, > > > > > > > > Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > > > Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > > > > Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out > > > > > > > > https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > > > > > > > Thanks, we'll take a look :) > > > > > > > Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > > I'll reply to your patch here. You have: > > > > + /* Set cpu->hvf->sleeping so that we get a > > SIG_IPI signal. */ > > + cpu->hvf->sleeping = true; > > + smp_mb(); > > + > > + /* Bail out if we received an IRQ meanwhile */ > > + if (cpu->thread_kicked || (cpu->interrupt_request & > > + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > > + cpu->hvf->sleeping = false; > > + break; > > + } > > + > > + /* nanosleep returns on signal, so we wake up on kick. */ > > + nanosleep(ts, NULL); > > > > and then send the signal conditional on whether sleeping is true, but > > I think this is racy. If the signal is sent after sleeping is set to > > true but before entering nanosleep then I think it will be ignored and > > we will miss the wakeup. That's why in my implementation I block IPI > > on the CPU thread at startup and then use pselect to atomically > > unblock and begin sleeping. The signal is sent unconditionally so > > there's no need to worry about races between actually sleeping and the > > "we think we're sleeping" state. It may lead to an extra wakeup but > > that's better than missing it entirely. > > > Thanks a bunch for the comment! So the trick I was using here is to modify > the timespec from the kick function before sending the IPI signal. That way, > we know that either we are inside the sleep (where the signal wakes it up) > or we are outside the sleep (where timespec={} will make it return > immediately). > > The only race I can think of is if nanosleep does calculations based on the > timespec and we happen to send the signal right there and then. > > The problem with blocking IPIs is basically what Frank was describing > earlier: How do you unset the IPI signal pending status? If the signal is > never delivered, how can pselect differentiate "signal from last time is > still pending" from "new signal because I got an IPI"? > > Hi Alex, There was a patch for x86 HVF that implements CPU kick and it wasn't merged (mostly because of my lazyness). It has some changes like you introduced in the series and VMX-specific handling of preemption timer to gurantee interrupt delivery without kick loss: https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/ I wonder if it'd possible to have common handling of kicks for both x86 and arm (given that arch-specific bits are wrapped)? Thanks, Roman
On Mon, Nov 30, 2020 at 2:10 PM Peter Maydell <peter.maydell@linaro.org> wrote: > On Mon, 30 Nov 2020 at 20:56, Frank Yang <lfy@google.com> wrote: > > We'd actually like to contribute upstream too :) We do want to maintain > > our own downstream though; Android Emulator codebase needs to work > > solidly on macos and windows which has made keeping up with upstream > difficult > > One of the main reasons why OSX and Windows support upstream is > not so great is because very few people are helping to develop, > test and support it upstream. The way to fix that IMHO is for more > people who do care about those platforms to actively engage > with us upstream to help in making those platforms move closer to > being first class citizens. If you stay on a downstream fork > forever then I don't think you'll ever see things improve. > > thanks > -- PMM > That's a really good point. I'll definitely be more active about sending comments upstream in the future :) Frank
On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote: > On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote: > > > > > > On 01.12.20 00:01, Peter Collingbourne wrote: > > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: > > >> Hi Peter, > > >> > > >> On 30.11.20 22:08, Peter Collingbourne wrote: > > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > > >>>> > > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > > >>>>> Hi Frank, > > >>>>> > > >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > > >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > > >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out > > >>>>> > > >>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > > >>>>> > > >>>> Thanks, we'll take a look :) > > >>>> > > >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > > >>> I'll reply to your patch here. You have: > > >>> > > >>> + /* Set cpu->hvf->sleeping so that we get a > > >>> SIG_IPI signal. */ > > >>> + cpu->hvf->sleeping = true; > > >>> + smp_mb(); > > >>> + > > >>> + /* Bail out if we received an IRQ meanwhile */ > > >>> + if (cpu->thread_kicked || (cpu->interrupt_request & > > >>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > > >>> + cpu->hvf->sleeping = false; > > >>> + break; > > >>> + } > > >>> + > > >>> + /* nanosleep returns on signal, so we wake up on kick. */ > > >>> + nanosleep(ts, NULL); > > >>> > > >>> and then send the signal conditional on whether sleeping is true, but > > >>> I think this is racy. If the signal is sent after sleeping is set to > > >>> true but before entering nanosleep then I think it will be ignored and > > >>> we will miss the wakeup. That's why in my implementation I block IPI > > >>> on the CPU thread at startup and then use pselect to atomically > > >>> unblock and begin sleeping. The signal is sent unconditionally so > > >>> there's no need to worry about races between actually sleeping and the > > >>> "we think we're sleeping" state. It may lead to an extra wakeup but > > >>> that's better than missing it entirely. > > >> > > >> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI > > >> signal. That way, we know that either we are inside the sleep (where the > > >> signal wakes it up) or we are outside the sleep (where timespec={} will > > >> make it return immediately). > > >> > > >> The only race I can think of is if nanosleep does calculations based on > > >> the timespec and we happen to send the signal right there and then. > > > Yes that's the race I was thinking of. Admittedly it's a small window > > > but it's theoretically possible and part of the reason why pselect was > > > created. > > > > > >> The problem with blocking IPIs is basically what Frank was describing > > >> earlier: How do you unset the IPI signal pending status? If the signal > > >> is never delivered, how can pselect differentiate "signal from last time > > >> is still pending" from "new signal because I got an IPI"? > > > In this case we would take the additional wakeup which should be > > > harmless since we will take the WFx exit again and put us in the > > > correct state. But that's a lot better than busy looping. > > > > > > I'm not sure I follow. I'm thinking of the following scenario: > > > > - trap into WFI handler > > - go to sleep with blocked SIG_IPI > > - SIG_IPI arrives, pselect() exits > > - signal is still pending because it's blocked > > - enter guest > > - trap into WFI handler > > - run pselect(), but it immediate exits because SIG_IPI is still pending > > > > This was the loop I was seeing when running with SIG_IPI blocked. That's > > part of the reason why I switched to a different model. > > What I observe is that when returning from a pending signal pselect > consumes the signal (which is also consistent with my understanding of > what pselect does). That means that it doesn't matter if we take a > second WFx exit because once we reach the pselect in the second WFx > exit the signal will have been consumed by the pselect in the first > exit and we will just wait for the next one. > Aha! Thanks for the explanation. So, the first WFI in the series of guest WFIs will likely wake up immediately? After a period without WFIs there must be a pending SIG_IPI... It shouldn't be a critical issue though because (as defined in D1.16.2) "the architecture permits a PE to leave the low-power state for any reason, it is permissible for a PE to treat WFI as a NOP, but this is not recommended for lowest power operation." BTW. I think a bit from the thread should go into the description of patch 8, because it's not trivial and it would really be helpful to keep in repo history. At least something like this (taken from an earlier reply in the thread): In this implementation IPI is blocked on the CPU thread at startup and pselect() is used to atomically unblock the signal and begin sleeping. The signal is sent unconditionally so there's no need to worry about races between actually sleeping and the "we think we're sleeping" state. It may lead to an extra wakeup but that's better than missing it entirely. Thanks, Roman > I don't know why things may have been going wrong in your > implementation but it may be related to the issue with > mach_absolute_time() which I posted about separately and was also > causing busy loops for us in some cases. Once that issue was fixed in > our implementation we started seeing sleep until VTIMER due work > properly. > > > > > > > > I reckon that you could improve things a little by unblocking the > > > signal and then reblocking it before unlocking iothread (e.g. with a > > > pselect with zero time interval), which would flush any pending > > > signals. Since any such signal would correspond to a signal from last > > > time (because we still have the iothread lock) we know that any future > > > signals should correspond to new IPIs. > > > > > > Yeah, I think you actually *have* to do exactly that, because otherwise > > pselect() will always return after 0ns because the signal is still pending. > > > > And yes, I agree that that starts to sound a bit less racy now. But it > > means we can probably also just do > > > > - WFI handler > > - block SIG_IPI > > - set hvf->sleeping = true > > - check for pending interrupts > > - pselect() > > - unblock SIG_IPI > > > > which means we run with SIG_IPI unmasked by default. I don't think the > > number of signal mask changes is any different with that compared to > > running with SIG_IPI always masked, right? > P.S. Just found that Alex already raised my concern. Pending signals have to be consumed or there should be no pending signals to start sleeping on the very first WFI. > And unlock/lock iothread around the pselect? I suppose that could work > but as I mentioned it would just be an optimization. > > Maybe I can try to make my approach work on top of your series, or if > you already have a patch I can try to debug it. Let me know. > > Peter
On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote: > > On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote: > > On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote: > > > > > > > > > On 01.12.20 00:01, Peter Collingbourne wrote: > > > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: > > > >> Hi Peter, > > > >> > > > >> On 30.11.20 22:08, Peter Collingbourne wrote: > > > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: > > > >>>> > > > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: > > > >>>>> Hi Frank, > > > >>>>> > > > >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. > > > >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! > > > >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out > > > >>>>> > > > >>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ > > > >>>>> > > > >>>> Thanks, we'll take a look :) > > > >>>> > > > >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. > > > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so > > > >>> I'll reply to your patch here. You have: > > > >>> > > > >>> + /* Set cpu->hvf->sleeping so that we get a > > > >>> SIG_IPI signal. */ > > > >>> + cpu->hvf->sleeping = true; > > > >>> + smp_mb(); > > > >>> + > > > >>> + /* Bail out if we received an IRQ meanwhile */ > > > >>> + if (cpu->thread_kicked || (cpu->interrupt_request & > > > >>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { > > > >>> + cpu->hvf->sleeping = false; > > > >>> + break; > > > >>> + } > > > >>> + > > > >>> + /* nanosleep returns on signal, so we wake up on kick. */ > > > >>> + nanosleep(ts, NULL); > > > >>> > > > >>> and then send the signal conditional on whether sleeping is true, but > > > >>> I think this is racy. If the signal is sent after sleeping is set to > > > >>> true but before entering nanosleep then I think it will be ignored and > > > >>> we will miss the wakeup. That's why in my implementation I block IPI > > > >>> on the CPU thread at startup and then use pselect to atomically > > > >>> unblock and begin sleeping. The signal is sent unconditionally so > > > >>> there's no need to worry about races between actually sleeping and the > > > >>> "we think we're sleeping" state. It may lead to an extra wakeup but > > > >>> that's better than missing it entirely. > > > >> > > > >> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI > > > >> signal. That way, we know that either we are inside the sleep (where the > > > >> signal wakes it up) or we are outside the sleep (where timespec={} will > > > >> make it return immediately). > > > >> > > > >> The only race I can think of is if nanosleep does calculations based on > > > >> the timespec and we happen to send the signal right there and then. > > > > Yes that's the race I was thinking of. Admittedly it's a small window > > > > but it's theoretically possible and part of the reason why pselect was > > > > created. > > > > > > > >> The problem with blocking IPIs is basically what Frank was describing > > > >> earlier: How do you unset the IPI signal pending status? If the signal > > > >> is never delivered, how can pselect differentiate "signal from last time > > > >> is still pending" from "new signal because I got an IPI"? > > > > In this case we would take the additional wakeup which should be > > > > harmless since we will take the WFx exit again and put us in the > > > > correct state. But that's a lot better than busy looping. > > > > > > > > > I'm not sure I follow. I'm thinking of the following scenario: > > > > > > - trap into WFI handler > > > - go to sleep with blocked SIG_IPI > > > - SIG_IPI arrives, pselect() exits > > > - signal is still pending because it's blocked > > > - enter guest > > > - trap into WFI handler > > > - run pselect(), but it immediate exits because SIG_IPI is still pending > > > > > > This was the loop I was seeing when running with SIG_IPI blocked. That's > > > part of the reason why I switched to a different model. > > > > What I observe is that when returning from a pending signal pselect > > consumes the signal (which is also consistent with my understanding of > > what pselect does). That means that it doesn't matter if we take a > > second WFx exit because once we reach the pselect in the second WFx > > exit the signal will have been consumed by the pselect in the first > > exit and we will just wait for the next one. > > > > Aha! Thanks for the explanation. So, the first WFI in the series of > guest WFIs will likely wake up immediately? After a period without WFIs > there must be a pending SIG_IPI... > > It shouldn't be a critical issue though because (as defined in D1.16.2) > "the architecture permits a PE to leave the low-power state for any > reason, it is permissible for a PE to treat WFI as a NOP, but this is > not recommended for lowest power operation." > > BTW. I think a bit from the thread should go into the description of > patch 8, because it's not trivial and it would really be helpful to keep > in repo history. At least something like this (taken from an earlier > reply in the thread): > > In this implementation IPI is blocked on the CPU thread at startup and > pselect() is used to atomically unblock the signal and begin sleeping. > The signal is sent unconditionally so there's no need to worry about > races between actually sleeping and the "we think we're sleeping" > state. It may lead to an extra wakeup but that's better than missing > it entirely. Okay, I'll add something like that to the next version of the patch I send out. Peter > > > Thanks, > Roman > > > I don't know why things may have been going wrong in your > > implementation but it may be related to the issue with > > mach_absolute_time() which I posted about separately and was also > > causing busy loops for us in some cases. Once that issue was fixed in > > our implementation we started seeing sleep until VTIMER due work > > properly. > > > > > > > > > > > > I reckon that you could improve things a little by unblocking the > > > > signal and then reblocking it before unlocking iothread (e.g. with a > > > > pselect with zero time interval), which would flush any pending > > > > signals. Since any such signal would correspond to a signal from last > > > > time (because we still have the iothread lock) we know that any future > > > > signals should correspond to new IPIs. > > > > > > > > > Yeah, I think you actually *have* to do exactly that, because otherwise > > > pselect() will always return after 0ns because the signal is still pending. > > > > > > And yes, I agree that that starts to sound a bit less racy now. But it > > > means we can probably also just do > > > > > > - WFI handler > > > - block SIG_IPI > > > - set hvf->sleeping = true > > > - check for pending interrupts > > > - pselect() > > > - unblock SIG_IPI > > > > > > which means we run with SIG_IPI unmasked by default. I don't think the > > > number of signal mask changes is any different with that compared to > > > running with SIG_IPI always masked, right? > > > > P.S. Just found that Alex already raised my concern. Pending signals > have to be consumed or there should be no pending signals to start > sleeping on the very first WFI. > > > And unlock/lock iothread around the pselect? I suppose that could work > > but as I mentioned it would just be an optimization. > > > > Maybe I can try to make my approach work on top of your series, or if > > you already have a patch I can try to debug it. Let me know. > > > > Peter
On 03.12.20 19:42, Peter Collingbourne wrote: > On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote: >> On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote: >>> On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote: >>>> >>>> On 01.12.20 00:01, Peter Collingbourne wrote: >>>>> On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote: >>>>>> Hi Peter, >>>>>> >>>>>> On 30.11.20 22:08, Peter Collingbourne wrote: >>>>>>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote: >>>>>>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote: >>>>>>>>> Hi Frank, >>>>>>>>> >>>>>>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse. >>>>>>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either! >>>>>>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out >>>>>>>>> >>>>>>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/ >>>>>>>>> >>>>>>>> Thanks, we'll take a look :) >>>>>>>> >>>>>>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold. >>>>>>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so >>>>>>> I'll reply to your patch here. You have: >>>>>>> >>>>>>> + /* Set cpu->hvf->sleeping so that we get a >>>>>>> SIG_IPI signal. */ >>>>>>> + cpu->hvf->sleeping = true; >>>>>>> + smp_mb(); >>>>>>> + >>>>>>> + /* Bail out if we received an IRQ meanwhile */ >>>>>>> + if (cpu->thread_kicked || (cpu->interrupt_request & >>>>>>> + (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) { >>>>>>> + cpu->hvf->sleeping = false; >>>>>>> + break; >>>>>>> + } >>>>>>> + >>>>>>> + /* nanosleep returns on signal, so we wake up on kick. */ >>>>>>> + nanosleep(ts, NULL); >>>>>>> >>>>>>> and then send the signal conditional on whether sleeping is true, but >>>>>>> I think this is racy. If the signal is sent after sleeping is set to >>>>>>> true but before entering nanosleep then I think it will be ignored and >>>>>>> we will miss the wakeup. That's why in my implementation I block IPI >>>>>>> on the CPU thread at startup and then use pselect to atomically >>>>>>> unblock and begin sleeping. The signal is sent unconditionally so >>>>>>> there's no need to worry about races between actually sleeping and the >>>>>>> "we think we're sleeping" state. It may lead to an extra wakeup but >>>>>>> that's better than missing it entirely. >>>>>> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI >>>>>> signal. That way, we know that either we are inside the sleep (where the >>>>>> signal wakes it up) or we are outside the sleep (where timespec={} will >>>>>> make it return immediately). >>>>>> >>>>>> The only race I can think of is if nanosleep does calculations based on >>>>>> the timespec and we happen to send the signal right there and then. >>>>> Yes that's the race I was thinking of. Admittedly it's a small window >>>>> but it's theoretically possible and part of the reason why pselect was >>>>> created. >>>>> >>>>>> The problem with blocking IPIs is basically what Frank was describing >>>>>> earlier: How do you unset the IPI signal pending status? If the signal >>>>>> is never delivered, how can pselect differentiate "signal from last time >>>>>> is still pending" from "new signal because I got an IPI"? >>>>> In this case we would take the additional wakeup which should be >>>>> harmless since we will take the WFx exit again and put us in the >>>>> correct state. But that's a lot better than busy looping. >>>> >>>> I'm not sure I follow. I'm thinking of the following scenario: >>>> >>>> - trap into WFI handler >>>> - go to sleep with blocked SIG_IPI >>>> - SIG_IPI arrives, pselect() exits >>>> - signal is still pending because it's blocked >>>> - enter guest >>>> - trap into WFI handler >>>> - run pselect(), but it immediate exits because SIG_IPI is still pending >>>> >>>> This was the loop I was seeing when running with SIG_IPI blocked. That's >>>> part of the reason why I switched to a different model. >>> What I observe is that when returning from a pending signal pselect >>> consumes the signal (which is also consistent with my understanding of >>> what pselect does). That means that it doesn't matter if we take a >>> second WFx exit because once we reach the pselect in the second WFx >>> exit the signal will have been consumed by the pselect in the first >>> exit and we will just wait for the next one. >>> >> Aha! Thanks for the explanation. So, the first WFI in the series of >> guest WFIs will likely wake up immediately? After a period without WFIs >> there must be a pending SIG_IPI... >> >> It shouldn't be a critical issue though because (as defined in D1.16.2) >> "the architecture permits a PE to leave the low-power state for any >> reason, it is permissible for a PE to treat WFI as a NOP, but this is >> not recommended for lowest power operation." >> >> BTW. I think a bit from the thread should go into the description of >> patch 8, because it's not trivial and it would really be helpful to keep >> in repo history. At least something like this (taken from an earlier >> reply in the thread): >> >> In this implementation IPI is blocked on the CPU thread at startup and >> pselect() is used to atomically unblock the signal and begin sleeping. >> The signal is sent unconditionally so there's no need to worry about >> races between actually sleeping and the "we think we're sleeping" >> state. It may lead to an extra wakeup but that's better than missing >> it entirely. > Okay, I'll add something like that to the next version of the patch I send out. If this is the only change, I've already added it for v4. If you want me to change it further, just let me know what to replace the patch description with. Alex
On Thu, Dec 03, 2020 at 11:13:35PM +0100, Alexander Graf wrote: > > On 03.12.20 19:42, Peter Collingbourne wrote: > > On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote: > > > On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote: > > > > What I observe is that when returning from a pending signal pselect > > > > consumes the signal (which is also consistent with my understanding of > > > > what pselect does). That means that it doesn't matter if we take a > > > > second WFx exit because once we reach the pselect in the second WFx > > > > exit the signal will have been consumed by the pselect in the first > > > > exit and we will just wait for the next one. > > > > > > > Aha! Thanks for the explanation. So, the first WFI in the series of > > > guest WFIs will likely wake up immediately? After a period without WFIs > > > there must be a pending SIG_IPI... > > > > > > It shouldn't be a critical issue though because (as defined in D1.16.2) > > > "the architecture permits a PE to leave the low-power state for any > > > reason, it is permissible for a PE to treat WFI as a NOP, but this is > > > not recommended for lowest power operation." > > > > > > BTW. I think a bit from the thread should go into the description of > > > patch 8, because it's not trivial and it would really be helpful to keep > > > in repo history. At least something like this (taken from an earlier > > > reply in the thread): > > > > > > In this implementation IPI is blocked on the CPU thread at startup and > > > pselect() is used to atomically unblock the signal and begin sleeping. > > > The signal is sent unconditionally so there's no need to worry about > > > races between actually sleeping and the "we think we're sleeping" > > > state. It may lead to an extra wakeup but that's better than missing > > > it entirely. > > Okay, I'll add something like that to the next version of the patch I send out. > > > If this is the only change, I've already added it for v4. If you want me to > change it further, just let me know what to replace the patch description > with. > > Thanks, Alex. I'm fine with the description and all set. -Roman
diff --git a/MAINTAINERS b/MAINTAINERS index 68bc160f41..ca4b6d9279 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com> M: Roman Bolshakov <r.bolshakov@yadro.com> W: https://wiki.qemu.org/Features/HVF S: Maintained -F: accel/stubs/hvf-stub.c F: target/i386/hvf/ + +HVF +M: Cameron Esfahani <dirty@apple.com> +M: Roman Bolshakov <r.bolshakov@yadro.com> +W: https://wiki.qemu.org/Features/HVF +S: Maintained +F: accel/hvf/ F: include/sysemu/hvf.h +F: include/sysemu/hvf_int.h WHPX CPUs M: Sunil Muthuswamy <sunilmut@microsoft.com> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c new file mode 100644 index 0000000000..47d77a472a --- /dev/null +++ b/accel/hvf/hvf-all.c @@ -0,0 +1,56 @@ +/* + * QEMU Hypervisor.framework support + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "qemu/error-report.h" +#include "sysemu/hvf.h" +#include "sysemu/hvf_int.h" +#include "sysemu/runstate.h" + +#include "qemu/main-loop.h" +#include "sysemu/accel.h" + +#include <Hypervisor/Hypervisor.h> + +bool hvf_allowed; +HVFState *hvf_state; + +void assert_hvf_ok(hv_return_t ret) +{ + if (ret == HV_SUCCESS) { + return; + } + + switch (ret) { + case HV_ERROR: + error_report("Error: HV_ERROR"); + break; + case HV_BUSY: + error_report("Error: HV_BUSY"); + break; + case HV_BAD_ARGUMENT: + error_report("Error: HV_BAD_ARGUMENT"); + break; + case HV_NO_RESOURCES: + error_report("Error: HV_NO_RESOURCES"); + break; + case HV_NO_DEVICE: + error_report("Error: HV_NO_DEVICE"); + break; + case HV_UNSUPPORTED: + error_report("Error: HV_UNSUPPORTED"); + break; + default: + error_report("Unknown Error"); + } + + abort(); +} diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c new file mode 100644 index 0000000000..f9bb5502b7 --- /dev/null +++ b/accel/hvf/hvf-cpus.c @@ -0,0 +1,468 @@ +/* + * Copyright 2008 IBM Corporation + * 2008 Red Hat, Inc. + * Copyright 2011 Intel Corporation + * Copyright 2016 Veertu, Inc. + * Copyright 2017 The Android Open Source Project + * + * QEMU Hypervisor.framework support + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + * + * This file contain code under public domain from the hvdos project: + * https://github.com/mist64/hvdos + * + * Parts Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qemu/main-loop.h" +#include "exec/address-spaces.h" +#include "exec/exec-all.h" +#include "sysemu/cpus.h" +#include "sysemu/hvf.h" +#include "sysemu/hvf_int.h" +#include "sysemu/runstate.h" +#include "qemu/guest-random.h" + +#include <Hypervisor/Hypervisor.h> + +/* Memory slots */ + +struct mac_slot { + int present; + uint64_t size; + uint64_t gpa_start; + uint64_t gva; +}; + +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) +{ + hvf_slot *slot; + int x; + for (x = 0; x < hvf_state->num_slots; ++x) { + slot = &hvf_state->slots[x]; + if (slot->size && start < (slot->start + slot->size) && + (start + size) > slot->start) { + return slot; + } + } + return NULL; +} + +struct mac_slot mac_slots[32]; + +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) +{ + struct mac_slot *macslot; + hv_return_t ret; + + macslot = &mac_slots[slot->slot_id]; + + if (macslot->present) { + if (macslot->size != slot->size) { + macslot->present = 0; + ret = hv_vm_unmap(macslot->gpa_start, macslot->size); + assert_hvf_ok(ret); + } + } + + if (!slot->size) { + return 0; + } + + macslot->present = 1; + macslot->gpa_start = slot->start; + macslot->size = slot->size; + ret = hv_vm_map(slot->mem, slot->start, slot->size, flags); + assert_hvf_ok(ret); + return 0; +} + +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add) +{ + hvf_slot *mem; + MemoryRegion *area = section->mr; + bool writeable = !area->readonly && !area->rom_device; + hv_memory_flags_t flags; + + if (!memory_region_is_ram(area)) { + if (writeable) { + return; + } else if (!memory_region_is_romd(area)) { + /* + * If the memory device is not in romd_mode, then we actually want + * to remove the hvf memory slot so all accesses will trap. + */ + add = false; + } + } + + mem = hvf_find_overlap_slot( + section->offset_within_address_space, + int128_get64(section->size)); + + if (mem && add) { + if (mem->size == int128_get64(section->size) && + mem->start == section->offset_within_address_space && + mem->mem == (memory_region_get_ram_ptr(area) + + section->offset_within_region)) { + return; /* Same region was attempted to register, go away. */ + } + } + + /* Region needs to be reset. set the size to 0 and remap it. */ + if (mem) { + mem->size = 0; + if (do_hvf_set_memory(mem, 0)) { + error_report("Failed to reset overlapping slot"); + abort(); + } + } + + if (!add) { + return; + } + + if (area->readonly || + (!memory_region_is_ram(area) && memory_region_is_romd(area))) { + flags = HV_MEMORY_READ | HV_MEMORY_EXEC; + } else { + flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; + } + + /* Now make a new slot. */ + int x; + + for (x = 0; x < hvf_state->num_slots; ++x) { + mem = &hvf_state->slots[x]; + if (!mem->size) { + break; + } + } + + if (x == hvf_state->num_slots) { + error_report("No free slots"); + abort(); + } + + mem->size = int128_get64(section->size); + mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; + mem->start = section->offset_within_address_space; + mem->region = area; + + if (do_hvf_set_memory(mem, flags)) { + error_report("Error registering new memory slot"); + abort(); + } +} + +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) +{ + hvf_slot *slot; + + slot = hvf_find_overlap_slot( + section->offset_within_address_space, + int128_get64(section->size)); + + /* protect region against writes; begin tracking it */ + if (on) { + slot->flags |= HVF_SLOT_LOG; + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, + HV_MEMORY_READ); + /* stop tracking region*/ + } else { + slot->flags &= ~HVF_SLOT_LOG; + hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size, + HV_MEMORY_READ | HV_MEMORY_WRITE); + } +} + +static void hvf_log_start(MemoryListener *listener, + MemoryRegionSection *section, int old, int new) +{ + if (old != 0) { + return; + } + + hvf_set_dirty_tracking(section, 1); +} + +static void hvf_log_stop(MemoryListener *listener, + MemoryRegionSection *section, int old, int new) +{ + if (new != 0) { + return; + } + + hvf_set_dirty_tracking(section, 0); +} + +static void hvf_log_sync(MemoryListener *listener, + MemoryRegionSection *section) +{ + /* + * sync of dirty pages is handled elsewhere; just make sure we keep + * tracking the region. + */ + hvf_set_dirty_tracking(section, 1); +} + +static void hvf_region_add(MemoryListener *listener, + MemoryRegionSection *section) +{ + hvf_set_phys_mem(section, true); +} + +static void hvf_region_del(MemoryListener *listener, + MemoryRegionSection *section) +{ + hvf_set_phys_mem(section, false); +} + +static MemoryListener hvf_memory_listener = { + .priority = 10, + .region_add = hvf_region_add, + .region_del = hvf_region_del, + .log_start = hvf_log_start, + .log_stop = hvf_log_stop, + .log_sync = hvf_log_sync, +}; + +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) +{ + if (!cpu->vcpu_dirty) { + hvf_get_registers(cpu); + cpu->vcpu_dirty = true; + } +} + +static void hvf_cpu_synchronize_state(CPUState *cpu) +{ + if (!cpu->vcpu_dirty) { + run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); + } +} + +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, + run_on_cpu_data arg) +{ + hvf_put_registers(cpu); + cpu->vcpu_dirty = false; +} + +static void hvf_cpu_synchronize_post_reset(CPUState *cpu) +{ + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); +} + +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, + run_on_cpu_data arg) +{ + hvf_put_registers(cpu); + cpu->vcpu_dirty = false; +} + +static void hvf_cpu_synchronize_post_init(CPUState *cpu) +{ + run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); +} + +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, + run_on_cpu_data arg) +{ + cpu->vcpu_dirty = true; +} + +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) +{ + run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); +} + +static void hvf_vcpu_destroy(CPUState *cpu) +{ + hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd); + assert_hvf_ok(ret); + + hvf_arch_vcpu_destroy(cpu); +} + +static void dummy_signal(int sig) +{ +} + +static int hvf_init_vcpu(CPUState *cpu) +{ + int r; + + /* init cpu signals */ + sigset_t set; + struct sigaction sigact; + + memset(&sigact, 0, sizeof(sigact)); + sigact.sa_handler = dummy_signal; + sigaction(SIG_IPI, &sigact, NULL); + + pthread_sigmask(SIG_BLOCK, NULL, &set); + sigdelset(&set, SIG_IPI); + +#ifdef __aarch64__ + r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL); +#else + r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); +#endif + cpu->vcpu_dirty = 1; + assert_hvf_ok(r); + + return hvf_arch_init_vcpu(cpu); +} + +/* + * The HVF-specific vCPU thread function. This one should only run when the host + * CPU supports the VMX "unrestricted guest" feature. + */ +static void *hvf_cpu_thread_fn(void *arg) +{ + CPUState *cpu = arg; + + int r; + + assert(hvf_enabled()); + + rcu_register_thread(); + + qemu_mutex_lock_iothread(); + qemu_thread_get_self(cpu->thread); + + cpu->thread_id = qemu_get_thread_id(); + cpu->can_do_io = 1; + current_cpu = cpu; + + hvf_init_vcpu(cpu); + + /* signal CPU creation */ + cpu_thread_signal_created(cpu); + qemu_guest_random_seed_thread_part2(cpu->random_seed); + + do { + if (cpu_can_run(cpu)) { + r = hvf_vcpu_exec(cpu); + if (r == EXCP_DEBUG) { + cpu_handle_guest_debug(cpu); + } + } + qemu_wait_io_event(cpu); + } while (!cpu->unplug || cpu_can_run(cpu)); + + hvf_vcpu_destroy(cpu); + cpu_thread_signal_destroyed(cpu); + qemu_mutex_unlock_iothread(); + rcu_unregister_thread(); + return NULL; +} + +static void hvf_start_vcpu_thread(CPUState *cpu) +{ + char thread_name[VCPU_THREAD_NAME_SIZE]; + + /* + * HVF currently does not support TCG, and only runs in + * unrestricted-guest mode. + */ + assert(hvf_enabled()); + + cpu->thread = g_malloc0(sizeof(QemuThread)); + cpu->halt_cond = g_malloc0(sizeof(QemuCond)); + qemu_cond_init(cpu->halt_cond); + + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", + cpu->cpu_index); + qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, + cpu, QEMU_THREAD_JOINABLE); +} + +static const CpusAccel hvf_cpus = { + .create_vcpu_thread = hvf_start_vcpu_thread, + + .synchronize_post_reset = hvf_cpu_synchronize_post_reset, + .synchronize_post_init = hvf_cpu_synchronize_post_init, + .synchronize_state = hvf_cpu_synchronize_state, + .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, +}; + +static int hvf_accel_init(MachineState *ms) +{ + int x; + hv_return_t ret; + HVFState *s; + + ret = hv_vm_create(HV_VM_DEFAULT); + assert_hvf_ok(ret); + + s = g_new0(HVFState, 1); + + s->num_slots = 32; + for (x = 0; x < s->num_slots; ++x) { + s->slots[x].size = 0; + s->slots[x].slot_id = x; + } + + hvf_state = s; + memory_listener_register(&hvf_memory_listener, &address_space_memory); + cpus_register_accel(&hvf_cpus); + return 0; +} + +static void hvf_accel_class_init(ObjectClass *oc, void *data) +{ + AccelClass *ac = ACCEL_CLASS(oc); + ac->name = "HVF"; + ac->init_machine = hvf_accel_init; + ac->allowed = &hvf_allowed; +} + +static const TypeInfo hvf_accel_type = { + .name = TYPE_HVF_ACCEL, + .parent = TYPE_ACCEL, + .class_init = hvf_accel_class_init, +}; + +static void hvf_type_init(void) +{ + type_register_static(&hvf_accel_type); +} + +type_init(hvf_type_init); diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build new file mode 100644 index 0000000000..dfd6b68dc7 --- /dev/null +++ b/accel/hvf/meson.build @@ -0,0 +1,7 @@ +hvf_ss = ss.source_set() +hvf_ss.add(files( + 'hvf-all.c', + 'hvf-cpus.c', +)) + +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss) diff --git a/accel/meson.build b/accel/meson.build index b26cca227a..6de12ce5d5 100644 --- a/accel/meson.build +++ b/accel/meson.build @@ -1,5 +1,6 @@ softmmu_ss.add(files('accel.c')) +subdir('hvf') subdir('qtest') subdir('kvm') subdir('tcg') diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h new file mode 100644 index 0000000000..de9bad23a8 --- /dev/null +++ b/include/sysemu/hvf_int.h @@ -0,0 +1,69 @@ +/* + * QEMU Hypervisor.framework (HVF) support + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +/* header to be included in HVF-specific code */ + +#ifndef HVF_INT_H +#define HVF_INT_H + +#include <Hypervisor/Hypervisor.h> + +#define HVF_MAX_VCPU 0x10 + +extern struct hvf_state hvf_global; + +struct hvf_vm { + int id; + struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; +}; + +struct hvf_state { + uint32_t version; + struct hvf_vm *vm; + uint64_t mem_quota; +}; + +/* hvf_slot flags */ +#define HVF_SLOT_LOG (1 << 0) + +typedef struct hvf_slot { + uint64_t start; + uint64_t size; + uint8_t *mem; + int slot_id; + uint32_t flags; + MemoryRegion *region; +} hvf_slot; + +typedef struct hvf_vcpu_caps { + uint64_t vmx_cap_pinbased; + uint64_t vmx_cap_procbased; + uint64_t vmx_cap_procbased2; + uint64_t vmx_cap_entry; + uint64_t vmx_cap_exit; + uint64_t vmx_cap_preemption_timer; +} hvf_vcpu_caps; + +struct HVFState { + AccelState parent; + hvf_slot slots[32]; + int num_slots; + + hvf_vcpu_caps *hvf_caps; +}; +extern HVFState *hvf_state; + +void assert_hvf_ok(hv_return_t ret); +int hvf_get_registers(CPUState *cpu); +int hvf_put_registers(CPUState *cpu); +int hvf_arch_init_vcpu(CPUState *cpu); +void hvf_arch_vcpu_destroy(CPUState *cpu); +int hvf_vcpu_exec(CPUState *cpu); +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); + +#endif diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c deleted file mode 100644 index 817b3d7452..0000000000 --- a/target/i386/hvf/hvf-cpus.c +++ /dev/null @@ -1,131 +0,0 @@ -/* - * Copyright 2008 IBM Corporation - * 2008 Red Hat, Inc. - * Copyright 2011 Intel Corporation - * Copyright 2016 Veertu, Inc. - * Copyright 2017 The Android Open Source Project - * - * QEMU Hypervisor.framework support - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, see <http://www.gnu.org/licenses/>. - * - * This file contain code under public domain from the hvdos project: - * https://github.com/mist64/hvdos - * - * Parts Copyright (c) 2011 NetApp, Inc. - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include "qemu/osdep.h" -#include "qemu/error-report.h" -#include "qemu/main-loop.h" -#include "sysemu/hvf.h" -#include "sysemu/runstate.h" -#include "target/i386/cpu.h" -#include "qemu/guest-random.h" - -#include "hvf-cpus.h" - -/* - * The HVF-specific vCPU thread function. This one should only run when the host - * CPU supports the VMX "unrestricted guest" feature. - */ -static void *hvf_cpu_thread_fn(void *arg) -{ - CPUState *cpu = arg; - - int r; - - assert(hvf_enabled()); - - rcu_register_thread(); - - qemu_mutex_lock_iothread(); - qemu_thread_get_self(cpu->thread); - - cpu->thread_id = qemu_get_thread_id(); - cpu->can_do_io = 1; - current_cpu = cpu; - - hvf_init_vcpu(cpu); - - /* signal CPU creation */ - cpu_thread_signal_created(cpu); - qemu_guest_random_seed_thread_part2(cpu->random_seed); - - do { - if (cpu_can_run(cpu)) { - r = hvf_vcpu_exec(cpu); - if (r == EXCP_DEBUG) { - cpu_handle_guest_debug(cpu); - } - } - qemu_wait_io_event(cpu); - } while (!cpu->unplug || cpu_can_run(cpu)); - - hvf_vcpu_destroy(cpu); - cpu_thread_signal_destroyed(cpu); - qemu_mutex_unlock_iothread(); - rcu_unregister_thread(); - return NULL; -} - -static void hvf_start_vcpu_thread(CPUState *cpu) -{ - char thread_name[VCPU_THREAD_NAME_SIZE]; - - /* - * HVF currently does not support TCG, and only runs in - * unrestricted-guest mode. - */ - assert(hvf_enabled()); - - cpu->thread = g_malloc0(sizeof(QemuThread)); - cpu->halt_cond = g_malloc0(sizeof(QemuCond)); - qemu_cond_init(cpu->halt_cond); - - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF", - cpu->cpu_index); - qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn, - cpu, QEMU_THREAD_JOINABLE); -} - -const CpusAccel hvf_cpus = { - .create_vcpu_thread = hvf_start_vcpu_thread, - - .synchronize_post_reset = hvf_cpu_synchronize_post_reset, - .synchronize_post_init = hvf_cpu_synchronize_post_init, - .synchronize_state = hvf_cpu_synchronize_state, - .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm, -}; diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h deleted file mode 100644 index ced31b82c0..0000000000 --- a/target/i386/hvf/hvf-cpus.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * Accelerator CPUS Interface - * - * Copyright 2020 SUSE LLC - * - * This work is licensed under the terms of the GNU GPL, version 2 or later. - * See the COPYING file in the top-level directory. - */ - -#ifndef HVF_CPUS_H -#define HVF_CPUS_H - -#include "sysemu/cpus.h" - -extern const CpusAccel hvf_cpus; - -int hvf_init_vcpu(CPUState *); -int hvf_vcpu_exec(CPUState *); -void hvf_cpu_synchronize_state(CPUState *); -void hvf_cpu_synchronize_post_reset(CPUState *); -void hvf_cpu_synchronize_post_init(CPUState *); -void hvf_cpu_synchronize_pre_loadvm(CPUState *); -void hvf_vcpu_destroy(CPUState *); - -#endif /* HVF_CPUS_H */ diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h index e0edffd077..6d56f8f6bb 100644 --- a/target/i386/hvf/hvf-i386.h +++ b/target/i386/hvf/hvf-i386.h @@ -18,57 +18,11 @@ #include "sysemu/accel.h" #include "sysemu/hvf.h" +#include "sysemu/hvf_int.h" #include "cpu.h" #include "x86.h" -#define HVF_MAX_VCPU 0x10 - -extern struct hvf_state hvf_global; - -struct hvf_vm { - int id; - struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU]; -}; - -struct hvf_state { - uint32_t version; - struct hvf_vm *vm; - uint64_t mem_quota; -}; - -/* hvf_slot flags */ -#define HVF_SLOT_LOG (1 << 0) - -typedef struct hvf_slot { - uint64_t start; - uint64_t size; - uint8_t *mem; - int slot_id; - uint32_t flags; - MemoryRegion *region; -} hvf_slot; - -typedef struct hvf_vcpu_caps { - uint64_t vmx_cap_pinbased; - uint64_t vmx_cap_procbased; - uint64_t vmx_cap_procbased2; - uint64_t vmx_cap_entry; - uint64_t vmx_cap_exit; - uint64_t vmx_cap_preemption_timer; -} hvf_vcpu_caps; - -struct HVFState { - AccelState parent; - hvf_slot slots[32]; - int num_slots; - - hvf_vcpu_caps *hvf_caps; -}; -extern HVFState *hvf_state; - -void hvf_set_phys_mem(MemoryRegionSection *, bool); void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int); -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t); #ifdef NEED_CPU_H /* Functions exported to host specific mode */ diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c index ed9356565c..8b96ecd619 100644 --- a/target/i386/hvf/hvf.c +++ b/target/i386/hvf/hvf.c @@ -51,6 +51,7 @@ #include "qemu/error-report.h" #include "sysemu/hvf.h" +#include "sysemu/hvf_int.h" #include "sysemu/runstate.h" #include "hvf-i386.h" #include "vmcs.h" @@ -72,171 +73,6 @@ #include "sysemu/accel.h" #include "target/i386/cpu.h" -#include "hvf-cpus.h" - -HVFState *hvf_state; - -static void assert_hvf_ok(hv_return_t ret) -{ - if (ret == HV_SUCCESS) { - return; - } - - switch (ret) { - case HV_ERROR: - error_report("Error: HV_ERROR"); - break; - case HV_BUSY: - error_report("Error: HV_BUSY"); - break; - case HV_BAD_ARGUMENT: - error_report("Error: HV_BAD_ARGUMENT"); - break; - case HV_NO_RESOURCES: - error_report("Error: HV_NO_RESOURCES"); - break; - case HV_NO_DEVICE: - error_report("Error: HV_NO_DEVICE"); - break; - case HV_UNSUPPORTED: - error_report("Error: HV_UNSUPPORTED"); - break; - default: - error_report("Unknown Error"); - } - - abort(); -} - -/* Memory slots */ -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size) -{ - hvf_slot *slot; - int x; - for (x = 0; x < hvf_state->num_slots; ++x) { - slot = &hvf_state->slots[x]; - if (slot->size && start < (slot->start + slot->size) && - (start + size) > slot->start) { - return slot; - } - } - return NULL; -} - -struct mac_slot { - int present; - uint64_t size; - uint64_t gpa_start; - uint64_t gva; -}; - -struct mac_slot mac_slots[32]; - -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) -{ - struct mac_slot *macslot; - hv_return_t ret; - - macslot = &mac_slots[slot->slot_id]; - - if (macslot->present) { - if (macslot->size != slot->size) { - macslot->present = 0; - ret = hv_vm_unmap(macslot->gpa_start, macslot->size); - assert_hvf_ok(ret); - } - } - - if (!slot->size) { - return 0; - } - - macslot->present = 1; - macslot->gpa_start = slot->start; - macslot->size = slot->size; - ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags); - assert_hvf_ok(ret); - return 0; -} - -void hvf_set_phys_mem(MemoryRegionSection *section, bool add) -{ - hvf_slot *mem; - MemoryRegion *area = section->mr; - bool writeable = !area->readonly && !area->rom_device; - hv_memory_flags_t flags; - - if (!memory_region_is_ram(area)) { - if (writeable) { - return; - } else if (!memory_region_is_romd(area)) { - /* - * If the memory device is not in romd_mode, then we actually want - * to remove the hvf memory slot so all accesses will trap. - */ - add = false; - } - } - - mem = hvf_find_overlap_slot( - section->offset_within_address_space, - int128_get64(section->size)); - - if (mem && add) { - if (mem->size == int128_get64(section->size) && - mem->start == section->offset_within_address_space && - mem->mem == (memory_region_get_ram_ptr(area) + - section->offset_within_region)) { - return; /* Same region was attempted to register, go away. */ - } - } - - /* Region needs to be reset. set the size to 0 and remap it. */ - if (mem) { - mem->size = 0; - if (do_hvf_set_memory(mem, 0)) { - error_report("Failed to reset overlapping slot"); - abort(); - } - } - - if (!add) { - return; - } - - if (area->readonly || - (!memory_region_is_ram(area) && memory_region_is_romd(area))) { - flags = HV_MEMORY_READ | HV_MEMORY_EXEC; - } else { - flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC; - } - - /* Now make a new slot. */ - int x; - - for (x = 0; x < hvf_state->num_slots; ++x) { - mem = &hvf_state->slots[x]; - if (!mem->size) { - break; - } - } - - if (x == hvf_state->num_slots) { - error_report("No free slots"); - abort(); - } - - mem->size = int128_get64(section->size); - mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region; - mem->start = section->offset_within_address_space; - mem->region = area; - - if (do_hvf_set_memory(mem, flags)) { - error_report("Error registering new memory slot"); - abort(); - } -} - void vmx_update_tpr(CPUState *cpu) { /* TODO: need integrate APIC handling */ @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer, } } -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg) -{ - if (!cpu->vcpu_dirty) { - hvf_get_registers(cpu); - cpu->vcpu_dirty = true; - } -} - -void hvf_cpu_synchronize_state(CPUState *cpu) -{ - if (!cpu->vcpu_dirty) { - run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL); - } -} - -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu, - run_on_cpu_data arg) -{ - hvf_put_registers(cpu); - cpu->vcpu_dirty = false; -} - -void hvf_cpu_synchronize_post_reset(CPUState *cpu) -{ - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); -} - -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu, - run_on_cpu_data arg) -{ - hvf_put_registers(cpu); - cpu->vcpu_dirty = false; -} - -void hvf_cpu_synchronize_post_init(CPUState *cpu) -{ - run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL); -} - -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu, - run_on_cpu_data arg) -{ - cpu->vcpu_dirty = true; -} - -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu) -{ - run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL); -} - static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) { int read, write; @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual) return false; } -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on) -{ - hvf_slot *slot; - - slot = hvf_find_overlap_slot( - section->offset_within_address_space, - int128_get64(section->size)); - - /* protect region against writes; begin tracking it */ - if (on) { - slot->flags |= HVF_SLOT_LOG; - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, - HV_MEMORY_READ); - /* stop tracking region*/ - } else { - slot->flags &= ~HVF_SLOT_LOG; - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size, - HV_MEMORY_READ | HV_MEMORY_WRITE); - } -} - -static void hvf_log_start(MemoryListener *listener, - MemoryRegionSection *section, int old, int new) -{ - if (old != 0) { - return; - } - - hvf_set_dirty_tracking(section, 1); -} - -static void hvf_log_stop(MemoryListener *listener, - MemoryRegionSection *section, int old, int new) -{ - if (new != 0) { - return; - } - - hvf_set_dirty_tracking(section, 0); -} - -static void hvf_log_sync(MemoryListener *listener, - MemoryRegionSection *section) -{ - /* - * sync of dirty pages is handled elsewhere; just make sure we keep - * tracking the region. - */ - hvf_set_dirty_tracking(section, 1); -} - -static void hvf_region_add(MemoryListener *listener, - MemoryRegionSection *section) -{ - hvf_set_phys_mem(section, true); -} - -static void hvf_region_del(MemoryListener *listener, - MemoryRegionSection *section) -{ - hvf_set_phys_mem(section, false); -} - -static MemoryListener hvf_memory_listener = { - .priority = 10, - .region_add = hvf_region_add, - .region_del = hvf_region_del, - .log_start = hvf_log_start, - .log_stop = hvf_log_stop, - .log_sync = hvf_log_sync, -}; - -void hvf_vcpu_destroy(CPUState *cpu) +void hvf_arch_vcpu_destroy(CPUState *cpu) { X86CPU *x86_cpu = X86_CPU(cpu); CPUX86State *env = &x86_cpu->env; - hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd); g_free(env->hvf_mmio_buf); - assert_hvf_ok(ret); -} - -static void dummy_signal(int sig) -{ } -int hvf_init_vcpu(CPUState *cpu) +int hvf_arch_init_vcpu(CPUState *cpu) { X86CPU *x86cpu = X86_CPU(cpu); CPUX86State *env = &x86cpu->env; - int r; - - /* init cpu signals */ - sigset_t set; - struct sigaction sigact; - - memset(&sigact, 0, sizeof(sigact)); - sigact.sa_handler = dummy_signal; - sigaction(SIG_IPI, &sigact, NULL); - - pthread_sigmask(SIG_BLOCK, NULL, &set); - sigdelset(&set, SIG_IPI); init_emu(); init_decoder(); @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu) hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1); env->hvf_mmio_buf = g_new(char, 4096); - r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT); - cpu->vcpu_dirty = 1; - assert_hvf_ok(r); - if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED, &hvf_state->hvf_caps->vmx_cap_pinbased)) { abort(); @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu) return ret; } - -bool hvf_allowed; - -static int hvf_accel_init(MachineState *ms) -{ - int x; - hv_return_t ret; - HVFState *s; - - ret = hv_vm_create(HV_VM_DEFAULT); - assert_hvf_ok(ret); - - s = g_new0(HVFState, 1); - - s->num_slots = 32; - for (x = 0; x < s->num_slots; ++x) { - s->slots[x].size = 0; - s->slots[x].slot_id = x; - } - - hvf_state = s; - memory_listener_register(&hvf_memory_listener, &address_space_memory); - cpus_register_accel(&hvf_cpus); - return 0; -} - -static void hvf_accel_class_init(ObjectClass *oc, void *data) -{ - AccelClass *ac = ACCEL_CLASS(oc); - ac->name = "HVF"; - ac->init_machine = hvf_accel_init; - ac->allowed = &hvf_allowed; -} - -static const TypeInfo hvf_accel_type = { - .name = TYPE_HVF_ACCEL, - .parent = TYPE_ACCEL, - .class_init = hvf_accel_class_init, -}; - -static void hvf_type_init(void) -{ - type_register_static(&hvf_accel_type); -} - -type_init(hvf_type_init); diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build index 409c9a3f14..c8a43717ee 100644 --- a/target/i386/hvf/meson.build +++ b/target/i386/hvf/meson.build @@ -1,6 +1,5 @@ i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files( 'hvf.c', - 'hvf-cpus.c', 'x86.c', 'x86_cpuid.c', 'x86_decode.c', diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c index bbec412b6c..89b8e9d87a 100644 --- a/target/i386/hvf/x86hvf.c +++ b/target/i386/hvf/x86hvf.c @@ -20,6 +20,9 @@ #include "qemu/osdep.h" #include "qemu-common.h" +#include "sysemu/hvf.h" +#include "sysemu/hvf_int.h" +#include "sysemu/hw_accel.h" #include "x86hvf.h" #include "vmx.h" #include "vmcs.h" @@ -32,8 +35,6 @@ #include <Hypervisor/hv.h> #include <Hypervisor/hv_vmx.h> -#include "hvf-cpus.h" - void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, SegmentCache *qseg, bool is_tr) { @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state) env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS); if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) { - hvf_cpu_synchronize_state(cpu_state); + cpu_synchronize_state(cpu_state); do_cpu_init(cpu); } @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state) cpu_state->halted = 0; } if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) { - hvf_cpu_synchronize_state(cpu_state); + cpu_synchronize_state(cpu_state); do_cpu_sipi(cpu); } if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) { cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR; - hvf_cpu_synchronize_state(cpu_state); + cpu_synchronize_state(cpu_state); apic_handle_tpr_access_report(cpu->apic_state, env->eip, env->tpr_access_type); } diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h index 635ab0f34e..99ed8d608d 100644 --- a/target/i386/hvf/x86hvf.h +++ b/target/i386/hvf/x86hvf.h @@ -21,8 +21,6 @@ #include "x86_descr.h" int hvf_process_events(CPUState *); -int hvf_put_registers(CPUState *); -int hvf_get_registers(CPUState *); bool hvf_inject_interrupts(CPUState *); void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg, SegmentCache *qseg, bool is_tr);
Until now, Hypervisor.framework has only been available on x86_64 systems. With Apple Silicon shipping now, it extends its reach to aarch64. To prepare for support for multiple architectures, let's move common code out into its own accel directory. Signed-off-by: Alexander Graf <agraf@csgraf.de> --- MAINTAINERS | 9 +- accel/hvf/hvf-all.c | 56 +++++ accel/hvf/hvf-cpus.c | 468 ++++++++++++++++++++++++++++++++++++ accel/hvf/meson.build | 7 + accel/meson.build | 1 + include/sysemu/hvf_int.h | 69 ++++++ target/i386/hvf/hvf-cpus.c | 131 ---------- target/i386/hvf/hvf-cpus.h | 25 -- target/i386/hvf/hvf-i386.h | 48 +--- target/i386/hvf/hvf.c | 360 +-------------------------- target/i386/hvf/meson.build | 1 - target/i386/hvf/x86hvf.c | 11 +- target/i386/hvf/x86hvf.h | 2 - 13 files changed, 619 insertions(+), 569 deletions(-) create mode 100644 accel/hvf/hvf-all.c create mode 100644 accel/hvf/hvf-cpus.c create mode 100644 accel/hvf/meson.build create mode 100644 include/sysemu/hvf_int.h delete mode 100644 target/i386/hvf/hvf-cpus.c delete mode 100644 target/i386/hvf/hvf-cpus.h