Message ID | 1578483143-14905-9-git-send-email-gengdongjiu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add ARMv8 RAS virtualization support in QEMU | expand |
On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote: > +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > +{ > + ram_addr_t ram_addr; > + hwaddr paddr; > + > + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); > + > + if (acpi_enabled && addr && > + object_property_get_bool(qdev_get_machine(), "ras", NULL)) { > + ram_addr = qemu_ram_addr_from_host(addr); > + if (ram_addr != RAM_ADDR_INVALID && > + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { > + kvm_hwpoison_page_add(ram_addr); > + /* > + * Asynchronous signal will be masked by main thread, so > + * only handle synchronous signal. > + */ I don't understand this comment. (I think we've had discussions about it before, but it's still not clear to me.) This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts: (1) in the vcpu thread: * the real SIGBUS handler sigbus_handler() sets a flag and arranges for an immediate vcpu exit * the vcpu thread reads the flag on exit from KVM_RUN and calls kvm_arch_on_sigbus_vcpu() directly * the error could be MCEERR_AR or MCEERR_AO (2) MCE errors on other threads: * here SIGBUS is blocked, so MCEERR_AR (action-required) errors will cause the kernel to just kill the QEMU process * MCEERR_AO errors will be handled via the iothread's use of signalfd(), so kvm_on_sigbus() will get called from the main thread, and it will call kvm_arch_on_sigbus_vcpu() * in this case the passed in CPUState will (arbitrarily) be that for the first vCPU For MCEERR_AR, the code here looks correct -- we know this is only going to happen for the relevant vCPU so we can go ahead and do the "record it in the ACPI table and tell the guest" bit. But shouldn't we be doing something with the MCEERR_AO too? That of course will be trickier because we're not necessarily in the vcpu thread, but it would be nice to let the guest know about it. One comment that would work with the current code would be: /* * If this is a BUS_MCEERR_AR, we know we have been called * synchronously from the vCPU thread, so we can easily * synchronize the state and inject an error. * * TODO: we currently don't tell the guest at all about BUS_MCEERR_AO. * In that case we might either be being called synchronously from * the vCPU thread, or a bit later from the main thread, so doing * the injection of the error would be more complicated. */ but I don't know if that's what you meant to say/implement... > + if (code == BUS_MCEERR_AR) { > + kvm_cpu_synchronize_state(c); > + if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { > + kvm_inject_arm_sea(c); > + } else { > + error_report("failed to record the error"); > + abort(); > + } > + } > + return; > + } > + if (code == BUS_MCEERR_AO) { > + error_report("Hardware memory error at addr %p for memory used by " > + "QEMU itself instead of guest system!", addr); > + } > + } > + > + if (code == BUS_MCEERR_AR) { > + error_report("Hardware memory error!"); > + exit(1); > + } > +} > thanks -- PMM
On Thu, 16 Jan 2020 at 16:28, Peter Maydell <peter.maydell@linaro.org> wrote: > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts: > > (1) in the vcpu thread: > * the real SIGBUS handler sigbus_handler() sets a flag and arranges > for an immediate vcpu exit > * the vcpu thread reads the flag on exit from KVM_RUN and > calls kvm_arch_on_sigbus_vcpu() directly > * the error could be MCEERR_AR or MCEERR_AO > (2) MCE errors on other threads: > * here SIGBUS is blocked, so MCEERR_AR (action-required) > errors will cause the kernel to just kill the QEMU process > * MCEERR_AO errors will be handled via the iothread's use > of signalfd(), so kvm_on_sigbus() will get called from > the main thread, and it will call kvm_arch_on_sigbus_vcpu() > * in this case the passed in CPUState will (arbitrarily) be that > for the first vCPU > > For MCEERR_AR, the code here looks correct -- we know this is > only going to happen for the relevant vCPU so we can go ahead > and do the "record it in the ACPI table and tell the guest" bit. > > But shouldn't we be doing something with the MCEERR_AO too? > That of course will be trickier because we're not necessarily > in the vcpu thread, but it would be nice to let the guest > know about it. An IRC discussion with Paolo came to the conclusion that the nicest approach here would be for kvm_on_sigbus() to use run_on_cpu() to call the whole of kvm_arch_on_sigbus_vcpu() in the vcpu thread for the cpu it gets passed. Then the code here would not have to worry about the "not on the right thread" case. This would be a refactoring of the x86 code, which currently does the run_on_cpu inside its implementation, in cpu_x86_inject_mce(). thanks -- PMM
On 2020/1/17 0:28, Peter Maydell wrote: > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote: > >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) >> +{ >> + ram_addr_t ram_addr; >> + hwaddr paddr; >> + >> + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); >> + >> + if (acpi_enabled && addr && >> + object_property_get_bool(qdev_get_machine(), "ras", NULL)) { >> + ram_addr = qemu_ram_addr_from_host(addr); >> + if (ram_addr != RAM_ADDR_INVALID && >> + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { >> + kvm_hwpoison_page_add(ram_addr); >> + /* >> + * Asynchronous signal will be masked by main thread, so >> + * only handle synchronous signal. >> + */ > > I don't understand this comment. (I think we've had discussions > about it before, but it's still not clear to me.) > > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts: > > (1) in the vcpu thread: > * the real SIGBUS handler sigbus_handler() sets a flag and arranges > for an immediate vcpu exit > * the vcpu thread reads the flag on exit from KVM_RUN and > calls kvm_arch_on_sigbus_vcpu() directly > * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO, but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored. At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads. > (2) MCE errors on other threads: > * here SIGBUS is blocked, so MCEERR_AR (action-required) > errors will cause the kernel to just kill the QEMU process > * MCEERR_AO errors will be handled via the iothread's use > of signalfd(), so kvm_on_sigbus() will get called from > the main thread, and it will call kvm_arch_on_sigbus_vcpu() > * in this case the passed in CPUState will (arbitrarily) be that > for the first vCPU For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2]. Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path. Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test. For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings. For Case2,it does not call call kvm_arch_on_sigbus_vcpu(). [1]: /* Called synchronously (via signalfd) in main thread. */ int kvm_on_sigbus(int code, void *addr) { #ifdef KVM_HAVE_MCE_INJECTION /* Action required MCE kills the process if SIGBUS is blocked. Because * that's what happens in the I/O thread, where we handle MCE via signalfd, * we can only get action optional here. */ [2]: assert(code != BUS_MCEERR_AR); kvm_arch_on_sigbus_vcpu(first_cpu, code, addr); return 0; #else return 1; #endif } [3]: https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03575.html > > For MCEERR_AR, the code here looks correct -- we know this is > only going to happen for the relevant vCPU so we can go ahead > and do the "record it in the ACPI table and tell the guest" bit. > > But shouldn't we be doing something with the MCEERR_AO too? Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call kvm_arch_on_sigbus_vcpu(). so I think currently we can just report error. If afterwards MCEERR_AO error can call kvm_arch_on_sigbus_vcpu(), we can let the guest know about it. > That of course will be trickier because we're not necessarily > in the vcpu thread, but it would be nice to let the guest > know about it. > > One comment that would work with the current code would be: > > /* > * If this is a BUS_MCEERR_AR, we know we have been called > * synchronously from the vCPU thread, so we can easily > * synchronize the state and inject an error. > * > * TODO: we currently don't tell the guest at all about BUS_MCEERR_AO. > * In that case we might either be being called synchronously from > * the vCPU thread, or a bit later from the main thread, so doing At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads. In the main thread, signalfd() is not called when it is BUS_MCEERR_AO which is triggered by guest. > * the injection of the error would be more complicated. > */ > > but I don't know if that's what you meant to say/implement... we can implement MCEERR_AO logic when QEMU can receive MCEERR_AO error which is triggered by guest. > >> + if (code == BUS_MCEERR_AR) { >> + kvm_cpu_synchronize_state(c); >> + if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { >> + kvm_inject_arm_sea(c); >> + } else { >> + error_report("failed to record the error"); >> + abort(); >> + } >> + } >> + return; >> + } >> + if (code == BUS_MCEERR_AO) { >> + error_report("Hardware memory error at addr %p for memory used by " >> + "QEMU itself instead of guest system!", addr); >> + } >> + } >> + >> + if (code == BUS_MCEERR_AR) { >> + error_report("Hardware memory error!"); >> + exit(1); >> + } >> +} >> > > thanks > -- PMM > . >
On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@huawei.com> wrote: > > On 2020/1/17 0:28, Peter Maydell wrote: > > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote: > > > >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > >> +{ > >> + ram_addr_t ram_addr; > >> + hwaddr paddr; > >> + > >> + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); > >> + > >> + if (acpi_enabled && addr && > >> + object_property_get_bool(qdev_get_machine(), "ras", NULL)) { > >> + ram_addr = qemu_ram_addr_from_host(addr); > >> + if (ram_addr != RAM_ADDR_INVALID && > >> + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { > >> + kvm_hwpoison_page_add(ram_addr); > >> + /* > >> + * Asynchronous signal will be masked by main thread, so > >> + * only handle synchronous signal. > >> + */ > > > > I don't understand this comment. (I think we've had discussions > > about it before, but it's still not clear to me.) > > > > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts: > > > > (1) in the vcpu thread: > > * the real SIGBUS handler sigbus_handler() sets a flag and arranges > > for an immediate vcpu exit > > * the vcpu thread reads the flag on exit from KVM_RUN and > > calls kvm_arch_on_sigbus_vcpu() directly > > * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO, > but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored. > At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads. > > > (2) MCE errors on other threads: > > * here SIGBUS is blocked, so MCEERR_AR (action-required) > > errors will cause the kernel to just kill the QEMU process > > * MCEERR_AO errors will be handled via the iothread's use > > of signalfd(), so kvm_on_sigbus() will get called from > > the main thread, and it will call kvm_arch_on_sigbus_vcpu() > > * in this case the passed in CPUState will (arbitrarily) be that > > for the first vCPU > > For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2]. > > Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path. > Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test. SIGBUS is blocked in the main thread because we use signalfd(). The function sigfd_handler() should be called and it will then manually invoke the correct function for the signal. > For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings. I don't understand what you mean here by "other app" or "guest" triggering of MCEERR. I thought that an MCEERR meant "the hardware has detected that there is a problem with the RAM". If there's a problem with the RAM and it's the RAM that's being used as guest RAM, we need to tell the guest, surely ? > For Case2,it does not call call kvm_arch_on_sigbus_vcpu(). It should do. The code you quote calls that function for that case: > [1]: > /* Called synchronously (via signalfd) in main thread. */ > int kvm_on_sigbus(int code, void *addr) > { > #ifdef KVM_HAVE_MCE_INJECTION > /* Action required MCE kills the process if SIGBUS is blocked. Because > * that's what happens in the I/O thread, where we handle MCE via signalfd, > * we can only get action optional here. > */ > [2]: assert(code != BUS_MCEERR_AR); > kvm_arch_on_sigbus_vcpu(first_cpu, code, addr); > return 0; > #else > return 1; > #endif > } > Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call kvm_arch_on_sigbus_vcpu(). I'm not sure what you mean by "triggered by guest". I assume that exactly what kind of errors the kernel can report and when will depend to some extent on the underlying hardware/firmware implementation of reporting of memory errors, but in principle the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the main thread, the signal should be handled by signalfd, our code for working with multiple fds should mean that the main thread calls sigfd_handler() to deal with reading bytes from the signalfd fd, and that function should then call sigbus_handler(), which calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu(). If something in that code path is not working then we need to find out what it is. thanks -- PMM
On 2020/1/20 20:15, Peter Maydell wrote: > On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@huawei.com> wrote: >> >> On 2020/1/17 0:28, Peter Maydell wrote: >>> On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote: >>> >>>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) >>>> +{ >>>> + ram_addr_t ram_addr; >>>> + hwaddr paddr; >>>> + >>>> + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); >>>> + >>>> + if (acpi_enabled && addr && >>>> + object_property_get_bool(qdev_get_machine(), "ras", NULL)) { >>>> + ram_addr = qemu_ram_addr_from_host(addr); >>>> + if (ram_addr != RAM_ADDR_INVALID && >>>> + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { >>>> + kvm_hwpoison_page_add(ram_addr); >>>> + /* >>>> + * Asynchronous signal will be masked by main thread, so >>>> + * only handle synchronous signal. >>>> + */ >>> >>> I don't understand this comment. (I think we've had discussions >>> about it before, but it's still not clear to me.) >>> >>> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts: >>> >>> (1) in the vcpu thread: >>> * the real SIGBUS handler sigbus_handler() sets a flag and arranges >>> for an immediate vcpu exit >>> * the vcpu thread reads the flag on exit from KVM_RUN and >>> calls kvm_arch_on_sigbus_vcpu() directly >>> * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO, >> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored. >> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads. >> >>> (2) MCE errors on other threads: >>> * here SIGBUS is blocked, so MCEERR_AR (action-required) >>> errors will cause the kernel to just kill the QEMU process >>> * MCEERR_AO errors will be handled via the iothread's use >>> of signalfd(), so kvm_on_sigbus() will get called from >>> the main thread, and it will call kvm_arch_on_sigbus_vcpu() >>> * in this case the passed in CPUState will (arbitrarily) be that >>> for the first vCPU >> >> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2]. >> >> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path. >> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test. > > SIGBUS is blocked in the main thread because we use signalfd(). > The function sigfd_handler() should be called and it will then > manually invoke the correct function for the signal. > >> For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings. > > I don't understand what you mean here by "other app" or > "guest" triggering of MCEERR. I thought that an MCEERR meant > "the hardware has detected that there is a problem with the > RAM". If there's a problem with the RAM and it's the RAM that's > being used as guest RAM, we need to tell the guest, surely ? sure, If the error is guest RAM, we need to test the guest. I mean if the RAM that is being used as QEMU RAM(not guest RAM), we should not tell the guest. OR if another user space manually send SIGBUS to qemu, such as using "kill -s SIGBUS xxx" commands, we should not tell the guest. > >> For Case2,it does not call call kvm_arch_on_sigbus_vcpu(). > > It should do. The code you quote calls that function > for that case: According to our analysis, I also think it should call the function for that case. But from my test, I see kvm_arch_on_sigbus_vcpu() is not called when KVM/Kernel delivers SIGBUS to QEMU main thread. So I am also confused. I haven't even dig into the reason yet. If anyone has done the test or knows the reason, welcome comments. > >> [1]: >> /* Called synchronously (via signalfd) in main thread. */ >> int kvm_on_sigbus(int code, void *addr) >> { >> #ifdef KVM_HAVE_MCE_INJECTION >> /* Action required MCE kills the process if SIGBUS is blocked. Because >> * that's what happens in the I/O thread, where we handle MCE via signalfd, >> * we can only get action optional here. >> */ >> [2]: assert(code != BUS_MCEERR_AR); >> kvm_arch_on_sigbus_vcpu(first_cpu, code, addr); >> return 0; >> #else >> return 1; >> #endif >> } > > >> Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call > kvm_arch_on_sigbus_vcpu(). > > I'm not sure what you mean by "triggered by guest". I assume that > exactly what kind of errors the kernel can report and when will > depend to some extent on the underlying hardware/firmware > implementation of reporting of memory errors, but in principle > the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the > main thread, the signal should be handled by signalfd, our code > for working with multiple fds should mean that the main thread > calls sigfd_handler() to deal with reading bytes from the signalfd > fd, and that function should then call sigbus_handler(), which > calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu(). > If something in that code path is not working then we need to > find out what it is. I agree with you, we need to check why it does not call sigbus_handler() for the SIGBUS delivered by kernel/KVM. But I think it can put it in another series, this series we only handle the SIGBUS_(BUS_MCEERR_AR), whether do you think it is OK? Of course I will update the comments that you ever mentioned. By the way, If using "kill -s SIGBUS xxx" command to send SIGBUS to QEMU main thread, it indeed will be handled by signalfd. > > thanks > -- PMM > . >
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 141342d..3b22504 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -379,8 +379,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id); /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */ unsigned long kvm_arch_vcpu_id(CPUState *cpu); -#ifdef TARGET_I386 -#define KVM_HAVE_MCE_INJECTION 1 +#ifdef KVM_HAVE_MCE_INJECTION void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr); #endif diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 5f70e9e..723bdb9 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -28,6 +28,10 @@ /* ARM processors have a weak memory model */ #define TCG_GUEST_DEFAULT_MO (0) +#ifdef TARGET_AARCH64 +#define KVM_HAVE_MCE_INJECTION 1 +#endif + #define EXCP_UDEF 1 /* undefined instruction */ #define EXCP_SWI 2 /* software interrupt */ #define EXCP_PREFETCH_ABORT 3 diff --git a/target/arm/helper.c b/target/arm/helper.c index 5074b5f..05bffd3 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -3045,7 +3045,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value, * Report exception with ESR indicating a fault due to a * translation table walk for a cache maintenance instruction. */ - syn = syn_data_abort_no_iss(current_el == target_el, + syn = syn_data_abort_no_iss(current_el == target_el, 0, fi.ea, 1, fi.s1ptw, 1, fsc); env->exception.vaddress = value; env->exception.fsr = fsr; diff --git a/target/arm/internals.h b/target/arm/internals.h index f5313dd..28b8451 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc) | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc; } -static inline uint32_t syn_data_abort_no_iss(int same_el, +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv, int ea, int cm, int s1ptw, int wnr, int fsc) { return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT) | ARM_EL_IL - | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc; + | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7) + | (wnr << 6) | fsc; } static inline uint32_t syn_data_abort_with_iss(int same_el, diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index 876184b..f3b05c1 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -28,6 +28,8 @@ #include "kvm_arm.h" #include "hw/boards.h" #include "internals.h" +#include "hw/acpi/acpi.h" +#include "hw/acpi/ghes.h" static bool have_guest_debug; @@ -843,6 +845,30 @@ int kvm_arm_cpreg_level(uint64_t regidx) return KVM_PUT_RUNTIME_STATE; } +/* Callers must hold the iothread mutex lock */ +static void kvm_inject_arm_sea(CPUState *c) +{ + ARMCPU *cpu = ARM_CPU(c); + CPUARMState *env = &cpu->env; + CPUClass *cc = CPU_GET_CLASS(c); + uint32_t esr; + bool same_el; + + c->exception_index = EXCP_DATA_ABORT; + env->exception.target_el = 1; + + /* + * Set the DFSC to synchronous external abort and set FnV to not valid, + * this will tell guest the FAR_ELx is UNKNOWN for this abort. + */ + same_el = arm_current_el(env) == env->exception.target_el; + esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10); + + env->exception.syndrome = esr; + + cc->do_interrupt(c); +} + #define AARCH64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) @@ -1295,6 +1321,46 @@ int kvm_arch_get_registers(CPUState *cs) return ret; } +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) +{ + ram_addr_t ram_addr; + hwaddr paddr; + + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); + + if (acpi_enabled && addr && + object_property_get_bool(qdev_get_machine(), "ras", NULL)) { + ram_addr = qemu_ram_addr_from_host(addr); + if (ram_addr != RAM_ADDR_INVALID && + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { + kvm_hwpoison_page_add(ram_addr); + /* + * Asynchronous signal will be masked by main thread, so + * only handle synchronous signal. + */ + if (code == BUS_MCEERR_AR) { + kvm_cpu_synchronize_state(c); + if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { + kvm_inject_arm_sea(c); + } else { + error_report("failed to record the error"); + abort(); + } + } + return; + } + if (code == BUS_MCEERR_AO) { + error_report("Hardware memory error at addr %p for memory used by " + "QEMU itself instead of guest system!", addr); + } + } + + if (code == BUS_MCEERR_AR) { + error_report("Hardware memory error!"); + exit(1); + } +} + /* C6.6.29 BRK instruction */ static const uint32_t brk_insn = 0xd4200000; diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c index 5feb312..499672e 100644 --- a/target/arm/tlb_helper.c +++ b/target/arm/tlb_helper.c @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn, * ISV field. */ if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) { - syn = syn_data_abort_no_iss(same_el, + syn = syn_data_abort_no_iss(same_el, 0, ea, 0, s1ptw, is_write, fsc); } else { /* diff --git a/target/i386/cpu.h b/target/i386/cpu.h index af28293..1a0dbc1 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -29,6 +29,8 @@ /* The x86 has a strong memory model with some store-after-load re-ordering */ #define TCG_GUEST_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD) +#define KVM_HAVE_MCE_INJECTION 1 + /* Maximum instruction code size */ #define TARGET_MAX_INSN_SIZE 16