diff mbox series

[v22,8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

Message ID 1578483143-14905-9-git-send-email-gengdongjiu@huawei.com (mailing list archive)
State New, archived
Headers show
Series Add ARMv8 RAS virtualization support in QEMU | expand

Commit Message

Dongjiu Geng Jan. 8, 2020, 11:32 a.m. UTC
Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fills this PA
to guest APEI GHES memory, then notifies guest according to the SIGBUS
type.

When guest accesses the poisoned memory, it will generate a Synchronous
External Abort(SEA). Then host kernel gets an APEI notification and calls
memory_failure() to unmapped the affected page in stage 2, finally
returns to guest.

Guest continues to access the PG_hwpoison page, it will trap to KVM as
stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
Qemu, Qemu records this error address into guest APEI GHES memory and
notifes guest using Synchronous-External-Abort(SEA).

In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
in which we can setup the type of exception and the syndrome information.
When switching to guest, the target vcpu will jump to the synchronous
external abort vector table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
not valid and hold an UNKNOWN value. These values will be set to KVM
register structures through KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 include/sysemu/kvm.h    |  3 +--
 target/arm/cpu.h        |  4 +++
 target/arm/helper.c     |  2 +-
 target/arm/internals.h  |  5 ++--
 target/arm/kvm64.c      | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/tlb_helper.c |  2 +-
 target/i386/cpu.h       |  2 ++
 7 files changed, 78 insertions(+), 6 deletions(-)

Comments

Peter Maydell Jan. 16, 2020, 4:28 p.m. UTC | #1
On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:

> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> +{
> +    ram_addr_t ram_addr;
> +    hwaddr paddr;
> +
> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> +
> +    if (acpi_enabled && addr &&
> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> +        ram_addr = qemu_ram_addr_from_host(addr);
> +        if (ram_addr != RAM_ADDR_INVALID &&
> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> +            kvm_hwpoison_page_add(ram_addr);
> +            /*
> +             * Asynchronous signal will be masked by main thread, so
> +             * only handle synchronous signal.
> +             */

I don't understand this comment. (I think we've had discussions
about it before, but it's still not clear to me.)

This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:

(1) in the vcpu thread:
  * the real SIGBUS handler sigbus_handler() sets a flag and arranges
    for an immediate vcpu exit
  * the vcpu thread reads the flag on exit from KVM_RUN and
    calls kvm_arch_on_sigbus_vcpu() directly
  * the error could be MCEERR_AR or MCEERR_AO
(2) MCE errors on other threads:
  * here SIGBUS is blocked, so MCEERR_AR (action-required)
    errors will cause the kernel to just kill the QEMU process
  * MCEERR_AO errors will be handled via the iothread's use
    of signalfd(), so kvm_on_sigbus() will get called from
    the main thread, and it will call kvm_arch_on_sigbus_vcpu()
  * in this case the passed in CPUState will (arbitrarily) be that
    for the first vCPU

For MCEERR_AR, the code here looks correct -- we know this is
only going to happen for the relevant vCPU so we can go ahead
and do the "record it in the ACPI table and tell the guest" bit.

But shouldn't we be doing something with the MCEERR_AO too?
That of course will be trickier because we're not necessarily
in the vcpu thread, but it would be nice to let the guest
know about it.

One comment that would work with the current code would be:

/*
 * If this is a BUS_MCEERR_AR, we know we have been called
 * synchronously from the vCPU thread, so we can easily
 * synchronize the state and inject an error.
 *
 * TODO: we currently don't tell the guest at all about BUS_MCEERR_AO.
 * In that case we might either be being called synchronously from
 * the vCPU thread, or a bit later from the main thread, so doing
 * the injection of the error would be more complicated.
 */

but I don't know if that's what you meant to say/implement...

> +            if (code == BUS_MCEERR_AR) {
> +                kvm_cpu_synchronize_state(c);
> +                if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
> +                    kvm_inject_arm_sea(c);
> +                } else {
> +                    error_report("failed to record the error");
> +                    abort();
> +                }
> +            }
> +            return;
> +        }
> +        if (code == BUS_MCEERR_AO) {
> +            error_report("Hardware memory error at addr %p for memory used by "
> +                "QEMU itself instead of guest system!", addr);
> +        }
> +    }
> +
> +    if (code == BUS_MCEERR_AR) {
> +        error_report("Hardware memory error!");
> +        exit(1);
> +    }
> +}
>

thanks
-- PMM
Peter Maydell Jan. 16, 2020, 4:40 p.m. UTC | #2
On Thu, 16 Jan 2020 at 16:28, Peter Maydell <peter.maydell@linaro.org> wrote:
> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
>
> (1) in the vcpu thread:
>   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
>     for an immediate vcpu exit
>   * the vcpu thread reads the flag on exit from KVM_RUN and
>     calls kvm_arch_on_sigbus_vcpu() directly
>   * the error could be MCEERR_AR or MCEERR_AO
> (2) MCE errors on other threads:
>   * here SIGBUS is blocked, so MCEERR_AR (action-required)
>     errors will cause the kernel to just kill the QEMU process
>   * MCEERR_AO errors will be handled via the iothread's use
>     of signalfd(), so kvm_on_sigbus() will get called from
>     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
>   * in this case the passed in CPUState will (arbitrarily) be that
>     for the first vCPU
>
> For MCEERR_AR, the code here looks correct -- we know this is
> only going to happen for the relevant vCPU so we can go ahead
> and do the "record it in the ACPI table and tell the guest" bit.
>
> But shouldn't we be doing something with the MCEERR_AO too?
> That of course will be trickier because we're not necessarily
> in the vcpu thread, but it would be nice to let the guest
> know about it.

An IRC discussion with Paolo came to the conclusion that
the nicest approach here would be for kvm_on_sigbus() to
use run_on_cpu() to call the whole of kvm_arch_on_sigbus_vcpu()
in the vcpu thread for the cpu it gets passed. Then the code
here would not have to worry about the "not on the right thread"
case. This would be a refactoring of the x86 code, which currently
does the run_on_cpu inside its implementation, in
cpu_x86_inject_mce().

thanks
-- PMM
Dongjiu Geng Jan. 17, 2020, 10:04 a.m. UTC | #3
On 2020/1/17 0:28, Peter Maydell wrote:
> On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:
> 
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I don't understand this comment. (I think we've had discussions
> about it before, but it's still not clear to me.)
> 
> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> 
> (1) in the vcpu thread:
>   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
>     for an immediate vcpu exit
>   * the vcpu thread reads the flag on exit from KVM_RUN and
>     calls kvm_arch_on_sigbus_vcpu() directly
>   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.

> (2) MCE errors on other threads:
>   * here SIGBUS is blocked, so MCEERR_AR (action-required)
>     errors will cause the kernel to just kill the QEMU process
>   * MCEERR_AO errors will be handled via the iothread's use
>     of signalfd(), so kvm_on_sigbus() will get called from
>     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
>   * in this case the passed in CPUState will (arbitrarily) be that
>     for the first vCPU

For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].

Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.

For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.
For Case2,it does not call call kvm_arch_on_sigbus_vcpu().


[1]:
/* Called synchronously (via signalfd) in main thread.  */
int kvm_on_sigbus(int code, void *addr)
{
#ifdef KVM_HAVE_MCE_INJECTION
    /* Action required MCE kills the process if SIGBUS is blocked.  Because
     * that's what happens in the I/O thread, where we handle MCE via signalfd,
     * we can only get action optional here.
     */
[2]: assert(code != BUS_MCEERR_AR);
    kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
    return 0;
#else
    return 1;
#endif
}

[3]: https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03575.html


> 
> For MCEERR_AR, the code here looks correct -- we know this is
> only going to happen for the relevant vCPU so we can go ahead
> and do the "record it in the ACPI table and tell the guest" bit.
> 
> But shouldn't we be doing something with the MCEERR_AO too?

Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call kvm_arch_on_sigbus_vcpu().
so I think currently we can just report error. If afterwards  MCEERR_AO error can call kvm_arch_on_sigbus_vcpu(), we can let the guest know about it.

> That of course will be trickier because we're not necessarily
> in the vcpu thread, but it would be nice to let the guest
> know about it.
> 
> One comment that would work with the current code would be:
> 
> /*
>  * If this is a BUS_MCEERR_AR, we know we have been called
>  * synchronously from the vCPU thread, so we can easily
>  * synchronize the state and inject an error.
>  *
>  * TODO: we currently don't tell the guest at all about BUS_MCEERR_AO.
>  * In that case we might either be being called synchronously from
>  * the vCPU thread, or a bit later from the main thread, so doing
At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
In the main thread, signalfd() is not called when it is BUS_MCEERR_AO which is triggered by guest.

>  * the injection of the error would be more complicated.
>  */
> 
> but I don't know if that's what you meant to say/implement...
we can implement MCEERR_AO logic when QEMU can receive MCEERR_AO error which is triggered by guest.

> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    error_report("failed to record the error");
>> +                    abort();
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        if (code == BUS_MCEERR_AO) {
>> +            error_report("Hardware memory error at addr %p for memory used by "
>> +                "QEMU itself instead of guest system!", addr);
>> +        }
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        error_report("Hardware memory error!");
>> +        exit(1);
>> +    }
>> +}
>>
> 
> thanks
> -- PMM
> .
>
Peter Maydell Jan. 20, 2020, 12:15 p.m. UTC | #4
On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@huawei.com> wrote:
>
> On 2020/1/17 0:28, Peter Maydell wrote:
> > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:
> >
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I don't understand this comment. (I think we've had discussions
> > about it before, but it's still not clear to me.)
> >
> > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> >
> > (1) in the vcpu thread:
> >   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
> >     for an immediate vcpu exit
> >   * the vcpu thread reads the flag on exit from KVM_RUN and
> >     calls kvm_arch_on_sigbus_vcpu() directly
> >   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>
> > (2) MCE errors on other threads:
> >   * here SIGBUS is blocked, so MCEERR_AR (action-required)
> >     errors will cause the kernel to just kill the QEMU process
> >   * MCEERR_AO errors will be handled via the iothread's use
> >     of signalfd(), so kvm_on_sigbus() will get called from
> >     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
> >   * in this case the passed in CPUState will (arbitrarily) be that
> >     for the first vCPU
>
> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].
>
> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.

SIGBUS is blocked in the main thread because we use signalfd().
The function sigfd_handler() should be called and it will then
manually invoke the correct function for the signal.

> For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.

I don't understand what you mean here by "other app" or
"guest" triggering of MCEERR. I thought that an MCEERR meant
"the hardware has detected that there is a problem with the
RAM". If there's a problem with the RAM and it's the RAM that's
being used as guest RAM, we need to tell the guest, surely ?

> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().

It should do. The code you quote calls that function
for that case:

> [1]:
> /* Called synchronously (via signalfd) in main thread.  */
> int kvm_on_sigbus(int code, void *addr)
> {
> #ifdef KVM_HAVE_MCE_INJECTION
>     /* Action required MCE kills the process if SIGBUS is blocked.  Because
>      * that's what happens in the I/O thread, where we handle MCE via signalfd,
>      * we can only get action optional here.
>      */
> [2]: assert(code != BUS_MCEERR_AR);
>     kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
>     return 0;
> #else
>     return 1;
> #endif
> }


> Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call
kvm_arch_on_sigbus_vcpu().

I'm not sure what you mean by "triggered by guest". I assume that
exactly what kind of errors the kernel can report and when will
depend to some extent on the underlying hardware/firmware
implementation of reporting of memory errors, but in principle
the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
main thread, the signal should be handled by signalfd, our code
for working with multiple fds should mean that the main thread
calls sigfd_handler() to deal with reading bytes from the signalfd
fd, and that function should then call sigbus_handler(), which
calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
If something in that code path is not working then we need to
find out what it is.

thanks
-- PMM
Dongjiu Geng Jan. 22, 2020, 3:30 p.m. UTC | #5
On 2020/1/20 20:15, Peter Maydell wrote:
> On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@huawei.com> wrote:
>>
>> On 2020/1/17 0:28, Peter Maydell wrote:
>>> On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:
>>>
>>>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> +{
>>>> +    ram_addr_t ram_addr;
>>>> +    hwaddr paddr;
>>>> +
>>>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>> +
>>>> +    if (acpi_enabled && addr &&
>>>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>>>> +        ram_addr = qemu_ram_addr_from_host(addr);
>>>> +        if (ram_addr != RAM_ADDR_INVALID &&
>>>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>>>> +            kvm_hwpoison_page_add(ram_addr);
>>>> +            /*
>>>> +             * Asynchronous signal will be masked by main thread, so
>>>> +             * only handle synchronous signal.
>>>> +             */
>>>
>>> I don't understand this comment. (I think we've had discussions
>>> about it before, but it's still not clear to me.)
>>>
>>> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
>>>
>>> (1) in the vcpu thread:
>>>   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
>>>     for an immediate vcpu exit
>>>   * the vcpu thread reads the flag on exit from KVM_RUN and
>>>     calls kvm_arch_on_sigbus_vcpu() directly
>>>   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
>> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
>> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>>
>>> (2) MCE errors on other threads:
>>>   * here SIGBUS is blocked, so MCEERR_AR (action-required)
>>>     errors will cause the kernel to just kill the QEMU process
>>>   * MCEERR_AO errors will be handled via the iothread's use
>>>     of signalfd(), so kvm_on_sigbus() will get called from
>>>     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
>>>   * in this case the passed in CPUState will (arbitrarily) be that
>>>     for the first vCPU
>>
>> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].
>>
>> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
>> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.
> 
> SIGBUS is blocked in the main thread because we use signalfd().
> The function sigfd_handler() should be called and it will then
> manually invoke the correct function for the signal.
> 
>> For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.
> 
> I don't understand what you mean here by "other app" or
> "guest" triggering of MCEERR. I thought that an MCEERR meant
> "the hardware has detected that there is a problem with the
> RAM". If there's a problem with the RAM and it's the RAM that's
> being used as guest RAM, we need to tell the guest, surely ?

  sure, If the error is guest RAM, we need to test the guest.
  I mean if the RAM that is being used as QEMU RAM(not guest RAM), we should not tell the guest.
  OR if another user space manually send SIGBUS to qemu, such as using "kill -s SIGBUS xxx" commands, we should not tell the guest.

> 
>> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().
> 
> It should do. The code you quote calls that function
> for that case:
  According to our analysis, I also think it should call the function for that case.
  But from my test, I see  kvm_arch_on_sigbus_vcpu() is not called when KVM/Kernel delivers SIGBUS to QEMU main thread.
  So I am also confused. I haven't even dig into the reason yet.

 If anyone has done the test or knows the reason, welcome comments.


> 
>> [1]:
>> /* Called synchronously (via signalfd) in main thread.  */
>> int kvm_on_sigbus(int code, void *addr)
>> {
>> #ifdef KVM_HAVE_MCE_INJECTION
>>     /* Action required MCE kills the process if SIGBUS is blocked.  Because
>>      * that's what happens in the I/O thread, where we handle MCE via signalfd,
>>      * we can only get action optional here.
>>      */
>> [2]: assert(code != BUS_MCEERR_AR);
>>     kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
>>     return 0;
>> #else
>>     return 1;
>> #endif
>> }
> 
> 
>> Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call
> kvm_arch_on_sigbus_vcpu().
> 
> I'm not sure what you mean by "triggered by guest". I assume that
> exactly what kind of errors the kernel can report and when will
> depend to some extent on the underlying hardware/firmware
> implementation of reporting of memory errors, but in principle
> the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
> main thread, the signal should be handled by signalfd, our code
> for working with multiple fds should mean that the main thread
> calls sigfd_handler() to deal with reading bytes from the signalfd
> fd, and that function should then call sigbus_handler(), which
> calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
> If something in that code path is not working then we need to
> find out what it is.

  I agree with you, we need to check why it does not call sigbus_handler() for the SIGBUS delivered by kernel/KVM.
  But I think it can  put it in another series, this series we only handle the SIGBUS_(BUS_MCEERR_AR), whether do you think it is OK?
  Of course I will update the comments that you ever mentioned.

  By the way, If using "kill -s SIGBUS xxx" command to send SIGBUS to QEMU main thread, it indeed will be handled by signalfd.

> 
> thanks
> -- PMM
> .
>
diff mbox series

Patch

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 141342d..3b22504 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -379,8 +379,7 @@  bool kvm_vcpu_id_is_valid(int vcpu_id);
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
-#ifdef TARGET_I386
-#define KVM_HAVE_MCE_INJECTION 1
+#ifdef KVM_HAVE_MCE_INJECTION
 void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 #endif
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5f70e9e..723bdb9 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -28,6 +28,10 @@ 
 /* ARM processors have a weak memory model */
 #define TCG_GUEST_DEFAULT_MO      (0)
 
+#ifdef TARGET_AARCH64
+#define KVM_HAVE_MCE_INJECTION 1
+#endif
+
 #define EXCP_UDEF            1   /* undefined instruction */
 #define EXCP_SWI             2   /* software interrupt */
 #define EXCP_PREFETCH_ABORT  3
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 5074b5f..05bffd3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3045,7 +3045,7 @@  static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
              * Report exception with ESR indicating a fault due to a
              * translation table walk for a cache maintenance instruction.
              */
-            syn = syn_data_abort_no_iss(current_el == target_el,
+            syn = syn_data_abort_no_iss(current_el == target_el, 0,
                                         fi.ea, 1, fi.s1ptw, 1, fsc);
             env->exception.vaddress = value;
             env->exception.fsr = fsr;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index f5313dd..28b8451 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -451,13 +451,14 @@  static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
         | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
 }
 
-static inline uint32_t syn_data_abort_no_iss(int same_el,
+static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
                                              int ea, int cm, int s1ptw,
                                              int wnr, int fsc)
 {
     return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
            | ARM_EL_IL
-           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
+           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
+           | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_data_abort_with_iss(int same_el,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 876184b..f3b05c1 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -28,6 +28,8 @@ 
 #include "kvm_arm.h"
 #include "hw/boards.h"
 #include "internals.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/ghes.h"
 
 static bool have_guest_debug;
 
@@ -843,6 +845,30 @@  int kvm_arm_cpreg_level(uint64_t regidx)
     return KVM_PUT_RUNTIME_STATE;
 }
 
+/* Callers must hold the iothread mutex lock */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+    ARMCPU *cpu = ARM_CPU(c);
+    CPUARMState *env = &cpu->env;
+    CPUClass *cc = CPU_GET_CLASS(c);
+    uint32_t esr;
+    bool same_el;
+
+    c->exception_index = EXCP_DATA_ABORT;
+    env->exception.target_el = 1;
+
+    /*
+     * Set the DFSC to synchronous external abort and set FnV to not valid,
+     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
+     */
+    same_el = arm_current_el(env) == env->exception.target_el;
+    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
+
+    env->exception.syndrome = esr;
+
+    cc->do_interrupt(c);
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
                  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
@@ -1295,6 +1321,46 @@  int kvm_arch_get_registers(CPUState *cs)
     return ret;
 }
 
+void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+{
+    ram_addr_t ram_addr;
+    hwaddr paddr;
+
+    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+
+    if (acpi_enabled && addr &&
+            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
+        ram_addr = qemu_ram_addr_from_host(addr);
+        if (ram_addr != RAM_ADDR_INVALID &&
+            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+            kvm_hwpoison_page_add(ram_addr);
+            /*
+             * Asynchronous signal will be masked by main thread, so
+             * only handle synchronous signal.
+             */
+            if (code == BUS_MCEERR_AR) {
+                kvm_cpu_synchronize_state(c);
+                if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+                    kvm_inject_arm_sea(c);
+                } else {
+                    error_report("failed to record the error");
+                    abort();
+                }
+            }
+            return;
+        }
+        if (code == BUS_MCEERR_AO) {
+            error_report("Hardware memory error at addr %p for memory used by "
+                "QEMU itself instead of guest system!", addr);
+        }
+    }
+
+    if (code == BUS_MCEERR_AR) {
+        error_report("Hardware memory error!");
+        exit(1);
+    }
+}
+
 /* C6.6.29 BRK instruction */
 static const uint32_t brk_insn = 0xd4200000;
 
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 5feb312..499672e 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -33,7 +33,7 @@  static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
      * ISV field.
      */
     if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
-        syn = syn_data_abort_no_iss(same_el,
+        syn = syn_data_abort_no_iss(same_el, 0,
                                     ea, 0, s1ptw, is_write, fsc);
     } else {
         /*
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index af28293..1a0dbc1 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -29,6 +29,8 @@ 
 /* The x86 has a strong memory model with some store-after-load re-ordering */
 #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#define KVM_HAVE_MCE_INJECTION 1
+
 /* Maximum instruction code size */
 #define TARGET_MAX_INSN_SIZE 16