Message ID | 1502122117-6276-1-git-send-email-gengdongjiu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
+James, since he deals with all things RAS. Please keep him on CC at all times. On 07/08/17 17:08, Dongjiu Geng wrote: > For the firmware-first RAS solution, SEA and SEI is injected > by the user space, user space needs to know the esr_el2 and > far_el2's value, so add them to sysreg. user space uses > the IOCTL KVM_GET_ONE_REG can get their value. No. This has zero purpose being exposed to userspace. Userspace sees a VM that runs at EL1, and nothing else, so exposing EL2 registers doesn't make *any* sense. If you want something to be exposed to userspace, it has to be properly abstracted and describe something that is relevant to the VM. An EL2 register satisfies none of these conditions. > > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> > Signed-off-by: Quanming Wu <wuquanming@huawei.com> > --- > arch/arm64/include/asm/kvm_host.h | 6 ++++-- > arch/arm64/kvm/sys_regs.c | 6 ++++++ > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index b6242fb..6063eec 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -103,10 +103,12 @@ enum vcpu_sysreg { > TTBR0_EL1, /* Translation Table Base Register 0 */ > TTBR1_EL1, /* Translation Table Base Register 1 */ > TCR_EL1, /* Translation Control Register */ > - ESR_EL1, /* Exception Syndrome Register */ > + ESR_EL1, /* Exception Syndrome Register for EL1 */ > + ESR_EL2, /* Exception Syndrome Register for EL2 */ > AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ > AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ > - FAR_EL1, /* Fault Address Register */ > + FAR_EL1, /* Fault Address Register for EL1 */ > + FAR_EL2, /* Fault Address Register for EL2 */ > MAIR_EL1, /* Memory Attribute Indirection Register */ > VBAR_EL1, /* Vector Base Address Register */ > CONTEXTIDR_EL1, /* Context ID Register */ > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 0e26f8c..0c286bf 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { > /* ESR_EL1 */ > { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000), > access_vm_reg, reset_unknown, ESR_EL1 }, > + /* ESR_EL2 */ > + { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000), > + access_vm_reg, reset_unknown, ESR_EL2 }, > /* FAR_EL1 */ > { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000), > access_vm_reg, reset_unknown, FAR_EL1 }, > + /* FAR_EL2 */ > + { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000), > + access_vm_reg, reset_unknown, FAR_EL2 }, > /* PAR_EL1 */ > { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000), > NULL, reset_unknown, PAR_EL1 }, > Also, what do you return here? All you're doing to return to userspace is 0x1de7ec7edbadc0deULL (which perfectly matches this patch). So for all intents and purposes, this patch is pretty useless. Thanks, M.
Hi Marc, As James's suggestion, I move injection SEA Error logic to the user space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 using far_el2, because this is synchronization abort. On 2017/8/7 23:57, Marc Zyngier wrote: > +James, since he deals with all things RAS. Please keep him on CC at all > times. > > On 07/08/17 17:08, Dongjiu Geng wrote: >> For the firmware-first RAS solution, SEA and SEI is injected >> by the user space, user space needs to know the esr_el2 and >> far_el2's value, so add them to sysreg. user space uses >> the IOCTL KVM_GET_ONE_REG can get their value. > > No. > > This has zero purpose being exposed to userspace. Userspace sees a VM > that runs at EL1, and nothing else, so exposing EL2 registers doesn't > make *any* sense. > > If you want something to be exposed to userspace, it has to be properly > abstracted and describe something that is relevant to the VM. An EL2 > register satisfies none of these conditions. > >> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> >> Signed-off-by: Quanming Wu <wuquanming@huawei.com> >> --- >> arch/arm64/include/asm/kvm_host.h | 6 ++++-- >> arch/arm64/kvm/sys_regs.c | 6 ++++++ >> 2 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >> index b6242fb..6063eec 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -103,10 +103,12 @@ enum vcpu_sysreg { >> TTBR0_EL1, /* Translation Table Base Register 0 */ >> TTBR1_EL1, /* Translation Table Base Register 1 */ >> TCR_EL1, /* Translation Control Register */ >> - ESR_EL1, /* Exception Syndrome Register */ >> + ESR_EL1, /* Exception Syndrome Register for EL1 */ >> + ESR_EL2, /* Exception Syndrome Register for EL2 */ >> AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ >> AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ >> - FAR_EL1, /* Fault Address Register */ >> + FAR_EL1, /* Fault Address Register for EL1 */ >> + FAR_EL2, /* Fault Address Register for EL2 */ >> MAIR_EL1, /* Memory Attribute Indirection Register */ >> VBAR_EL1, /* Vector Base Address Register */ >> CONTEXTIDR_EL1, /* Context ID Register */ >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >> index 0e26f8c..0c286bf 100644 >> --- a/arch/arm64/kvm/sys_regs.c >> +++ b/arch/arm64/kvm/sys_regs.c >> @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { >> /* ESR_EL1 */ >> { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000), >> access_vm_reg, reset_unknown, ESR_EL1 }, >> + /* ESR_EL2 */ >> + { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000), >> + access_vm_reg, reset_unknown, ESR_EL2 }, >> /* FAR_EL1 */ >> { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000), >> access_vm_reg, reset_unknown, FAR_EL1 }, >> + /* FAR_EL2 */ >> + { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000), >> + access_vm_reg, reset_unknown, FAR_EL2 }, >> /* PAR_EL1 */ >> { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000), >> NULL, reset_unknown, PAR_EL1 }, >> > > Also, what do you return here? All you're doing to return to userspace > is 0x1de7ec7edbadc0deULL (which perfectly matches this patch). > > So for all intents and purposes, this patch is pretty useless. > > Thanks, > > M. >
On 07/08/17 17:23, gengdongjiu wrote: > Hi Marc, > As James's suggestion, I move injection SEA Error logic to the user space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr > through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped > instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 using far_el2, because this is synchronization abort. Usespace may need some fault information, but certainly not the full set of FAR_EL2/ESR_EL2. What it needs is a very small set of well defined information, properly abstracted, and not data that is completely private to the hypervisor. Thanks, M. > > > > > On 2017/8/7 23:57, Marc Zyngier wrote: >> +James, since he deals with all things RAS. Please keep him on CC at all >> times. >> >> On 07/08/17 17:08, Dongjiu Geng wrote: >>> For the firmware-first RAS solution, SEA and SEI is injected >>> by the user space, user space needs to know the esr_el2 and >>> far_el2's value, so add them to sysreg. user space uses >>> the IOCTL KVM_GET_ONE_REG can get their value. >> >> No. >> >> This has zero purpose being exposed to userspace. Userspace sees a VM >> that runs at EL1, and nothing else, so exposing EL2 registers doesn't >> make *any* sense. >> >> If you want something to be exposed to userspace, it has to be properly >> abstracted and describe something that is relevant to the VM. An EL2 >> register satisfies none of these conditions. >> >>> >>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> >>> Signed-off-by: Quanming Wu <wuquanming@huawei.com> >>> --- >>> arch/arm64/include/asm/kvm_host.h | 6 ++++-- >>> arch/arm64/kvm/sys_regs.c | 6 ++++++ >>> 2 files changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>> index b6242fb..6063eec 100644 >>> --- a/arch/arm64/include/asm/kvm_host.h >>> +++ b/arch/arm64/include/asm/kvm_host.h >>> @@ -103,10 +103,12 @@ enum vcpu_sysreg { >>> TTBR0_EL1, /* Translation Table Base Register 0 */ >>> TTBR1_EL1, /* Translation Table Base Register 1 */ >>> TCR_EL1, /* Translation Control Register */ >>> - ESR_EL1, /* Exception Syndrome Register */ >>> + ESR_EL1, /* Exception Syndrome Register for EL1 */ >>> + ESR_EL2, /* Exception Syndrome Register for EL2 */ >>> AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ >>> AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ >>> - FAR_EL1, /* Fault Address Register */ >>> + FAR_EL1, /* Fault Address Register for EL1 */ >>> + FAR_EL2, /* Fault Address Register for EL2 */ >>> MAIR_EL1, /* Memory Attribute Indirection Register */ >>> VBAR_EL1, /* Vector Base Address Register */ >>> CONTEXTIDR_EL1, /* Context ID Register */ >>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >>> index 0e26f8c..0c286bf 100644 >>> --- a/arch/arm64/kvm/sys_regs.c >>> +++ b/arch/arm64/kvm/sys_regs.c >>> @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { >>> /* ESR_EL1 */ >>> { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000), >>> access_vm_reg, reset_unknown, ESR_EL1 }, >>> + /* ESR_EL2 */ >>> + { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000), >>> + access_vm_reg, reset_unknown, ESR_EL2 }, >>> /* FAR_EL1 */ >>> { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000), >>> access_vm_reg, reset_unknown, FAR_EL1 }, >>> + /* FAR_EL2 */ >>> + { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000), >>> + access_vm_reg, reset_unknown, FAR_EL2 }, >>> /* PAR_EL1 */ >>> { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000), >>> NULL, reset_unknown, PAR_EL1 }, >>> >> >> Also, what do you return here? All you're doing to return to userspace >> is 0x1de7ec7edbadc0deULL (which perfectly matches this patch). >> >> So for all intents and purposes, this patch is pretty useless. >> >> Thanks, >> >> M. >> >
Hi gengdongjiu, On 07/08/17 17:23, gengdongjiu wrote: > As James's suggestion, I move injection SEA Error logic to the user space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr (because for firmware-first its the CPER records that matter, and only QEMU knows where it reserved the memory for these, and what it told the guest it would use as the notification method). > through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped > instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 using far_el2, because this is synchronization abort. The 32bit kernel doesn't support ACPI firmware first, and aarch64 doesn't support 16-bit instructions. James
Ok, thanks for James's confirmation. Another question, For the SEI, I want to also use SIGBUS both for the KVM user and non-kvm user, if SEA and SEI Error all use the SIGBUS to notify user space(Qemu), the user space(Qemu) will be confused, and do not know whether this is SEA or SEI error. so here I pass the sysreg ESR_EL2(vcpu->arch.fault.esr_el2) to the user space, let user space judges the (vcpu->arch.fault.esr_el2)'s value to know this is a SEA or SEI Error. do you agree my solution for the SEI? thanks. because the vcpu->arch.fault.esr_el2 can not directly passed to userspace, so I defined the vcpu->arch.fault.esr_el2 to sysreg ESR_EL2/FAR_EL2, sysreg register can pass to user space. + vcpu_sys_reg(vcpu,ESR_EL2) = kvm_vcpu_get_hsr(vcpu); + vcpu_sys_reg(vcpu,FAR_EL2) = kvm_vcpu_get_hfar(vcpu); On 2017/8/8 0:59, James Morse wrote: > Hi gengdongjiu, > > On 07/08/17 17:23, gengdongjiu wrote: >> As James's suggestion, I move injection SEA Error logic to the user space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr > > (because for firmware-first its the CPER records that matter, and only QEMU > knows where it reserved the memory for these, and what it told the guest it > would use as the notification method). > >> through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped >> instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 using far_el2, because this is synchronization abort. > > The 32bit kernel doesn't support ACPI firmware first, and aarch64 doesn't > support 16-bit instructions. thanks, so how about the SEA's error FAR_EL1's value? may be FAR_EL1's value get from FAR_EL2's value. > > > James > > > > . >
Marc, On 2017/8/8 0:56, Marc Zyngier wrote: > On 07/08/17 17:23, gengdongjiu wrote: >> Hi Marc, >> As James's suggestion, I move injection SEA Error logic to the user space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr >> through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped >> instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 using far_el2, because this is synchronization abort. > > Usespace may need some fault information, but certainly not the full set > of FAR_EL2/ESR_EL2. What it needs is a very small set of well defined > information, properly abstracted, and not data that is completely > private to the hypervisor. Marc, just now I update the patch, may be use the vcpu->arch.fault.esr_el2 and vcpu->arch.fault.far_el2 to set the vcpu_sys_reg(vcpu,FAR_EL2)/vcpu_sys_reg(vcpu,ESR_EL2) can be better. Now the user space can not directly get the vcpu->arch.faul.xxxx value, so need use vcpu_sys_reg to pass. > > Thanks, > > M. >> >> >> >> >> On 2017/8/7 23:57, Marc Zyngier wrote: >>> +James, since he deals with all things RAS. Please keep him on CC at all >>> times. >>> >>> On 07/08/17 17:08, Dongjiu Geng wrote: >>>> For the firmware-first RAS solution, SEA and SEI is injected >>>> by the user space, user space needs to know the esr_el2 and >>>> far_el2's value, so add them to sysreg. user space uses >>>> the IOCTL KVM_GET_ONE_REG can get their value. >>> >>> No. >>> >>> This has zero purpose being exposed to userspace. Userspace sees a VM >>> that runs at EL1, and nothing else, so exposing EL2 registers doesn't >>> make *any* sense. >>> >>> If you want something to be exposed to userspace, it has to be properly >>> abstracted and describe something that is relevant to the VM. An EL2 >>> register satisfies none of these conditions. >>> >>>> >>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> >>>> Signed-off-by: Quanming Wu <wuquanming@huawei.com> >>>> --- >>>> arch/arm64/include/asm/kvm_host.h | 6 ++++-- >>>> arch/arm64/kvm/sys_regs.c | 6 ++++++ >>>> 2 files changed, 10 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>>> index b6242fb..6063eec 100644 >>>> --- a/arch/arm64/include/asm/kvm_host.h >>>> +++ b/arch/arm64/include/asm/kvm_host.h >>>> @@ -103,10 +103,12 @@ enum vcpu_sysreg { >>>> TTBR0_EL1, /* Translation Table Base Register 0 */ >>>> TTBR1_EL1, /* Translation Table Base Register 1 */ >>>> TCR_EL1, /* Translation Control Register */ >>>> - ESR_EL1, /* Exception Syndrome Register */ >>>> + ESR_EL1, /* Exception Syndrome Register for EL1 */ >>>> + ESR_EL2, /* Exception Syndrome Register for EL2 */ >>>> AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ >>>> AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ >>>> - FAR_EL1, /* Fault Address Register */ >>>> + FAR_EL1, /* Fault Address Register for EL1 */ >>>> + FAR_EL2, /* Fault Address Register for EL2 */ >>>> MAIR_EL1, /* Memory Attribute Indirection Register */ >>>> VBAR_EL1, /* Vector Base Address Register */ >>>> CONTEXTIDR_EL1, /* Context ID Register */ >>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >>>> index 0e26f8c..0c286bf 100644 >>>> --- a/arch/arm64/kvm/sys_regs.c >>>> +++ b/arch/arm64/kvm/sys_regs.c >>>> @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { >>>> /* ESR_EL1 */ >>>> { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000), >>>> access_vm_reg, reset_unknown, ESR_EL1 }, >>>> + /* ESR_EL2 */ >>>> + { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000), >>>> + access_vm_reg, reset_unknown, ESR_EL2 }, >>>> /* FAR_EL1 */ >>>> { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000), >>>> access_vm_reg, reset_unknown, FAR_EL1 }, >>>> + /* FAR_EL2 */ >>>> + { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000), >>>> + access_vm_reg, reset_unknown, FAR_EL2 }, >>>> /* PAR_EL1 */ >>>> { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000), >>>> NULL, reset_unknown, PAR_EL1 }, >>>> >>> >>> Also, what do you return here? All you're doing to return to userspace >>> is 0x1de7ec7edbadc0deULL (which perfectly matches this patch). >>> >>> So for all intents and purposes, this patch is pretty useless. >>> >>> Thanks, >>> >>> M. >>> >> > >
Hi gengdongjiu, On 07/08/17 18:43, gengdongjiu wrote: > Another question, For the SEI, I want to also use SIGBUS both for the KVM user and non-kvm user, > if SEA and SEI Error all use the SIGBUS to notify user space(Qemu), User-space shouldn't necessarily be notified about Synchronous External Aborts or SError Interrupts. You're really asking about RAS firmware-first notifications that use these as the notification mechanism. We should not notify user-space that the guest happened to be interrupted by a RAS firmware-first notification. It may not be relevant, and we can't know until we parse the CPER records. The notification mechanism is between firmware and the host kernel, we should never expose anything about it to user space or a guest. Linux should act on the CPER records first to determine if the host kernel can keep running. Once it has done this it can deliver signals to affected processes, but which signal and its properties depends on the CPER records. The example here is BUS_MCEERR_AO and BUS_MCEERR_AR. These notify userspace that si_addr_lsb bits of memory are corrupt at si_addr, this is either Action-Optional or Action-Required. For arm64 we just needed to turn this code on, it already presents the minimum necessary information to user-space in an architecture-agnostic way. We didn't need to do anything to this code to support NOTIFY_SEA, the notification mechanism is irrelevant, this is all driven by the CPER records. If you have a class of error that isn't covered by the memory-failure code, then we need to add something similar. This should be based on the CPER records, and should work in exactly the same way for all processes on all ACPI platforms. > do you agree my solution for the SEI? thanks. No, you are trying to notify userspace that firmware notified the host. This creates an ABI between EL3 firmware and EL0 user space that we can't possibly support. I think you've come to this because you are merging two steps together: 1. The OS uses the v8.2 RAS extensions to isolate errors and notify firmware. 2. If firmware has to tell the OS about the error, firmware generates CPER records. 3. Firmware triggers the GHES notification mechanism for this error source. 4. Linux receives the notification and calls ghes_proc(), (if KVM gets the notification because a guest happened to be running, it should switch back to the host and arrange for ghes_proc() to be called). 5. ghes_proc() parses the CPER records and calls other kernel helpers to handle the specific type of error, e.g. memory_failure(). 6. If the helper knows the kernel can keep running, the error is visible to user-space and user space could do further processing to correct the error, an error-specific signal is sent. 7. User-space reloads the webpage, notifies the guest or whatever is appropriate. You are merging steps 3 and 7. The notification method is an abstraction that only matters to steps 3&4, this lets us add more without rewriting the world. User-space signals are an abstraction between steps 6&7, this works across architectures, even those not using APEI firmware first. There are no shortcuts here. Doing anything else creates more work for user space, another platform or another architecture. Thanks, James
Hi James, On 2017/8/9 0:27, James Morse wrote: > Hi gengdongjiu, > > On 07/08/17 18:43, gengdongjiu wrote: >> Another question, For the SEI, I want to also use SIGBUS both for the KVM user and non-kvm user, >> if SEA and SEI Error all use the SIGBUS to notify user space(Qemu), > > User-space shouldn't necessarily be notified about Synchronous External Aborts > or SError Interrupts. You're really asking about RAS firmware-first > notifications that use these as the notification mechanism. Firstly, we am talking the RAS firmware-first solution. I mainly want to let user space to know what is the Error type for this hardware error(Synchronous or asynchronous). we do not care the notification mechanism. As our agreement before , Qemu will record the CPER for the guest OS. if Qemu does not know the Error type, it can not record the CPER. because in the ghes there is a field to fill the error type. I paste the APEI table layout: https://wiki.linaro.org/LEG/Engineering/Kernel/RAS/APEITables usual the notification type is classified by the hardware type > > We should not notify user-space that the guest happened to be interrupted by a > RAS firmware-first notification. It may not be relevant, and we can't know until > we parse the CPER records. The notification mechanism is between firmware and > the host kernel, we should never expose anything about it to user space or a guest. I agree with you this sentence, for the hardware error, host kernel will firstly deal with, and then decided whether to notified Qemu/KVM tools. In this process, we do not care what is the notification mechanism between firmware and host kernel. we only concern the hardware error type. different type, Qemu/KVM tools will have different behavior. > > Linux should act on the CPER records first to determine if the host kernel can > keep running. Once it has done this it can deliver signals to affected > processes, but which signal and its properties depends on the CPER records. if want Qemu to handle this Error, I think qemu/kvmtools should know hardware error type, else it will be confused and do not know how to deal with. > > The example here is BUS_MCEERR_AO and BUS_MCEERR_AR. These notify userspace that > si_addr_lsb bits of memory are corrupt at si_addr, this is either > Action-Optional or Action-Required. > > For arm64 we just needed to turn this code on, it already presents the minimum > necessary information to user-space in an architecture-agnostic way. We didn't > need to do anything to this code to support NOTIFY_SEA, the notification > mechanism is irrelevant, this is all driven by the CPER records. we do not care the notification type, we are only care the hardware error type. if user-space do not know the error type, it can not record the CPER and can not inject the proper Error to guest OS. because record CPER and inject the Error to guest OS need this hardware error type. different Error type, there is different behavior. For different hardware error type, X86 Qemu/kvm tools code also have different behaviour in > > If you have a class of error that isn't covered by the memory-failure code, then > we need to add something similar. This should be based on the CPER records, and > should work in exactly the same way for all processes on all ACPI platforms. > > >> do you agree my solution for the SEI? thanks. > > No, you are trying to notify userspace that firmware notified the host. This > creates an ABI between EL3 firmware and EL0 user space that we can't possibly > support. you may misunderstand I mentioned solution here. I mean using memory-failure code to signal user space for the SError(SEI), this way does not creates any ABI. SEA/SEI all use same method. Qemu can judge the ESR to know the hardware type. > > I think you've come to this because you are merging two steps together: > 1. The OS uses the v8.2 RAS extensions to isolate errors and notify firmware. > 2. If firmware has to tell the OS about the error, firmware generates CPER records. > 3. Firmware triggers the GHES notification mechanism for this error source. > 4. Linux receives the notification and calls ghes_proc(), (if KVM gets the > notification because a guest happened to be running, it should switch back to > the host and arrange for ghes_proc() to be called). > 5. ghes_proc() parses the CPER records and calls other kernel helpers to handle > the specific type of error, e.g. memory_failure(). > 6. If the helper knows the kernel can keep running, the error is visible to > user-space and user space could do further processing to correct the error, an > error-specific signal is sent. > 7. User-space reloads the webpage, notifies the guest or whatever is appropriate. > > You are merging steps 3 and 7. No, not merge them. in above steps, the steps 7 needs to know the hardware type. if it does not know the hardware error type, user space can not correctly do further processing. we do not care the notification type. we only care what is the hardware error type. > > The notification method is an abstraction that only matters to steps 3&4, this > lets us add more without rewriting the world. > User-space signals are an abstraction between steps 6&7, this works across > architectures, even those not using APEI firmware first. so here can I understand that you also agree that SError type can also use signals to be delivered to user space? > There are no shortcuts here. Doing anything else creates more work for user > space, another platform or another architecture. > > > Thanks, > > James > > . >
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index b6242fb..6063eec 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -103,10 +103,12 @@ enum vcpu_sysreg { TTBR0_EL1, /* Translation Table Base Register 0 */ TTBR1_EL1, /* Translation Table Base Register 1 */ TCR_EL1, /* Translation Control Register */ - ESR_EL1, /* Exception Syndrome Register */ + ESR_EL1, /* Exception Syndrome Register for EL1 */ + ESR_EL2, /* Exception Syndrome Register for EL2 */ AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ - FAR_EL1, /* Fault Address Register */ + FAR_EL1, /* Fault Address Register for EL1 */ + FAR_EL2, /* Fault Address Register for EL2 */ MAIR_EL1, /* Memory Attribute Indirection Register */ VBAR_EL1, /* Vector Base Address Register */ CONTEXTIDR_EL1, /* Context ID Register */ diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0e26f8c..0c286bf 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { /* ESR_EL1 */ { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000), access_vm_reg, reset_unknown, ESR_EL1 }, + /* ESR_EL2 */ + { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000), + access_vm_reg, reset_unknown, ESR_EL2 }, /* FAR_EL1 */ { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000), access_vm_reg, reset_unknown, FAR_EL1 }, + /* FAR_EL2 */ + { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000), + access_vm_reg, reset_unknown, FAR_EL2 }, /* PAR_EL1 */ { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000), NULL, reset_unknown, PAR_EL1 },