Message ID | 1503916701-13516-5-git-send-email-gengdongjiu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Dongjiu Geng, On 28/08/17 11:38, Dongjiu Geng wrote: > In ARMV8.2 RAS extension, a virtual SError exception syndrome > register(VSESR_EL2) is added. This value may be specified from > userspace. I agree that the CPU support for injecting SErrors with a specified ESR should be exposed to KVM's user space... > Userspace will want to check if the CPU has the RAS > extension. ... but user-space wants to know if it can inject SErrors with a specified ESR. What if we gain another way of doing this that isn't via the RAS-extensions, now user-space has to check for two capabilities. > If it has, it wil specify the virtual SError syndrome > value, otherwise it will not be set. This patch adds support for > querying the availability of this extension. I'm against telling user-space what features the CPU has unless it can use them directly. In this case we are talking about a KVM API, so we should describe the API not the CPU. > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 3256b9228e75..b7313ee028e9 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_ARM_PMU_V3: > r = kvm_arm_support_pmu_v3(); > break; > + case KVM_CAP_ARM_RAS_EXTENSION: This should be called something more like 'KVM_CAP_ARM_INJECT_SERROR_ESR' > + r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN); > + break; > case KVM_CAP_SET_GUEST_DEBUG: > case KVM_CAP_VCPU_ATTRIBUTES: > r = 1; We can inject SError on systems without the RAS extensions using just the HCR_EL2.VSE bit. We may want to make the 'ESR' part of the API optional, or expose '1' for the without-ESR version and '2 for with-ESR, (however we choose to implement that). The risk is if we want to add a without-ESR version later, and the name we make ABI now turned out to be a mistake. Marc or Christoffer probably have the best view of this. (no-one has needed it so far...) Thanks, James
James, On 2017/9/1 2:04, James Morse wrote: > Hi Dongjiu Geng, > > On 28/08/17 11:38, Dongjiu Geng wrote: >> In ARMV8.2 RAS extension, a virtual SError exception syndrome >> register(VSESR_EL2) is added. This value may be specified from >> userspace. > > I agree that the CPU support for injecting SErrors with a specified ESR should > be exposed to KVM's user space... Ok, thanks. > > >> Userspace will want to check if the CPU has the RAS >> extension. > > ... but user-space wants to know if it can inject SErrors with a specified ESR. > > What if we gain another way of doing this that isn't via the RAS-extensions, now > user-space has to check for two capabilities. > > >> If it has, it wil specify the virtual SError syndrome >> value, otherwise it will not be set. This patch adds support for >> querying the availability of this extension. > > I'm against telling user-space what features the CPU has unless it can use them > directly. In this case we are talking about a KVM API, so we should describe the > API not the CPU. shenglong (zhaoshenglong@huawei.com) who is Qemu maintainer suggested checking the CPU RAS-extensions to decide whether generate the APEI table and record CPER for the guest OS in the user space. he means if the host does not support RAS, user space may also not support RAS. > > >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c >> index 3256b9228e75..b7313ee028e9 100644 >> --- a/arch/arm64/kvm/reset.c >> +++ b/arch/arm64/kvm/reset.c >> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) >> case KVM_CAP_ARM_PMU_V3: >> r = kvm_arm_support_pmu_v3(); >> break; >> + case KVM_CAP_ARM_RAS_EXTENSION: > > This should be called something more like 'KVM_CAP_ARM_INJECT_SERROR_ESR' I understand your suggestion. > > >> + r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN); >> + break; >> case KVM_CAP_SET_GUEST_DEBUG: >> case KVM_CAP_VCPU_ATTRIBUTES: >> r = 1; > > > We can inject SError on systems without the RAS extensions using just the > HCR_EL2.VSE bit. We may want to make the 'ESR' part of the API optional, or > expose '1' for the without-ESR version and '2 for with-ESR, (however we choose > to implement that). > > The risk is if we want to add a without-ESR version later, and the name we make > ABI now turned out to be a mistake. Marc or Christoffer probably have the best > view of this. (no-one has needed it so far...) > > > Thanks, > > James > > > . >
Hi gengdongjiu, On 05/09/17 08:18, gengdongjiu wrote: > On 2017/9/1 2:04, James Morse wrote: >> On 28/08/17 11:38, Dongjiu Geng wrote: >>> Userspace will want to check if the CPU has the RAS >>> extension. >> >> ... but user-space wants to know if it can inject SErrors with a specified ESR. >> >> What if we gain another way of doing this that isn't via the RAS-extensions, now >> user-space has to check for two capabilities. >> >> >>> If it has, it wil specify the virtual SError syndrome >>> value, otherwise it will not be set. This patch adds support for >>> querying the availability of this extension. >> >> I'm against telling user-space what features the CPU has unless it can use them >> directly. In this case we are talking about a KVM API, so we should describe the >> API not the CPU. > > shenglong (zhaoshenglong@huawei.com) who is Qemu maintainer suggested checking the CPU RAS-extensions > to decide whether generate the APEI table and record CPER for the guest OS in the user space. > he means if the host does not support RAS, user space may also not support RAS. The code to signal memory-failure to user-space doesn't depend on the CPU's RAS-extensions. If Qemu supports notifying the guest about RAS errors using CPER records, it should generate a HEST describing firmware first. It can then choose the notification methods, some of which may require optional KVM APIs to support. Seattle has a HEST, it doesn't support the CPU RAS-extensions. The kernel can notify user-space about memory_failure() on this machine. I would expect Qemu to be able to receive signals and describe memory errors to a guest (1). The question should be: 'How can Qemu know it can use SEI as a firmware-first notification?' It needs a KVM API to trigger an SError in the guest with a specified ESR. The name of the KVM CAP needs to reflect the API (2). Just because this is the first KVM API that needs the CPU to have the RAS extensions doesn't mean we should call it 'has RAS' and be done with it. We will eventually need another KVM API to configure trapping and emulating values in the RAS ERR registers so that Qemu can emulate a machine without firmware-first. (This is likely to be a page of memory that backs the registers, there will need to be another KVM CAP to describe this support (3)). Exposing the CPUs support for RAS-extensions to support (2) means having per-platform support for (1). This is either creating extra work, or not supporting as many platforms as we could. Both are bad. Once we have (3) as well, any developer needs to know that 'has RAS' just meant the first API KVM implemented using RAS, and doesn't mean later APIs also using RAS are supported by the kernel. Thanks, James
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 3256b9228e75..b7313ee028e9 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_PMU_V3: r = kvm_arm_support_pmu_v3(); break; + case KVM_CAP_ARM_RAS_EXTENSION: + r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN); + break; case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_VCPU_ATTRIBUTES: r = 1; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6cd63c18708a..5a2a338cae57 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -929,6 +929,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_SMT_POSSIBLE 147 #define KVM_CAP_HYPERV_SYNIC2 148 #define KVM_CAP_HYPERV_VP_INDEX 149 +#define KVM_CAP_ARM_RAS_EXTENSION 150 #ifdef KVM_CAP_IRQ_ROUTING
In ARMV8.2 RAS extension, a virtual SError exception syndrome register(VSESR_EL2) is added. This value may be specified from userspace. Userspace will want to check if the CPU has the RAS extension. If it has, it wil specify the virtual SError syndrome value, otherwise it will not be set. This patch adds support for querying the availability of this extension. change since v5: 1. modify some patch description Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> --- arch/arm64/kvm/reset.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+)