From patchwork Mon Sep 30 11:57:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 13815930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1CC79CF6497 for ; Mon, 30 Sep 2024 11:58:11 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.807306.1218673 (Exim 4.92) (envelope-from ) id 1svF2O-0007p2-EU; Mon, 30 Sep 2024 11:58:00 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 807306.1218673; Mon, 30 Sep 2024 11:58:00 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF2O-0007ov-Ao; Mon, 30 Sep 2024 11:58:00 +0000 Received: by outflank-mailman (input) for mailman id 807306; Mon, 30 Sep 2024 11:57:59 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF2M-0007YI-Sh for xen-devel@lists.xenproject.org; Mon, 30 Sep 2024 11:57:59 +0000 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [2a00:1450:4864:20::533]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 3c41286c-7f23-11ef-a0ba-8be0dac302b0; Mon, 30 Sep 2024 13:57:58 +0200 (CEST) Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5c87c7d6ad4so4070404a12.3 for ; Mon, 30 Sep 2024 04:57:58 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c2945f2esm519701466b.117.2024.09.30.04.57.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 04:57:57 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 3c41286c-7f23-11ef-a0ba-8be0dac302b0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1727697478; x=1728302278; darn=lists.xenproject.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=IwGX/SkbzLq4zJqJalyoWTF+TyvScc7TAFVMyq5kwXg=; b=gKybBAt1GwN4RUcd7hpuWgKOhgLTTCOLGGEtHpGCP9ahc6CeL/aqSSBvz8DUlsrkDJ eriH+CG14LZGjkCw5aiyRrbH9nJ8lfeQmsjD07rXcWaPDK7QfC6t9Uz1Gt4o4I+hl+lD 6Ft4qpE0Qx7EqqIA3mxsO75tOYsEldOj9NkrwRAj10NtlyrLGkOEDXuLIlh4f2rkcCxk KbZmt1bTgCPPcfWHnh56ATYv0/q3vNVk7KQWgS5RDoAfuBnJYhfhvl/dlW3Q/aX0UAEH hIIdNLKP/ZBBO1SBWlAYyaY6jRl8doqkm40ZKMDbo7QmV4d7fuE3onk73pdF8kZ21TaZ ErxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727697478; x=1728302278; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IwGX/SkbzLq4zJqJalyoWTF+TyvScc7TAFVMyq5kwXg=; b=xOU/KS/nMwhtTNQycHa1kevM08ayyHtHVhnFVBVUZblQK1ISH4Z5LrNbY9Vf+rCFil 8lLVobW9L4ffUehCiAkvsrdX/PPsI9StIwyk3VI3a+heYpBaUFMvDp/HHdsnnna1/6ll hNAR/nG6nEefVCm8PeS99vh/Ql27jf/P8+IUrb/NkLNT9k0Wmt4gqQYNJC1leB+vIqrW SCnVNdxcLguO8xszBfTZkNgRgKSHqEE7n0OC7LLRT5QxeYs+gcQX2OQ4lD9dKti96mjq pCJKadaw/+y3Rf++JWDcDW2FjNbjabJAyLqVxYeyAN+GqjpaAHGedzKU5/2ZqrzKOe+s H6Mw== X-Gm-Message-State: AOJu0Yz69GnsGRnEPyU9VAdf3SZF1hlSECaRL1A8rSay7eR4PCEZYjDQ Cwf8INF458BhYv7vIUexpFHuLbc7Uh4Gh83v3ovPy6qSMW7Vul98+bmzb3reKWOhg39F0ehTdng = X-Google-Smtp-Source: AGHT+IFLuerBbh6NdD+AmEvpPMlz1l2JojqYKRPb2jM+uS7+oCECo8D2UA7aNLMmFTJLo7o0B+7dCg== X-Received: by 2002:a17:907:3203:b0:a86:851e:3a2b with SMTP id a640c23a62f3a-a93c49299e7mr1247644966b.29.1727697477453; Mon, 30 Sep 2024 04:57:57 -0700 (PDT) Message-ID: <7c186e8f-1902-4688-a29a-d59cf5c84709@suse.com> Date: Mon, 30 Sep 2024 13:57:56 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v6 1/5] x86emul: support LKGS From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= References: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Provide support for this insn, which is a prereq to FRED. CPUID-wise introduce both its, FRED's, and the NMI_SRC bit at this occasion, thus allowing to also express the dependency right away. While adding a testcase, also add a SWAPGS one. In order to not affect the behavior of pre-existing tests, install write_{segment,msr} hooks only transiently. Signed-off-by: Jan Beulich --- Instead of ->read_segment() we could of course also use ->read_msr() to fetch the original GS base. I don't think I can see a clear advantage of either approach; the way it's done it matches how we handle SWAPGS. For PV save_segments() would need adjustment, but the insn being restricted to ring 0 means PV guests can't use it anyway (unless we wanted to emulate it as another privileged insn). --- v6: Use MSR constants in test harness. S->s in cpufeatureset.h. Add NMI_SRC feature bits. Re-base. v5: Re-base. v3: Add dependency on LM. Re-base. v2: Use X86_EXC_*. Add comments. --- a/tools/tests/x86_emulator/predicates.c +++ b/tools/tests/x86_emulator/predicates.c @@ -326,6 +326,7 @@ static const struct { { { 0x00, 0x18 }, { 2, 2 }, T, R }, /* ltr */ { { 0x00, 0x20 }, { 2, 2 }, T, R }, /* verr */ { { 0x00, 0x28 }, { 2, 2 }, T, R }, /* verw */ + { { 0x00, 0x30 }, { 0, 2 }, T, R, pfx_f2 }, /* lkgs */ { { 0x01, 0x00 }, { 2, 2 }, F, W }, /* sgdt */ { { 0x01, 0x08 }, { 2, 2 }, F, W }, /* sidt */ { { 0x01, 0x10 }, { 2, 2 }, F, R }, /* lgdt */ --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -672,6 +672,10 @@ static int blk( return x86_emul_blk((void *)offset, p_data, bytes, eflags, state, ctxt); } +#ifdef __x86_64__ +static unsigned long gs_base, gs_base_shadow; +#endif + static int read_segment( enum x86_segment seg, struct segment_register *reg, @@ -681,8 +685,30 @@ static int read_segment( return X86EMUL_UNHANDLEABLE; memset(reg, 0, sizeof(*reg)); reg->p = 1; + +#ifdef __x86_64__ + if ( seg == x86_seg_gs ) + reg->base = gs_base; +#endif + + return X86EMUL_OKAY; +} + +#ifdef __x86_64__ +static int write_segment( + enum x86_segment seg, + const struct segment_register *reg, + struct x86_emulate_ctxt *ctxt) +{ + if ( !is_x86_user_segment(seg) ) + return X86EMUL_UNHANDLEABLE; + + if ( seg == x86_seg_gs ) + gs_base = reg->base; + return X86EMUL_OKAY; } +#endif static int read_msr( unsigned int reg, @@ -695,6 +721,20 @@ static int read_msr( *val = ctxt->addr_size > 32 ? EFER_LME | EFER_LMA : 0; return X86EMUL_OKAY; +#ifdef __x86_64__ + case MSR_GS_BASE: + if ( ctxt->addr_size < 64 ) + break; + *val = gs_base; + return X86EMUL_OKAY; + + case MSR_SHADOW_GS_BASE: + if ( ctxt->addr_size < 64 ) + break; + *val = gs_base_shadow; + return X86EMUL_OKAY; +#endif + case MSR_TSC_AUX: #define TSC_AUX_VALUE 0xCACACACA *val = TSC_AUX_VALUE; @@ -704,6 +744,31 @@ static int read_msr( return X86EMUL_UNHANDLEABLE; } +#ifdef __x86_64__ +static int write_msr( + unsigned int reg, + uint64_t val, + struct x86_emulate_ctxt *ctxt) +{ + switch ( reg ) + { + case MSR_GS_BASE: + if ( ctxt->addr_size < 64 || !is_canonical_address(val) ) + break; + gs_base = val; + return X86EMUL_OKAY; + + case MSR_SHADOW_GS_BASE: + if ( ctxt->addr_size < 64 || !is_canonical_address(val) ) + break; + gs_base_shadow = val; + return X86EMUL_OKAY; + } + + return X86EMUL_UNHANDLEABLE; +} +#endif + #define INVPCID_ADDR 0x12345678 #define INVPCID_PCID 0x123 @@ -1338,6 +1403,41 @@ int main(int argc, char **argv) printf("%u bytes read - ", bytes_read); goto fail; } + printf("okay\n"); + + emulops.write_segment = write_segment; + emulops.write_msr = write_msr; + + printf("%-40s", "Testing swapgs..."); + instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xf8; + regs.eip = (unsigned long)&instr[0]; + gs_base = 0xffffeeeecccc8888UL; + gs_base_shadow = 0x0000111122224444UL; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.eip != (unsigned long)&instr[3]) || + (gs_base != 0x0000111122224444UL) || + (gs_base_shadow != 0xffffeeeecccc8888UL) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing lkgs 2(%rdx)..."); + instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x00; instr[3] = 0x72; instr[4] = 0x02; + regs.eip = (unsigned long)&instr[0]; + regs.edx = (unsigned long)res; + res[0] = 0x00004444; + res[1] = 0x8888cccc; + i = cpu_policy.extd.nscb; cpu_policy.extd.nscb = true; /* for AMD */ + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.eip != (unsigned long)&instr[5]) || + (gs_base != 0x0000111122224444UL) || + gs_base_shadow ) + goto fail; + + cpu_policy.extd.nscb = i; + emulops.write_segment = NULL; + emulops.write_msr = NULL; #endif printf("okay\n"); --- a/tools/tests/x86_emulator/x86-emulate.c +++ b/tools/tests/x86_emulator/x86-emulate.c @@ -85,6 +85,7 @@ bool emul_test_init(void) cpu_policy.feat.invpcid = true; cpu_policy.feat.adx = true; cpu_policy.feat.rdpid = true; + cpu_policy.feat.lkgs = true; cpu_policy.feat.wrmsrns = true; cpu_policy.extd.clzero = true; --- a/xen/arch/x86/x86_emulate/decode.c +++ b/xen/arch/x86/x86_emulate/decode.c @@ -744,8 +744,12 @@ decode_twobyte(struct x86_emulate_state case 0: s->desc |= DstMem | SrcImplicit | Mov; break; + case 6: + if ( !(s->modrm_reg & 1) && mode_64bit() ) + { case 2: case 4: - s->desc |= SrcMem16; + s->desc |= SrcMem16; + } break; } break; --- a/xen/arch/x86/x86_emulate/private.h +++ b/xen/arch/x86/x86_emulate/private.h @@ -594,6 +594,7 @@ amd_like(const struct x86_emulate_ctxt * #define vcpu_has_avx_vnni() (ctxt->cpuid->feat.avx_vnni) #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16) #define vcpu_has_cmpccxadd() (ctxt->cpuid->feat.cmpccxadd) +#define vcpu_has_lkgs() (ctxt->cpuid->feat.lkgs) #define vcpu_has_wrmsrns() (ctxt->cpuid->feat.wrmsrns) #define vcpu_has_avx_ifma() (ctxt->cpuid->feat.avx_ifma) #define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2871,8 +2871,35 @@ x86_emulate( break; } break; - default: - generate_exception_if(true, X86_EXC_UD); + case 6: /* lkgs */ + generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2, + X86_EXC_UD); + generate_exception_if(!mode_64bit() || !mode_ring0(), X86_EXC_UD); + vcpu_must_have(lkgs); + fail_if(!ops->read_segment || !ops->read_msr || + !ops->write_segment || !ops->write_msr); + if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val, + ctxt)) != X86EMUL_OKAY || + (rc = ops->read_segment(x86_seg_gs, &sreg, + ctxt)) != X86EMUL_OKAY ) + goto done; + dst.orig_val = sreg.base; /* Preserve full GS Base. */ + if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg, + ctxt, ops)) != X86EMUL_OKAY || + /* Write (32-bit) base into SHADOW_GS. */ + (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base, + ctxt)) != X86EMUL_OKAY ) + goto done; + sreg.base = dst.orig_val; /* Reinstate full GS Base. */ + if ( (rc = ops->write_segment(x86_seg_gs, &sreg, + ctxt)) != X86EMUL_OKAY ) + { + /* Best effort unwind (i.e. no real error checking). */ + if ( ops->write_msr(MSR_SHADOW_GS_BASE, msr_val, + ctxt) == X86EMUL_EXCEPTION ) + x86_emul_reset_event(ctxt); + goto done; + } break; } break; --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -307,7 +307,10 @@ XEN_CPUFEATURE(CMPCCXADD, 10*32+ 7) / XEN_CPUFEATURE(FZRM, 10*32+10) /*A Fast Zero-length REP MOVSB */ XEN_CPUFEATURE(FSRS, 10*32+11) /*A Fast Short REP STOSB */ XEN_CPUFEATURE(FSRCS, 10*32+12) /*A Fast Short REP CMPSB/SCASB */ +XEN_CPUFEATURE(FRED, 10*32+17) /* Flexible Return and Event Delivery */ +XEN_CPUFEATURE(LKGS, 10*32+18) /*s Load Kernel GS Base */ XEN_CPUFEATURE(WRMSRNS, 10*32+19) /*S WRMSR Non-Serialising */ +XEN_CPUFEATURE(NMI_SRC, 10*32+20) /* NMI-source reporting */ XEN_CPUFEATURE(AMX_FP16, 10*32+21) /* AMX FP16 instruction */ XEN_CPUFEATURE(AVX_IFMA, 10*32+23) /*A AVX-IFMA Instructions */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -274,7 +274,8 @@ def crunch_numbers(state): # superpages, PCID and PKU are only available in 4 level paging. # NO_LMSL indicates the absense of Long Mode Segment Limits, which # have been dropped in hardware. - LM: [CX16, PCID, LAHF_LM, PAGE1GB, PKU, NO_LMSL, AMX_TILE, CMPCCXADD], + LM: [CX16, PCID, LAHF_LM, PAGE1GB, PKU, NO_LMSL, AMX_TILE, CMPCCXADD, + LKGS], # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the # standard 3DNow in the earlier K6 processors. @@ -343,6 +344,9 @@ def crunch_numbers(state): # computational instructions. All further AMX features are built on top # of AMX-TILE. AMX_TILE: [AMX_BF16, AMX_INT8, AMX_FP16, AMX_COMPLEX], + + # FRED builds on the LKGS instruction. + LKGS: [FRED], } deep_features = tuple(sorted(deps.keys())) From patchwork Mon Sep 30 11:58:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 13815931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 87377CF6497 for ; Mon, 30 Sep 2024 11:58:41 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.807312.1218683 (Exim 4.92) (envelope-from ) id 1svF2v-0008Kw-Pg; Mon, 30 Sep 2024 11:58:33 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 807312.1218683; Mon, 30 Sep 2024 11:58:33 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF2v-0008Kp-MQ; Mon, 30 Sep 2024 11:58:33 +0000 Received: by outflank-mailman (input) for mailman id 807312; Mon, 30 Sep 2024 11:58:32 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF2u-0008J1-Oc for xen-devel@lists.xenproject.org; Mon, 30 Sep 2024 11:58:32 +0000 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [2a00:1450:4864:20::62f]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 501e7f40-7f23-11ef-a0ba-8be0dac302b0; Mon, 30 Sep 2024 13:58:31 +0200 (CEST) Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-a8d6ac24a3bso793142166b.1 for ; Mon, 30 Sep 2024 04:58:31 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c299af9esm519986366b.216.2024.09.30.04.58.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 04:58:30 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 501e7f40-7f23-11ef-a0ba-8be0dac302b0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1727697511; x=1728302311; darn=lists.xenproject.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=Dktc4kuIueOUcCc4ig9K+pf1aA2fhfDgPtIRsL/YWI4=; b=VDap9SfkSESTZpOulDFsnhJX+/A1VrBAW/LYPk5vClVNKiKeWr6LdgQFufigFJWIvD QBtXF5p+Q6/kdrIPRZiqiB8sIG6C7f7yXgZnAmGyIoOo8yIOM5kWV0P+rVALpb9uILhR XliKCafww5it2/0mPMDJ9oZw3co1XxNwRPxeEp5x7ObFz+zoKhy6c2+fnV3alFIjCfXm 12LnwnYiggSlXRi4uVOJM5LrFCBEwM44tsfuc1pUSHN6vxJnZ9Zr25oVRjoAm2ZIoSpB URwn3vx+L4RHc9udvRTY+QadFbME4ZWaGkLQUNT1iqijaglSuyy7FKfl59RLMNWnSqjA WRpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727697511; x=1728302311; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Dktc4kuIueOUcCc4ig9K+pf1aA2fhfDgPtIRsL/YWI4=; b=b4nknsJBBODwG9MPYrethxaMRAO3icjA4xgoAAKU22YuxySyFXnhqux3/geLwFf0Xq HsclV2iP0jMZ8X5hxjMPRmADvRmbzHBHiQ/7zzNMOvcDgqDbKl7UBI1zae1axjOD9qS2 8V7UjcShVCovj6SOkCC1S4I6jImNP1UkZ1VJj4wl/H6rzmZkuKL9/imZiBMh3r/V/gfM Xw84H4XKgGAdonSxr4ntZVj3qP5PoN/EYqyqCzC5kgYu/5GxTdzrhYUrfQ4V3VthT+tJ 9uz7iDjshBMlV7otbvea2o/r2pSSnic+rHUDJM158/6iiNbI6cyT+pNZ2tM9SZuX7e5B ZE8w== X-Gm-Message-State: AOJu0YwP6Q35aD3ogywJUHrkBHt9LkVgXrHe4izYGXrI17yJ440AGYRH 4xzWscb1uRejokDe3yqJdVCJacNzmZ0taidKrLf0xYe0QDRThD/pny6jattz0RRhkQvTkjROc/4 = X-Google-Smtp-Source: AGHT+IFgv0LYCY7sAqZeOwYvJkSFMj7c0xeDlYsvMdS7YTyjn4kirgdc6FOcGVV+oJkr/ZtnkOlRPQ== X-Received: by 2002:a17:907:d1a:b0:a8d:592d:f56 with SMTP id a640c23a62f3a-a93b167a9ffmr1588593466b.31.1727697510979; Mon, 30 Sep 2024 04:58:30 -0700 (PDT) Message-ID: <3b8403dd-3b0c-4e00-8e20-be39178f0af6@suse.com> Date: Mon, 30 Sep 2024 13:58:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v6 2/5] x86emul+VMX: support {RD,WR}MSRLIST From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= References: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> These are "compound" instructions to issue a series of RDMSR / WRMSR respectively. In the emulator we can therefore implement them by using the existing msr_{read,write}() hooks. The memory accesses utilize that the HVM ->read() / ->write() hooks are already linear-address (x86_seg_none) aware (by way of hvmemul_virtual_to_linear() handling this case). Preemption is being checked for in WRMSRLIST handling only, as only MSR writes are expected to possibly take long. Signed-off-by: Jan Beulich --- RFC: In vmx_vmexit_handler() handling is forwarded to the emulator blindly. Alternatively we could consult the exit qualification and process just a single MSR at a time (without involving the emulator), exiting back to the guest after every iteration. (I don't think a mix of both models makes a lot of sense.) The precise behavior of MSR_BARRIER is still not spelled out in ISE 050, so the (minimal) implementation continues to be a guess for now. Wouldn't calculate_hvm_max_policy() for MPX better behave the same way as done here, at least from an abstract perspective (assuming that AMD won't add such functionality now that Intel have deprecated it)? --- v6: Use MSR constants in test harness. Re-base. v5: Add missing vmx_init_vmcs_config() and construct_vmcs() adjustments. Avoid unnecessary uses of r(). Re-base. v3: Add dependency on LM. Limit exposure to HVM. Utilize new info from ISE 050. Re-base. v2: Use X86_EXC_*. Add preemption checking to WRMSRLIST handling. Remove the feature from "max" when the VMX counterpart isn't available. --- a/tools/tests/x86_emulator/predicates.c +++ b/tools/tests/x86_emulator/predicates.c @@ -342,6 +342,8 @@ static const struct { { { 0x01, 0xc4 }, { 2, 2 }, F, N }, /* vmxoff */ { { 0x01, 0xc5 }, { 2, 2 }, F, N }, /* pconfig */ { { 0x01, 0xc6 }, { 2, 2 }, F, N }, /* wrmsrns */ + { { 0x01, 0xc6 }, { 0, 2 }, F, W, pfx_f2 }, /* rdmsrlist */ + { { 0x01, 0xc6 }, { 0, 2 }, F, R, pfx_f3 }, /* wrmsrlist */ { { 0x01, 0xc8 }, { 2, 2 }, F, N }, /* monitor */ { { 0x01, 0xc9 }, { 2, 2 }, F, N }, /* mwait */ { { 0x01, 0xca }, { 2, 2 }, F, N }, /* clac */ --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -625,7 +625,7 @@ static int write( if ( verbose ) printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes); - if ( !is_x86_user_segment(seg) ) + if ( !is_x86_user_segment(seg) && seg != x86_seg_none ) return X86EMUL_UNHANDLEABLE; memcpy((void *)offset, p_data, bytes); return X86EMUL_OKAY; @@ -717,6 +717,10 @@ static int read_msr( { switch ( reg ) { + case MSR_BARRIER: + *val = 0; + return X86EMUL_OKAY; + case MSR_EFER: *val = ctxt->addr_size > 32 ? EFER_LME | EFER_LMA : 0; return X86EMUL_OKAY; @@ -1434,9 +1438,53 @@ int main(int argc, char **argv) (gs_base != 0x0000111122224444UL) || gs_base_shadow ) goto fail; + printf("okay\n"); cpu_policy.extd.nscb = i; emulops.write_segment = NULL; + + printf("%-40s", "Testing rdmsrlist..."); + instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6; + regs.rip = (unsigned long)&instr[0]; + regs.rsi = (unsigned long)(res + 0x80); + regs.rdi = (unsigned long)(res + 0x80 + 0x40 * 2); + regs.rcx = 0x0002000100008000UL; + gs_base_shadow = 0x0000222244446666UL; + memset(res + 0x80, ~0, 0x40 * 8 * 2); + res[0x80 + 0x0f * 2] = MSR_GS_BASE; + res[0x80 + 0x0f * 2 + 1] = 0; + res[0x80 + 0x20 * 2] = MSR_SHADOW_GS_BASE; + res[0x80 + 0x20 * 2 + 1] = 0; + res[0x80 + 0x31 * 2] = MSR_BARRIER; + res[0x80 + 0x31 * 2 + 1] = 0; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[4]) || + regs.rcx || + (res[0x80 + (0x40 + 0x0f) * 2] != (unsigned int)gs_base) || + (res[0x80 + (0x40 + 0x0f) * 2 + 1] != (gs_base >> (8 * sizeof(int)))) || + (res[0x80 + (0x40 + 0x20) * 2] != (unsigned int)gs_base_shadow) || + (res[0x80 + (0x40 + 0x20) * 2 + 1] != (gs_base_shadow >> (8 * sizeof(int)))) || + res[0x80 + (0x40 + 0x31) * 2] || res[0x80 + (0x40 + 0x31) * 2 + 1] ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing wrmsrlist..."); + instr[0] = 0xf3; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6; + regs.eip = (unsigned long)&instr[0]; + regs.rsi -= 0x11 * 8; + regs.rdi -= 0x11 * 8; + regs.rcx = 0x0002000100000000UL; + res[0x80 + 0x0f * 2] = MSR_SHADOW_GS_BASE; + res[0x80 + 0x20 * 2] = MSR_GS_BASE; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[4]) || + regs.rcx || + (gs_base != 0x0000222244446666UL) || + (gs_base_shadow != 0x0000111122224444UL) ) + goto fail; + emulops.write_msr = NULL; #endif printf("okay\n"); --- a/tools/tests/x86_emulator/x86-emulate.c +++ b/tools/tests/x86_emulator/x86-emulate.c @@ -87,6 +87,7 @@ bool emul_test_init(void) cpu_policy.feat.rdpid = true; cpu_policy.feat.lkgs = true; cpu_policy.feat.wrmsrns = true; + cpu_policy.feat.msrlist = true; cpu_policy.extd.clzero = true; if ( cpu_has_xsave ) --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -745,6 +745,9 @@ static void __init calculate_hvm_max_pol __clear_bit(X86_FEATURE_XSAVES, fs); } + if ( !cpu_has_vmx_msrlist ) + __clear_bit(X86_FEATURE_MSRLIST, fs); + /* * Xen doesn't use PKS, so the guest support for it has opted to not use * the VMCS load/save controls for efficiency reasons. This depends on --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -366,7 +366,8 @@ static int vmx_init_vmcs_config(bool bsp if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS ) { - uint64_t opt = TERTIARY_EXEC_VIRT_SPEC_CTRL; + uint64_t opt = TERTIARY_EXEC_ENABLE_MSRLIST | + TERTIARY_EXEC_VIRT_SPEC_CTRL; _vmx_tertiary_exec_control = adjust_vmx_controls2( "Tertiary Exec Control", 0, opt, @@ -1119,7 +1120,8 @@ static int construct_vmcs(struct vcpu *v v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING; v->arch.hvm.vmx.secondary_exec_control = vmx_secondary_exec_control; - v->arch.hvm.vmx.tertiary_exec_control = vmx_tertiary_exec_control; + v->arch.hvm.vmx.tertiary_exec_control = vmx_tertiary_exec_control & + ~TERTIARY_EXEC_ENABLE_MSRLIST; /* * Disable features which we don't want active by default: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -871,6 +871,20 @@ static void cf_check vmx_cpuid_policy_ch else vmx_set_msr_intercept(v, MSR_PKRS, VMX_MSR_RW); + if ( cp->feat.msrlist ) + { + vmx_clear_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW); + v->arch.hvm.vmx.tertiary_exec_control |= TERTIARY_EXEC_ENABLE_MSRLIST; + vmx_update_tertiary_exec_control(v); + } + else if ( v->arch.hvm.vmx.tertiary_exec_control & + TERTIARY_EXEC_ENABLE_MSRLIST ) + { + vmx_set_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW); + v->arch.hvm.vmx.tertiary_exec_control &= ~TERTIARY_EXEC_ENABLE_MSRLIST; + vmx_update_tertiary_exec_control(v); + } + out: vmx_vmcs_exit(v); @@ -3732,6 +3746,22 @@ gp_fault: return X86EMUL_EXCEPTION; } +static bool cf_check is_msrlist( + const struct x86_emulate_state *state, const struct x86_emulate_ctxt *ctxt) +{ + + if ( ctxt->opcode == X86EMUL_OPC(0x0f, 0x01) ) + { + unsigned int rm, reg; + int mode = x86_insn_modrm(state, &rm, ®); + + /* This also includes WRMSRNS; should be okay. */ + return mode == 3 && rm == 6 && !reg; + } + + return false; +} + static void vmx_do_extint(struct cpu_user_regs *regs) { unsigned long vector; @@ -4539,6 +4569,17 @@ void asmlinkage vmx_vmexit_handler(struc } break; + case EXIT_REASON_RDMSRLIST: + case EXIT_REASON_WRMSRLIST: + if ( vmx_guest_x86_mode(v) != 8 || !currd->arch.cpuid->feat.msrlist ) + { + ASSERT_UNREACHABLE(); + hvm_inject_hw_exception(X86_EXC_UD, X86_EVENT_NO_EC); + } + else if ( !hvm_emulate_one_insn(is_msrlist, "MSR list") ) + hvm_inject_hw_exception(X86_EXC_GP, 0); + break; + case EXIT_REASON_VMXOFF: case EXIT_REASON_VMXON: case EXIT_REASON_VMCLEAR: --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -267,6 +267,7 @@ extern u32 vmx_secondary_exec_control; #define TERTIARY_EXEC_EPT_PAGING_WRITE BIT(2, UL) #define TERTIARY_EXEC_GUEST_PAGING_VERIFY BIT(3, UL) #define TERTIARY_EXEC_IPI_VIRT BIT(4, UL) +#define TERTIARY_EXEC_ENABLE_MSRLIST BIT(6, UL) #define TERTIARY_EXEC_VIRT_SPEC_CTRL BIT(7, UL) extern uint64_t vmx_tertiary_exec_control; @@ -391,6 +392,9 @@ extern u64 vmx_ept_vpid_cap; #define cpu_has_vmx_notify_vm_exiting \ (IS_ENABLED(CONFIG_INTEL_VMX) && \ vmx_secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING) +#define cpu_has_vmx_msrlist \ + (IS_ENABLED(CONFIG_INTEL_VMX) && \ + (vmx_tertiary_exec_control & TERTIARY_EXEC_ENABLE_MSRLIST)) #define VMCS_RID_TYPE_MASK 0x80000000U --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h @@ -201,6 +201,8 @@ static inline void pi_clear_sn(struct pi #define EXIT_REASON_XRSTORS 64 #define EXIT_REASON_BUS_LOCK 74 #define EXIT_REASON_NOTIFY 75 +#define EXIT_REASON_RDMSRLIST 78 +#define EXIT_REASON_WRMSRLIST 79 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */ /* --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -24,6 +24,8 @@ #define APIC_BASE_ENABLE (_AC(1, ULL) << 11) #define APIC_BASE_ADDR_MASK _AC(0x000ffffffffff000, ULL) +#define MSR_BARRIER 0x0000002f + #define MSR_TEST_CTRL 0x00000033 #define TEST_CTRL_SPLITLOCK_DETECT (_AC(1, ULL) << 29) #define TEST_CTRL_SPLITLOCK_DISABLE (_AC(1, ULL) << 31) --- a/xen/arch/x86/include/asm/perfc_defn.h +++ b/xen/arch/x86/include/asm/perfc_defn.h @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, #ifdef CONFIG_HVM -#define VMX_PERF_EXIT_REASON_SIZE 76 +#define VMX_PERF_EXIT_REASON_SIZE 80 #define VMEXIT_NPF_PERFC 143 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1) PERFCOUNTER_ARRAY(vmexits, "vmexits", --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -74,6 +74,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t case MSR_AMD_PPIN: goto gp_fault; + case MSR_BARRIER: + if ( !cp->feat.msrlist ) + goto gp_fault; + *val = 0; + break; + case MSR_IA32_FEATURE_CONTROL: /* * Architecturally, availability of this MSR is enumerated by the @@ -347,6 +353,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t uint64_t rsvd; /* Read-only */ + case MSR_BARRIER: case MSR_IA32_PLATFORM_ID: case MSR_CORE_CAPABILITIES: case MSR_INTEL_CORE_THREAD_COUNT: --- a/xen/arch/x86/x86_emulate/0f01.c +++ b/xen/arch/x86/x86_emulate/0f01.c @@ -11,6 +11,7 @@ #include "private.h" #ifdef __XEN__ +#include #include #endif @@ -28,6 +29,7 @@ int x86emul_0f01(struct x86_emulate_stat switch ( s->modrm ) { unsigned long base, limit, cr0, cr0w, cr4; + unsigned int n; struct segment_register sreg; uint64_t msr_val; @@ -42,6 +44,64 @@ int x86emul_0f01(struct x86_emulate_stat ((uint64_t)regs->r(dx) << 32) | regs->eax, ctxt); goto done; + + case vex_f3: /* wrmsrlist */ + vcpu_must_have(msrlist); + generate_exception_if(!mode_64bit(), X86_EXC_UD); + generate_exception_if(!mode_ring0() || (regs->esi & 7) || + (regs->edi & 7), + X86_EXC_GP, 0); + fail_if(!ops->write_msr); + while ( regs->r(cx) ) + { + n = __builtin_ffsl(regs->r(cx)) - 1; + if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8, + &msr_val, 8, ctxt)) != X86EMUL_OKAY ) + break; + generate_exception_if(msr_val != (uint32_t)msr_val, + X86_EXC_GP, 0); + base = msr_val; + if ( (rc = ops->read(x86_seg_none, regs->r(di) + n * 8, + &msr_val, 8, ctxt)) != X86EMUL_OKAY || + (rc = ops->write_msr(base, msr_val, ctxt)) != X86EMUL_OKAY ) + break; + regs->r(cx) &= ~(1UL << n); + +#ifdef __XEN__ + if ( regs->r(cx) && local_events_need_delivery() ) + { + rc = X86EMUL_RETRY; + break; + } +#endif + } + goto done; + + case vex_f2: /* rdmsrlist */ + vcpu_must_have(msrlist); + generate_exception_if(!mode_64bit(), X86_EXC_UD); + generate_exception_if(!mode_ring0() || (regs->esi & 7) || + (regs->edi & 7), + X86_EXC_GP, 0); + fail_if(!ops->read_msr || !ops->write); + while ( regs->r(cx) ) + { + n = __builtin_ffsl(regs->r(cx)) - 1; + if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8, + &msr_val, 8, ctxt)) != X86EMUL_OKAY ) + break; + generate_exception_if(msr_val != (uint32_t)msr_val, + X86_EXC_GP, 0); + if ( (rc = ops->read_msr(msr_val, &msr_val, + ctxt)) != X86EMUL_OKAY || + (rc = ops->write(x86_seg_none, regs->r(di) + n * 8, + &msr_val, 8, ctxt)) != X86EMUL_OKAY ) + break; + regs->r(cx) &= ~(1UL << n); + } + if ( rc != X86EMUL_OKAY ) + ctxt->regs->r(cx) = regs->r(cx); + goto done; } generate_exception(X86_EXC_UD); --- a/xen/arch/x86/x86_emulate/private.h +++ b/xen/arch/x86/x86_emulate/private.h @@ -597,6 +597,7 @@ amd_like(const struct x86_emulate_ctxt * #define vcpu_has_lkgs() (ctxt->cpuid->feat.lkgs) #define vcpu_has_wrmsrns() (ctxt->cpuid->feat.wrmsrns) #define vcpu_has_avx_ifma() (ctxt->cpuid->feat.avx_ifma) +#define vcpu_has_msrlist() (ctxt->cpuid->feat.msrlist) #define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8) #define vcpu_has_avx_ne_convert() (ctxt->cpuid->feat.avx_ne_convert) #define vcpu_has_avx_vnni_int16() (ctxt->cpuid->feat.avx_vnni_int16) --- a/xen/arch/x86/x86_emulate/util.c +++ b/xen/arch/x86/x86_emulate/util.c @@ -100,6 +100,9 @@ bool cf_check x86_insn_is_mem_access(con break; case X86EMUL_OPC(0x0f, 0x01): + /* {RD,WR}MSRLIST */ + if ( mode_64bit() && s->modrm == 0xc6 ) + return s->vex.pfx >= vex_f3; /* Cover CLZERO. */ return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7; } @@ -160,7 +163,11 @@ bool cf_check x86_insn_is_mem_write(cons case 0xff: /* Grp5 */ break; - case X86EMUL_OPC(0x0f, 0x01): /* CLZERO is the odd one. */ + case X86EMUL_OPC(0x0f, 0x01): + /* RDMSRLIST */ + if ( mode_64bit() && s->modrm == 0xc6 ) + return s->vex.pfx == vex_f2; + /* CLZERO is another odd one. */ return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7; default: --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -313,6 +313,7 @@ XEN_CPUFEATURE(WRMSRNS, 10*32+19) / XEN_CPUFEATURE(NMI_SRC, 10*32+20) /* NMI-source reporting */ XEN_CPUFEATURE(AMX_FP16, 10*32+21) /* AMX FP16 instruction */ XEN_CPUFEATURE(AVX_IFMA, 10*32+23) /*A AVX-IFMA Instructions */ +XEN_CPUFEATURE(MSRLIST, 10*32+27) /*s MSR list instructions */ /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */ XEN_CPUFEATURE(NO_NEST_BP, 11*32+ 0) /*A No Nested Data Breakpoints */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -275,7 +275,7 @@ def crunch_numbers(state): # NO_LMSL indicates the absense of Long Mode Segment Limits, which # have been dropped in hardware. LM: [CX16, PCID, LAHF_LM, PAGE1GB, PKU, NO_LMSL, AMX_TILE, CMPCCXADD, - LKGS], + LKGS, MSRLIST], # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the # standard 3DNow in the earlier K6 processors. From patchwork Mon Sep 30 11:59:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 13815932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 99254CF6499 for ; Mon, 30 Sep 2024 11:59:27 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.807317.1218693 (Exim 4.92) (envelope-from ) id 1svF3g-0000TV-1r; Mon, 30 Sep 2024 11:59:20 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 807317.1218693; Mon, 30 Sep 2024 11:59:20 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF3f-0000TM-VW; Mon, 30 Sep 2024 11:59:19 +0000 Received: by outflank-mailman (input) for mailman id 807317; Mon, 30 Sep 2024 11:59:18 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF3e-0008Ek-CV for xen-devel@lists.xenproject.org; Mon, 30 Sep 2024 11:59:18 +0000 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [2a00:1450:4864:20::234]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 6ae61580-7f23-11ef-99a2-01e77a169b0f; Mon, 30 Sep 2024 13:59:16 +0200 (CEST) Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-2fad100dd9fso9314411fa.3 for ; Mon, 30 Sep 2024 04:59:16 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c2776878sm520190766b.28.2024.09.30.04.59.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 04:59:15 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 6ae61580-7f23-11ef-99a2-01e77a169b0f DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1727697556; x=1728302356; darn=lists.xenproject.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=ENniFr9qlLBeT3V1gIuC9oU1+s63zDimPcM+gsH4aVE=; b=HrXEKi4QL9vxJFBLUq7RSTsDjyKrvUoJAdtcxvwEiZgTIfeyDcxf43KjGYO12U9Gcq rrx9ny5tvjV6O06u9dHUB0sUZ89rVtHRpVv1sWg2NuoLxx0DR3T+97RSZsmNHrTC8Du0 vPL2mstCNVB0bbXctpGEaD96IUdARQoXtmGlwe9dxTebcmGJa4uSl8UpZzs8x47AUoNs +a78b6C3pJGN9MvOqqwgZNirDrA6KZ1FwFHMSUIYnasp74DZbH6ppc9kML8k3chJWHTe JHC+j+SEEOY5Lbf24wspe8EHQ8mJAb3+FszRTqcYVR7Y1WSuV2P+CE5/UCUtVZokNqPk Hubw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727697556; x=1728302356; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ENniFr9qlLBeT3V1gIuC9oU1+s63zDimPcM+gsH4aVE=; b=jOBzKGH4DTu+wnDR/QENHExl9pcBf2U8/zo8lrVutU0RCU9u8jN1M/wkMr4wFITqrC z3FpYGc5W18qiDmWuTGHT46hnp+FtnUg7hRgRugvhVLp7a2FPC1djf5trF67zrlgrQyb GVUypRmNuOhcGvcZxLcWRy4nHb9csuXSNkQXyEMKhP/+ZfJ5QXCiIMeleKAXnl3haLY1 0aRxY3k01p3/LKRgRpRVYXuZPqYXPPcBQItapSjfS6bSPDMSrPWTpyLnhnNTS8GPi+kZ ZDp2rJnwJ2jXWWaU9NYxALWjkTuaBAa31eCpb2eqePgtkKlFUTKEch5/hfEjR6R6qDTy Ko6A== X-Gm-Message-State: AOJu0YwW6QWBdnaXo1x7a28YfXYEQm/wLPargMIFwG11edApw5/wKqB9 g2AEb2qvDDS+RykSTc89TI77XKHzCQtIgDfnb2wUghg0fJkNddGgr/0O+5gFj9evihTlfYwJa68 = X-Google-Smtp-Source: AGHT+IGhXT8EFRJLUa1azU+EGZPcrwn6e2lD2NySOEp4b4NNLd5sfV2JfWtgJlB9MCcvRkdB3FWL9w== X-Received: by 2002:a2e:f19:0:b0:2f7:4c9d:7a8c with SMTP id 38308e7fff4ca-2f9d3f90e0amr60368041fa.29.1727697555721; Mon, 30 Sep 2024 04:59:15 -0700 (PDT) Message-ID: Date: Mon, 30 Sep 2024 13:59:14 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v6 3/5] x86emul: support USER_MSR instructions From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= References: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> While UWRMSR probably isn't of much use as long as we don't support UINTR, URDMSR may well be useful to guests even without that (depending on what OSes are willing to permit access to). Since the two VEX encodings introduce a lonely opcode point in map 7, for now don't bother introducing a full 256-entry table. Signed-off-by: Jan Beulich --- The retaining of (possible) #PF from the bitmap access is "speculative" (the spec doesn't mention #PF as a possible exception; conceivably this might also need converting to #GP). I'm a little wary of the "MSRs Writeable by UWRMSR" table that the spec has, and that our code thus also enforces: As new MSRs are added to that table, we'll need piecemeal updates to that switch() statement. --- v6: Add MSR_UINTR_TIMER to header. Use MSR constants in test harness. Re-base. v5: Correct ModR/M.reg check for VEX-encoded forms. Cosmetic test harness adjustment. Re-base. v4: MSR index input regs are 64-bit (albeit only the APX spec has it this way for now). v3: New. --- a/tools/tests/x86_emulator/predicates.c +++ b/tools/tests/x86_emulator/predicates.c @@ -864,7 +864,9 @@ static const struct { { { 0xf6 }, { 2, 2 }, T, R, pfx_66 }, /* adcx */ { { 0xf6 }, { 2, 2 }, T, R, pfx_f3 }, /* adox */ { { 0xf8 }, { 2, 2 }, F, W, pfx_66 }, /* movdir64b */ + { { 0xf8, 0xc0 }, { 0, 2 }, F, N, pfx_f3 }, /* uwrmsr */ { { 0xf8 }, { 2, 2 }, F, W, pfx_f3 }, /* enqcmds */ + { { 0xf8, 0xc0 }, { 0, 2 }, F, N, pfx_f2 }, /* urdmsr */ { { 0xf8 }, { 2, 2 }, F, W, pfx_f2 }, /* enqcmd */ { { 0xf9 }, { 2, 2 }, F, W }, /* movdiri */ }; @@ -1516,6 +1518,9 @@ static const struct vex { { { 0xde }, 3, T, R, pfx_66, W0, L0 }, /* vsm3rnds2 */ { { 0xdf }, 3, T, R, pfx_66, WIG, Ln }, /* vaeskeygenassist */ { { 0xf0 }, 3, T, R, pfx_f2, Wn, L0 }, /* rorx */ +}, vex_map7[] = { + { { 0xf8, 0xc0 }, 6, F, N, pfx_f3, W0, L0 }, /* uwrmsr */ + { { 0xf8, 0xc0 }, 6, F, N, pfx_f2, W0, L0 }, /* urdmsr */ }; static const struct { @@ -1525,6 +1530,10 @@ static const struct { { vex_0f, ARRAY_SIZE(vex_0f) }, { vex_0f38, ARRAY_SIZE(vex_0f38) }, { vex_0f3a, ARRAY_SIZE(vex_0f3a) }, + { NULL, 0 }, /* map 4 */ + { NULL, 0 }, /* map 5 */ + { NULL, 0 }, /* map 6 */ + { vex_map7, ARRAY_SIZE(vex_map7) }, }; static const struct xop { @@ -2425,7 +2434,8 @@ void predicates_test(void *instr, struct if ( vex[x].tbl[t].w == WIG || (vex[x].tbl[t].w & W0) ) { - memcpy(ptr, vex[x].tbl[t].opc, vex[x].tbl[t].len); + memcpy(ptr, vex[x].tbl[t].opc, + MIN(vex[x].tbl[t].len, ARRAY_SIZE(vex->tbl->opc))); if ( vex[x].tbl[t].l == LIG || (vex[x].tbl[t].l & L0) ) do_test(instr, vex[x].tbl[t].len + ((void *)ptr - instr), @@ -2435,7 +2445,8 @@ void predicates_test(void *instr, struct if ( vex[x].tbl[t].l == LIG || (vex[x].tbl[t].l & L1) ) { ptr[-1] |= 4; - memcpy(ptr, vex[x].tbl[t].opc, vex[x].tbl[t].len); + memcpy(ptr, vex[x].tbl[t].opc, + MIN(vex[x].tbl[t].len, ARRAY_SIZE(vex->tbl->opc))); do_test(instr, vex[x].tbl[t].len + ((void *)ptr - instr), vex[x].tbl[t].modrm ? (void *)ptr - instr + 1 : 0, @@ -2446,7 +2457,8 @@ void predicates_test(void *instr, struct if ( vex[x].tbl[t].w == WIG || (vex[x].tbl[t].w & W1) ) { ptr[-1] = 0xf8 | vex[x].tbl[t].pfx; - memcpy(ptr, vex[x].tbl[t].opc, vex[x].tbl[t].len); + memcpy(ptr, vex[x].tbl[t].opc, + MIN(vex[x].tbl[t].len, ARRAY_SIZE(vex->tbl->opc))); if ( vex[x].tbl[t].l == LIG || (vex[x].tbl[t].l & L0) ) do_test(instr, vex[x].tbl[t].len + ((void *)ptr - instr), @@ -2456,7 +2468,8 @@ void predicates_test(void *instr, struct if ( vex[x].tbl[t].l == LIG || (vex[x].tbl[t].l & L1) ) { ptr[-1] |= 4; - memcpy(ptr, vex[x].tbl[t].opc, vex[x].tbl[t].len); + memcpy(ptr, vex[x].tbl[t].opc, + MIN(vex[x].tbl[t].len, ARRAY_SIZE(vex->tbl->opc))); do_test(instr, vex[x].tbl[t].len + ((void *)ptr - instr), vex[x].tbl[t].modrm ? (void *)ptr - instr + 1 : 0, --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -674,6 +674,7 @@ static int blk( #ifdef __x86_64__ static unsigned long gs_base, gs_base_shadow; +static unsigned long uintr_timer; #endif static int read_segment( @@ -708,6 +709,15 @@ static int write_segment( return X86EMUL_OKAY; } + +static const uint8_t __attribute__((aligned(0x1000))) umsr_bitmap[0x1000] = { +#define RD(msr) [(msr) >> 3] = 1 << ((msr) & 7) +#define WR(msr) [0x800 + ((msr) >> 3)] = 1 << ((msr) & 7) + RD(MSR_IA32_APERF), + WR(MSR_UINTR_TIMER), +#undef WR +#undef RD +}; #endif static int read_msr( @@ -717,10 +727,22 @@ static int read_msr( { switch ( reg ) { +#ifdef __x86_64__ + case MSR_USER_MSR_CTL: + *val = (unsigned long)umsr_bitmap | 1; + return X86EMUL_OKAY; +#endif + case MSR_BARRIER: *val = 0; return X86EMUL_OKAY; + case MSR_IA32_APERF: +#define APERF_LO_VALUE 0xAEAEAEAE +#define APERF_HI_VALUE 0xEAEAEAEA + *val = ((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE; + return X86EMUL_OKAY; + case MSR_EFER: *val = ctxt->addr_size > 32 ? EFER_LME | EFER_LMA : 0; return X86EMUL_OKAY; @@ -756,6 +778,12 @@ static int write_msr( { switch ( reg ) { + case MSR_UINTR_TIMER: + if ( ctxt->addr_size < 64 ) + break; + uintr_timer = val; + return X86EMUL_OKAY; + case MSR_GS_BASE: if ( ctxt->addr_size < 64 || !is_canonical_address(val) ) break; @@ -1484,6 +1512,63 @@ int main(int argc, char **argv) (gs_base != 0x0000222244446666UL) || (gs_base_shadow != 0x0000111122224444UL) ) goto fail; + printf("okay\n"); + + printf("%-40s", "Testing urdmsr %rdx,%rcx..."); + instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8; instr[4] = 0xd1; + regs.rip = (unsigned long)&instr[0]; + regs.rdx = 0x000000e8UL; /* APERF */ + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[5]) || + (regs.rcx != (((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE)) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing urdmsr $0xe8,%rdx..."); + instr[0] = 0xc4; instr[1] = 0xe7; instr[2] = 0x7b; instr[3] = 0xf8; instr[4] = 0xc2; + instr[5] = 0xe8; instr[6] = 0; instr[7] = 0; instr[8] = 0; + regs.rip = (unsigned long)&instr[0]; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[9]) || + (regs.rdx != (((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE)) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing uwrmsr %rdi,%rsi..."); + instr[0] = 0xf3; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8; instr[4] = 0xf7; + regs.rip = (unsigned long)&instr[0]; + regs.rsi = 0x00001b00UL; /* UINTR_TIMER */ + regs.rdi = 0x0011223344556677UL; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[5]) || + (uintr_timer != 0x0011223344556677UL) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing uwrmsr %rsi,$0x1b00..."); + instr[0] = 0xc4; instr[1] = 0xe7; instr[2] = 0x7a; instr[3] = 0xf8; instr[4] = 0xc6; + instr[5] = 0x00; instr[6] = 0x1b; instr[7] = 0; instr[8] = 0; + regs.rip = (unsigned long)&instr[0]; + regs.rsi = 0x8877665544332211UL; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || + (regs.rip != (unsigned long)&instr[9]) || + (uintr_timer != 0x8877665544332211UL) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing uwrmsr %rsi,$0x1b01..."); + instr[5] = 0x01; /* UARCH_MISC_CTL (derived from UINTR_TIMER) */ + regs.rip = (unsigned long)&instr[0]; + regs.rsi = 0; + rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_EXCEPTION) || + (regs.rip != (unsigned long)&instr[0]) || + (uintr_timer != 0x8877665544332211UL) ) + goto fail; emulops.write_msr = NULL; #endif --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -24,6 +24,10 @@ #define APIC_BASE_ENABLE (_AC(1, ULL) << 11) #define APIC_BASE_ADDR_MASK _AC(0x000ffffffffff000, ULL) +#define MSR_USER_MSR_CTL 0x0000001c +#define USER_MSR_ENABLE (_AC(1, ULL) << 0) +#define USER_MSR_ADDR_MASK 0xfffffffffffff000ULL + #define MSR_BARRIER 0x0000002f #define MSR_TEST_CTRL 0x00000033 @@ -192,6 +196,8 @@ #define MCU_CONTROL_DIS_MCU_LOAD (_AC(1, ULL) << 1) #define MCU_CONTROL_EN_SMM_BYPASS (_AC(1, ULL) << 2) +#define MSR_UINTR_TIMER 0x00001b00 + #define MSR_UARCH_MISC_CTRL 0x00001b01 #define UARCH_CTRL_DOITM (_AC(1, ULL) << 0) --- a/xen/arch/x86/x86_emulate/decode.c +++ b/xen/arch/x86/x86_emulate/decode.c @@ -903,7 +903,7 @@ decode_0f38(struct x86_emulate_state *s, { case 0x00 ... 0xef: case 0xf2 ... 0xf5: - case 0xf7 ... 0xf8: + case 0xf7: case 0xfa ... 0xff: s->op_bytes = 0; /* fall through */ @@ -948,6 +948,18 @@ decode_0f38(struct x86_emulate_state *s, case X86EMUL_OPC_VEX_F2(0, 0xf7): /* shrx */ break; + case 0xf8: + if ( s->modrm_mod == 3 ) /* u{rd,wr}msr */ + { + s->desc = DstMem | SrcReg | Mov; + s->op_bytes = 8; + s->simd_size = simd_none; + } + else /* movdir64b / enqcmd{,s} */ + s->op_bytes = 0; + ctxt->opcode |= MASK_INSR(s->vex.pfx, X86EMUL_OPC_PFX_MASK); + break; + default: s->op_bytes = 0; break; @@ -1246,6 +1258,16 @@ int x86emul_decode(struct x86_emulate_st */ d = twobyte_table[0x38].desc; break; + + case vex_map7: + opcode |= MASK_INSR(7, X86EMUL_OPC_EXT_MASK); + /* + * No table lookup here for now, as there's only a single + * opcode point (0xf8) populated in map 7. + */ + d = DstMem | SrcImm | ModRM | Mov; + s->op_bytes = 8; + break; } } else if ( s->ext < ext_8f08 + ARRAY_SIZE(xop_table) ) @@ -1600,6 +1622,7 @@ int x86emul_decode(struct x86_emulate_st s->simd_size = ext8f09_table[b].simd_size; break; + case ext_map7: case ext_8f08: case ext_8f0a: /* @@ -1814,6 +1837,7 @@ int x86emul_decode(struct x86_emulate_st case ext_map5: case ext_map6: + case ext_map7: case ext_8f09: case ext_8f0a: break; --- a/xen/arch/x86/x86_emulate/private.h +++ b/xen/arch/x86/x86_emulate/private.h @@ -189,6 +189,7 @@ enum vex_opcx { vex_0f3a, evex_map5 = 5, evex_map6, + vex_map7, }; enum vex_pfx { @@ -245,6 +246,7 @@ struct x86_emulate_state { ext_0f3a = vex_0f3a, ext_map5 = evex_map5, ext_map6 = evex_map6, + ext_map7 = vex_map7, /* * For XOP use values such that the respective instruction field * can be used without adjustment. --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -7029,10 +7029,67 @@ x86_emulate( state->simd_size = simd_none; break; - case X86EMUL_OPC_F2(0x0f38, 0xf8): /* enqcmd r,m512 */ - case X86EMUL_OPC_F3(0x0f38, 0xf8): /* enqcmds r,m512 */ + case X86EMUL_OPC_F2(0x0f38, 0xf8): /* enqcmd r,m512 / urdmsr r32,r64 */ + case X86EMUL_OPC_F3(0x0f38, 0xf8): /* enqcmds r,m512 / uwrmsr r64,r32 */ + if ( ea.type == OP_MEM ) + goto enqcmd; + imm1 = src.val; + /* fall through */ + case X86EMUL_OPC_VEX_F2(7, 0xf8): /* urdmsr imm32,r64 */ + case X86EMUL_OPC_VEX_F3(7, 0xf8): /* uwrmsr r64,imm32 */ + generate_exception_if(!mode_64bit() || ea.type != OP_REG, X86_EXC_UD); + generate_exception_if(vex.l || vex.w, X86_EXC_UD); + generate_exception_if(vex.opcx && ((modrm_reg & 7) || vex.reg != 0xf), + X86_EXC_UD); + fail_if(!ops->read_msr); + if ( ops->read_msr(MSR_USER_MSR_CTL, &msr_val, ctxt) != X86EMUL_OKAY ) + { + x86_emul_reset_event(ctxt); + msr_val = 0; + } + generate_exception_if(!(msr_val & USER_MSR_ENABLE), X86_EXC_UD); + generate_exception_if(imm1 & ~0x3fff, X86_EXC_GP, 0); + + /* Check the corresponding bitmap. */ + ea.mem.off = msr_val & ~0xfff; + if ( vex.pfx != vex_f2 ) + ea.mem.off += 0x800; + ea.mem.off += imm1 >> 3; + if ( (rc = ops->read(x86_seg_sys, ea.mem.off, &b, 1, + ctxt)) != X86EMUL_OKAY ) + goto done; + generate_exception_if(!(b & (1 << (imm1 & 7))), X86_EXC_GP, 0); + + /* Carry out the actual MSR access. */ + if ( vex.pfx == vex_f2 ) + { + /* urdmsr */ + if ( (rc = ops->read_msr(imm1, &msr_val, ctxt)) != X86EMUL_OKAY ) + goto done; + dst.val = msr_val; + ASSERT(dst.type == OP_REG); + dst.bytes = 8; + } + else + { + /* uwrmsr */ + switch ( imm1 ) + { + case 0x1b00: /* UINTR_TIMER */ + case 0x1b01: /* UARCH_MISC_CTL */ + break; + default: + generate_exception(X86_EXC_GP, 0); + } + fail_if(!ops->write_msr); + if ( (rc = ops->write_msr(imm1, dst.val, ctxt)) != X86EMUL_OKAY ) + goto done; + dst.type = OP_NONE; + } + break; + + enqcmd: host_and_vcpu_must_have(enqcmd); - generate_exception_if(ea.type != OP_MEM, X86_EXC_UD); generate_exception_if(vex.pfx != vex_f2 && !mode_ring0(), X86_EXC_GP, 0); src.val = truncate_ea(*dst.reg); generate_exception_if(!is_aligned(x86_seg_es, src.val, 64, ctxt, ops), --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -350,6 +350,7 @@ XEN_CPUFEATURE(AVX_NE_CONVERT, 15*32 XEN_CPUFEATURE(AMX_COMPLEX, 15*32+ 8) /* AMX Complex Instructions */ XEN_CPUFEATURE(AVX_VNNI_INT16, 15*32+10) /*A AVX-VNNI-INT16 Instructions */ XEN_CPUFEATURE(PREFETCHI, 15*32+14) /*A PREFETCHIT{0,1} Instructions */ +XEN_CPUFEATURE(USER_MSR, 15*32+15) /* U{RD,WR}MSR Instructions */ XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks safe to use */ /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.eax, word 16 */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -275,7 +275,7 @@ def crunch_numbers(state): # NO_LMSL indicates the absense of Long Mode Segment Limits, which # have been dropped in hardware. LM: [CX16, PCID, LAHF_LM, PAGE1GB, PKU, NO_LMSL, AMX_TILE, CMPCCXADD, - LKGS, MSRLIST], + LKGS, MSRLIST, USER_MSR], # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the # standard 3DNow in the earlier K6 processors. From patchwork Mon Sep 30 11:59:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 13815933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 62924CF6499 for ; Mon, 30 Sep 2024 11:59:49 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.807320.1218702 (Exim 4.92) (envelope-from ) id 1svF43-0000yS-EB; Mon, 30 Sep 2024 11:59:43 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 807320.1218702; Mon, 30 Sep 2024 11:59:43 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF43-0000yL-Bh; Mon, 30 Sep 2024 11:59:43 +0000 Received: by outflank-mailman (input) for mailman id 807320; Mon, 30 Sep 2024 11:59:42 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF42-0008Ek-Eh for xen-devel@lists.xenproject.org; Mon, 30 Sep 2024 11:59:42 +0000 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [2a00:1450:4864:20::531]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 794d64cb-7f23-11ef-99a2-01e77a169b0f; Mon, 30 Sep 2024 13:59:40 +0200 (CEST) Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-5c40aea5c40so7768290a12.0 for ; Mon, 30 Sep 2024 04:59:40 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c882405127sm4417642a12.5.2024.09.30.04.59.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 04:59:40 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 794d64cb-7f23-11ef-99a2-01e77a169b0f DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1727697580; x=1728302380; darn=lists.xenproject.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=vIEJMiqmSBTZ/FiAO2g/0mH+P7xBgHAYPgSV3v/hQKo=; b=GT9mfQ3V2Eo4hLxa+M9Wi+Sse5En5cfGjbB0BlN7FUM8W43lPlOhgFc3TwyiKi5rVQ ayFk4xlSLSvkxfhIvFyferO64DhrgqZucFibbh2gfU1fY5HuPZ3YoNWtEzJtGwDHPv4h dRsa1Y/zhp5XF2sR9bPRIR1MoFeOpMppwN6IMVpVbGE2w6KTIr50Rjp/xWQA8HbKDu9P +4sbXRqKp9xR58VGthe5qhQgqeIJKmiPO8Dl/n6DVHAvfpEGwf2ndbu7/CW0P1gI/w+k v6PLtXviLjN3D8AMxbIubaMBmtQrrJJL7u1sIW1Pgwq0uytJuvPR+TKFZ4+P19rMj1Xg x/xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727697580; x=1728302380; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vIEJMiqmSBTZ/FiAO2g/0mH+P7xBgHAYPgSV3v/hQKo=; b=txLm7n5xtNKUaI3Irg3yraYljcYJ1Z0xlCfyTW5bgEum1SoWi8Xyg+tQ0u4mLWv3rW ns9tzP4hnd1ltxxMvM4+pZ5Cl3FVspvfY8rCbIC475STzf7sfhe/UULXcC+Uku5r+Cvj VOKceqOnQrp4PB9u+vKHPCD46b9bXUhLcEz9/B+UZP6hw4CcQDBbbxYitAWeNdLdRe7m LqvIXfHhdeSgmjuoWtZS75CkbegJ9ijhlj1DZJZBCAkhNhB9GKxpWFvR5D0IBt3jeDwd 3riQgxoXwEZfdbPPZYSyGRRpEdiCXTYn1B9NwLrwmKI2lYkSc7vGeACu3BT40ZOTSSSy zN7Q== X-Gm-Message-State: AOJu0Yyk9tdecYe74jfCMtC8se1mcCQ78SA+lBvprWYa8KCz4Bc0gW+t 8STALdTULJ6Aes15NMbFvro+HjKyLEGWBOCs52MgVJwazvcZgMghIFy+JpKMgmFAsTWQjFFbZls = X-Google-Smtp-Source: AGHT+IFFXsK+mXcA0Bqcml1G+O48fs/M7FJDVXEFiIfS5lnYrOedoKnMtIKgMqWAenOV1BTwEcIDFw== X-Received: by 2002:a05:6402:500c:b0:5c2:5f31:8888 with SMTP id 4fb4d7f45d1cf-5c8777e7944mr19342402a12.15.1727697580318; Mon, 30 Sep 2024 04:59:40 -0700 (PDT) Message-ID: <0e31ed35-1dd2-433d-9c15-b851afd3bc0f@suse.com> Date: Mon, 30 Sep 2024 13:59:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v6 4/5] x86/cpu-policy: re-arrange no-VMX logic From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= References: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Move the PKS check into an "else" for the corresponding "if()", such that further adjustments (like for USER_MSR) can easily be put there as well. Signed-off-by: Jan Beulich --- v5: Re-base. v4: New. --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -744,19 +744,20 @@ static void __init calculate_hvm_max_pol if ( !cpu_has_vmx_xsaves ) __clear_bit(X86_FEATURE_XSAVES, fs); } + else + { + /* + * Xen doesn't use PKS, so the guest support for it has opted to not use + * the VMCS load/save controls for efficiency reasons. This depends on + * the exact vmentry/exit behaviour, so don't expose PKS in other + * situations until someone has cross-checked the behaviour for safety. + */ + __clear_bit(X86_FEATURE_PKS, fs); + } if ( !cpu_has_vmx_msrlist ) __clear_bit(X86_FEATURE_MSRLIST, fs); - /* - * Xen doesn't use PKS, so the guest support for it has opted to not use - * the VMCS load/save controls for efficiency reasons. This depends on - * the exact vmentry/exit behaviour, so don't expose PKS in other - * situations until someone has cross-checked the behaviour for safety. - */ - if ( !cpu_has_vmx ) - __clear_bit(X86_FEATURE_PKS, fs); - /* * Make adjustments to possible (nested) virtualization features exposed * to the guest From patchwork Mon Sep 30 12:00:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 13815934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0D2B5CF64A0 for ; Mon, 30 Sep 2024 12:00:34 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.807328.1218713 (Exim 4.92) (envelope-from ) id 1svF4i-0002VQ-RX; Mon, 30 Sep 2024 12:00:24 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 807328.1218713; Mon, 30 Sep 2024 12:00:24 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF4i-0002VJ-No; Mon, 30 Sep 2024 12:00:24 +0000 Received: by outflank-mailman (input) for mailman id 807328; Mon, 30 Sep 2024 12:00:23 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1svF4h-0000sV-Ag for xen-devel@lists.xenproject.org; Mon, 30 Sep 2024 12:00:23 +0000 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [2a00:1450:4864:20::530]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 925ac380-7f23-11ef-a0ba-8be0dac302b0; Mon, 30 Sep 2024 14:00:22 +0200 (CEST) Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-5c896b9b4e0so1776968a12.3 for ; Mon, 30 Sep 2024 05:00:22 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c294747asm517006966b.107.2024.09.30.05.00.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 05:00:21 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 925ac380-7f23-11ef-a0ba-8be0dac302b0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1727697622; x=1728302422; darn=lists.xenproject.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=VQlv8ySD1zPxJt3KBwvQyjwVCyff2lekIKVwRPdVFo4=; b=FBXRPKtdugw9SdRfDorbEptlzc31v6iorB6lYHkbNk4zeauvS/MKe4jTs/W/MeeBke Dff6MboCoEtgGYhUQsEtpLTYm8I2uFsitHARACMuCdlg27RymcFt9wp9DJ9rvS1hQFg1 DA1Vrvf62Vo/PnYEG8s/Th2ybdnXIalh2jnc904NclQlNxzPzK/xrB7GtS4rFvK/EzOb 9IFXeqkVQG/41xofvE80tKpmh1JMhIoLdNgUUs3WOCvo5fTt+bRhQ8nLc9Rdx/ajPKWS sXtnRcyUhtewOcRHDoGvovN/hFlieVE47DwHEAzoszySwhRV70aZjehaoZWjFJvKYJub UGNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727697622; x=1728302422; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VQlv8ySD1zPxJt3KBwvQyjwVCyff2lekIKVwRPdVFo4=; b=vYY5YqqHS+8kKKXqILZf+xbGx36NyPIhyKDoiQ+WXpbg5XfctVji2jvY/V/hoJEZ8y bgQ59hPWvJDUngKRkuxt+T9isSHQmakU29w9KDw2TkcrCBfGn66ltKaBP3WFfYRqIwHA 1yKW0lZkrxy8bC96+7KxvFnr+vtIv8EN54k2WQsfEqPkYYotRxFE8xZzlZYTk/VLlOxA ONwwYCzegrs3PiFNBUKmjcUq7Sms/8ftn4e054/ofaWbFtfcitceJgFUs/yz6p/rJOYR kQcL7rFvLA2F8jOc2wZNnngw4salcjazlKKlAC31hn7ILnwRQzmpSpp7BOqC3TVI7NUg QrnQ== X-Gm-Message-State: AOJu0YxB54t+6S+QcIE96EClXcuMbMnujTAqABC3i84/w4Nwyi3RyjSv w5COg45SyI5USa0NhCxQ1ff/eKwldj2yKGBVM3OuCVGJ1JJiQmgPyOA1Lm9ZX280Vu783NBIhGU = X-Google-Smtp-Source: AGHT+IE46kNwJDJsaWaz6Lwvn0Di06EHabrBIgPllyZE6Jaf1zOo1ooI7RLGHAHAVHL1+8tIK7cnjQ== X-Received: by 2002:a17:907:9346:b0:a8a:9054:8394 with SMTP id a640c23a62f3a-a93c47d88e8mr1440879966b.0.1727697622021; Mon, 30 Sep 2024 05:00:22 -0700 (PDT) Message-ID: <4341e6c9-e0b8-4d2b-9547-e02e51e60f09@suse.com> Date: Mon, 30 Sep 2024 14:00:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v6 5/5] VMX: support USER_MSR From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= References: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <88b00dbf-b095-4ce2-9365-2a195ca0d65c@suse.com> Hook up the new VM exit codes and handle guest accesses, context switch, and save/restore. At least for now don't allow the guest direct access to the control MSR; this may need changing if guests were to frequently access it (e.g. on their own context switch path). While there also correct a one-off in union ldt_or_tr_instr_info's comment. Signed-off-by: Jan Beulich --- Needing to change two places in hvm.c continues to be unhelpful; I recall I already did forget to also adjust hvm_load_cpu_msrs() for XFD. Considering that MSRs typically arrive in the order the table has it, couldn't we incrementally look up the incoming MSR index there, falling back to a full lookup only when the incremental lookup failed (and thus not normally re-iterating through the initial part of the array)? Said comment in union ldt_or_tr_instr_info is further odd (same for union gdt_or_idt_instr_info's) in that Instruction Information is only a 32-bit field. Hence bits 32-63 aren't undefined, but simply don't exist. RFC: The wee attempt to "deal" with nested is likely wrong, but I'm afraid I simply don't know where such enforcement would be done properly. Returning an error there is also commented out, for domain_cpu_policy_changed() returning void without "x86/xstate: re-size save area when CPUID policy changes" in place. --- v5: Introduce user_msr_gpr(). v4: New. --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -753,6 +753,12 @@ static void __init calculate_hvm_max_pol * situations until someone has cross-checked the behaviour for safety. */ __clear_bit(X86_FEATURE_PKS, fs); + + /* + * Don't expose USER_MSR until it is known how (if at all) it is + * virtualized on SVM. + */ + __clear_bit(X86_FEATURE_USER_MSR, fs); } if ( !cpu_has_vmx_msrlist ) --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -447,6 +447,10 @@ void domain_cpu_policy_changed(struct do } } + /* Nested doesn't have the necessary processing, yet. */ + if ( nestedhvm_enabled(d) && p->feat.user_msr ) + return /* -EINVAL */; + for_each_vcpu ( d, v ) { cpu_policy_updated(v); --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1370,6 +1370,7 @@ static int cf_check hvm_load_cpu_xsave_s #define HVM_CPU_MSR_SIZE(cnt) offsetof(struct hvm_msr, msr[cnt]) static const uint32_t msrs_to_send[] = { + MSR_USER_MSR_CTL, MSR_SPEC_CTRL, MSR_INTEL_MISC_FEATURES_ENABLES, MSR_PKRS, @@ -1524,6 +1525,7 @@ static int cf_check hvm_load_cpu_msrs(st { int rc; + case MSR_USER_MSR_CTL: case MSR_SPEC_CTRL: case MSR_INTEL_MISC_FEATURES_ENABLES: case MSR_PKRS: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -676,13 +676,18 @@ static void cf_check vmx_vcpu_destroy(st } /* - * To avoid MSR save/restore at every VM exit/entry time, we restore - * the x86_64 specific MSRs at domain switch time. Since these MSRs - * are not modified once set for para domains, we don't save them, - * but simply reset them to values set in percpu_traps_init(). + * To avoid MSR save/restore at every VM exit/entry time, we restore the + * x86_64 specific MSRs at vcpu switch time. Since these MSRs are not + * modified once set for para domains, we don't save them, but simply clear + * them or reset them to values set in percpu_traps_init(). */ -static void vmx_restore_host_msrs(void) +static void vmx_restore_host_msrs(const struct vcpu *v) { + const struct vcpu_msrs *msrs = v->arch.msrs; + + if ( msrs->user_msr_ctl.enable ) + wrmsrl(MSR_USER_MSR_CTL, 0); + /* No PV guests? No need to restore host SYSCALL infrastructure. */ if ( !IS_ENABLED(CONFIG_PV) ) return; @@ -736,6 +741,9 @@ static void vmx_restore_guest_msrs(struc if ( cp->feat.pks ) wrpkrs(msrs->pkrs); + + if ( msrs->user_msr_ctl.enable ) + wrmsrl(MSR_USER_MSR_CTL, msrs->user_msr_ctl.raw); } void vmx_update_cpu_exec_control(struct vcpu *v) @@ -1178,7 +1186,7 @@ static void cf_check vmx_ctxt_switch_fro if ( !v->arch.fully_eager_fpu ) vmx_fpu_leave(v); vmx_save_guest_msrs(v); - vmx_restore_host_msrs(); + vmx_restore_host_msrs(v); vmx_save_dr(v); if ( v->domain->arch.hvm.pi_ops.flags & PI_CSW_FROM ) @@ -4080,6 +4088,14 @@ static int vmx_handle_apic_write(void) return vlapic_apicv_write(current, exit_qualification & 0xfff); } +static unsigned int user_msr_gpr(void) +{ + user_msr_instr_info_t info; + + __vmread(VMX_INSTRUCTION_INFO, &info.raw); + return info.gpr; +} + static void undo_nmis_unblocked_by_iret(void) { unsigned long guest_info; @@ -4580,6 +4596,41 @@ void asmlinkage vmx_vmexit_handler(struc hvm_inject_hw_exception(X86_EXC_GP, 0); break; + case EXIT_REASON_URDMSR: + { + uint64_t msr_content = 0; + + __vmread(EXIT_QUALIFICATION, &exit_qualification); + switch ( hvm_msr_read_intercept(exit_qualification, &msr_content) ) + { + case X86EMUL_OKAY: + *decode_gpr(regs, user_msr_gpr()) = msr_content; + update_guest_eip(); /* Safe: URDMSR */ + break; + + case X86EMUL_EXCEPTION: + hvm_inject_hw_exception(X86_EXC_GP, 0); + break; + } + break; + } + + case EXIT_REASON_UWRMSR: + __vmread(EXIT_QUALIFICATION, &exit_qualification); + switch ( hvm_msr_write_intercept(exit_qualification, + *decode_gpr(regs, user_msr_gpr()), + true) ) + { + case X86EMUL_OKAY: + update_guest_eip(); /* Safe: UWRMSR */ + break; + + case X86EMUL_EXCEPTION: + hvm_inject_hw_exception(X86_EXC_GP, 0); + break; + } + break; + case EXIT_REASON_VMXOFF: case EXIT_REASON_VMXON: case EXIT_REASON_VMCLEAR: --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h @@ -203,6 +203,8 @@ static inline void pi_clear_sn(struct pi #define EXIT_REASON_NOTIFY 75 #define EXIT_REASON_RDMSRLIST 78 #define EXIT_REASON_WRMSRLIST 79 +#define EXIT_REASON_URDMSR 80 +#define EXIT_REASON_UWRMSR 81 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */ /* @@ -674,8 +676,18 @@ typedef union ldt_or_tr_instr_info { base_reg_invalid :1, /* bit 27 - Base register invalid */ instr_identity :1, /* bit 28 - 0:LDT, 1:TR */ instr_write :1, /* bit 29 - 0:store, 1:load */ - :34; /* bits 31:63 - Undefined */ + :34; /* bits 30:63 - Undefined */ }; } ldt_or_tr_instr_info_t; +/* VM-Exit instruction info for URDMSR and UWRMSR */ +typedef union user_msr_instr_info { + unsigned long raw; + struct { + unsigned int :3, /* Bits 0:2 - Undefined */ + gpr :4, /* Bits 3:6 - Source/Destination register */ + :25; /* bits 7:31 - Undefined */ + }; +} user_msr_instr_info_t; + #endif /* __ASM_X86_HVM_VMX_VMX_H__ */ --- a/xen/arch/x86/include/asm/msr.h +++ b/xen/arch/x86/include/asm/msr.h @@ -296,6 +296,20 @@ uint64_t msr_spec_ctrl_valid_bits(const struct vcpu_msrs { /* + * 0x0000001c - MSR_USER_MSR_CTL + * + * Value is guest chosen, and always loaded in vcpu context. + */ + union { + uint64_t raw; + struct { + bool enable:1; + unsigned int :11; + unsigned long bitmap:52; + }; + } user_msr_ctl; + + /* * 0x00000048 - MSR_SPEC_CTRL * 0xc001011f - MSR_VIRT_SPEC_CTRL (if X86_FEATURE_AMD_SSBD) * --- a/xen/arch/x86/include/asm/perfc_defn.h +++ b/xen/arch/x86/include/asm/perfc_defn.h @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, #ifdef CONFIG_HVM -#define VMX_PERF_EXIT_REASON_SIZE 80 +#define VMX_PERF_EXIT_REASON_SIZE 82 #define VMEXIT_NPF_PERFC 143 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1) PERFCOUNTER_ARRAY(vmexits, "vmexits", --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -206,6 +206,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t *val = msrs->xss.raw; break; + case MSR_USER_MSR_CTL: + if ( !cp->feat.user_msr ) + goto gp_fault; + *val = msrs->user_msr_ctl.raw; + break; + case 0x40000000 ... 0x400001ff: if ( is_viridian_domain(d) ) { @@ -536,6 +542,19 @@ int guest_wrmsr(struct vcpu *v, uint32_t msrs->xss.raw = val; break; + case MSR_USER_MSR_CTL: + if ( !cp->feat.user_msr ) + goto gp_fault; + + if ( (val & ~(USER_MSR_ENABLE | USER_MSR_ADDR_MASK)) || + !is_canonical_address(val) ) + goto gp_fault; + + msrs->user_msr_ctl.raw = val; + if ( v == curr ) + wrmsrl(MSR_USER_MSR_CTL, val); + break; + case 0x40000000 ... 0x400001ff: if ( is_viridian_domain(d) ) { --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -350,7 +350,7 @@ XEN_CPUFEATURE(AVX_NE_CONVERT, 15*32 XEN_CPUFEATURE(AMX_COMPLEX, 15*32+ 8) /* AMX Complex Instructions */ XEN_CPUFEATURE(AVX_VNNI_INT16, 15*32+10) /*A AVX-VNNI-INT16 Instructions */ XEN_CPUFEATURE(PREFETCHI, 15*32+14) /*A PREFETCHIT{0,1} Instructions */ -XEN_CPUFEATURE(USER_MSR, 15*32+15) /* U{RD,WR}MSR Instructions */ +XEN_CPUFEATURE(USER_MSR, 15*32+15) /*s U{RD,WR}MSR Instructions */ XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks safe to use */ /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.eax, word 16 */