Message ID | e379a005-a9ee-41ae-a797-9768c6aeb52e@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/MSR: improve code gen for rdmsr_safe() and rdtsc() | expand |
On 30/09/2024 4:15 pm, Jan Beulich wrote: > To fold two 32-bit outputs from the asm()-s into a single 64-bit value > the compiler needs to emit a zero-extension insn for the low half. Both > RDMSR and RDTSC clear the upper halves of their output registers anyway, > though. So despite that zero-extending insn (a simple MOV) being cheap, > we can do better: Without one, by declaring the local variables as 64- > bit ones. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Hmm. This is generally better, but not universally so. xen$ ../scripts/bloat-o-meter xen-syms-before xen-syms-after add/remove: 0/0 grow/shrink: 2/29 up/down: 24/-193 (-169) Function old new delta pci_cfg_ok 335 351 +16 arch_ioreq_server_get_type_addr 251 259 +8 udelay 81 79 -2 tsc_get_info 258 256 -2 trace 1320 1318 -2 time_latch_stamps 76 74 -2 set_legacy_ssbd.isra 146 144 -2 read_hyperv_timer 81 79 -2 read_counter 47 45 -2 read_clocks_slave 135 133 -2 mcheck_mca_logout 2059 2057 -2 hwp_write_request 105 103 -2 get_s_time_fixed 91 89 -2 get_s_time 77 75 -2 cpu_init 251 249 -2 amd_mcheck_init 754 752 -2 _svm_cpu_up 467 465 -2 tsc_check_writability 187 184 -3 _probe_mask_msr 106 99 -7 time_calibration_tsc_rendezvous 541 533 -8 time_calibration_std_rendezvous 197 189 -8 probe_cpuid_faulting 180 172 -8 hvm_save 343 335 -8 amd_check_erratum_1474 140 132 -8 calibrate_tsc 382 372 -10 wait_for_nmis 134 119 -15 read_msr 1220 1204 -16 guest_rdmsr 1869 1853 -16 check_tsc_warp.constprop 201 185 -16 hwp_init_msrs 440 420 -20 amd_log_freq.cold 519 499 -20 Total: Before=3895379, After=3895210, chg -0.00% arch_ioreq_server_get_type_addr goes from: and $0x40,%dh je ... to shl $0x20,%rdx or %rax,%rdx bt $0x2e,%rdx jae ... and pci_cfg_ok gets a "movabs $0x400000000000,%rcx" and 64bit test, where previously it used "and $0x40,%dh" So, the use of 64bit variables as opposed to 32bit does seem to break GCC's ability to substitute %dh. Oh well - it's minor either way. Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
--- a/xen/arch/x86/include/asm/msr.h +++ b/xen/arch/x86/include/asm/msr.h @@ -54,17 +54,17 @@ static inline void wrmsr_ns(uint32_t msr /* rdmsr with exception handling */ #define rdmsr_safe(msr,val) ({\ int rc_; \ - uint32_t lo_, hi_; \ + uint64_t lo_, hi_; \ __asm__ __volatile__( \ "1: rdmsr\n2:\n" \ ".section .fixup,\"ax\"\n" \ - "3: xorl %0,%0\n; xorl %1,%1\n" \ + "3: xorl %k0,%k0\n; xorl %k1,%k1\n" \ " movl %5,%2\n; jmp 2b\n" \ ".previous\n" \ _ASM_EXTABLE(1b, 3b) \ : "=a" (lo_), "=d" (hi_), "=&r" (rc_) \ : "c" (msr), "2" (0), "i" (-EFAULT)); \ - val = lo_ | ((uint64_t)hi_ << 32); \ + val = lo_ | (hi_ << 32); \ rc_; }) /* wrmsr with exception handling */ @@ -99,11 +99,11 @@ static inline void weak_wrmsr_fence(bool static inline uint64_t rdtsc(void) { - uint32_t low, high; + uint64_t low, high; __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high)); - return ((uint64_t)high << 32) | low; + return (high << 32) | low; } static inline uint64_t rdtsc_ordered(void)
To fold two 32-bit outputs from the asm()-s into a single 64-bit value the compiler needs to emit a zero-extension insn for the low half. Both RDMSR and RDTSC clear the upper halves of their output registers anyway, though. So despite that zero-extending insn (a simple MOV) being cheap, we can do better: Without one, by declaring the local variables as 64- bit ones. Signed-off-by: Jan Beulich <jbeulich@suse.com>