Message ID | ZM59CkNZg5n4WXO3@p100 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] Reduce generated code by 3% by increasing MMU indices | expand |
On 8/5/23 09:47, Helge Deller wrote: > Do we want to enable such an performance optimization? > If so, I see two possibilities: > > a) Re-define NB_MMU_MODES per target No, we've just gotten rid of per target definitions of NB_MMU_MODES, on the way to being able to support multiple targets simultaneously. This only affects x86, and for only 6 bytes per memory access. While saving code size is a nice goal, I sincerely doubt you can measure any performance difference. If there were a way to change no more than two lines of code, that would be fine. But otherwise I don't see this as being worth making the rest of the code base any more complex. r~
* Richard Henderson <richard.henderson@linaro.org>: > On 8/5/23 09:47, Helge Deller wrote: > > Do we want to enable such an performance optimization? > > If so, I see two possibilities: > > > > a) Re-define NB_MMU_MODES per target > > No, we've just gotten rid of per target definitions of NB_MMU_MODES, on the > way to being able to support multiple targets simultaneously. Ok, I assume that answer :-) > This only affects x86, and for only 6 bytes per memory access. While saving > code size is a nice goal, I sincerely doubt you can measure any performance > difference. Maybe. I don't know. I'm sure the gain is small, but the patch is small too. > If there were a way to change no more than two lines of code, that would be > fine. But otherwise I don't see this as being worth making the rest of the > code base any more complex. Ok. What about that 6-line patch below for x86? It's trivial and all what's needed for x86. Btw, any index which is >= 9 will use the shorter code sequence. Helge diff --git a/target/i386/cpu.h b/target/i386/cpu.h index e0771a1043..3e71e666db 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2251,11 +2251,11 @@ uint64_t cpu_get_tsc(CPUX86State *env); #define cpu_list x86_cpu_list /* MMU modes definitions */ -#define MMU_KSMAP_IDX 0 -#define MMU_USER_IDX 1 -#define MMU_KNOSMAP_IDX 2 -#define MMU_NESTED_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KSMAP_IDX 11 +#define MMU_USER_IDX 12 +#define MMU_KNOSMAP_IDX 13 +#define MMU_NESTED_IDX 14 +#define MMU_PHYS_IDX 15 static inline int cpu_mmu_index(CPUX86State *env, bool ifetch) {
On 8/5/23 10:43, Helge Deller wrote: >> If there were a way to change no more than two lines of code, that would be >> fine. But otherwise I don't see this as being worth making the rest of the >> code base any more complex. > > Ok. What about that 6-line patch below for x86? > It's trivial and all what's needed for x86. > Btw, any index which is >= 9 will use the shorter code sequence. > > Helge > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > index e0771a1043..3e71e666db 100644 > --- a/target/i386/cpu.h > +++ b/target/i386/cpu.h > @@ -2251,11 +2251,11 @@ uint64_t cpu_get_tsc(CPUX86State *env); > #define cpu_list x86_cpu_list > > /* MMU modes definitions */ > -#define MMU_KSMAP_IDX 0 > -#define MMU_USER_IDX 1 > -#define MMU_KNOSMAP_IDX 2 > -#define MMU_NESTED_IDX 3 > -#define MMU_PHYS_IDX 4 > +#define MMU_KSMAP_IDX 11 > +#define MMU_USER_IDX 12 > +#define MMU_KNOSMAP_IDX 13 > +#define MMU_NESTED_IDX 14 > +#define MMU_PHYS_IDX 15 No. The small patch would need to apply to all guests. Perhaps something to handle indexing of CPUTLBDescFast, e.g. static inline CPUTLBDescFast cputlb_fast(CPUTLB *tlb, unsigned idx) { return &tlb->f[NB_MMU_MODES - 1 - idx]; } There's already tlb_mask_table_ofs, which handles all tcg backends; you just need to adjust that and cputlb.c. Introduce cputlb_fast with normal indexing in one patch, and then the second patch to invert the indexing may well be exactly two lines. :-) r~
On 8/5/23 19:58, Richard Henderson wrote: > On 8/5/23 10:43, Helge Deller wrote: >>> If there were a way to change no more than two lines of code, >>> that would be fine. But otherwise I don't see this as being >>> worth making the rest of the code base any more complex. >> >> Ok. What about that 6-line patch below for x86? It's trivial and >> all what's needed for x86. Btw, any index which is >= 9 will use >> the shorter code sequence. >> >> Helge >> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h index >> e0771a1043..3e71e666db 100644 --- a/target/i386/cpu.h +++ >> b/target/i386/cpu.h @@ -2251,11 +2251,11 @@ uint64_t >> cpu_get_tsc(CPUX86State *env); #define cpu_list x86_cpu_list >> >> /* MMU modes definitions */ -#define MMU_KSMAP_IDX 0 -#define >> MMU_USER_IDX 1 -#define MMU_KNOSMAP_IDX 2 -#define >> MMU_NESTED_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KSMAP_IDX >> 11 +#define MMU_USER_IDX 12 +#define MMU_KNOSMAP_IDX 13 +#define >> MMU_NESTED_IDX 14 +#define MMU_PHYS_IDX 15 > > No. The small patch would need to apply to all guests. Yes. > Perhaps something to handle indexing of CPUTLBDescFast, e.g. > > static inline CPUTLBDescFast cputlb_fast(CPUTLB *tlb, unsigned idx) > { return &tlb->f[NB_MMU_MODES - 1 - idx]; } > > There's already tlb_mask_table_ofs, which handles all tcg backends; > you just need to adjust that and cputlb.c> Introduce cputlb_fast with > normal indexing in one patch, and then the second patch to invert the > indexing may well be exactly two lines. :-) You're cheating :-) But ok, that's an easy one and I can come up with both patches. One last idea which came into my mind and which may be worth asking before I start to hack the patch above...: include/exec/cpu-defs.h: /* add some comment here why we use this transformation: */ #define MMU_INDEX(nr) (NB_MMU_MODES - 1 - (x)) target/*/cpu.h: /* MMU modes definitions */ #define MMU_KSMAP_IDX MMU_INDEX(0) #define MMU_USER_IDX MMU_INDEX(1) #define MMU_KNOSMAP_IDX MMU_INDEX(2) #define MMU_NESTED_IDX MMU_INDEX(3) ... Downside: - of course it's a lot more than the 2 lines you asked for Upsides: - no additional subtaction at tcg compile time/runtime - clear indication that this is an MMU index, easy to grep. - easy to use ? Helge
On 8/5/23 21:40, Helge Deller wrote: > On 8/5/23 19:58, Richard Henderson wrote: >> On 8/5/23 10:43, Helge Deller wrote: >>>> If there were a way to change no more than two lines of code, >>>> that would be fine. But otherwise I don't see this as being >>>> worth making the rest of the code base any more complex. >>> >>> Ok. What about that 6-line patch below for x86? It's trivial and >>> all what's needed for x86. Btw, any index which is >= 9 will use >>> the shorter code sequence. >>> >>> Helge >>> >>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h index >>> e0771a1043..3e71e666db 100644 --- a/target/i386/cpu.h +++ >>> b/target/i386/cpu.h @@ -2251,11 +2251,11 @@ uint64_t >>> cpu_get_tsc(CPUX86State *env); #define cpu_list x86_cpu_list >>> >>> /* MMU modes definitions */ -#define MMU_KSMAP_IDX 0 -#define >>> MMU_USER_IDX 1 -#define MMU_KNOSMAP_IDX 2 -#define >>> MMU_NESTED_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KSMAP_IDX >>> 11 +#define MMU_USER_IDX 12 +#define MMU_KNOSMAP_IDX 13 +#define >>> MMU_NESTED_IDX 14 +#define MMU_PHYS_IDX 15 >> >> No. The small patch would need to apply to all guests. > > Yes. > >> Perhaps something to handle indexing of CPUTLBDescFast, e.g. >> >> static inline CPUTLBDescFast cputlb_fast(CPUTLB *tlb, unsigned idx) >> { return &tlb->f[NB_MMU_MODES - 1 - idx]; } >> >> There's already tlb_mask_table_ofs, which handles all tcg backends; >> you just need to adjust that and cputlb.c> Introduce cputlb_fast with >> normal indexing in one patch, and then the second patch to invert the >> indexing may well be exactly two lines. :-) > > You're cheating :-) > But ok, that's an easy one and I can come up with both patches. > > One last idea which came into my mind and which may be worth > asking before I start to hack the patch above...: > > include/exec/cpu-defs.h: > /* add some comment here why we use this transformation: */ > #define MMU_INDEX(nr) (NB_MMU_MODES - 1 - (x)) > > target/*/cpu.h: > /* MMU modes definitions */ > #define MMU_KSMAP_IDX MMU_INDEX(0) > #define MMU_USER_IDX MMU_INDEX(1) > #define MMU_KNOSMAP_IDX MMU_INDEX(2) > #define MMU_NESTED_IDX MMU_INDEX(3) > ... > > Downside: > - of course it's a lot more than the 2 lines you asked for > Upsides: > - no additional subtaction at tcg compile time/runtime > - clear indication that this is an MMU index, easy to grep. > - easy to use and it's actually a 1-line patch as you requested :-) similiar to your approach above (multiple preparation patches, one last patch which just changes #define MMU_INDEX(nr) (nr) to #define MMU_INDEX(nr) (NB_MMU_MODES - 1 - (nr)) ;-) Helge
On 8/5/23 13:04, Helge Deller wrote: > On 8/5/23 21:40, Helge Deller wrote: >> On 8/5/23 19:58, Richard Henderson wrote: >>> On 8/5/23 10:43, Helge Deller wrote: >>>>> If there were a way to change no more than two lines of code, >>>>> that would be fine. But otherwise I don't see this as being >>>>> worth making the rest of the code base any more complex. >>>> >>>> Ok. What about that 6-line patch below for x86? It's trivial and >>>> all what's needed for x86. Btw, any index which is >= 9 will use >>>> the shorter code sequence. >>>> >>>> Helge >>>> >>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h index >>>> e0771a1043..3e71e666db 100644 --- a/target/i386/cpu.h +++ >>>> b/target/i386/cpu.h @@ -2251,11 +2251,11 @@ uint64_t >>>> cpu_get_tsc(CPUX86State *env); #define cpu_list x86_cpu_list >>>> >>>> /* MMU modes definitions */ -#define MMU_KSMAP_IDX 0 -#define >>>> MMU_USER_IDX 1 -#define MMU_KNOSMAP_IDX 2 -#define >>>> MMU_NESTED_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KSMAP_IDX >>>> 11 +#define MMU_USER_IDX 12 +#define MMU_KNOSMAP_IDX 13 +#define >>>> MMU_NESTED_IDX 14 +#define MMU_PHYS_IDX 15 >>> >>> No. The small patch would need to apply to all guests. >> >> Yes. >> >>> Perhaps something to handle indexing of CPUTLBDescFast, e.g. >>> >>> static inline CPUTLBDescFast cputlb_fast(CPUTLB *tlb, unsigned idx) >>> { return &tlb->f[NB_MMU_MODES - 1 - idx]; } >>> >>> There's already tlb_mask_table_ofs, which handles all tcg backends; >>> you just need to adjust that and cputlb.c> Introduce cputlb_fast with >>> normal indexing in one patch, and then the second patch to invert the >>> indexing may well be exactly two lines. :-) >> >> You're cheating :-) >> But ok, that's an easy one and I can come up with both patches. >> >> One last idea which came into my mind and which may be worth >> asking before I start to hack the patch above...: >> >> include/exec/cpu-defs.h: >> /* add some comment here why we use this transformation: */ >> #define MMU_INDEX(nr) (NB_MMU_MODES - 1 - (x)) >> >> target/*/cpu.h: >> /* MMU modes definitions */ >> #define MMU_KSMAP_IDX MMU_INDEX(0) >> #define MMU_USER_IDX MMU_INDEX(1) >> #define MMU_KNOSMAP_IDX MMU_INDEX(2) >> #define MMU_NESTED_IDX MMU_INDEX(3) >> ... >> >> Downside: >> - of course it's a lot more than the 2 lines you asked for >> Upsides: >> - no additional subtaction at tcg compile time/runtime >> - clear indication that this is an MMU index, easy to grep. >> - easy to use > > and it's actually a 1-line patch as you requested :-) > similiar to your approach above (multiple preparation patches, > one last patch which just changes > #define MMU_INDEX(nr) (nr) > to > #define MMU_INDEX(nr) (NB_MMU_MODES - 1 - (nr)) > > ;-) :-) Plausible. With a go, anyway. r~
diff --git a/target/i386/cpu.h b/target/i386/cpu.h index e0771a1043..d4aa6e7bee 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2251,11 +2251,11 @@ uint64_t cpu_get_tsc(CPUX86State *env); #define cpu_list x86_cpu_list /* MMU modes definitions */ -#define MMU_KSMAP_IDX 0 -#define MMU_USER_IDX 1 -#define MMU_KNOSMAP_IDX 2 -#define MMU_NESTED_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KSMAP_IDX (NB_MMU_MODES - 1) +#define MMU_USER_IDX (NB_MMU_MODES - 2) +#define MMU_KNOSMAP_IDX (NB_MMU_MODES - 3) +#define MMU_NESTED_IDX (NB_MMU_MODES - 4) +#define MMU_PHYS_IDX (NB_MMU_MODES - 5) static inline int cpu_mmu_index(CPUX86State *env, bool ifetch) { diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index 25fac9577a..a2a56781eb 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -1474,13 +1474,14 @@ int ppc_dcr_write(ppc_dcr_t *dcr_env, int dcrn, uint32_t val); #define cpu_list ppc_cpu_list /* MMU modes definitions */ -#define MMU_USER_IDX 0 +#define MMU_USER_IDX (NB_MMU_MODES - 1) static inline int cpu_mmu_index(CPUPPCState *env, bool ifetch) { #ifdef CONFIG_USER_ONLY return MMU_USER_IDX; #else - return (env->hflags >> (ifetch ? HFLAGS_IMMU_IDX : HFLAGS_DMMU_IDX)) & 7; + return NB_MMU_MODES - 2 + - ((env->hflags >> (ifetch ? HFLAGS_IMMU_IDX : HFLAGS_DMMU_IDX)) & 7); #endif } diff --git a/target/alpha/cpu.h b/target/alpha/cpu.h index 13306665af..f25cf33e25 100644 --- a/target/alpha/cpu.h +++ b/target/alpha/cpu.h @@ -194,9 +194,9 @@ enum { PALcode cheats and uses the KSEG mapping for its code+data rather than physical addresses. */ -#define MMU_KERNEL_IDX 0 -#define MMU_USER_IDX 1 -#define MMU_PHYS_IDX 2 +#define MMU_KERNEL_IDX (NB_MMU_MODES - 1) +#define MMU_USER_IDX (NB_MMU_MODES - 2) +#define MMU_PHYS_IDX (NB_MMU_MODES - 3) typedef struct CPUArchState { uint64_t ir[31]; diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h index 75c5c0ccf7..1c09602d0b 100644 --- a/target/hppa/cpu.h +++ b/target/hppa/cpu.h @@ -30,9 +30,9 @@ basis. It's probably easier to fall back to a strong memory model. */ #define TCG_GUEST_DEFAULT_MO TCG_MO_ALL -#define MMU_KERNEL_IDX 0 -#define MMU_USER_IDX 3 -#define MMU_PHYS_IDX 4 +#define MMU_KERNEL_IDX (NB_MMU_MODES - 1) +#define MMU_USER_IDX (NB_MMU_MODES - 2) +#define MMU_PHYS_IDX (NB_MMU_MODES - 3) #define TARGET_INSN_START_EXTRA_WORDS 1 /* Hardware exceptions, interrupts, faults, and traps. */