diff mbox

read_cpuid_id() in arch/arm/kernel/setup.c

Message ID 550312BB.8020902@free.fr (mailing list archive)
State New, archived
Headers show

Commit Message

Mason March 13, 2015, 4:39 p.m. UTC
On 13/03/2015 17:19, Russell King - ARM Linux wrote:
> On Fri, Mar 13, 2015 at 05:03:52PM +0100, Mason wrote:
>> Hello everyone,
>>
>> As far as I can tell, read_cpuid_id() resolves to read_cpuid(CPUID_ID)
>> which resolves to mrc 15, 0, rN, cr0, cr0, {0}
>>
>> Consider this:
>>
>> /*
>>   * The CPU ID never changes at run time, so we might as well tell the
>>   * compiler that it's constant.  Use this function to read the CPU ID
>>   * rather than directly reading processor_id or read_cpuid() directly.
>>   */
>> static inline unsigned int __attribute_const__ read_cpuid_id(void)
>> {
>> 	return read_cpuid(CPUID_ID);
>> }
>>
>> Despite the comment and attribute, my compiler(*) still reloads the
>> value every time.
>>
>> (*) gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)
>>
>> e.g.
>>
>> static int __get_cpu_architecture(void)
>> {
>> 	int cpu_arch;
>>
>> 	unsigned int id = read_cpuid_id();
>> 	if ((id & 0x0008f000) == 0) {
>> 		cpu_arch = CPU_ARCH_UNKNOWN;
>> 	} else if ((id & 0x0008f000) == 0x00007000) {
>> 		cpu_arch = (id & (1 << 23)) ? CPU_ARCH_ARMv4T : CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x00080000) == 0x00000000) {
>> 		cpu_arch = (id >> 16) & 7;
>> 		if (cpu_arch)
>> 			cpu_arch += CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x000f0000) == 0x000f0000) {
>>
>> resolves to
>>
>> c01fec74:       ee10cf10        mrc     15, 0, ip, cr0, cr0, {0}
>> c01fec78:       e21cca8f        ands    ip, ip, #585728 ; 0x8f000
>> c01fec7c:       e34c3023        movt    r3, #49187      ; 0xc023
>> c01fec80:       e5837008        str     r7, [r3, #8]
>> c01fec84:       e50b304c        str     r3, [fp, #-76]  ; 0x4c
>> c01fec88:       0a000022        beq     c01fed18 <setup_arch+0xe4>
>> c01fec8c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fec90:       e2033a8f        and     r3, r3, #585728 ; 0x8f000
>> c01fec94:       e3530a07        cmp     r3, #28672      ; 0x7000
>> c01fec98:       1a000004        bne     c01fecb0 <setup_arch+0x7c>
>> c01fec9c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01feca0:       e3130502        tst     r3, #8388608    ; 0x800000
>> c01feca4:       13a0c003        movne   ip, #3
>> c01feca8:       03a0c001        moveq   ip, #1
>> c01fecac:       ea000019        b       c01fed18 <setup_arch+0xe4>
>> c01fecb0:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fecb4:       e3130702        tst     r3, #524288     ; 0x80000
>>
>>
>> So I thought it would be nice to give the poor compiler a break,
>> and just stuff the result in a local variable:
>
> NAK.
>
> Your compiler behaviour is different to mine (stock gcc 4.7.4):
>
>   4f8:   e1a0c00d        mov     ip, sp
>   4fc:   e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}
>   500:   e24cb004        sub     fp, ip, #4
>   504:   e24dd00c        sub     sp, sp, #12
>   508:   ee105f10        mrc     15, 0, r5, cr0, cr0, {0} <== load id
>   50c:   e1a07000        mov     r7, r0
>   510:   e1a00005        mov     r0, r5			<== r5 = id
>   514:   ebfffffe        bl      0 <lookup_processor_type>
>   518:   e2504000        subs    r4, r0, #0
>   51c:   1a000003        bne     530 <setup_arch+0x38>
>   520:   e59f063c        ldr     r0, [pc, #1596] ; b64 <setup_arch+0x66c>
>   524:   e1a01005        mov     r1, r5
>   528:   ebfffffe        bl      0 <printk>
>   52c:   eafffffe        b       52c <setup_arch+0x34>
>   530:   e5948020        ldr     r8, [r4, #32]
>   534:   e2153a8f        ands    r3, r5, #585728 ; 0x8f000 <== r5 = id
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   53c:   e5828000        str     r8, [r2]
>   540:   0a00001f        beq     5c4 <setup_arch+0xcc>
>   544:   e3530a07        cmp     r3, #28672      ; 0x7000
>   548:   1a000003        bne     55c <setup_arch+0x64>
>   54c:   e3150502        tst     r5, #8388608    ; 0x800000 <== r5 = id
>   550:   03a03001        moveq   r3, #1
>   554:   13a03003        movne   r3, #3
>   558:   ea000019        b       5c4 <setup_arch+0xcc>
>   55c:   e3150702        tst     r5, #524288     ; 0x80000 <== r5 = id
>   560:   1a000003        bne     574 <setup_arch+0x7c>
>   564:   e7e23855        ubfx    r3, r5, #16, #3		<== r5 = id
>   568:   e3530000        cmp     r3, #0
>   56c:   12833001        addne   r3, r3, #1
>   570:   ea000013        b       5c4 <setup_arch+0xcc>
>   574:   e205580f        and     r5, r5, #983040 ; 0xf0000 <== r5 = id
>   578:   e355080f        cmp     r5, #983040     ; 0xf0000
>   57c:   13a03000        movne   r3, #0
>   580:   1a00000f        bne     5c4 <setup_arch+0xcc>
> ...
>
> The point here is that the compiler is free to optimise the code as it
> sees fit - if it decides that the register pressure from having the
> value cached in a register is too high, it can decide to spill the
> cached value, and reload it from CP15 as and when it needs to.  That
> is an advantage.

Good point. I hadn't thought of that.

Do you know the latency of an mrc instruction? (compared to a mov)

> It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3.  In fact,
> it looks like Linaro's 4.9.3 is rather buggy as far as optimisation
> goes.
>
> Later compilers aren't always better.

I did NOT expect that. Compiler optimizations passes are so fragile.

Anyway, here's another proposed nano-improvement ;-)
(This one is code factorization)



This one looks good, doesn't it? :-)

Regards.

Comments

Russell King - ARM Linux March 13, 2015, 4:45 p.m. UTC | #1
On Fri, Mar 13, 2015 at 05:39:23PM +0100, Mason wrote:
> Good point. I hadn't thought of that.
> 
> Do you know the latency of an mrc instruction? (compared to a mov)

Not offhand.  It'll be different for different CPUs, but it's probably
not far off mov.

> >It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3.  In fact,
> >it looks like Linaro's 4.9.3 is rather buggy as far as optimisation
> >goes.
> >
> >Later compilers aren't always better.
> 
> I did NOT expect that. Compiler optimizations passes are so fragile.

You're learning :)

> Anyway, here's another proposed nano-improvement ;-)
> (This one is code factorization)
> 
> --- setup.c	2015-03-03 18:04:59.000000000 +0100
> +++ setup.bar.c	2015-03-13 17:29:23.800668429 +0100
> @@ -246,12 +246,9 @@
>  		if (cpu_arch)
>  			cpu_arch += CPU_ARCH_ARMv3;
>  	} else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) {
> -		unsigned int mmfr0;
> -
>  		/* Revised CPUID format. Read the Memory Model Feature
>  		 * Register 0 and check for VMSAv7 or PMSAv7 */
> -		asm("mrc	p15, 0, %0, c0, c1, 4"
> -		    : "=r" (mmfr0));
> +		unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0);
>  		if ((mmfr0 & 0x0000000f) >= 0x00000003 ||
>  		    (mmfr0 & 0x000000f0) >= 0x00000030)
>  			cpu_arch = CPU_ARCH_ARMv7;
> 
> 
> This one looks good, doesn't it? :-)

Yes, this one I like - and it probably fixes a potential latent bug
where the compiler was free to re-order that mrc outside of the if()
statement.

Please wrap it up as a normal submission, thanks.
Mason March 13, 2015, 5:06 p.m. UTC | #2
On 13/03/2015 17:45, Russell King - ARM Linux wrote:
> On Fri, Mar 13, 2015 at 05:39:23PM +0100, Mason wrote:
>> Good point. I hadn't thought of that.
>>
>> Do you know the latency of an mrc instruction? (compared to a mov)
>
> Not offhand.  It'll be different for different CPUs, but it's probably
> not far off mov.
>
>>> It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3.  In fact,
>>> it looks like Linaro's 4.9.3 is rather buggy as far as optimisation
>>> goes.
>>>
>>> Later compilers aren't always better.
>>
>> I did NOT expect that. Compiler optimizations passes are so fragile.
>
> You're learning :)
>
>> Anyway, here's another proposed nano-improvement ;-)
>> (This one is code factorization)
>>
>> --- setup.c	2015-03-03 18:04:59.000000000 +0100
>> +++ setup.bar.c	2015-03-13 17:29:23.800668429 +0100
>> @@ -246,12 +246,9 @@
>>   		if (cpu_arch)
>>   			cpu_arch += CPU_ARCH_ARMv3;
>>   	} else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) {
>> -		unsigned int mmfr0;
>> -
>>   		/* Revised CPUID format. Read the Memory Model Feature
>>   		 * Register 0 and check for VMSAv7 or PMSAv7 */
>> -		asm("mrc	p15, 0, %0, c0, c1, 4"
>> -		    : "=r" (mmfr0));
>> +		unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0);
>>   		if ((mmfr0 & 0x0000000f) >= 0x00000003 ||
>>   		    (mmfr0 & 0x000000f0) >= 0x00000030)
>>   			cpu_arch = CPU_ARCH_ARMv7;
>>
>>
>> This one looks good, doesn't it? :-)
>
> Yes, this one I like - and it probably fixes a potential latent bug
> where the compiler was free to re-order that mrc outside of the if()
> statement.
>
> Please wrap it up as a normal submission, thanks.

How exciting, my first kernel patch :-)

I'll take a closer look at other similar refactoring opportunities.

Can I send the patch inline, with "scissors and perforation" delimiter?

Do I have to base it on the latest 4.0 release candidate, or can you
rebase it as needed?

Also, you didn't like "caching" the value of read_cpuid_id() in
a local variable, but it's done in feat_v6_fixup. Do you want me
to submit a patch changing that, or is it not worth it?

Regards.
diff mbox

Patch

--- setup.c	2015-03-03 18:04:59.000000000 +0100
+++ setup.bar.c	2015-03-13 17:29:23.800668429 +0100
@@ -246,12 +246,9 @@ 
  		if (cpu_arch)
  			cpu_arch += CPU_ARCH_ARMv3;
  	} else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) {
-		unsigned int mmfr0;
-
  		/* Revised CPUID format. Read the Memory Model Feature
  		 * Register 0 and check for VMSAv7 or PMSAv7 */
-		asm("mrc	p15, 0, %0, c0, c1, 4"
-		    : "=r" (mmfr0));
+		unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0);
  		if ((mmfr0 & 0x0000000f) >= 0x00000003 ||
  		    (mmfr0 & 0x000000f0) >= 0x00000030)
  			cpu_arch = CPU_ARCH_ARMv7;