diff mbox

[v2,2/2] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

Message ID 1454092821-46795-3-git-send-email-jrtc27@jrtc27.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jessica Clarke Jan. 29, 2016, 6:40 p.m. UTC
Here is the description of the mcrfs instruction from the PowerPC Architecture
Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
(http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), found
on page 120:

    The contents of FPSCR field BFA are copied to Condition Register field BF.
    All exception bits copied are set to 0 in the FPSCR. If the FX bit is
    copied, it is set to 0 in the FPSCR.

    Special Registers Altered:
        CR field BF
        FX OX                        (if BFA=0)
        UX ZX XX VXSNAN              (if BFA=1)
        VXISI VXIDI VXZDZ VXIMZ      (if BFA=2)
        VXVC                         (if BFA=3)
        VXSOFT VXSQRT VXCVI          (if BFA=5)

However, currently every bit in FPSCR field BFA is set to 0, including ones not
on that list.

This can be seen in the following simple C program:

    #include <fenv.h>
    #include <stdio.h>

    int main(int argc, char **argv) {
        int ret;
        ret = fegetround();
        printf("Current rounding: %d\n", ret);
        ret = fesetround(FE_UPWARD);
        printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
        ret = fegetround();
        printf("Current rounding: %d\n", ret);
        ret = fegetround();
        printf("Current rounding: %d\n", ret);
        return 0;
    }

which gave the output (before this commit):

    Current rounding: 0
    Setting to FE_UPWARD (2): 0
    Current rounding: 2
    Current rounding: 0

instead of (after this commit):

    Current rounding: 0
    Setting to FE_UPWARD (2): 0
    Current rounding: 2
    Current rounding: 2

The relevant disassembly is in fegetround(), which, on my system, is:

    __GI___fegetround:
    <+0>:   mcrfs  cr7, cr7
    <+4>:   mfcr   r3
    <+8>:   clrldi r3, r3, 62
    <+12>:  blr

What happens is that, the first time fegetround() is called, FPSCR field 7 is
retrieved. However, because of the bug in mcrfs, the entirety of field 7 is set
to 0, which includes the rounding mode.

There are other issues this will fix, such as condition flags not persisting
when they should if read, and if you were to read a specific field with some
exception bits set, but no others were set in the entire register, then the
bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
should be.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
---
 target-ppc/cpu.h       |  6 ++++++
 target-ppc/translate.c | 21 +++++++++++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

Comments

David Gibson Jan. 31, 2016, 11:50 p.m. UTC | #1
On Fri, Jan 29, 2016 at 06:40:21PM +0000, James Clarke wrote:
> Here is the description of the mcrfs instruction from the PowerPC Architecture
> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), found
> on page 120:
> 
>     The contents of FPSCR field BFA are copied to Condition Register field BF.
>     All exception bits copied are set to 0 in the FPSCR. If the FX bit is
>     copied, it is set to 0 in the FPSCR.
> 
>     Special Registers Altered:
>         CR field BF
>         FX OX                        (if BFA=0)
>         UX ZX XX VXSNAN              (if BFA=1)
>         VXISI VXIDI VXZDZ VXIMZ      (if BFA=2)
>         VXVC                         (if BFA=3)
>         VXSOFT VXSQRT VXCVI          (if BFA=5)
> 
> However, currently every bit in FPSCR field BFA is set to 0, including ones not
> on that list.
> 
> This can be seen in the following simple C program:
> 
>     #include <fenv.h>
>     #include <stdio.h>
> 
>     int main(int argc, char **argv) {
>         int ret;
>         ret = fegetround();
>         printf("Current rounding: %d\n", ret);
>         ret = fesetround(FE_UPWARD);
>         printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
>         ret = fegetround();
>         printf("Current rounding: %d\n", ret);
>         ret = fegetround();
>         printf("Current rounding: %d\n", ret);
>         return 0;
>     }
> 
> which gave the output (before this commit):
> 
>     Current rounding: 0
>     Setting to FE_UPWARD (2): 0
>     Current rounding: 2
>     Current rounding: 0
> 
> instead of (after this commit):
> 
>     Current rounding: 0
>     Setting to FE_UPWARD (2): 0
>     Current rounding: 2
>     Current rounding: 2
> 
> The relevant disassembly is in fegetround(), which, on my system, is:
> 
>     __GI___fegetround:
>     <+0>:   mcrfs  cr7, cr7
>     <+4>:   mfcr   r3
>     <+8>:   clrldi r3, r3, 62
>     <+12>:  blr
> 
> What happens is that, the first time fegetround() is called, FPSCR field 7 is
> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is set
> to 0, which includes the rounding mode.
> 
> There are other issues this will fix, such as condition flags not persisting
> when they should if read, and if you were to read a specific field with some
> exception bits set, but no others were set in the entire register, then the
> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
> should be.
> 
> Signed-off-by: James Clarke <jrtc27@jrtc27.com>

Thanks for the fixup.  It actually looks like helper_store_fpscr()
should really take a target_ulong instead of u64 and have the (single)
caller which wants to pass a 64 do the truncate.  But that can be a
cleanup for another day.

Applied to ppc-for-2.6.

> ---
>  target-ppc/cpu.h       |  6 ++++++
>  target-ppc/translate.c | 21 +++++++++++++++++----
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 3a967b7..d811bc9 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -718,6 +718,12 @@ enum {
>  #define FP_RN1		(1ull << FPSCR_RN1)
>  #define FP_RN		(1ull << FPSCR_RN)
>  
> +/* the exception bits which can be cleared by mcrfs - includes FX */
> +#define FP_EX_CLEAR_BITS (FP_FX     | FP_OX     | FP_UX     | FP_ZX     | \
> +                          FP_XX     | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
> +                          FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
> +                          FP_VXSQRT | FP_VXCVI)
> +
>  /*****************************************************************************/
>  /* Vector status and control register */
>  #define VSCR_NJ		16 /* Vector non-java */
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4be7eaa..ca10bd1 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>  static void gen_mcrfs(DisasContext *ctx)
>  {
>      TCGv tmp = tcg_temp_new();
> +    TCGv_i32 tmask;
> +    TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>      int bfa;
> +    int nibble;
> +    int shift;
>  
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    bfa = 4 * (7 - crfS(ctx->opcode));
> -    tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
> +    bfa = crfS(ctx->opcode);
> +    nibble = 7 - bfa;
> +    shift = 4 * nibble;
> +    tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>      tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
> -    tcg_temp_free(tmp);
>      tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 0xf);
> -    tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
> +    tcg_temp_free(tmp);
> +    tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
> +    /* Only the exception bits (including FX) should be cleared if read */
> +    tcg_gen_andi_i64(tnew_fpscr, tnew_fpscr, ~((0xF << shift) & FP_EX_CLEAR_BITS));
> +    /* FEX and VX need to be updated, so don't set fpscr directly */
> +    tmask = tcg_const_i32(1 << nibble);
> +    gen_helper_store_fpscr(cpu_env, tnew_fpscr, tmask);
> +    tcg_temp_free_i32(tmask);
> +    tcg_temp_free_i64(tnew_fpscr);
>  }
>  
>  /* mffs */
Jessica Clarke Feb. 1, 2016, 12:04 a.m. UTC | #2
> On 31 Jan 2016, at 23:50, David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Jan 29, 2016 at 06:40:21PM +0000, James Clarke wrote:
>> Here is the description of the mcrfs instruction from the PowerPC Architecture
>> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
>> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), found
>> on page 120:
>> 
>>    The contents of FPSCR field BFA are copied to Condition Register field BF.
>>    All exception bits copied are set to 0 in the FPSCR. If the FX bit is
>>    copied, it is set to 0 in the FPSCR.
>> 
>>    Special Registers Altered:
>>        CR field BF
>>        FX OX                        (if BFA=0)
>>        UX ZX XX VXSNAN              (if BFA=1)
>>        VXISI VXIDI VXZDZ VXIMZ      (if BFA=2)
>>        VXVC                         (if BFA=3)
>>        VXSOFT VXSQRT VXCVI          (if BFA=5)
>> 
>> However, currently every bit in FPSCR field BFA is set to 0, including ones not
>> on that list.
>> 
>> This can be seen in the following simple C program:
>> 
>>    #include <fenv.h>
>>    #include <stdio.h>
>> 
>>    int main(int argc, char **argv) {
>>        int ret;
>>        ret = fegetround();
>>        printf("Current rounding: %d\n", ret);
>>        ret = fesetround(FE_UPWARD);
>>        printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
>>        ret = fegetround();
>>        printf("Current rounding: %d\n", ret);
>>        ret = fegetround();
>>        printf("Current rounding: %d\n", ret);
>>        return 0;
>>    }
>> 
>> which gave the output (before this commit):
>> 
>>    Current rounding: 0
>>    Setting to FE_UPWARD (2): 0
>>    Current rounding: 2
>>    Current rounding: 0
>> 
>> instead of (after this commit):
>> 
>>    Current rounding: 0
>>    Setting to FE_UPWARD (2): 0
>>    Current rounding: 2
>>    Current rounding: 2
>> 
>> The relevant disassembly is in fegetround(), which, on my system, is:
>> 
>>    __GI___fegetround:
>>    <+0>:   mcrfs  cr7, cr7
>>    <+4>:   mfcr   r3
>>    <+8>:   clrldi r3, r3, 62
>>    <+12>:  blr
>> 
>> What happens is that, the first time fegetround() is called, FPSCR field 7 is
>> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is set
>> to 0, which includes the rounding mode.
>> 
>> There are other issues this will fix, such as condition flags not persisting
>> when they should if read, and if you were to read a specific field with some
>> exception bits set, but no others were set in the entire register, then the
>> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
>> should be.
>> 
>> Signed-off-by: James Clarke <jrtc27@jrtc27.com>
> 
> Thanks for the fixup.  It actually looks like helper_store_fpscr()
> should really take a target_ulong instead of u64 and have the (single)
> caller which wants to pass a 64 do the truncate.  But that can be a
> cleanup for another day.
> 
> Applied to ppc-for-2.6.

Great, thanks. I agree it seems odd, especially given the argument is cast to
target_ulong, but that’s a more invasive change.

> 
>> ---
>> target-ppc/cpu.h       |  6 ++++++
>> target-ppc/translate.c | 21 +++++++++++++++++----
>> 2 files changed, 23 insertions(+), 4 deletions(-)
>> 
>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>> index 3a967b7..d811bc9 100644
>> --- a/target-ppc/cpu.h
>> +++ b/target-ppc/cpu.h
>> @@ -718,6 +718,12 @@ enum {
>> #define FP_RN1		(1ull << FPSCR_RN1)
>> #define FP_RN		(1ull << FPSCR_RN)
>> 
>> +/* the exception bits which can be cleared by mcrfs - includes FX */
>> +#define FP_EX_CLEAR_BITS (FP_FX     | FP_OX     | FP_UX     | FP_ZX     | \
>> +                          FP_XX     | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
>> +                          FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
>> +                          FP_VXSQRT | FP_VXCVI)
>> +
>> /*****************************************************************************/
>> /* Vector status and control register */
>> #define VSCR_NJ		16 /* Vector non-java */
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 4be7eaa..ca10bd1 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>> static void gen_mcrfs(DisasContext *ctx)
>> {
>>     TCGv tmp = tcg_temp_new();
>> +    TCGv_i32 tmask;
>> +    TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>>     int bfa;
>> +    int nibble;
>> +    int shift;
>> 
>>     if (unlikely(!ctx->fpu_enabled)) {
>>         gen_exception(ctx, POWERPC_EXCP_FPU);
>>         return;
>>     }
>> -    bfa = 4 * (7 - crfS(ctx->opcode));
>> -    tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
>> +    bfa = crfS(ctx->opcode);
>> +    nibble = 7 - bfa;
>> +    shift = 4 * nibble;
>> +    tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>>     tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
>> -    tcg_temp_free(tmp);
>>     tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 0xf);
>> -    tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
>> +    tcg_temp_free(tmp);
>> +    tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
>> +    /* Only the exception bits (including FX) should be cleared if read */
>> +    tcg_gen_andi_i64(tnew_fpscr, tnew_fpscr, ~((0xF << shift) & FP_EX_CLEAR_BITS));
>> +    /* FEX and VX need to be updated, so don't set fpscr directly */
>> +    tmask = tcg_const_i32(1 << nibble);
>> +    gen_helper_store_fpscr(cpu_env, tnew_fpscr, tmask);
>> +    tcg_temp_free_i32(tmask);
>> +    tcg_temp_free_i64(tnew_fpscr);
>> }
>> 
>> /* mffs */
> 
> --
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

Regards,
James
diff mbox

Patch

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 3a967b7..d811bc9 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -718,6 +718,12 @@  enum {
 #define FP_RN1		(1ull << FPSCR_RN1)
 #define FP_RN		(1ull << FPSCR_RN)
 
+/* the exception bits which can be cleared by mcrfs - includes FX */
+#define FP_EX_CLEAR_BITS (FP_FX     | FP_OX     | FP_UX     | FP_ZX     | \
+                          FP_XX     | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
+                          FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
+                          FP_VXSQRT | FP_VXCVI)
+
 /*****************************************************************************/
 /* Vector status and control register */
 #define VSCR_NJ		16 /* Vector non-java */
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 4be7eaa..ca10bd1 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2500,18 +2500,31 @@  static void gen_fmrgow(DisasContext *ctx)
 static void gen_mcrfs(DisasContext *ctx)
 {
     TCGv tmp = tcg_temp_new();
+    TCGv_i32 tmask;
+    TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
     int bfa;
+    int nibble;
+    int shift;
 
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    bfa = 4 * (7 - crfS(ctx->opcode));
-    tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
+    bfa = crfS(ctx->opcode);
+    nibble = 7 - bfa;
+    shift = 4 * nibble;
+    tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
     tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
-    tcg_temp_free(tmp);
     tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 0xf);
-    tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
+    tcg_temp_free(tmp);
+    tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
+    /* Only the exception bits (including FX) should be cleared if read */
+    tcg_gen_andi_i64(tnew_fpscr, tnew_fpscr, ~((0xF << shift) & FP_EX_CLEAR_BITS));
+    /* FEX and VX need to be updated, so don't set fpscr directly */
+    tmask = tcg_const_i32(1 << nibble);
+    gen_helper_store_fpscr(cpu_env, tnew_fpscr, tmask);
+    tcg_temp_free_i32(tmask);
+    tcg_temp_free_i64(tnew_fpscr);
 }
 
 /* mffs */