diff mbox

[v1,09/10] target-ppc: add vextu[bhw]lx instructions

Message ID 1479901039-7113-10-git-send-email-nikunj@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nikunj A. Dadhania Nov. 23, 2016, 11:37 a.m. UTC
From: Avinesh Kumar <avinesku@linux.vnet.ibm.com>

vextublx:  Vector Extract Unsigned Byte Left
vextuhlx:  Vector Extract Unsigned Halfword Left
vextuwlx:  Vector Extract Unsigned Word Left

Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |  3 ++
 target-ppc/int_helper.c             | 65 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-impl.inc.c | 18 ++++++++++
 target-ppc/translate/vmx-ops.inc.c  |  4 ++-
 4 files changed, 89 insertions(+), 1 deletion(-)

Comments

David Gibson Nov. 24, 2016, 1:02 a.m. UTC | #1
On Wed, Nov 23, 2016 at 05:07:18PM +0530, Nikunj A Dadhania wrote:
> From: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
> 
> vextublx:  Vector Extract Unsigned Byte Left
> vextuhlx:  Vector Extract Unsigned Halfword Left
> vextuwlx:  Vector Extract Unsigned Word Left
> 
> Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

So, when I suggested doing these without helpers before, I had
forgotten that the non-byte versions can straddle the word boundary.
Given that the offset is in a register, not the instruction that does
make it complicated.

But, this version also relies on working 128-bit arithmetic, AFAICT
this will just fail to build if CONFIG_INT128 isn't defined.  It
really shouldn't be that hard to make a helper that works just in
terms of 64-bit arithmetic - there are only 3 cases (all in the upper
word, all in the lower, and straddling).  I'd prefer to see it done
that way, rather than increasing reliance on CONFIG_INT128.
Nikunj A. Dadhania Nov. 24, 2016, 5:53 a.m. UTC | #2
David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Wed, Nov 23, 2016 at 05:07:18PM +0530, Nikunj A Dadhania wrote:
>> From: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>> 
>> vextublx:  Vector Extract Unsigned Byte Left
>> vextuhlx:  Vector Extract Unsigned Halfword Left
>> vextuwlx:  Vector Extract Unsigned Word Left
>> 
>> Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>
> So, when I suggested doing these without helpers before, I had
> forgotten that the non-byte versions can straddle the word boundary.
> Given that the offset is in a register, not the instruction that does
> make it complicated.
>
> But, this version also relies on working 128-bit arithmetic, AFAICT
> this will just fail to build if CONFIG_INT128 isn't defined.

It has both the implementation, just that the defines might have
confused you:

#if defined(HOST_WORDS_BIGENDIAN)

#  if defined(CONFIG_INT128)
#  else
#  endif

#else /* !defined (HOST_WORDS_BIGENDIAN) */

#  if defined(CONFIG_INT128)
#  else
#  endif

#endif 

> It really shouldn't be that hard to make a helper that works just in
> terms of 64-bit arithmetic - there are only 3 cases (all in the upper
> word, all in the lower, and straddling).

Currently, its being done using byte array.

 +{                                                               \
 +    target_ulong r = 0;                                         \
 +    int i;                                                      \
 +    int index = a & 0xf;                                        \
 +    for (i = 0; i < elem; i++) {                                \
 +        r = r << 8;                                             \
 +        if (index + i <= 15) {                                  \
 +            r = r | b->u8[index + i];                           \
 +        }                                                       \
 +    }                                                           \
 +    return r;                                                   \
 +}

> I'd prefer to see it done that way, rather than increasing reliance on
> CONFIG_INT128.

Regards
Nikunj
Richard Henderson Nov. 24, 2016, 8:14 a.m. UTC | #3
On 11/24/2016 06:53 AM, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
>
>> [ Unknown signature status ]
>> On Wed, Nov 23, 2016 at 05:07:18PM +0530, Nikunj A Dadhania wrote:
>>> From: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>>>
>>> vextublx:  Vector Extract Unsigned Byte Left
>>> vextuhlx:  Vector Extract Unsigned Halfword Left
>>> vextuwlx:  Vector Extract Unsigned Word Left
>>>
>>> Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>
>> So, when I suggested doing these without helpers before, I had
>> forgotten that the non-byte versions can straddle the word boundary.
>> Given that the offset is in a register, not the instruction that does
>> make it complicated.
>>
>> But, this version also relies on working 128-bit arithmetic, AFAICT
>> this will just fail to build if CONFIG_INT128 isn't defined.
>
> It has both the implementation, just that the defines might have
> confused you:
>
> #if defined(HOST_WORDS_BIGENDIAN)
>
> #  if defined(CONFIG_INT128)
> #  else
> #  endif
>
> #else /* !defined (HOST_WORDS_BIGENDIAN) */
>
> #  if defined(CONFIG_INT128)
> #  else
> #  endif
>
> #endif

In include/qemu/int128.h, we do have int128_rshift.  So you don't *really* have 
to do this by hand, exactly.


r~
Nikunj A. Dadhania Nov. 24, 2016, 8:22 a.m. UTC | #4
Richard Henderson <rth@twiddle.net> writes:

> On 11/24/2016 06:53 AM, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>>
>>> [ Unknown signature status ]
>>> On Wed, Nov 23, 2016 at 05:07:18PM +0530, Nikunj A Dadhania wrote:
>>>> From: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>>>>
>>>> vextublx:  Vector Extract Unsigned Byte Left
>>>> vextuhlx:  Vector Extract Unsigned Halfword Left
>>>> vextuwlx:  Vector Extract Unsigned Word Left
>>>>
>>>> Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
>>>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>>
>>> So, when I suggested doing these without helpers before, I had
>>> forgotten that the non-byte versions can straddle the word boundary.
>>> Given that the offset is in a register, not the instruction that does
>>> make it complicated.
>>>
>>> But, this version also relies on working 128-bit arithmetic, AFAICT
>>> this will just fail to build if CONFIG_INT128 isn't defined.
>>
>> It has both the implementation, just that the defines might have
>> confused you:
>>
>> #if defined(HOST_WORDS_BIGENDIAN)
>>
>> #  if defined(CONFIG_INT128)
>> #  else
>> #  endif
>>
>> #else /* !defined (HOST_WORDS_BIGENDIAN) */
>>
>> #  if defined(CONFIG_INT128)
>> #  else
>> #  endif
>>
>> #endif
>
> In include/qemu/int128.h, we do have int128_rshift.  So you don't *really* have 
> to do this by hand, exactly.

Sure, let me add int128_extract as well. Will be helpful.

Regards
Nikunj
diff mbox

Patch

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 3b26678..d0a8fb2 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -366,6 +366,9 @@  DEF_HELPER_3(vpmsumb, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumh, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumw, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumd, void, avr, avr, avr)
+DEF_HELPER_2(vextublx, tl, tl, avr)
+DEF_HELPER_2(vextuhlx, tl, tl, avr)
+DEF_HELPER_2(vextuwlx, tl, tl, avr)
 
 DEF_HELPER_2(vsbox, void, avr, avr)
 DEF_HELPER_3(vcipher, void, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index fbf477f..ce6cff1 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1805,6 +1805,71 @@  void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
     }
 }
 
+#ifdef CONFIG_INT128
+#define EXTRACT128(value, start, length)                      \
+    ((value >> start) & (~(__uint128_t)0 >> (128 - length)))
+#endif
+
+#if defined(HOST_WORDS_BIGENDIAN)
+#  if defined(CONFIG_INT128)
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int index = (a & 0xf) * 8;                                  \
+    r = EXTRACT128(b->u128, index, elem * 8);                   \
+    return r;                                                   \
+}
+#  else
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int i;                                                      \
+    int index = a & 0xf;                                        \
+    for (i = 0; i < elem; i++) {                                \
+        r = r << 8;                                             \
+        if (index + i <= 15) {                                  \
+            r = r | b->u8[index + i];                           \
+        }                                                       \
+    }                                                           \
+    return r;                                                   \
+}
+#  endif
+#else
+#  if defined(CONFIG_INT128)
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int size =  elem * 8;                                       \
+    int index = (15 - (a & 0xf) + 1) * 8;                       \
+    r = EXTRACT128(b->u128, (index - size), size);              \
+    return r;                                                   \
+}
+#  else
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int i;                                                      \
+    int index = 15 - (a & 0xf);                                 \
+    for (i = 0; i < elem; i++) {                                \
+        r = r << 8;                                             \
+        if (index - i >= 0) {                                   \
+            r = r | b->u8[index - i];                           \
+        }                                                       \
+    }                                                           \
+    return r;                                                   \
+}
+#  endif
+#endif
+
+VEXTULX_DO(vextublx, 1)
+VEXTULX_DO(vextuhlx, 2)
+VEXTULX_DO(vextuwlx, 4)
+#undef VEXTULX_DO
+
 /* The specification says that the results are undefined if all of the
  * shift counts are not identical.  We check to make sure that they are
  * to conform to what real hardware appears to do.  */
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index 7143eb3..e91d10b 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -340,6 +340,19 @@  static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     }                                                                   \
 }
 
+#define GEN_VXFORM_HETRO(name, opc2, opc3)                              \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+{                                                                       \
+    TCGv_ptr rb;                                                        \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
+    rb = gen_avr_ptr(rB(ctx->opcode));                                  \
+    gen_helper_##name(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)], rb); \
+    tcg_temp_free_ptr(rb);                                              \
+}
+
 GEN_VXFORM(vaddubm, 0, 0);
 GEN_VXFORM_DUAL_EXT(vaddubm, PPC_ALTIVEC, PPC_NONE, 0,       \
                     vmul10cuq, PPC_NONE, PPC2_ISA300, 0x0000F800)
@@ -525,6 +538,11 @@  GEN_VXFORM_ENV(vaddfp, 5, 0);
 GEN_VXFORM_ENV(vsubfp, 5, 1);
 GEN_VXFORM_ENV(vmaxfp, 5, 16);
 GEN_VXFORM_ENV(vminfp, 5, 17);
+GEN_VXFORM_HETRO(vextublx, 6, 24)
+GEN_VXFORM_HETRO(vextuhlx, 6, 25)
+GEN_VXFORM_HETRO(vextuwlx, 6, 26)
+GEN_VXFORM_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207,
+                vextuwlx, PPC_NONE, PPC2_ISA300)
 
 #define GEN_VXRFORM1(opname, name, str, opc2, opc3)                     \
 static void glue(gen_, name)(DisasContext *ctx)                         \
diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c
index f02b3be..e62e564 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -91,8 +91,10 @@  GEN_VXFORM(vmrghw, 6, 2),
 GEN_VXFORM(vmrglb, 6, 4),
 GEN_VXFORM(vmrglh, 6, 5),
 GEN_VXFORM(vmrglw, 6, 6),
+GEN_VXFORM_300(vextublx, 6, 24),
+GEN_VXFORM_300(vextuhlx, 6, 25),
+GEN_VXFORM_DUAL(vmrgow, vextuwlx, 6, 26, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_207(vmrgew, 6, 30),
-GEN_VXFORM_207(vmrgow, 6, 26),
 GEN_VXFORM(vmuloub, 4, 0),
 GEN_VXFORM(vmulouh, 4, 1),
 GEN_VXFORM_DUAL(vmulouw, vmuluwm, 4, 2, PPC_ALTIVEC, PPC_NONE),