[38/67] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit

Message ID	20241201150607.12812-39-richard.henderson@linaro.org (mailing list archive)
State	New
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: qemu-arm@nongnu.org Subject: [PATCH 38/67] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit Date: Sun, 1 Dec 2024 09:05:37 -0600 Message-ID: <20241201150607.12812-39-richard.henderson@linaro.org> In-Reply-To: <20241201150607.12812-1-richard.henderson@linaro.org> References: <20241201150607.12812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::236; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x236.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	target/arm: AArch64 decodetree conversion, final part \| expand [00/67] target/arm: AArch64 decodetree conversion, final part [01/67] target/arm: Use ### to separate 3rd-level sections in a64.decode [02/67] target/arm: Convert UDIV, SDIV to decodetree [03/67] target/arm: Convert LSLV, LSRV, ASRV, RORV to decodetree [04/67] target/arm: Convert CRC32, CRC32C to decodetree [05/67] target/arm: Convert SUBP, IRG, GMI to decodetree [06/67] target/arm: Convert PACGA to decodetree [07/67] target/arm: Convert RBIT, REV16, REV32, REV64 to decodetree [08/67] target/arm: Convert CLZ, CLS to decodetree [09/67] target/arm: Convert PAC[ID], AUT[ID] to decodetree [10/67] target/arm: Convert XPAC[ID] to decodetree [11/67] target/arm: Convert disas_logic_reg to decodetree [12/67] target/arm: Convert disas_add_sub_ext_reg to decodetree [13/67] target/arm: Convert disas_add_sub_reg to decodetree [14/67] target/arm: Convert disas_data_proc_3src to decodetree [15/67] target/arm: Convert disas_adc_sbc to decodetree [16/67] target/arm: Convert RMIF to decodetree [17/67] target/arm: Convert SETF8, SETF16 to decodetree [18/67] target/arm: Convert CCMP, CCMN to decodetree [19/67] target/arm: Convert disas_cond_select to decodetree [20/67] target/arm: Introduce fp_access_check_scalar_hsd [21/67] target/arm: Introduce fp_access_check_vector_hsd [22/67] target/arm: Convert FCMP, FCMPE, FCCMP, FCCMPE to decodetree [23/67] target/arm: Convert FMOV, FABS, FNEG (scalar) to decodetree [24/67] target/arm: Pass fpstatus to vfp_sqrt* [25/67] target/arm: Remove helper_sqrt_f16 [26/67] target/arm: Convert FSQRT (scalar) to decodetree [27/67] target/arm: Convert FRINT[NPMSAXI] (scalar) to decodetree [28/67] target/arm: Convert BFCVT to decodetree [29/67] target/arm: Convert FRINT{32, 64}[ZX] (scalar) to decodetree [30/67] target/arm: Convert FCVT (scalar) to decodetree [31/67] target/arm: Convert handle_fpfpcvt to decodetree [32/67] target/arm: Convert FJCVTZS to decodetree [33/67] target/arm: Convert handle_fmov to decodetree [34/67] target/arm: Convert SQABS, SQNEG to decodetree [35/67] target/arm: Convert ABS, NEG to decodetree [36/67] target/arm: Introduce gen_gvec_cls, gen_gvec_clz [37/67] target/arm: Convert CLS, CLZ (vector) to decodetree [38/67] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit [39/67] target/arm: Convert CNT, NOT, RBIT (vector) to decodetree [40/67] target/arm: Convert CMGT, CMGE, GMLT, GMLE, CMEQ (zero) to decodetree [41/67] target/arm: Introduce gen_gvec_rev{16,32,64} [42/67] target/arm: Convert handle_rev to decodetree [43/67] target/arm: Move helper_neon_addlp_{s8, s16} to neon_helper.c [44/67] target/arm: Introduce gen_gvec_{s,u}{add,ada}lp [45/67] target/arm: Convert handle_2misc_pairwise to decodetree [46/67] target/arm: Remove helper_neon_{add,sub}l_u{16,32} [47/67] target/arm: Introduce clear_vec [48/67] target/arm: Convert XTN, SQXTUN, SQXTN, UQXTN to decodetree [49/67] target/arm: Convert FCVTN, BFCVTN to decodetree [50/67] target/arm: Convert FCVTXN to decodetree [51/67] target/arm: Convert SHLL to decodetree [52/67] target/arm: Convert FABS, FNEG (vector) to decodetree [53/67] target/arm: Convert FSQRT (vector) to decodetree [54/67] target/arm: Convert FRINT* (vector) to decodetree [55/67] target/arm: Convert FCVT* (vector, integer) scalar to decodetree [56/67] target/arm: Convert FCVT* (vector, fixed-point) scalar to decodetree [57/67] target/arm: Convert [US]CVTF (vector, integer) scalar to decodetree [58/67] target/arm: Convert [US]CVTF (vector, fixed-point) scalar to decodetree [59/67] target/arm: Rename helper_gvec_vcvt_[hf][su] with _rz [60/67] target/arm: Convert [US]CVTF (vector) to decodetree [61/67] target/arm: Convert FCVTZ[SU] (vector, fixed-point) to decodetree [62/67] target/arm: Convert FCVT* (vector, integer) to decodetree [63/67] target/arm: Convert handle_2misc_fcmp_zero to decodetree [64/67] target/arm: Convert FRECPE, FRECPX, FRSQRTE to decodetree [65/67] target/arm: Introduce gen_gvec_urecpe, gen_gvec_ursqrte [66/67] target/arm: Convert URECPE and URSQRTE to decodetree [67/67] target/arm: Convert FCVTL to decodetree

Message ID

20241201150607.12812-39-richard.henderson@linaro.org (mailing list archive)

State

New

Headers

From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org
Subject: [PATCH 38/67] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit
Date: Sun,  1 Dec 2024 09:05:37 -0600
Message-ID: <20241201150607.12812-39-richard.henderson@linaro.org>
In-Reply-To: <20241201150607.12812-1-richard.henderson@linaro.org>
References: <20241201150607.12812-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::236;
 envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x236.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

Series

target/arm: AArch64 decodetree conversion, final part | expand

Commit Message

Richard Henderson Dec. 1, 2024, 3:05 p.m. UTC

Add gvec interfaces for CNT and RBIT operations.
Use ctpop8 for CNT and revbit+bswap for RBIT.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h             |  4 ++--
 target/arm/tcg/translate.h      |  4 ++++
 target/arm/tcg/gengvec.c        | 16 ++++++++++++++++
 target/arm/tcg/neon_helper.c    | 21 ---------------------
 target/arm/tcg/translate-a64.c  | 32 +++++++++-----------------------
 target/arm/tcg/translate-neon.c | 16 ++++++++--------
 target/arm/tcg/vec_helper.c     | 24 ++++++++++++++++++++++++
 7 files changed, 63 insertions(+), 54 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 0a697e752b..167e331a83 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -363,8 +363,8 @@  DEF_HELPER_1(neon_clz_u16, i32, i32)
 DEF_HELPER_1(neon_cls_s8, i32, i32)
 DEF_HELPER_1(neon_cls_s16, i32, i32)
 DEF_HELPER_1(neon_cls_s32, i32, i32)
-DEF_HELPER_1(neon_cnt_u8, i32, i32)
-DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_3(gvec_cnt_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rbit_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 5c6c24f057..cb8e1b2586 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -582,6 +582,10 @@  void gen_gvec_cls(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
                   uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
                   uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                  uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                   uint32_t opr_sz, uint32_t max_sz);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 834b2961c0..85a0b50496 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2393,3 +2393,19 @@  void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
     assert(vece <= MO_32);
     tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]);
 }
+
+void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                  uint32_t opr_sz, uint32_t max_sz)
+{
+    assert(vece == MO_8);
+    tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0,
+                       gen_helper_gvec_cnt_b);
+}
+
+void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                  uint32_t opr_sz, uint32_t max_sz)
+{
+    assert(vece == MO_8);
+    tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0,
+                       gen_helper_gvec_rbit_b);
+}
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 93b2076c64..4e501925de 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -525,27 +525,6 @@  uint32_t HELPER(neon_cls_s32)(uint32_t x)
     return count - 1;
 }
 
-/* Bit count.  */
-uint32_t HELPER(neon_cnt_u8)(uint32_t x)
-{
-    x = (x & 0x55555555) + ((x >>  1) & 0x55555555);
-    x = (x & 0x33333333) + ((x >>  2) & 0x33333333);
-    x = (x & 0x0f0f0f0f) + ((x >>  4) & 0x0f0f0f0f);
-    return x;
-}
-
-/* Reverse bits in each 8 bit word */
-uint32_t HELPER(neon_rbit_u8)(uint32_t x)
-{
-    x =  ((x & 0xf0f0f0f0) >> 4)
-       | ((x & 0x0f0f0f0f) << 4);
-    x =  ((x & 0x88888888) >> 3)
-       | ((x & 0x44444444) >> 1)
-       | ((x & 0x22222222) << 1)
-       | ((x & 0x11111111) << 3);
-    return x;
-}
-
 #define NEON_QDMULH16(dest, src1, src2, round) do { \
     uint32_t tmp = (int32_t)(int16_t) src1 * (int16_t) src2; \
     if ((tmp ^ (tmp << 1)) & SIGNBIT) { \
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 312bf48020..f96b29e5a9 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10328,12 +10328,15 @@  static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     }
 
     switch (opcode) {
-    case 0x5:
-        if (u && size == 0) { /* NOT */
+    case 0x5: /* CNT, NOT, RBIT */
+        if (!u) {
+            gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cnt, 0);
+        } else if (size) {
+            gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_rbit, 0);
+        } else {
             gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_not, 0);
-            return;
         }
-        break;
+        return;
     case 0x8: /* CMGT, CMGE */
         if (u) {
             gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cge0, size);
@@ -10378,13 +10381,14 @@  static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     } else {
         int pass;
 
+        assert(size == 2);
         for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
             TCGv_i32 tcg_op = tcg_temp_new_i32();
             TCGv_i32 tcg_res = tcg_temp_new_i32();
 
             read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
 
-            if (size == 2) {
+            {
                 /* Special cases for 32 bit elements */
                 switch (opcode) {
                 case 0x2f: /* FABS */
@@ -10438,25 +10442,7 @@  static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                 case 0x7: /* SQABS, SQNEG */
                     g_assert_not_reached();
                 }
-            } else {
-                /* Use helpers for 8 and 16 bit elements */
-                switch (opcode) {
-                case 0x5: /* CNT, RBIT */
-                    /* For these two insns size is part of the opcode specifier
-                     * (handled earlier); they always operate on byte elements.
-                     */
-                    if (u) {
-                        gen_helper_neon_rbit_u8(tcg_res, tcg_op);
-                    } else {
-                        gen_helper_neon_cnt_u8(tcg_res, tcg_op);
-                    }
-                    break;
-                default:
-                case 0x7: /* SQABS, SQNEG */
-                    g_assert_not_reached();
-                }
             }
-
             write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
         }
     }
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 1c89a53272..50d0bf7753 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -3131,6 +3131,14 @@  static bool trans_VMVN(DisasContext *s, arg_2misc *a)
     return do_2misc_vec(s, a, tcg_gen_gvec_not);
 }
 
+static bool trans_VCNT(DisasContext *s, arg_2misc *a)
+{
+    if (a->size != 0) {
+        return false;
+    }
+    return do_2misc_vec(s, a, gen_gvec_cnt);
+}
+
 #define WRAP_2M_3_OOL_FN(WRAPNAME, FUNC, DATA)                          \
     static void WRAPNAME(unsigned vece, uint32_t rd_ofs,                \
                          uint32_t rm_ofs, uint32_t oprsz,               \
@@ -3229,14 +3237,6 @@  static bool trans_VREV16(DisasContext *s, arg_2misc *a)
     return do_2misc(s, a, gen_rev16);
 }
 
-static bool trans_VCNT(DisasContext *s, arg_2misc *a)
-{
-    if (a->size != 0) {
-        return false;
-    }
-    return do_2misc(s, a, gen_helper_neon_cnt_u8);
-}
-
 static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
                        uint32_t oprsz, uint32_t maxsz)
 {
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index e825d501a2..60381258cf 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -3072,3 +3072,27 @@  DO_CLAMP(gvec_uclamp_b, uint8_t)
 DO_CLAMP(gvec_uclamp_h, uint16_t)
 DO_CLAMP(gvec_uclamp_s, uint32_t)
 DO_CLAMP(gvec_uclamp_d, uint64_t)
+
+/* Bit count in each 8-bit word. */
+void HELPER(gvec_cnt_b)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; ++i) {
+        d[i] = ctpop8(n[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+/* Reverse bits in each 8 bit word */
+void HELPER(gvec_rbit_b)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint64_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz / 8; ++i) {
+        d[i] = revbit64(bswap64(n[i]));
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}

[38/67] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit

Commit Message

Patch