diff mbox series

[1/3] target/arm: Correct LDRD atomicity and fault behaviour

Message ID 20250227142746.1698904-2-peter.maydell@linaro.org (mailing list archive)
State New
Headers show
Series target/arm: Fix LDRD, STRD atomicity, fault behaviour | expand

Commit Message

Peter Maydell Feb. 27, 2025, 2:27 p.m. UTC
Our LDRD implementation is wrong in two respects:

 * if the address is 4-aligned and the load crosses a page boundary
   and the second load faults and the first load was to the
   base register (as in cases like "ldrd r2, r3, [r2]", then we
   must not update the base register before taking the fault
 * if the address is 8-aligned the access must be a 64-bit
   single-copy atomic access, not two 32-bit accesses

Rewrite the handling of the loads in LDRD to use a single
tcg_gen_qemu_ld_i64() and split the result into the destination
registers. This allows us to get the atomicity requirements
right, and also implicitly means that we won't update the
base register too early for the page-crossing case.

Note that because we no longer increment 'addr' by 4 in the course of
performing the LDRD we must change the adjustment value we pass to
op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
subtract 4 to get the correct value to use if doing base register
writeback.

STRD has the same problem with not getting the atomicity right;
we will deal with that in the following commit.

Cc: qemu-stable@nongnu.org
Reported-by: Stu Grossman <stu.grossman@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 24 deletions(-)

Comments

Richard Henderson Feb. 27, 2025, 5:40 p.m. UTC | #1
On 2/27/25 06:27, Peter Maydell wrote:
> Our LDRD implementation is wrong in two respects:
> 
>   * if the address is 4-aligned and the load crosses a page boundary
>     and the second load faults and the first load was to the
>     base register (as in cases like "ldrd r2, r3, [r2]", then we
>     must not update the base register before taking the fault
>   * if the address is 8-aligned the access must be a 64-bit
>     single-copy atomic access, not two 32-bit accesses
> 
> Rewrite the handling of the loads in LDRD to use a single
> tcg_gen_qemu_ld_i64() and split the result into the destination
> registers. This allows us to get the atomicity requirements
> right, and also implicitly means that we won't update the
> base register too early for the page-crossing case.
> 
> Note that because we no longer increment 'addr' by 4 in the course of
> performing the LDRD we must change the adjustment value we pass to
> op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
> subtract 4 to get the correct value to use if doing base register
> writeback.
> 
> STRD has the same problem with not getting the atomicity right;
> we will deal with that in the following commit.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Stu Grossman <stu.grossman@gmail.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++--------------
>   1 file changed, 40 insertions(+), 24 deletions(-)
> 
> diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
> index d8225b77c8c..e10a1240c17 100644
> --- a/target/arm/tcg/translate.c
> +++ b/target/arm/tcg/translate.c
> @@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
>       return true;
>   }
>   
> +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> +{
> +    /*
> +     * LDRD is required to be an atomic 64-bit access if the
> +     * address is 8-aligned, two atomic 32-bit accesses if
> +     * it's only 4-aligned, and to give an alignemnt fault
> +     * if it's not 4-aligned.
> +     * Rt is always the word from the lower address, and Rt2 the
> +     * data from the higher address, regardless of endianness.
> +     * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> +     * so we don't get its SCTLR_B check, and instead do a 64-bit access
> +     * using MO_BE if appropriate and then split the two halves.
> +     *
> +     * This also gives us the correct behaviour of not updating
> +     * rt if the load of rt2 faults; this is required for cases
> +     * like "ldrd r2, r3, [r2]" where rt is also the base register.
> +     */
> +    int mem_idx = get_mem_index(s);
> +    MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;

The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?

Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use 
MO_ATOM_IFALIGN_PAIR.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Peter Maydell Feb. 27, 2025, 5:58 p.m. UTC | #2
On Thu, 27 Feb 2025 at 17:41, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/27/25 06:27, Peter Maydell wrote:
> > +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> > +{
> > +    /*
> > +     * LDRD is required to be an atomic 64-bit access if the
> > +     * address is 8-aligned, two atomic 32-bit accesses if
> > +     * it's only 4-aligned, and to give an alignemnt fault
> > +     * if it's not 4-aligned.
> > +     * Rt is always the word from the lower address, and Rt2 the
> > +     * data from the higher address, regardless of endianness.
> > +     * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> > +     * so we don't get its SCTLR_B check, and instead do a 64-bit access
> > +     * using MO_BE if appropriate and then split the two halves.
> > +     *
> > +     * This also gives us the correct behaviour of not updating
> > +     * rt if the load of rt2 faults; this is required for cases
> > +     * like "ldrd r2, r3, [r2]" where rt is also the base register.
> > +     */
> > +    int mem_idx = get_mem_index(s);
> > +    MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
>
> The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
> Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
>
> Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
> MO_ATOM_IFALIGN_PAIR.

Definitely worth a comment at minimum. Do we generate better
code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
atomicity than the architecture mandates seems harmless.)


For the comment in memop.h that currently reads
     * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
     *    by the alignment.  E.g. if the address is 0 mod 4, then each
     *    4-byte subobject is single-copy atomic.
     *    This is the atomicity e.g. of IBM Power.

maybe we could expand the e.g:

  E.g if an 8-byte value is accessed at an address which is 0 mod 8,
  then the whole 8-byte access is single-copy atomic; otherwise,
  if it is accessed at 0 mod 4 then each 4-byte subobject is
  single-copy atomic; otherwise if it is accessed at 0 mod 2
  then the four 2-byte subobjects are single-copy atomic.

? I wasn't sure when reading what we currently have whether
it provided the 8-byte-aligned guarantee, rather than merely
the 4-byte-aligned one.

thanks
-- PMM
diff mbox series

Patch

diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index d8225b77c8c..e10a1240c17 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5003,10 +5003,43 @@  static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
     return true;
 }
 
+static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
+{
+    /*
+     * LDRD is required to be an atomic 64-bit access if the
+     * address is 8-aligned, two atomic 32-bit accesses if
+     * it's only 4-aligned, and to give an alignemnt fault
+     * if it's not 4-aligned.
+     * Rt is always the word from the lower address, and Rt2 the
+     * data from the higher address, regardless of endianness.
+     * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
+     * so we don't get its SCTLR_B check, and instead do a 64-bit access
+     * using MO_BE if appropriate and then split the two halves.
+     *
+     * This also gives us the correct behaviour of not updating
+     * rt if the load of rt2 faults; this is required for cases
+     * like "ldrd r2, r3, [r2]" where rt is also the base register.
+     */
+    int mem_idx = get_mem_index(s);
+    MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
+    TCGv taddr = gen_aa32_addr(s, addr, opc);
+    TCGv_i64 t64 = tcg_temp_new_i64();
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i32 tmp2 = tcg_temp_new_i32();
+
+    tcg_gen_qemu_ld_i64(t64, taddr, mem_idx, opc);
+    if (s->be_data == MO_BE) {
+        tcg_gen_extr_i64_i32(tmp2, tmp, t64);
+    } else {
+        tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+    }
+    store_reg(s, rt, tmp);
+    store_reg(s, rt2, tmp2);
+}
+
 static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
 {
-    int mem_idx = get_mem_index(s);
-    TCGv_i32 addr, tmp;
+    TCGv_i32 addr;
 
     if (!ENABLE_ARCH_5TE) {
         return false;
@@ -5017,18 +5050,10 @@  static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
     }
     addr = op_addr_rr_pre(s, a);
 
-    tmp = tcg_temp_new_i32();
-    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-    store_reg(s, a->rt, tmp);
-
-    tcg_gen_addi_i32(addr, addr, 4);
-
-    tmp = tcg_temp_new_i32();
-    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-    store_reg(s, a->rt + 1, tmp);
+    do_ldrd_load(s, addr, a->rt, a->rt + 1);
 
     /* LDRD w/ base writeback is undefined if the registers overlap.  */
-    op_addr_rr_post(s, a, addr, -4);
+    op_addr_rr_post(s, a, addr, 0);
     return true;
 }
 
@@ -5152,23 +5177,14 @@  static bool op_store_ri(DisasContext *s, arg_ldst_ri *a,
 
 static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
 {
-    int mem_idx = get_mem_index(s);
-    TCGv_i32 addr, tmp;
+    TCGv_i32 addr;
 
     addr = op_addr_ri_pre(s, a);
 
-    tmp = tcg_temp_new_i32();
-    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-    store_reg(s, a->rt, tmp);
-
-    tcg_gen_addi_i32(addr, addr, 4);
-
-    tmp = tcg_temp_new_i32();
-    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-    store_reg(s, rt2, tmp);
+    do_ldrd_load(s, addr, a->rt, rt2);
 
     /* LDRD w/ base writeback is undefined if the registers overlap.  */
-    op_addr_ri_post(s, a, addr, -4);
+    op_addr_ri_post(s, a, addr, 0);
     return true;
 }