Message ID | 20250227142746.1698904-2-peter.maydell@linaro.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | target/arm: Fix LDRD, STRD atomicity, fault behaviour | expand |
On 2/27/25 06:27, Peter Maydell wrote: > Our LDRD implementation is wrong in two respects: > > * if the address is 4-aligned and the load crosses a page boundary > and the second load faults and the first load was to the > base register (as in cases like "ldrd r2, r3, [r2]", then we > must not update the base register before taking the fault > * if the address is 8-aligned the access must be a 64-bit > single-copy atomic access, not two 32-bit accesses > > Rewrite the handling of the loads in LDRD to use a single > tcg_gen_qemu_ld_i64() and split the result into the destination > registers. This allows us to get the atomicity requirements > right, and also implicitly means that we won't update the > base register too early for the page-crossing case. > > Note that because we no longer increment 'addr' by 4 in the course of > performing the LDRD we must change the adjustment value we pass to > op_addr_ri_post() and op_addr_rr_post(): it no longer needs to > subtract 4 to get the correct value to use if doing base register > writeback. > > STRD has the same problem with not getting the atomicity right; > we will deal with that in the following commit. > > Cc: qemu-stable@nongnu.org > Reported-by: Stu Grossman <stu.grossman@gmail.com> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++-------------- > 1 file changed, 40 insertions(+), 24 deletions(-) > > diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c > index d8225b77c8c..e10a1240c17 100644 > --- a/target/arm/tcg/translate.c > +++ b/target/arm/tcg/translate.c > @@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a, > return true; > } > > +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2) > +{ > + /* > + * LDRD is required to be an atomic 64-bit access if the > + * address is 8-aligned, two atomic 32-bit accesses if > + * it's only 4-aligned, and to give an alignemnt fault > + * if it's not 4-aligned. > + * Rt is always the word from the lower address, and Rt2 the > + * data from the higher address, regardless of endianness. > + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64() > + * so we don't get its SCTLR_B check, and instead do a 64-bit access > + * using MO_BE if appropriate and then split the two halves. > + * > + * This also gives us the correct behaviour of not updating > + * rt if the load of rt2 faults; this is required for cases > + * like "ldrd r2, r3, [r2]" where rt is also the base register. > + */ > + int mem_idx = get_mem_index(s); > + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data; The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile. Worth checking ARM_FEATURE_LPAE, or at least adding to the comment? Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use MO_ATOM_IFALIGN_PAIR. Otherwise, Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On Thu, 27 Feb 2025 at 17:41, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 2/27/25 06:27, Peter Maydell wrote: > > +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2) > > +{ > > + /* > > + * LDRD is required to be an atomic 64-bit access if the > > + * address is 8-aligned, two atomic 32-bit accesses if > > + * it's only 4-aligned, and to give an alignemnt fault > > + * if it's not 4-aligned. > > + * Rt is always the word from the lower address, and Rt2 the > > + * data from the higher address, regardless of endianness. > > + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64() > > + * so we don't get its SCTLR_B check, and instead do a 64-bit access > > + * using MO_BE if appropriate and then split the two halves. > > + * > > + * This also gives us the correct behaviour of not updating > > + * rt if the load of rt2 faults; this is required for cases > > + * like "ldrd r2, r3, [r2]" where rt is also the base register. > > + */ > > + int mem_idx = get_mem_index(s); > > + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data; > > The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile. > Worth checking ARM_FEATURE_LPAE, or at least adding to the comment? > > Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use > MO_ATOM_IFALIGN_PAIR. Definitely worth a comment at minimum. Do we generate better code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher atomicity than the architecture mandates seems harmless.) For the comment in memop.h that currently reads * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts * by the alignment. E.g. if the address is 0 mod 4, then each * 4-byte subobject is single-copy atomic. * This is the atomicity e.g. of IBM Power. maybe we could expand the e.g: E.g if an 8-byte value is accessed at an address which is 0 mod 8, then the whole 8-byte access is single-copy atomic; otherwise, if it is accessed at 0 mod 4 then each 4-byte subobject is single-copy atomic; otherwise if it is accessed at 0 mod 2 then the four 2-byte subobjects are single-copy atomic. ? I wasn't sure when reading what we currently have whether it provided the 8-byte-aligned guarantee, rather than merely the 4-byte-aligned one. thanks -- PMM
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c index d8225b77c8c..e10a1240c17 100644 --- a/target/arm/tcg/translate.c +++ b/target/arm/tcg/translate.c @@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a, return true; } +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2) +{ + /* + * LDRD is required to be an atomic 64-bit access if the + * address is 8-aligned, two atomic 32-bit accesses if + * it's only 4-aligned, and to give an alignemnt fault + * if it's not 4-aligned. + * Rt is always the word from the lower address, and Rt2 the + * data from the higher address, regardless of endianness. + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64() + * so we don't get its SCTLR_B check, and instead do a 64-bit access + * using MO_BE if appropriate and then split the two halves. + * + * This also gives us the correct behaviour of not updating + * rt if the load of rt2 faults; this is required for cases + * like "ldrd r2, r3, [r2]" where rt is also the base register. + */ + int mem_idx = get_mem_index(s); + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data; + TCGv taddr = gen_aa32_addr(s, addr, opc); + TCGv_i64 t64 = tcg_temp_new_i64(); + TCGv_i32 tmp = tcg_temp_new_i32(); + TCGv_i32 tmp2 = tcg_temp_new_i32(); + + tcg_gen_qemu_ld_i64(t64, taddr, mem_idx, opc); + if (s->be_data == MO_BE) { + tcg_gen_extr_i64_i32(tmp2, tmp, t64); + } else { + tcg_gen_extr_i64_i32(tmp, tmp2, t64); + } + store_reg(s, rt, tmp); + store_reg(s, rt2, tmp2); +} + static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a) { - int mem_idx = get_mem_index(s); - TCGv_i32 addr, tmp; + TCGv_i32 addr; if (!ENABLE_ARCH_5TE) { return false; @@ -5017,18 +5050,10 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a) } addr = op_addr_rr_pre(s, a); - tmp = tcg_temp_new_i32(); - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); - store_reg(s, a->rt, tmp); - - tcg_gen_addi_i32(addr, addr, 4); - - tmp = tcg_temp_new_i32(); - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); - store_reg(s, a->rt + 1, tmp); + do_ldrd_load(s, addr, a->rt, a->rt + 1); /* LDRD w/ base writeback is undefined if the registers overlap. */ - op_addr_rr_post(s, a, addr, -4); + op_addr_rr_post(s, a, addr, 0); return true; } @@ -5152,23 +5177,14 @@ static bool op_store_ri(DisasContext *s, arg_ldst_ri *a, static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2) { - int mem_idx = get_mem_index(s); - TCGv_i32 addr, tmp; + TCGv_i32 addr; addr = op_addr_ri_pre(s, a); - tmp = tcg_temp_new_i32(); - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); - store_reg(s, a->rt, tmp); - - tcg_gen_addi_i32(addr, addr, 4); - - tmp = tcg_temp_new_i32(); - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); - store_reg(s, rt2, tmp); + do_ldrd_load(s, addr, a->rt, rt2); /* LDRD w/ base writeback is undefined if the registers overlap. */ - op_addr_ri_post(s, a, addr, -4); + op_addr_ri_post(s, a, addr, 0); return true; }
Our LDRD implementation is wrong in two respects: * if the address is 4-aligned and the load crosses a page boundary and the second load faults and the first load was to the base register (as in cases like "ldrd r2, r3, [r2]", then we must not update the base register before taking the fault * if the address is 8-aligned the access must be a 64-bit single-copy atomic access, not two 32-bit accesses Rewrite the handling of the loads in LDRD to use a single tcg_gen_qemu_ld_i64() and split the result into the destination registers. This allows us to get the atomicity requirements right, and also implicitly means that we won't update the base register too early for the page-crossing case. Note that because we no longer increment 'addr' by 4 in the course of performing the LDRD we must change the adjustment value we pass to op_addr_ri_post() and op_addr_rr_post(): it no longer needs to subtract 4 to get the correct value to use if doing base register writeback. STRD has the same problem with not getting the atomicity right; we will deal with that in the following commit. Cc: qemu-stable@nongnu.org Reported-by: Stu Grossman <stu.grossman@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 24 deletions(-)