Message ID | 20210904203516.2570119-4-philipp.tomsich@vrull.eu (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | target/riscv: Update QEmu for Zb[abcs] 1.0.0 | expand |
On 9/4/21 10:35 PM, Philipp Tomsich wrote: > Assume clzw being executed on a register that is not sign-extended, such > as for the following sequence that uses (1ULL << 63) | 392 as the operand > to clzw: > bseti a2, zero, 63 > addi a2, a2, 392 > clzw a3, a2 > The correct result of clzw would be 23, but the current implementation > returns -32 (as it performs a 64bit clz, which results in 0 leading zero > bits, and then subtracts 32). > > Fix this by changing the implementation to: > 1. shift the original register up by 32 > 2. performs a target-length (64bit) clz > 3. return 32 if no bits are set > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > --- > > Changes in v10: > - New patch, fixing correctnes for clzw called on a register with undefined > (as in: not properly sign-extended) upper bits. But we have return gen_unary(ctx, a, EXT_ZERO, gen_clzw); should *not* be undefined. Ah, what's missing is ctx->w = true; within trans_clzw to cause the extend to take effect. There are a few other "w" functions that are missing that set, though they use EXT_NONE so there is no visible bug, it would probably be best to set w anyway. r~
On Sun 5. Sep 2021 at 11:11, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 9/4/21 10:35 PM, Philipp Tomsich wrote: > > Assume clzw being executed on a register that is not sign-extended, such > > as for the following sequence that uses (1ULL << 63) | 392 as the operand > > to clzw: > > bseti a2, zero, 63 > > addi a2, a2, 392 > > clzw a3, a2 > > The correct result of clzw would be 23, but the current implementation > > returns -32 (as it performs a 64bit clz, which results in 0 leading zero > > bits, and then subtracts 32). > > > > Fix this by changing the implementation to: > > 1. shift the original register up by 32 > > 2. performs a target-length (64bit) clz > > 3. return 32 if no bits are set > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > > --- > > > > Changes in v10: > > - New patch, fixing correctnes for clzw called on a register with undefined > > (as in: not properly sign-extended) upper bits. > > But we have > > return gen_unary(ctx, a, EXT_ZERO, gen_clzw); > > should *not* be undefined. Ah, what's missing is > > ctx->w = true; > > within trans_clzw to cause the extend to take effect. > > There are a few other "w" functions that are missing that set, though they use EXT_NONE so > there is no visible bug, it would probably be best to set w anyway. I had originally considered that (it would have to be ctx->w = true; and EXT_SIGN), but that has the side-effect of performing an extension on the result of the clzw as well (i.e. bot get_gpr and set_gpr are extended). However, this is not what clzw does: it only ignores the upper bits and then produces a result in the value-range [0..32]. An extension on the result of either a clz or clzw is never needed (we have been fighting that problem in GCC and had to use patterns for the combiner), so I don't want to introduce this inefficiency in qemu. If you have EXT_SIGN (or _ZERO), we will end up with 2 additional extensions (one on the argument, one on the result) in addition to the 2 other tcg operations that we need (i.e. clzi/subi for the extending case, shli/clzi for the proposed fix). So I believe that this commit is the best fix: 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the input _and_ extends the output), as it only treats the input as 32bit, but the output is xlen. 2. It doesn't introduce any unnecessary extensions, but handles the case in-place. Philipp.
Richard, Did you have a chance to consider what to do with clzw? I would prefer to avoid the extra extension instructions and change the implementation (and would update the commit message to provide more context), but if you insist on setting 'ctx->w' I'll just have the extra extensions emitted than delay this series further… Thanks, Philipp. On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > On Sun 5. Sep 2021 at 11:11, Richard Henderson > <richard.henderson@linaro.org> wrote: > > > > On 9/4/21 10:35 PM, Philipp Tomsich wrote: > > > Assume clzw being executed on a register that is not sign-extended, > such > > > as for the following sequence that uses (1ULL << 63) | 392 as the > operand > > > to clzw: > > > bseti a2, zero, 63 > > > addi a2, a2, 392 > > > clzw a3, a2 > > > The correct result of clzw would be 23, but the current implementation > > > returns -32 (as it performs a 64bit clz, which results in 0 leading > zero > > > bits, and then subtracts 32). > > > > > > Fix this by changing the implementation to: > > > 1. shift the original register up by 32 > > > 2. performs a target-length (64bit) clz > > > 3. return 32 if no bits are set > > > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > > > --- > > > > > > Changes in v10: > > > - New patch, fixing correctnes for clzw called on a register with > undefined > > > (as in: not properly sign-extended) upper bits. > > > > But we have > > > > return gen_unary(ctx, a, EXT_ZERO, gen_clzw); > > > > should *not* be undefined. Ah, what's missing is > > > > ctx->w = true; > > > > within trans_clzw to cause the extend to take effect. > > > > There are a few other "w" functions that are missing that set, though > they use EXT_NONE so > > there is no visible bug, it would probably be best to set w anyway. > > > I had originally considered that (it would have to be ctx->w = true; > and EXT_SIGN), > but that has the side-effect of performing an extension on the result > of the clzw as > well (i.e. bot get_gpr and set_gpr are extended). > > However, this is not what clzw does: it only ignores the upper bits > and then produces > a result in the value-range [0..32]. An extension on the result of > either a clz or clzw > is never needed (we have been fighting that problem in GCC and had to > use patterns > for the combiner), so I don't want to introduce this inefficiency in qemu. > > If you have EXT_SIGN (or _ZERO), we will end up with 2 additional > extensions (one > on the argument, one on the result) in addition to the 2 other tcg > operations that we > need (i.e. clzi/subi for the extending case, shli/clzi for the proposed > fix). > > So I believe that this commit is the best fix: > 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the > input _and_ > extends the output), as it only treats the input as 32bit, but the > output is xlen. > 2. It doesn't introduce any unnecessary extensions, but handles the > case in-place. > > Philipp. >
On 9/10/21 3:36 PM, Philipp Tomsich wrote: > Richard, > > Did you have a chance to consider what to do with clzw? > I would prefer to avoid the extra extension instructions and change the implementation > (and would update the commit message to provide more context), but if you insist on > setting 'ctx->w' I'll just have the extra extensions emitted than delay this series further… I don't mind not setting ctx->w, but bear in mind that UXL is going to automatically set this flag when executing RV32 on RV64. That's why I have written a tcg patch set to eliminate unnecessary sign-extensions. Do swap out EXT_ZERO for EXT_NONE in the gen_unary call if you're not going to make use of any kind of source extension. r~
On Fri, 10 Sept 2021 at 16:40, Richard Henderson < richard.henderson@linaro.org> wrote: > On 9/10/21 3:36 PM, Philipp Tomsich wrote: > > Richard, > > > > Did you have a chance to consider what to do with clzw? > > I would prefer to avoid the extra extension instructions and change the > implementation > > (and would update the commit message to provide more context), but if > you insist on > > setting 'ctx->w' I'll just have the extra extensions emitted than delay > this series further… > > I don't mind not setting ctx->w, but bear in mind that UXL is going to > automatically set > this flag when executing RV32 on RV64. That's why I have written a tcg > patch set to > eliminate unnecessary sign-extensions. > Ok, thanks! Updated patches follow, once all test workloads have run… Just wondering regarding the UXL-comment: the clzw instruction will be an illegal encoding for RV32 (the w-form instructions are present on RV64 only), so it should never be encountered in a RV32 instruction stream. Did you mean that clz (the instruction operating on xlen-registers) would have ctx->w set for RV32 executing on RV64 ... or am I missing something fundamental? Philipp.
On 9/10/21 3:47 PM, Philipp Tomsich wrote: > Just wondering regarding the UXL-comment: the clzw instruction will be an illegal encoding > for RV32 (the w-form instructions are present on RV64 only), so it should never be > encountered in a RV32 instruction stream. Correct. > Did you mean that clz (the instruction operating on xlen-registers) would have ctx->w > set for RV32 executing on RV64 ... or am I missing something fundamental? Yes. Or, as some test patches I was planning to post this weekend, replacing "w" as boolean with an "operation length" (ol) as an enum of MXL_RV*, so that we can represent "d" operations on RV128 with the same mechanism. r~
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc index 6c85c89f6d..8d29cadd20 100644 --- a/target/riscv/insn_trans/trans_rvb.c.inc +++ b/target/riscv/insn_trans/trans_rvb.c.inc @@ -349,8 +349,10 @@ GEN_TRANS_SHADD(3) static void gen_clzw(TCGv ret, TCGv arg1) { - tcg_gen_clzi_tl(ret, arg1, 64); - tcg_gen_subi_tl(ret, ret, 32); + TCGv t = tcg_temp_new(); + tcg_gen_shli_tl(t, arg1, 32); + tcg_gen_clzi_tl(ret, t, 32); + tcg_temp_free(t); } static bool trans_clzw(DisasContext *ctx, arg_clzw *a)
Assume clzw being executed on a register that is not sign-extended, such as for the following sequence that uses (1ULL << 63) | 392 as the operand to clzw: bseti a2, zero, 63 addi a2, a2, 392 clzw a3, a2 The correct result of clzw would be 23, but the current implementation returns -32 (as it performs a 64bit clz, which results in 0 leading zero bits, and then subtracts 32). Fix this by changing the implementation to: 1. shift the original register up by 32 2. performs a target-length (64bit) clz 3. return 32 if no bits are set Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> --- Changes in v10: - New patch, fixing correctnes for clzw called on a register with undefined (as in: not properly sign-extended) upper bits. target/riscv/insn_trans/trans_rvb.c.inc | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)