Message ID | 20201122152731.10994-2-luc.vanoostenryck@gmail.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | simplify logical negation | expand |
I thought we already canonicalized the pseudo ordering, but clearly not.. Anyway., looks good to me. Btw, if you want to do another simplification, one that I've actually seen multiple times in the kernel is this one: if (val1 & BITx) val2 |= BITy; and turning it into val2 |= (val1 & BITx) .. shift left or right by (BITx-BITy); and while actually testing the above, I note that sparse seems to have problems with even simple if-conversion: #define BIT1 4 #define BIT2 8 int fn(int x, int y) { if (x & BIT1) y |= BIT2; return y; } linearizes to a nasty mess of actual phi nodes and conditional jumps rather than just a 'select' op. Never mind the actual unconditional version, of course. I didn't check why the if-conversion doesn't happen. Linus
On Sun, Nov 22, 2020 at 11:00:26AM -0800, Linus Torvalds wrote: > I thought we already canonicalized the pseudo ordering, but clearly not.. > > Anyway., looks good to me. > > Btw, if you want to do another simplification, one that I've actually > seen multiple times in the kernel is this one: > > if (val1 & BITx) > val2 |= BITy; > > and turning it into > > val2 |= (val1 & BITx) .. shift left or right by (BITx-BITy); Mmmm, yes, interesting. I'll look at this but ... > and while actually testing the above, I note that sparse seems to have > problems with even simple if-conversion: > > #define BIT1 4 > #define BIT2 8 > > int fn(int x, int y) > { > if (x & BIT1) > y |= BIT2; > return y; > } > > linearizes to a nasty mess of actual phi nodes and conditional jumps > rather than just a 'select' op. Never mind the actual unconditional > version, of course. > > I didn't check why the if-conversion doesn't happen. Yes, I know. I'm currently working on it the last days but it's nasty. Very often, a (good) change blocks another one and I'm often bitten by the CFG changing and thus invalidating things like dominance relation- ships. In the case here with your example, the if-conversion doesn't happen because the phi-sources is not defined in the top block because of the OR: fn: and.32 %r2 <- %arg1, $4 phisrc.32 %phi2(y) <- %arg2 cbr %r2, .L1, .L2 .L1: or.32 %r5 <- %arg2, $8 phisrc.32 %phi3(y) <- %r5 br .L2 .L2: phi.32 %r8(y) <- %phi2(y), %phi3(y) ret.32 %r8(y) It's quite frequent and the obvious solution is to move up such instructions (and maybe, if the if-conversion succeed, the BBs will be merged and there won't be a top and bottom block anymore). I've a very crude patch moving all such instructions up if possible, it gives some positive results for the context imbalance but nothing miraculous (IIRC something like 35 warnings less on a total of ~1000). Another interesting example is something like: int foo(int a) { int e; if (a) { __context__(1); e = 1; } else { e = 0; } if (!e) return 1; __context__(-1); return 0; } which produces the following IR: foo: .L0: cbr %arg1, .L1, .L3 .L1: context 1 br .L3 .L3: seteq.32 %r5 <- %arg1, $0 cbr %arg1, .L5, .L6 .L5: context -1 br .L6 .L6: ret.32 %r5 Sparse issues a "context imbalance" warnings on it but in fact everything is OK. Three different things could unblock things: 1) the seteq move up to L0, then L1 can directly jump to L5 2) the seteq move down to L6, then again L1 can jump to L5 3) the context in L1 is changed into a conditional (a new instruction, essentially a select but for contexts) it can then be moved up to L0, L1 disappears and L3 can be merged with L0. None of these are really hard but the problem, I'm sure you can guess it, is that none bring a definitive, clear solution because, very often, moving things up or down just displaces the problem to somewhere else. But well, it's life. -- Luc
On Sun, Nov 22, 2020 at 11:57 AM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > In the case here with your example, the if-conversion doesn't happen > because the phi-sources is not defined in the top block because of > the OR: Ahh. Yes, it's more obvious with a more realistic test-case that actually translates one set of bits to another set of bits (which is something we do in the kernel for various reasons - different bit "namespaces", for example user interfaces etc): #define BIT1 4 #define BIT1x 16 #define BIT2 8 #define BIT2x 32 int translate_bits(int x) { int y = 0; if (x & BIT1) y |= BIT1x; if (x & BIT2) y |= BIT2x; return y; } and the first one gets nicely translated as and.32 %r2 <- %arg1, $4 select.32 %r14(y) <- %r2, $16, $0 but then the second one doesn't for the reason you mention. Honestly, particularly in the conditional form, the OP_SEL optimization might not even be the right thing. It adds register pressure. So maybe a better model would be to not try to do jump-conversion, but have some kind of general "can we simplify phi nodes", where jump conversion to OP_SEL is just one of the options. Linus
On Sun, Nov 22, 2020 at 12:13:46PM -0800, Linus Torvalds wrote: > > Honestly, particularly in the conditional form, the OP_SEL > optimization might not even be the right thing. It adds register > pressure. > > So maybe a better model would be to not try to do jump-conversion, but > have some kind of general "can we simplify phi nodes", where jump > conversion to OP_SEL is just one of the options. Yes, for the "context imbalance" problem, it's best to eliminate the most possible conditional branches (and if-conversion and jump threading help each other at this (when not blocking each other)). So, yes, probably what's missing the most is some strategy to attack the problem. -- Luc
On Sun, Nov 22, 2020 at 12:13:46PM -0800, Linus Torvalds wrote: > On Sun, Nov 22, 2020 at 11:57 AM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > In the case here with your example, the if-conversion doesn't happen > > because the phi-sources is not defined in the top block because of > > the OR: > > Ahh. Yes, it's more obvious with a more realistic test-case that > actually translates one set of bits to another set of bits (which is > something we do in the kernel for various reasons - different bit > "namespaces", for example user interfaces etc): > > #define BIT1 4 > #define BIT1x 16 > > #define BIT2 8 > #define BIT2x 32 > > int translate_bits(int x) > { > int y = 0; > if (x & BIT1) > y |= BIT1x; > if (x & BIT2) > y |= BIT2x; > return y; > } > > and the first one gets nicely translated as > > and.32 %r2 <- %arg1, $4 > select.32 %r14(y) <- %r2, $16, $0 > > but then the second one doesn't for the reason you mention. One thing that can be done (but probably excessive) is something like the patch below: each time a phi-source is 'blocked' by it's defining instruction, try to move this instruction up. With this, we get the expected result for your patch: translate_bits: and.32 %r2 <- %arg1, $4 select.32 %r14(y) <- %r2, $16, $0 and.32 %r7 <- %arg1, $8 or.32 %r10 <- %r14(y), $32 select.32 %r13(y) <- %r7, %r10, %r14(y) ret.32 %r13(y) It gives very good result in all sort of code but sadly it doesn't improve the context imbalance warnings, it even make them (very) slightly worse. From be38aca2a4ca3fc0911da9187e2ea58a379824cc Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Date: Sun, 22 Nov 2020 22:49:13 +0100 Subject: [PATCH] move up instructions blocking if-conversion --- linearize.h | 6 +++++ simplify.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 69 insertions(+), 2 deletions(-) diff --git a/linearize.h b/linearize.h index 31c754e200c2..dcd4359eb2a6 100644 --- a/linearize.h +++ b/linearize.h @@ -300,6 +300,12 @@ static inline void replace_bb_in_list(struct basic_block_list **list, replace_ptr_list_entry((struct ptr_list **)list, old, new, count); } +static inline void replace_insn(struct instruction *old, struct instruction *new) +{ + replace_ptr_list_entry((struct ptr_list **)&old->bb->insns, old, new, 1); +} + + struct entrypoint { struct symbol *name; struct symbol_list *syms; diff --git a/simplify.c b/simplify.c index a0e23d6de01f..7d5e0d9bcb3a 100644 --- a/simplify.c +++ b/simplify.c @@ -46,11 +46,69 @@ #include "linearize.h" #include "flow.h" #include "symbol.h" +#include "flowgraph.h" /// // Utilities // ^^^^^^^^^ +/// +// check if a pseudo is defined in a BB +// +// :note: this could also use the liveness information if available. +// +static inline bool is_defined_at(pseudo_t src, struct basic_block *bb) +{ + if (src->type != PSEUDO_REG) + return true; + return domtree_dominates(src->def->bb, bb); +} + +/// +// move an instruction at the end of the given BB +static void move_insn(struct instruction *insn, struct basic_block *bb) +{ + struct instruction *last = delete_last_instruction(&bb->insns); + insn->bb = bb; + add_instruction(&bb->insns, insn); + add_instruction(&bb->insns, last); +} + +/// +// move an instruction into a BB if all its arguments are defined there +static bool try_to_move_insn(struct instruction *insn, struct basic_block *bb) +{ + static struct instruction NOP; // will later be ignore because !ep + + if (!bb || bb == insn->bb) + return 0; + + switch (insn->opcode) { + case OP_SEL: + if (!is_defined_at(insn->src3, bb)) + return 0; + case OP_ADD: case OP_SUB: + case OP_ASR: case OP_LSR: case OP_SHL: + case OP_AND: case OP_XOR: case OP_OR: + case OP_BINCMP ... OP_BINCMP_END: + case OP_FPCMP ... OP_FPCMP_END: + case OP_FADD: case OP_FSUB: + // basically, all the binops but multiply and divide, just because + if (!is_defined_at(insn->src2, bb)) + return 0; + case OP_UNOP ... OP_UNOP_END: + if (!is_defined_at(insn->src1, bb)) + return 0; + break; + default: + return 0; + } + replace_insn(insn, &NOP); + move_insn(insn, bb); + repeat_phase |= REPEAT_CSE; + return 1; +} + /// // find the trivial parent for a phi-source static struct basic_block *phi_parent(struct basic_block *source, pseudo_t pseudo) @@ -58,8 +116,11 @@ static struct basic_block *phi_parent(struct basic_block *source, pseudo_t pseud /* Can't go upwards if the pseudo is defined in the bb it came from.. */ if (pseudo->type == PSEUDO_REG) { struct instruction *def = pseudo->def; - if (def->bb == source) - return source; + if (def->bb == source) { + // unless we can move the defining instruction up + if (!try_to_move_insn(def, source->idom)) + return source; + } } if (bb_list_size(source->children) != 1 || bb_list_size(source->parents) != 1) return source;
diff --git a/validation/optim/canonical-arg.c b/validation/optim/canonical-arg.c new file mode 100644 index 000000000000..a8ecc9bd0083 --- /dev/null +++ b/validation/optim/canonical-arg.c @@ -0,0 +1,20 @@ +int def(void); + +int canon_arg_arg(int a, int b) +{ + return (a + b) == (b + a); +} + +int canon_arg_reg(int a) +{ + int b = def(); + return (a + b) == (b + a); +} + +/* + * check-name: canonical-arg + * check-command: test-linearize -Wno-decl $file + * + * check-output-ignore + * check-output-returns: 1 + */ diff --git a/validation/optim/canonical-not.c b/validation/optim/canonical-not.c new file mode 100644 index 000000000000..9698590fd245 --- /dev/null +++ b/validation/optim/canonical-not.c @@ -0,0 +1,9 @@ +int canon_not(int a, int b) { return (a & ~b) == (~b & a); } + +/* + * check-name: canonical-not + * check-command: test-linearize -Wno-decl $file + * + * check-output-ignore + * check-output-returns: 1 + */ diff --git a/validation/optim/cse-arg01.c b/validation/optim/cse-arg01.c new file mode 100644 index 000000000000..c3f2963ffdeb --- /dev/null +++ b/validation/optim/cse-arg01.c @@ -0,0 +1,10 @@ +int foo(int a, int b) { return (a < b) == (b > a); } + +/* + * check-name: cse-arg01 + * check-command: test-linearize -Wno-decl $file + * check-known-to-fail + * + * check-output-ignore + * check-output-returns: 1 + */ diff --git a/validation/optim/cse-not01.c b/validation/optim/cse-not01.c new file mode 100644 index 000000000000..f87123f14f13 --- /dev/null +++ b/validation/optim/cse-not01.c @@ -0,0 +1,12 @@ +int and(int a) { return (~a & a) == 0; } +int ior(int a) { return (~a | a) == ~0; } +int xor(int a) { return (~a ^ a) == ~0; } + +/* + * check-name: cse-not01 + * check-command: test-linearize -Wno-decl $file + * check-known-to-fail + * + * check-output-ignore + * check-output-returns: 1 + */ diff --git a/validation/optim/cse-not02.c b/validation/optim/cse-not02.c new file mode 100644 index 000000000000..aa54a375a9ea --- /dev/null +++ b/validation/optim/cse-not02.c @@ -0,0 +1,12 @@ +int and(int a, int b) { return ((a == b) & (a != b)) == 0; } +int ior(int a, int b) { return ((a == b) | (a != b)) == 1; } +int xor(int a, int b) { return ((a == b) ^ (a != b)) == 1; } + +/* + * check-name: cse-not02 + * check-command: test-linearize -Wno-decl $file + * check-known-to-fail + * + * check-output-ignore + * check-output-returns: 1 + */ diff --git a/validation/optim/cse-reg01.c b/validation/optim/cse-reg01.c new file mode 100644 index 000000000000..938858f4649b --- /dev/null +++ b/validation/optim/cse-reg01.c @@ -0,0 +1,10 @@ +int foo(int a, int b) { int x = a + b, y = ~b; return (x < y) == (y > x); } + +/* + * check-name: cse-reg01 + * check-command: test-linearize -Wno-decl $file + * check-known-to-fail + * + * check-output-ignore + * check-output-returns: 1 + */
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> --- validation/optim/canonical-arg.c | 20 ++++++++++++++++++++ validation/optim/canonical-not.c | 9 +++++++++ validation/optim/cse-arg01.c | 10 ++++++++++ validation/optim/cse-not01.c | 12 ++++++++++++ validation/optim/cse-not02.c | 12 ++++++++++++ validation/optim/cse-reg01.c | 10 ++++++++++ 6 files changed, 73 insertions(+) create mode 100644 validation/optim/canonical-arg.c create mode 100644 validation/optim/canonical-not.c create mode 100644 validation/optim/cse-arg01.c create mode 100644 validation/optim/cse-not01.c create mode 100644 validation/optim/cse-not02.c create mode 100644 validation/optim/cse-reg01.c