Message ID | 20210124021229.25987-3-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | grep: better support invalid UTF-8 haystacks | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > NOT(A && B) is Equivalent to (NOT(A) OR NOT(B)) At this level, however, the left one looks much simpler than the right one ;-) > if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && > - !(!opt->ignore_case && (p->fixed || p->is_fixed))) > + (opt->ignore_case || !(p->fixed || p->is_fixed))) > options |= PCRE2_UTF; In the context of this expression, well, I guess the rewritten one is probably simpler but can we explain the whole condition in fewer than three lines? With or without the rewrite, it still looks too complicated to me.
Am 24.01.21 um 06:33 schrieb Junio C Hamano: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> NOT(A && B) is Equivalent to (NOT(A) OR NOT(B)) > > At this level, however, the left one looks much simpler than the > right one ;-) > > >> if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && >> - !(!opt->ignore_case && (p->fixed || p->is_fixed))) >> + (opt->ignore_case || !(p->fixed || p->is_fixed))) >> options |= PCRE2_UTF; > > In the context of this expression, well, I guess the rewritten one > is probably simpler but can we explain the whole condition in fewer > than three lines? With or without the rewrite, it still looks too > complicated to me. Make the condition if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && (opt->ignore_case || (!p->fixed && !p->is_fixed))) { options |= PCRE2_UTF; } With the knowledge of the equivalence (A => B) <=> (NOT(A) OR B) (A => B means "if A then B"), the condition makes a lot of sense when read aloud: if NOT ignore locale AND is UTF8 AND has non-ASCII AND if NOT ignore case then if also NOT fixed AND NOT is fixed then ... The codition amounts to extending a series of conjunctions with more conjuctions IF a condition is satisfied. That's quite sensible. You have to swap the polarity of the first condition of || in your head, though, to achieve that meaning. That works with every OR condition, BTW. -- Hannes
diff --git a/grep.c b/grep.c index efeb6dc58d..0bb772f727 100644 --- a/grep.c +++ b/grep.c @@ -491,7 +491,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt options |= PCRE2_CASELESS; } if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && - !(!opt->ignore_case && (p->fixed || p->is_fixed))) + (opt->ignore_case || !(p->fixed || p->is_fixed))) options |= PCRE2_UTF; p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
Simplify an expression I added in 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching, 2019-07-26) by using a simple application of De Morgan's laws[1]. I.e.: NOT(A && B) is Equivalent to (NOT(A) OR NOT(B)) 1. https://en.wikipedia.org/wiki/De_Morgan%27s_laws Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)