Message ID | 20190818201727.31505-1-dev+git@drbeat.li (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | grep: under --debug, show whether PCRE JIT is enabled | expand |
Beat Bolli <dev+git@drbeat.li> writes: > This information is useful and not visible anywhere else, so show it. > > Signed-off-by: Beat Bolli <dev+git@drbeat.li> > Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > --- > This applies on top of 'ab/pcre-jit-fixes', currently in pu. Thanks. We saw a few people exchanging patches on the list and discussing topics aroud PCRE and especially JIT, but the discussion petered out during the prerelease freeze. I'd like to see the topics solidify early in this cycle. IIRC, some key points/issues addressed by various patches we saw during the last cycle were: - A binary may be prepared to be JIT capable, but on a particular system (e.g. SELinux) JIT may not work. Should we write off such a configuration as "broken"? Should we just fall back on non-JIT? Should we fall back with loud warning? - JIT and non-JIT codepath may validate UTF-8 differently without care, but we should make sure JIT codepath behave identically to non JIT (only faster). - We should not be validating strict UTF-8 when we do not even know if the payload is UTF-8. What mechanism, if any, do we have to let us say "this must be UTF-8 or otherwise it is an error" with confidence? Should we error out in the middle of "git log" session upon seeing a binary haystack while looking for UTF-8 needle (I think not)? There may be others I am missing. Is ab/pcre-jit-fixes a good base to collectively work on to finish the topics floated around PCRE during the last cycle? I'll queue this debugging aid on top in the meantime. Thanks.
On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > There may be others I am missing. should we still support PCRE1? I think in this case the problem is compounded by the fact that unless we do something like [1], the real fix for those UTF-8 validation issues will require a yet unreleased version of PCRE2 and will never be available for PCRE1, making the user experience suboptimal. and explained in [1] there was a series to cleanup (both for maintainability and to mitigate regressions) the PCRE1 code that is yet to be formally reviewed in [2] there is also the question of if we should provide knobs so users can "tune" their pcre library to workaround some of the quirks or if we should do more work ourselves to handle those quirks and improve the error reporting. one example of that is as you pointed out JIT, but also applies to other things like PCRE's stack size, or depending on our solution for PCRE1, accepting the risk (which already exist anyway) to accept problems with matching because of corrupted UTF-8 in the haystack > Is ab/pcre-jit-fixes a good base to collectively work on to finish > the topics floated around PCRE during the last cycle? V3 of that (which was never sent) might be better IMHO, I had to also admit I was surprised to see the whole no-kwset series this depended on being dropped but would seem it was just partially merged with pcre-jit-fixes (which is missing the patches that address the UTF-8 issues with PCRE2's unreleased flag and that should be part of that V3) it might be worth also rebasing pcre2-chartables-leakfix on top of this to avoid conflicts, eventhough I had to admit that I was expanding on integrating [3], to avoid having to squash a fix into René's patch (as he suggested) and that would be part of a reroll from that series. Carlo [1] https://public-inbox.org/git/CAPUEspgStVxL=0SoAg82vxRMRGLSEKdHrT-xq6nCW1sNq7nLsw@mail.gmail.com/ [2] https://public-inbox.org/git/20190726202642.7986-1-carenas@gmail.com/ [3] https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/
Hi Carlo, On Sat, 24 Aug 2019, Carlo Arenas wrote: > On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > There may be others I am missing. > > should we still support PCRE1? While Git for Windows has no problem to just drop PCRE1 support, I would like to take a longer road in Git. Like, deprecate it first (I don't know how to do that effectively, though, as packagers usually ignore compile warnings, maybe we need to add a Makefile knob YES_I_WANT_TO_BUILD_WITH_PCRE1_INSTEAD_OF_PCRE2 and YES_I_WILL_STATE_MY_REASONS_ON_THE_GIT_MAILING_LIST or something like that). Ciao, Dscho
On Mon, Aug 26, 2019 at 7:29 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Sat, 24 Aug 2019, Carlo Arenas wrote: > > > On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > There may be others I am missing. > > > > should we still support PCRE1? > > While Git for Windows has no problem to just drop PCRE1 support, FWIW I don't want to drop PCRE1 support, I was not advocating for it, but in the contrary trying to find a way to keep it working as best as possible until we really can't. > I would > like to take a longer road in Git. Like, deprecate it first (I don't > know how to do that effectively, though, as packagers usually ignore > compile warnings, maybe we need to add a Makefile knob > YES_I_WANT_TO_BUILD_WITH_PCRE1_INSTEAD_OF_PCRE2 and > YES_I_WILL_STATE_MY_REASONS_ON_THE_GIT_MAILING_LIST or something like > that). FWIW e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11) put a "deprecattion" warning in the Makefile by Ævar, but last time this was discussed[1] Junio made an IMHO sound argument for why that should be removed instead but ab/pcre-jit-fixes and UTF-8 validation are likely to make that more difficult (even if it is a mostly self inflicted wound AFAIK) Carlo [1] https://public-inbox.org/git/xmqqlg4xkh28.fsf@gitster-ct.c.googlers.com/
Carlo Arenas <carenas@gmail.com> writes: > ... but > ab/pcre-jit-fixes and UTF-8 validation are likely to make that more > difficult (even if it is a mostly self inflicted wound AFAIK) Hmm, in what way? Do you mean that we'd be invested even more in pcre1 in an effort to keep supporting, that the sunk cost would dissuade us from deprecating the support even more, or something?
On Mon, Aug 26, 2019 at 9:02 AM Junio C Hamano <gitster@pobox.com> wrote: > > Carlo Arenas <carenas@gmail.com> writes: > > > ... but > > ab/pcre-jit-fixes and UTF-8 validation are likely to make that more > > difficult (even if it is a mostly self inflicted wound AFAIK) > > Hmm, in what way? Do you mean that we'd be invested even more in > pcre1 in an effort to keep supporting, that the sunk cost would > dissuade us from deprecating the support even more, or something? on the contrary, PCRE1 works fine but our recent changes make it worst unnecessarily (IMHO) for example 685668faaa (grep: stop using a custom JIT stack with PCRE v1, 2019-07-26) adds 2 regressions as discussed in [1] * git grep -P will now throw an error if there are non UTF-8 documents in the haystack (even if JIT is available) * git grep -P '^([/](?!/)|[^/])*~/.*' will now fail with a cryptic PCRE error instead of succeeding (but at least will be consistent and show the same error with PCRE2) Carlo [1] https://public-inbox.org/git/CAPUEspgStVxL=0SoAg82vxRMRGLSEKdHrT-xq6nCW1sNq7nLsw@mail.gmail.com/
Carlo Arenas <carenas@gmail.com> writes: >> > ... but >> > ab/pcre-jit-fixes and UTF-8 validation are likely to make that more >> > difficult (even if it is a mostly self inflicted wound AFAIK) >> ... > for example 685668faaa (grep: stop using a custom JIT stack with PCRE > v1, 2019-07-26) adds 2 regressions as discussed in [1] > > * git grep -P will now throw an error if there are non UTF-8 documents > in the haystack (even if JIT is available) > * git grep -P '^([/](?!/)|[^/])*~/.*' will now fail with a cryptic > PCRE error instead of succeeding (but at least will be consistent and > show the same error with PCRE2) Thanks. This was exactly the kind of thing I wanted to hear when I asked if ab/pcre-jit-fixes is a good base to further build on. As far as I am concerned, ab/pcre-jit-fixes and anything pcre that are still in flight are fair game for a reboot during this cycle (I hope I did not merge any of them to 'next' by mistake yet). If there are bad apples queued on 'pu', let's make sure we eject them and rethink the way to address the issue(s) they wanted to address.
diff --git a/grep.c b/grep.c index 9bc589720b..96272b3cfc 100644 --- a/grep.c +++ b/grep.c @@ -433,6 +433,8 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) #ifdef GIT_PCRE1_USE_JIT pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on); + if (opt->debug) + fprintf(stderr, "pcre1_jit_on=%d\n", p->pcre1_jit_on); #endif } @@ -534,6 +536,8 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); + if (opt->debug) + fprintf(stderr, "pcre2_jit_on=%d\n", p->pcre2_jit_on); if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); if (jitret) @@ -559,6 +563,9 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt BUG("pcre2_pattern_info() failed: %d", patinforet); if (jitsizearg == 0) { p->pcre2_jit_on = 0; + if (opt->debug) + fprintf(stderr, "pcre2_jit_on=%d: (*NO_JIT) in regex\n", + p->pcre2_jit_on); return; } }
This information is useful and not visible anywhere else, so show it. Signed-off-by: Beat Bolli <dev+git@drbeat.li> Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- This applies on top of 'ab/pcre-jit-fixes', currently in pu. grep.c | 7 +++++++ 1 file changed, 7 insertions(+)