diff mbox series

pretty: colorize pattern matches in commit messages

Message ID 20210901121616.2109658-1-someguy@effective-light.com (mailing list archive)
State Superseded
Headers show
Series pretty: colorize pattern matches in commit messages | expand

Commit Message

Hamza Mahfooz Sept. 1, 2021, 12:16 p.m. UTC
Currently, for example when

  git log --grep=pattern

is executed, the outputted commits that are matched by the pattern do not
have the relevant substring matches highlighted.

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
---
 pretty.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 98 insertions(+), 11 deletions(-)

Comments

Felipe Contreras Sept. 1, 2021, 5:17 p.m. UTC | #1
Hamza Mahfooz wrote:
> Currently, for example when
> 
>   git log --grep=pattern
> 
> is executed, the outputted commits that are matched by the pattern do not
> have the relevant substring matches highlighted.
> 
> Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
> ---
>  pretty.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++------

Can you add some tests?
Junio C Hamano Sept. 1, 2021, 11:26 p.m. UTC | #2
Hamza Mahfooz <someguy@effective-light.com> writes:

> Currently, for example when
>
>   git log --grep=pattern
>
> is executed, the outputted commits that are matched by the pattern do not
> have the relevant substring matches highlighted.

A proposed log message that stops after a description of the current
status alone invites a "so what?" response.  While it may be so
obvious to you why the current state is undesirable (after all, it
motivated you enough to write a patch to improve the situation), the
job of the proposed message is to explain and convince others to
agree what is wrong in the current state and what is a reasonable
design to improve it.  Among these three ingredients, the latter two
are missing from the above.

Because it is our convention to talk about the current status in
present tense first, you do not have to start the description with
"Currently,".

Taking the above two paragraphs together, perhaps:

    The "git log" command limits its output to the commits that
    contain strings that matched the "--grep=<pattern>" option, but
    unlike output from "git grep -e <pattern>", the matches are not
    highlighted, making them harder to spot.

    Teach the pretty-printer code to highlight matches from the
    "--grep=<pattern>", "--author=<pattern>" and
    "--committer=<pattern>" options (to view the last one, you may
    have to ask for --pretty=fuller).

Or something like that.

>  pretty.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 98 insertions(+), 11 deletions(-)

This new feature deserves to be tested.  I am surprised that we do
not have "git log --grep" tests whose expected output we need to
adjust for this change.  Perhaps it is becuase we are not otherwise
showing colors at all in "git log" tests?

It also needs a documentation update, if only to tell readers how to
customize the color used to paint the hits.

> +static void append_matched_line(struct grep_opt *opt, const char *line,
> +				size_t linelen, enum grep_pat_token token,
> +				int field, struct strbuf *sb)
> +{
> +	struct grep_pat *pat;
> +	struct strbuf tmp_sb;
> +	regmatch_t tmp_match, match;
> +	char *buf, *eol, *color;
> +	int cflags = 0;
> +
> +	strbuf_init(&tmp_sb, linelen + 1);
> +	strbuf_add(&tmp_sb, line, linelen);
> +	buf = tmp_sb.buf;
> +	eol = buf + linelen;

This copy of the whole line is wasted when ...

> +	if (!opt || !want_color(opt->color))
> +		goto skip;

... we are not doing any coloring.  Can we avoid it?

> +	color = opt->colors[GREP_COLOR_MATCH_CONTEXT];

Why is the context (as opposed to selected) color used here?  

In general, when you allow the foreground color to be customized,
you must also make the background color customizable, so that the
end user can avoid low contrast combinations.  If we look at how
opt->color is handled in grep.c::show_line(), we can tell how both
match_color (foregroud) and line_color (background) are taken from
the palette that the end user can customize.  We should do the same
here, without assuming that 'color' here will have a good contrast
against the COLOR_RESET backdrop.

> +	for (;;) {
> +		match.rm_so = match.rm_eo = -1;
> +
> +		for (pat = (token == GREP_PATTERN_HEAD ?
> +			    opt->header_list : opt->pattern_list);
> +		     pat; pat = pat->next) {
> +			if (pat->token == token &&
> +			    (field == -1 || pat->field == field) &&
> +			    !regexec(&pat->regexp, buf, 1, &tmp_match,
> +				     cflags)) {
> +
> +				if ((match.rm_so >= 0 && match.rm_eo >= 0) &&
> +				    (tmp_match.rm_so > match.rm_so ||
> +				     (tmp_match.rm_so == match.rm_so &&
> +				      tmp_match.rm_eo < match.rm_eo)))
> +					continue;
> +
> +				match.rm_so = tmp_match.rm_so;
> +				match.rm_eo = tmp_match.rm_eo;
> +			}
> +		}

For the current commit to come this far, "git log --grep=<pattern>"
must have done the above exact regexec() to decide if we need to
call this function in the first place, right?

We must be redoing the same computation here, which is unfortunate
for two reasons.  Performance and maintainability.

How much extra cycles are we looking at with this additional code?
Depending on how inefficient this code makes, we may need to make it
an optional feature, turned off by default.

Worse yet, this can easily become a source of future bugs, since the
above matching logic must be kept in sync with the existing matching
code elsewhere.  I would not be surprised if the above logic is
already broken when various options that affects how pattern
matching works with "log --grep" (e.g. "--invert-grep", "-E", "-i")
are in use.

Also, this function is misnamed.  It is not a function that appends
matched line.  From the caller's point of view, it is used to append
each and every line it wants to add to sb [*], and they do not even
need or want to know how the callee decides which parts of the line
to paint in different colors.  Perhaps append_line_with_color() or
something?

	Side note.  By the way, why is the sb the last parameter to
	this function?  Usually functions that operate on a strbuf
	take it as the first argument.

> +		if (match.rm_so == match.rm_eo)
> +			break;
> +
> +		strbuf_grow(sb, strlen(color) + strlen(GIT_COLOR_RESET));
> +		strbuf_add(sb, buf, match.rm_so);
> +		strbuf_add(sb, color, strlen(color));
> +		strbuf_add(sb, buf + match.rm_so,
> +			   match.rm_eo - match.rm_so);
> +		strbuf_add(sb, GIT_COLOR_RESET,
> +			   strlen(GIT_COLOR_RESET));
> +		buf += match.rm_eo;
> +		cflags = REG_NOTBOL;
> +	}
> +
> +skip:
> +	strbuf_add(sb, buf, eol - buf);
> +
> +	strbuf_release(&tmp_sb);
> +}
> +

I like what the new feature wants to do, but I am not sure if this
is the best execution of the idea (yet).

Thanks.
diff mbox series

Patch

diff --git a/pretty.c b/pretty.c
index 9631529c10..2886916ae6 100644
--- a/pretty.c
+++ b/pretty.c
@@ -431,15 +431,80 @@  const char *show_ident_date(const struct ident_split *ident,
 	return show_date(date, tz, mode);
 }
 
+static void append_matched_line(struct grep_opt *opt, const char *line,
+				size_t linelen, enum grep_pat_token token,
+				int field, struct strbuf *sb)
+{
+	struct grep_pat *pat;
+	struct strbuf tmp_sb;
+	regmatch_t tmp_match, match;
+	char *buf, *eol, *color;
+	int cflags = 0;
+
+	strbuf_init(&tmp_sb, linelen + 1);
+	strbuf_add(&tmp_sb, line, linelen);
+	buf = tmp_sb.buf;
+	eol = buf + linelen;
+
+	if (!opt || !want_color(opt->color))
+		goto skip;
+
+	color = opt->colors[GREP_COLOR_MATCH_CONTEXT];
+
+	for (;;) {
+		match.rm_so = match.rm_eo = -1;
+
+		for (pat = (token == GREP_PATTERN_HEAD ?
+			    opt->header_list : opt->pattern_list);
+		     pat; pat = pat->next) {
+			if (pat->token == token &&
+			    (field == -1 || pat->field == field) &&
+			    !regexec(&pat->regexp, buf, 1, &tmp_match,
+				     cflags)) {
+
+				if ((match.rm_so >= 0 && match.rm_eo >= 0) &&
+				    (tmp_match.rm_so > match.rm_so ||
+				     (tmp_match.rm_so == match.rm_so &&
+				      tmp_match.rm_eo < match.rm_eo)))
+					continue;
+
+				match.rm_so = tmp_match.rm_so;
+				match.rm_eo = tmp_match.rm_eo;
+			}
+		}
+
+		if (match.rm_so == match.rm_eo)
+			break;
+
+		strbuf_grow(sb, strlen(color) + strlen(GIT_COLOR_RESET));
+		strbuf_add(sb, buf, match.rm_so);
+		strbuf_add(sb, color, strlen(color));
+		strbuf_add(sb, buf + match.rm_so,
+			   match.rm_eo - match.rm_so);
+		strbuf_add(sb, GIT_COLOR_RESET,
+			   strlen(GIT_COLOR_RESET));
+		buf += match.rm_eo;
+		cflags = REG_NOTBOL;
+	}
+
+skip:
+	strbuf_add(sb, buf, eol - buf);
+
+	strbuf_release(&tmp_sb);
+}
+
 void pp_user_info(struct pretty_print_context *pp,
 		  const char *what, struct strbuf *sb,
 		  const char *line, const char *encoding)
 {
+	struct strbuf id;
 	struct ident_split ident;
 	char *line_end;
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
 	int max_length = 78; /* per rfc2822 */
+	int field = -1;
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
 
 	if (pp->fmt == CMIT_FMT_ONELINE)
 		return;
@@ -496,9 +561,22 @@  void pp_user_info(struct pretty_print_context *pp,
 			strbuf_addch(sb, '\n');
 		strbuf_addf(sb, " <%.*s>\n", (int)maillen, mailbuf);
 	} else {
-		strbuf_addf(sb, "%s: %.*s%.*s <%.*s>\n", what,
-			    (pp->fmt == CMIT_FMT_FULLER) ? 4 : 0, "    ",
-			    (int)namelen, namebuf, (int)maillen, mailbuf);
+		strbuf_init(&id, namelen + maillen + 4);
+
+		if (!strcmp(what, "Author"))
+			field = GREP_HEADER_AUTHOR;
+		else if (!strcmp(what, "Commit"))
+			field = GREP_HEADER_COMMITTER;
+
+		strbuf_addf(sb, "%s: %.*s", what,
+			    (pp->fmt == CMIT_FMT_FULLER) ? 4 : 0, "    ");
+		strbuf_addf(&id, "%.*s <%.*s>", (int)namelen, namebuf,
+			    (int)maillen, mailbuf);
+
+		append_matched_line(opt, id.buf, id.len,
+				    GREP_PATTERN_HEAD, field, sb);
+		strbuf_addch(sb, '\n');
+		strbuf_release(&id);
 	}
 
 	switch (pp->fmt) {
@@ -1855,6 +1933,7 @@  static void pp_header(struct pretty_print_context *pp,
 	}
 }
 
+
 void pp_title_line(struct pretty_print_context *pp,
 		   const char **msg_p,
 		   struct strbuf *sb,
@@ -1935,8 +2014,8 @@  static int pp_utf8_width(const char *start, const char *end)
 	return width;
 }
 
-static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
-				 const char *line, int linelen)
+static void strbuf_add_tabexpand(struct grep_opt *opt, struct strbuf *sb,
+				 int tabwidth, const char *line, int linelen)
 {
 	const char *tab;
 
@@ -1953,7 +2032,8 @@  static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 			break;
 
 		/* Output the data .. */
-		strbuf_add(sb, line, tab - line);
+		append_matched_line(opt, line, tab - line, GREP_PATTERN_BODY,
+				    -1, sb);
 
 		/* .. and the de-tabified tab */
 		strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));
@@ -1968,7 +2048,8 @@  static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 	 * worrying about width - there's nothing more to
 	 * align.
 	 */
-	strbuf_add(sb, line, linelen);
+	append_matched_line(opt, line, linelen,
+			    GREP_PATTERN_BODY, -1, sb);
 }
 
 /*
@@ -1980,11 +2061,14 @@  static void pp_handle_indent(struct pretty_print_context *pp,
 			     struct strbuf *sb, int indent,
 			     const char *line, int linelen)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
+
 	strbuf_addchars(sb, ' ', indent);
 	if (pp->expand_tabs_in_log)
-		strbuf_add_tabexpand(sb, pp->expand_tabs_in_log, line, linelen);
+		strbuf_add_tabexpand(opt, sb, pp->expand_tabs_in_log, line, linelen);
 	else
-		strbuf_add(sb, line, linelen);
+		append_matched_line(opt, line, linelen, GREP_PATTERN_BODY, -1,
+				    sb);
 }
 
 static int is_mboxrd_from(const char *line, int len)
@@ -2002,7 +2086,9 @@  void pp_remainder(struct pretty_print_context *pp,
 		  struct strbuf *sb,
 		  int indent)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
 	int first = 1;
+
 	for (;;) {
 		const char *line = *msg_p;
 		int linelen = get_one_line(line);
@@ -2023,14 +2109,15 @@  void pp_remainder(struct pretty_print_context *pp,
 		if (indent)
 			pp_handle_indent(pp, sb, indent, line, linelen);
 		else if (pp->expand_tabs_in_log)
-			strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
+			strbuf_add_tabexpand(opt, sb, pp->expand_tabs_in_log,
 					     line, linelen);
 		else {
 			if (pp->fmt == CMIT_FMT_MBOXRD &&
 					is_mboxrd_from(line, linelen))
 				strbuf_addch(sb, '>');
 
-			strbuf_add(sb, line, linelen);
+			append_matched_line(opt, line, linelen,
+					    GREP_PATTERN_BODY, -1, sb);
 		}
 		strbuf_addch(sb, '\n');
 	}