diff mbox series

[v2] log: add option to search for header or body

Message ID pull.1710.v2.git.1712460247516.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] log: add option to search for header or body | expand

Commit Message

Max April 7, 2024, 3:24 a.m. UTC
From: =?UTF-8?q?Max=20=F0=9F=91=A8=F0=9F=8F=BD=E2=80=8D=F0=9F=92=BB=20Copl?=
 =?UTF-8?q?an?= <mchcopl@gmail.com>

Summary:
This change adds a new option to `git log` that allows users to search
for commits that match either the author or the commit message. This is
useful for finding commits that were either authored or co-authored by a
specific person.

Currently, the best way to find a commit either authored or co-authored
by a specific person is to use

```sh
$ echo \
    $(git log --author=Torvalds --pretty="%cd,%H\n" --date=iso-strict) \
    $(git log --grep="Co-authored-by: .*Torvalds" --pretty="%cd,%H\n" --date=iso-strict) \
| sort -n --reverse \
| awk -F, '{print $2}' \
| tr '\n' '\t' \
| xargs git show --stat --stdin
```

This is a bit of a pain, so this change adds a new option to `git log`.
Now finding either authors or co-authors is as simple as

```sh
$ git log --author=Torvalds --grep=Torvalds --match-header-or-grep
```

Test plan:
1. create commit authored by A and co-authored-by B
2. create commit authored by B
3. run
```sh
$ git log --author=B --grep="Co-authored-by: B" --match-header-or-grep
```
4. expect to see both commits

Signed-off-by: Max 

Comments

Junio C Hamano April 7, 2024, 6:08 a.m. UTC | #1
"Max Coplan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This change adds a new option to `git log` that allows users to search
> for commits that match either the author or the commit message. This is
> useful for finding commits that were either authored or co-authored by a
> specific person.

I have this feeling that the "solution" presented is not quite
addressing the use case in a more useful and direct way than it
could be.  When I designed how the --author/--committer restriction
and --grep in the body of the message interact, I made a concious
decision that "among those commits that were authored by person X,
find the ones that mention Y" is far more useful than "done by X, or
done by anybody that mention Y", especially when Y is just a text
search in the free form.  There was nothing that limits the mention
of Y to those specifically involved in the commit---the mention could
just have been part of text, like "earlier Max Coplan sent a patch,
but this commit is not related to it".

But these days, we have a more established "convention" that lists
people at the end in the form of "trailers", and that changes the
picture quite a lot from how the world order was back then.

In other words, if the true objective is to find commits that
involved person X, Y or Z (which is very common and would be a lot
more useful than finding those that involve all of them), shouldn't
we be limiting the --grep side even further so that a random mention
of person Y is excluded and hit is counted only when person Y is
mentioned on a trailer (while loosening the --author side so that it
is OR'ed instead of AND'ed)?

I am imagining a pair of new options to name people (all OR'ed) and
to name places the names of these people should appear (again, all
OR'ed).  I am not good at naming, so the option names in the example
is not more than illustration of an idea and not my recommendation,
but a command:

    git log --by="Max Coplan" --by="Junio C Hamano" \
	    --by-where=author,Signed-off-by,Co-authored-by

would find a commit that has one (or more) of the given names
in one (or more) of the places that are specified, where the places
can be either "author", "committer" to specify these headers in the
commit object, or random other string to specify trailer lines with
given keys.

Hmm?
Phillip Wood April 7, 2024, 2 p.m. UTC | #2
On 07/04/2024 07:08, Junio C Hamano wrote:
> "Max Coplan via GitGitGadget" <gitgitgadget@gmail.com> writes:
> I am imagining a pair of new options to name people (all OR'ed) and
> to name places the names of these people should appear (again, all
> OR'ed).  I am not good at naming, so the option names in the example
> is not more than illustration of an idea and not my recommendation,
> but a command:
> 
>      git log --by="Max Coplan" --by="Junio C Hamano" \
> 	    --by-where=author,Signed-off-by,Co-authored-by
> 
> would find a commit that has one (or more) of the given names
> in one (or more) of the places that are specified, where the places
> can be either "author", "committer" to specify these headers in the
> commit object, or random other string to specify trailer lines with
> given keys.

I like this, yesterday[1] I didn't have a clear idea in my head of how 
such an option should work but I think passing the names and the fields 
to match those names against as two separate options is a good idea.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/c93817ba-5945-4ec0-9775-5621481b972c@gmail.com/
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 00ccf687441..db0979ac498 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -71,6 +71,14 @@  endif::git-rev-list[]
 	Limit the commits output to ones that match all given `--grep`,
 	instead of ones that match at least one.
 
+--match-header-or-grep::
+	Limit the commits output to ones that match either header patterns
+	(`--author`, `--committer`, or `--grep-reflog`) or `--grep`, instead
+	of ones that match both the header and grep patterns
++
+For example, `--author=me --grep=Co-authored-by: me` limits to commits either
+authored or co-authored by me.
+
 --invert-grep::
 	Limit the commits output to ones with a log message that do not
 	match the pattern specified with `--grep=<pattern>`.
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 75193ded4bd..30fc6ed08bd 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2170,7 +2170,7 @@  __git_log_gitk_options="
 # Options that go well for log and shortlog (not gitk)
 __git_log_shortlog_options="
 	--author= --committer= --grep=
-	--all-match --invert-grep
+	--all-match --invert-grep --match-header-or-grep
 "
 # Options accepted by log and show
 __git_log_show_options="
diff --git a/grep.c b/grep.c
index ac34bfeafb3..72cf599660a 100644
--- a/grep.c
+++ b/grep.c
@@ -802,7 +802,7 @@  void compile_grep_patterns(struct grep_opt *opt)
 
 	if (!opt->pattern_expression)
 		opt->pattern_expression = header_expr;
-	else if (opt->all_match)
+	else if (opt->all_match || opt->match_header_or_grep)
 		opt->pattern_expression = grep_splice_or(header_expr,
 							 opt->pattern_expression);
 	else
@@ -1829,7 +1829,7 @@  int grep_source(struct grep_opt *opt, struct grep_source *gs)
 	opt->body_hit = 0;
 	grep_source_1(opt, gs, 1);
 
-	if (opt->all_match && !chk_hit_marker(opt->pattern_expression))
+	if (!opt->match_header_or_grep && opt->all_match && !chk_hit_marker(opt->pattern_expression))
 		return 0;
 	if (opt->no_body_match && opt->body_hit)
 		return 0;
diff --git a/grep.h b/grep.h
index 926c0875c42..861584dba98 100644
--- a/grep.h
+++ b/grep.h
@@ -147,6 +147,7 @@  struct grep_opt {
 	int count;
 	int word_regexp;
 	int all_match;
+	int match_header_or_grep;
 	int no_body_match;
 	int body_hit;
 #define GREP_BINARY_DEFAULT	0
diff --git a/revision.c b/revision.c
index 7e45f765d9f..786c229f56d 100644
--- a/revision.c
+++ b/revision.c
@@ -2646,6 +2646,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
+	} else if (!strcmp(arg, "--match-header-or-grep")) {
+		revs->grep_filter.match_header_or_grep = 1;
 	} else if (!strcmp(arg, "--invert-grep")) {
 		revs->grep_filter.no_body_match = 1;
 	} else if ((argcount = parse_long_opt("encoding", argv, &optarg))) {
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 875dcfd98f3..c78ce150f4d 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -961,6 +961,14 @@  test_expect_success 'log --grep --author uses intersection' '
 	test_cmp expect actual
 '
 
+test_expect_success 'log --grep --author --match-header-or-grep uses union' '
+	# grep matches only third and fourth
+	# author matches only initial and third
+	git log --author="A U Thor" --grep=r --match-header-or-grep --format=%s >actual &&
+	test_write_lines fourth third initial >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'log --grep --grep --author takes union of greps and intersects with author' '
 	# grep matches initial and second but not third
 	# author matches only initial and third
@@ -971,7 +979,23 @@  test_expect_success 'log --grep --grep --author takes union of greps and interse
 	test_cmp expect actual
 '
 
-test_expect_success 'log ---all-match -grep --author --author still takes union of authors and intersects with grep' '
+test_expect_success 'log --author --grep --grep --match-header-or-grep takes union of greps and author' '
+	# grep matches initial and second but not third
+	# author matches only initial and third
+	git log --author="A U Thor" --grep=second --grep=initial --match-header-or-grep --format=%s >actual &&
+	test_write_lines third second initial >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --author --grep --grep --all-match --match-header-or-grep still takes union of greps and author' '
+	# grep matches initial and second but not third
+	# author matches only initial and third
+	git log --author="A U Thor" --grep=second --grep=initial --all-match --match-header-or-grep --format=%s >actual &&
+	test_write_lines third second initial >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --all-match --grep --author --author still takes union of authors and intersects with grep' '
 	# grep matches only initial and third
 	# author matches all but second
 	git log --all-match --author="Thor" --author="Night" --grep=i --format=%s >actual &&