diff mbox series

[v6,4/6] blame: add config options to handle output for ignored lines

Message ID 20190410162409.117264-5-brho@google.com (mailing list archive)
State New, archived
Headers show
Series blame: add the ability to ignore commits | expand

Commit Message

Barret Rhoden April 10, 2019, 4:24 p.m. UTC
When ignoring commits, the commit that is blamed might not be
responsible for the change.  Users might want to know when a particular
line has a potentially inaccurate blame.  Furthermore, they might never
want to see the object hash of an ignored commit.

This patch adds two config options to control the output behavior.

The first option can identify ignored lines by specifying
blame.markIgnoredFiles.  When this option is set, each blame line is
marked with an '*'.

For example:
	278b6158d6fdb (Barret Rhoden  2016-04-11 13:57:54 -0400 26)
appears as:
	*278b6158d6fd (Barret Rhoden  2016-04-11 13:57:54 -0400 26)

where the '*' is placed before the commit, and the hash has one fewer
characters.

Sometimes we are unable to even guess at what commit touched a line.
These lines are 'unblamable.'  The second option,
blame.maskIgnoredUnblamables, will zero the hash of any unblamable line.

For example, say we ignore e5e8d36d04cbe:
	e5e8d36d04cbe (Barret Rhoden  2016-04-11 13:57:54 -0400 26)
appears as:
	0000000000000 (Barret Rhoden  2016-04-11 13:57:54 -0400 26)

Signed-off-by: Barret Rhoden <brho@google.com>
---
 Documentation/blame-options.txt |  6 +++++-
 Documentation/config/blame.txt  |  9 +++++++++
 blame.c                         |  4 ++++
 blame.h                         |  1 +
 builtin/blame.c                 | 18 +++++++++++++++--
 t/t8013-blame-ignore-revs.sh    | 34 +++++++++++++++++++++++++++++++++
 6 files changed, 69 insertions(+), 3 deletions(-)

Comments

Junio C Hamano April 14, 2019, 3:45 a.m. UTC | #1
Barret Rhoden <brho@google.com> writes:

> Sometimes we are unable to even guess at what commit touched a line.
> These lines are 'unblamable.'  The second option,
> blame.maskIgnoredUnblamables, will zero the hash of any unblamable line.
>
> For example, say we ignore e5e8d36d04cbe:
> 	e5e8d36d04cbe (Barret Rhoden  2016-04-11 13:57:54 -0400 26)
> appears as:
> 	0000000000000 (Barret Rhoden  2016-04-11 13:57:54 -0400 26)

Wouldn't this make it impossible to tell between what's done by such
a commit that was marked to be ignored, and what's done locally only
in the working tree, which the users have long accustomed to see
with the ^0*$ object name?  I think it would make a lot more sense
to show the object name of the "ignored" commit, which would be
recognizable by the user who fed such an object name to the command
in the first place.  Alternatively, perhaps the same idea as replacing
one of the hexdigits with '*' used by the other configuration can be
applied to this as well?
Michael Platings April 14, 2019, 10:09 a.m. UTC | #2
On Sun, 14 Apr 2019 at 04:45, Junio C Hamano <gitster@pobox.com> wrote:
> Wouldn't this make it impossible to tell between what's done by such
> a commit that was marked to be ignored, and what's done locally only
> in the working tree, which the users have long accustomed to see
> with the ^0*$ object name?  I think it would make a lot more sense
> to show the object name of the "ignored" commit, which would be
> recognizable by the user who fed such an object name to the command
> in the first place.  Alternatively, perhaps the same idea as replacing
> one of the hexdigits with '*' used by the other configuration can be
> applied to this as well?

I had the same objection to zeroing out hashes, but this option is off
by default so I think it's OK.
If you enable both blame.markIgnoredLines and
blame.maskIgnoredUnblamables then the hash does appear as
"*0000000000" like you suggest. I think it's appropriate that the '*'
is only added if you opt in with the markIgnoredLines option.

If you only enable blame.markIgnoredLines then the hash for
"unblamable" lines appears as e.g. "*3252488f5" - this doesn't seem
right to me because the commit *wasn't* ignored, it is in fact the
commit in which that line was added. I think '*' should denote "this
information may be inaccurate" as that's what a typical user needs to
be aware of. However given that "unblamable" lines tend to be either
empty or a single character I'm not going to insist :)
Junio C Hamano April 14, 2019, 10:24 a.m. UTC | #3
Michael Platings <michael@platin.gs> writes:

> If you only enable blame.markIgnoredLines then the hash for
> "unblamable" lines appears as e.g. "*3252488f5" - this doesn't seem
> right to me because the commit *wasn't* ignored,

I think you misunderstood me.  I was merely suggesting to use the
approach to mark the line in a way other than using the NULLed out
object name that has been reserved for something totally different,
and hinting with "the same *idea*".

And that idea is not even original to this series; the "^" marker
that is used to say "the line is attributed to this commit, but that
may only be because you blamed with commit range A..B and we reached
the bottom of the range---if you dug further, you might find the
line originates from another commit" is the origin of the same idea,
and this topic borrows it and uses a different mark, i.e. '*', for
the "we are not certain---take this with grain of salt" mark.

If you ended up hitting the commit the user wanted to ignore,
perhaps you can find another character that is different from '^' or
'*' and use that, following the same idea.

That is what I meant.  So you shouldn't be worried about using the
same '*' making the result ambiguous.

By the way, a configuration only feature is something we usually do
not accept.  A feature must be guarded with --command-line-option
and then optionally can have a corresponding configuration once the
option proves to be useful enough that it becomes useful to be able
to say "in this repository (or to this user), the feature is on by
default".
Michael Platings April 14, 2019, 11:27 a.m. UTC | #4
On Sun, 14 Apr 2019 at 11:24, Junio C Hamano <gitster@pobox.com> wrote:
> > If you only enable blame.markIgnoredLines then the hash for
> > "unblamable" lines appears as e.g. "*3252488f5" - this doesn't seem
> > right to me because the commit *wasn't* ignored,
>
> I think you misunderstood me.  I was merely suggesting to use the
> approach to mark the line in a way other than using the NULLed out
> object name that has been reserved for something totally different,
> and hinting with "the same *idea*".

Hi Junio, that paragraph wasn't targetted at yourself, more a comment
on the functionality as it exists in the latest patch series. Sorry
for not making that clear.

> the "^" marker
> that is used to say "the line is attributed to this commit, but that
> may only be because you blamed with commit range A..B and we reached
> the bottom of the range---if you dug further, you might find the
> line originates from another commit" is the origin of the same idea,
> and this topic borrows it and uses a different mark, i.e. '*', for
> the "we are not certain---take this with grain of salt" mark.

So it sounds like we have many types of blame to consider:

1) This commit is truly the last one to touch this line, and you
didn't ask to ignore it.
2) This commit is truly the last one to touch this line, but you asked
to ignore it (AKA "unblamable").
3) This commit is at the bottom of the range of commits (^)
4) The "true" commit was ignored but we guess this is the one you're
actually interested in (*)
5) The "true" commit was ignored and we've reached the bottom of the
range of commits (^*)?
6) This commit is at the bottom of the range of commits, and you asked
to ignore it.

> If you ended up hitting the commit the user wanted to ignore,
> perhaps you can find another character that is different from '^' or
> '*' and use that, following the same idea.

I personally don't find the "unblamable" lines interesting enough to
justify giving them a symbol. But if Barret strongly feels that such
lines should get a '*' then I won't fight it - these lines tend to be
as simple as "}".

> By the way, a configuration only feature is something we usually do
> not accept.  A feature must be guarded with --command-line-option
> and then optionally can have a corresponding configuration once the
> option proves to be useful enough that it becomes useful to be able
> to say "in this repository (or to this user), the feature is on by
> default".

In that case we definitely need a --mark-ignored-lines option to git
blame, and I would strongly prefer that we also keep the
blame.markIgnoredLines option as I for one will be switching it on.
Barret Rhoden April 15, 2019, 1:51 p.m. UTC | #5
Hi -

On 4/14/19 7:27 AM, Michael Platings wrote:
> On Sun, 14 Apr 2019 at 11:24, Junio C Hamano <gitster@pobox.com> wrote:
>>> If you only enable blame.markIgnoredLines then the hash for
>>> "unblamable" lines appears as e.g. "*3252488f5" - this doesn't seem
>>> right to me because the commit *wasn't* ignored,
>>
>> I think you misunderstood me.  I was merely suggesting to use the
>> approach to mark the line in a way other than using the NULLed out
>> object name that has been reserved for something totally different,
>> and hinting with "the same *idea*".
> 
> Hi Junio, that paragraph wasn't targetted at yourself, more a comment
> on the functionality as it exists in the latest patch series. Sorry
> for not making that clear.
> 
>> the "^" marker
>> that is used to say "the line is attributed to this commit, but that
>> may only be because you blamed with commit range A..B and we reached
>> the bottom of the range---if you dug further, you might find the
>> line originates from another commit" is the origin of the same idea,
>> and this topic borrows it and uses a different mark, i.e. '*', for
>> the "we are not certain---take this with grain of salt" mark.
> 
> So it sounds like we have many types of blame to consider:
> 
> 1) This commit is truly the last one to touch this line, and you
> didn't ask to ignore it.
> 2) This commit is truly the last one to touch this line, but you asked
> to ignore it (AKA "unblamable").
> 3) This commit is at the bottom of the range of commits (^)
> 4) The "true" commit was ignored but we guess this is the one you're
> actually interested in (*)
> 5) The "true" commit was ignored and we've reached the bottom of the
> range of commits (^*)?
> 6) This commit is at the bottom of the range of commits, and you asked
> to ignore it.
> 
>> If you ended up hitting the commit the user wanted to ignore,
>> perhaps you can find another character that is different from '^' or
>> '*' and use that, following the same idea.
> 
> I personally don't find the "unblamable" lines interesting enough to
> justify giving them a symbol. But if Barret strongly feels that such
> lines should get a '*' then I won't fight it - these lines tend to be
> as simple as "}".

I'm fine with not zeroing the hash, so long as there's some way to mark it.

We could mark with another *, such that if we mark-ignored and 
mark-unblamable you get "**hash".  You can't have an unblamable that 
isn't from an ignored commit, so a single '*' has only one meaning, 
based on your config options.

If that works for you all, I can change this to markIgnoredUnblamables 
(instead of 'mask') in the next version.

>> By the way, a configuration only feature is something we usually do
>> not accept.  A feature must be guarded with --command-line-option
>> and then optionally can have a corresponding configuration once the
>> option proves to be useful enough that it becomes useful to be able
>> to say "in this repository (or to this user), the feature is on by
>> default".
> 
> In that case we definitely need a --mark-ignored-lines option to git
> blame, and I would strongly prefer that we also keep the
> blame.markIgnoredLines option as I for one will be switching it on.

I'd also keep this set.  I think the whole reason for these config 
options was that everyone has a different preference, but that 
preference rarely changes.  I don't want to have to type 
--mark-ignored-lines every time I run git blame.  If I had to, I'd have 
to alias git blame or something.

I think having config options for these sorts of things is fine, since 
we know already that for a given user+repo, we want the feature on (or 
off).  But if I have to remove it, then let me know.

Thanks,

Barret
diff mbox series

Patch

diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
index 8f155196c6fe..e7b8b5e4b87b 100644
--- a/Documentation/blame-options.txt
+++ b/Documentation/blame-options.txt
@@ -115,7 +115,11 @@  take effect.
 	change never happened.  Lines that were changed or added by an ignored
 	commit will be blamed on the previous commit that changed that line or
 	nearby lines.  This option may be specified multiple times to ignore
-	more than one revision.
+	more than one revision.  If the `blame.markIgnoredLines` config option
+	is set, then lines that were changed by an ignored commit will be
+	marked with a `*` in the blame output.  If the
+	`blame.maskIgnoredUnblamables` config option is set, then those lines that
+	we could not attribute to another revision are outputted as all zeros.
 
 --ignore-revs-file <file>::
 	Ignore revisions listed in `file`, one unabbreviated object name per line.
diff --git a/Documentation/config/blame.txt b/Documentation/config/blame.txt
index 4da2788f306d..bb6674227da1 100644
--- a/Documentation/config/blame.txt
+++ b/Documentation/config/blame.txt
@@ -26,3 +26,12 @@  blame.ignoreRevsFile::
 	`#` are ignored.  This option may be repeated multiple times.  Empty
 	file names will reset the list of ignored revisions.  This option will
 	be handled before the command line option `--ignore-revs-file`.
+
+blame.maskIgnoredUnblamables::
+	Output an object hash of all zeros for lines that were changed by an ignored
+	revision and that we could not attribute to another revision in the output
+	of linkgit:git-blame[1].
+
+blame.markIgnoredLines::
+	Mark lines that were changed by an ignored revision with a '*' in the
+	output of linkgit:git-blame[1].
diff --git a/blame.c b/blame.c
index 0bbb86ad5985..a98ae00e2cfc 100644
--- a/blame.c
+++ b/blame.c
@@ -481,6 +481,7 @@  void blame_coalesce(struct blame_scoreboard *sb)
 	for (ent = sb->ent; ent && (next = ent->next); ent = next) {
 		if (ent->suspect == next->suspect &&
 		    ent->s_lno + ent->num_lines == next->s_lno &&
+		    ent->ignored == next->ignored &&
 		    ent->unblamable == next->unblamable) {
 			ent->num_lines += next->num_lines;
 			ent->next = next->next;
@@ -733,6 +734,7 @@  static void split_overlap(struct blame_entry *split,
 	int chunk_end_lno;
 	memset(split, 0, sizeof(struct blame_entry [3]));
 
+	split[0].ignored = split[1].ignored = split[2].ignored = e->ignored;
 	split[0].unblamable = e->unblamable;
 	split[1].unblamable = e->unblamable;
 	split[2].unblamable = e->unblamable;
@@ -857,6 +859,7 @@  static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
 	struct blame_entry *n = xcalloc(1, sizeof(struct blame_entry));
 
 	n->suspect = new_suspect;
+	n->ignored = e->ignored;
 	n->unblamable = e->unblamable;
 	n->lno = e->lno + len;
 	n->s_lno = e->s_lno + len;
@@ -922,6 +925,7 @@  static void ignore_blame_entry(struct blame_entry *e,
 	struct blame_line_tracker *line_blames;
 	int entry_len, nr_lines, i;
 
+	e->ignored = 1;
 	line_blames = xcalloc(sizeof(struct blame_line_tracker),
 			      e->num_lines);
 	guess_line_blames(e, parent, target, offset, parent_slno, parent_len,
diff --git a/blame.h b/blame.h
index 91664913d7c4..53df8b4c5b3f 100644
--- a/blame.h
+++ b/blame.h
@@ -92,6 +92,7 @@  struct blame_entry {
 	 * scanning the lines over and over.
 	 */
 	unsigned score;
+	int ignored;
 	int unblamable;
 };
 
diff --git a/builtin/blame.c b/builtin/blame.c
index b48842d8459b..c10a6a802240 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -53,6 +53,8 @@  static int show_progress;
 static char repeated_meta_color[COLOR_MAXLEN];
 static int coloring_mode;
 static struct string_list ignore_revs_file_list = STRING_LIST_INIT_NODUP;
+static int mask_ignored_unblamables;
+static int mark_ignored_lines;
 
 static struct date_mode blame_date_mode = { DATE_ISO8601 };
 static size_t blame_date_width;
@@ -347,7 +349,7 @@  static void emit_porcelain(struct blame_scoreboard *sb, struct blame_entry *ent,
 	char hex[GIT_MAX_HEXSZ + 1];
 
 	oid_to_hex_r(hex, &suspect->commit->object.oid);
-	if (ent->unblamable)
+	if (mask_ignored_unblamables && ent->unblamable)
 		memset(hex, '0', strlen(hex));
 	printf("%s %d %d %d\n",
 	       hex,
@@ -482,7 +484,11 @@  static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 			}
 		}
 
-		if (ent->unblamable)
+		if (mark_ignored_lines && ent->ignored) {
+			length--;
+			putchar('*');
+		}
+		if (mask_ignored_unblamables && ent->unblamable)
 			memset(hex, '0', length);
 		printf("%.*s", length, hex);
 		if (opt & OUTPUT_ANNOTATE_COMPAT) {
@@ -710,6 +716,14 @@  static int git_blame_config(const char *var, const char *value, void *cb)
 		string_list_insert(&ignore_revs_file_list, str);
 		return 0;
 	}
+	if (!strcmp(var, "blame.maskignoredunblamables")) {
+		mask_ignored_unblamables = git_config_bool(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "blame.markignoredlines")) {
+		mark_ignored_lines = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "color.blame.repeatedlines")) {
 		if (color_parse_mem(value, strlen(value), repeated_meta_color))
 			warning(_("invalid color '%s' in color.blame.repeatedLines"),
diff --git a/t/t8013-blame-ignore-revs.sh b/t/t8013-blame-ignore-revs.sh
index df4993f98682..cc049a390b0d 100755
--- a/t/t8013-blame-ignore-revs.sh
+++ b/t/t8013-blame-ignore-revs.sh
@@ -53,6 +53,7 @@  test_expect_success ignore_rev_changing_lines '
 # to have nothing in common with "line-one" or "line-two", to keep any
 # heuristics from matching them with any lines in the parent.
 test_expect_success ignore_rev_adding_unblamable_lines '
+	git config --add blame.maskIgnoredUnblamables true &&
 	test_write_lines line-one-change line-two-changed y3 y4 >file &&
 	git add file &&
 	test_tick &&
@@ -123,6 +124,39 @@  test_expect_success bad_files_and_revs '
 	test_i18ngrep "Invalid object name: NOREV" err
 	'
 
+# Commit Z will touch the first two lines.  Y touched all four.
+# 	A--B--X--Y--Z
+# The blame output when ignoring Z should be:
+# ^Y ... 1)
+# ^Y ... 2)
+# Y  ... 3)
+# Y  ... 4)
+# We're checking only the first character
+test_expect_success mark_ignored_lines '
+	git config --add blame.markIgnoredLines true &&
+
+	test_write_lines line-one-Z line-two-Z y3 y4 >file &&
+	git add file &&
+	test_tick &&
+	git commit -m Z &&
+	git tag Z &&
+
+	git blame --ignore-rev Z file >blame_raw &&
+	echo "*" >expect &&
+
+	sed -n "1p" blame_raw | cut -c1 >actual &&
+	test_cmp expect actual &&
+
+	sed -n "2p" blame_raw | cut -c1 >actual &&
+	test_cmp expect actual &&
+
+	sed -n "3p" blame_raw | cut -c1 >actual &&
+	! test_cmp expect actual &&
+
+	sed -n "4p" blame_raw | cut -c1 >actual &&
+	! test_cmp expect actual
+	'
+
 # Resetting the repo and creating:
 #
 # A--B--M