Message ID | 20190117202919.157326-3-brho@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blame: add the ability to ignore commits | expand |
Hi Barret, On Thu, 17 Jan 2019, Barret Rhoden wrote: > [...] > > Users can ignore a revision with --ignore-rev=rev, which may be > repeated. They can specify a file of full SHA-1 hashes of revs (one per > line) with the blame.ignoreRevFile config option. They can also specify > a file with --ignore-rev-file=file, which overrides the config option. This sounds like that is already the case in Git, but the truth is: this patch *introduces* that feature. Maybe start the paragraph with "With this patch, ..."? I cannot speak for the correctness of the changes to blame.c, others on the Cc: list are already much more familiar with that code, so I'll leave it to them to comment on that. However, I am missing a regression test for this behavior, the best idea would be likely to add t/t8013-blame-ignore-revs.sh (copy-edit it from t/t8009-blame-vs-topicbranches.sh, maybe). And there is another thing: > diff --git a/builtin/blame.c b/builtin/blame.c > index 6d798f99392e..2f9183fb5fbd 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -52,6 +52,7 @@ static int no_whole_file_rename; > static int show_progress; > static char repeated_meta_color[COLOR_MAXLEN]; > static int coloring_mode; > +static const char *ignore_revs_file; ... this... > > static struct date_mode blame_date_mode = { DATE_ISO8601 }; > static size_t blame_date_width; > @@ -695,6 +696,8 @@ static int git_blame_config(const char *var, const char *value, void *cb) > parse_date_format(value, &blame_date_mode); > return 0; > } > + if (!strcmp(var, "blame.ignorerevsfile")) > + return git_config_pathname(&ignore_revs_file, var, value); ... this... > if (!strcmp(var, "color.blame.repeatedlines")) { > if (color_parse_mem(value, strlen(value), repeated_meta_color)) > warning(_("invalid color '%s' in color.blame.repeatedLines"), > @@ -806,6 +826,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) > OPT_BIT('s', NULL, &output_option, N_("Suppress author name and timestamp (Default: off)"), OUTPUT_NO_AUTHOR), > OPT_BIT('e', "show-email", &output_option, N_("Show author email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL), > OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace differences"), XDF_IGNORE_WHITESPACE), > + OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("Ignore <rev> when blaming")), > + OPT_STRING(0, "ignore-revs-file", &ignore_revs_file, N_("file"), N_("Ignore revisions from <file>")), ... and this change limit the user to specifying a single file, for no good reason. Worse: specifying two different files via two `--ignore-revs-file` parameters will only heed the latter and skip the former without any warning. A better idea IMHO would be to use an OPT_STRING_LIST() for `--ignore-revs-file`, too, and to allow for multiple `blame.ignoreRevsFile` config entries (with our usual trick of handling an empty setting by resetting the list of paths that were accumulated so far, see e.g. how `credential.helper` is handled). Ciao, Johannes > OPT_BIT(0, "color-lines", &output_option, N_("color redundant metadata from previous line differently"), OUTPUT_COLOR_LINE), > OPT_BIT(0, "color-by-age", &output_option, N_("color lines by age"), OUTPUT_SHOW_AGE_WITH_COLOR), > > @@ -995,6 +1017,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) > sb.contents_from = contents_from; > sb.reverse = reverse; > sb.repo = the_repository; > + build_ignorelist(&sb, &ignore_rev_list); > + string_list_clear(&ignore_rev_list, 0); > setup_scoreboard(&sb, path, &o); > lno = sb.num_lines; > > -- > 2.20.1.321.g9e740568ce-goog > >
On 2019-01-18 at 10:47 Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > ... and this change limit the user to specifying a single file, for no > good reason. Worse: specifying two different files via two > `--ignore-revs-file` parameters will only heed the latter and skip the > former without any warning. > > A better idea IMHO would be to use an OPT_STRING_LIST() for > `--ignore-revs-file`, too, and to allow for multiple > `blame.ignoreRevsFile` config entries (with our usual trick of handling an > empty setting by resetting the list of paths that were accumulated so > far, see e.g. how `credential.helper` is handled). I can do this if you all want, though I was modeling it off the behavior of -S, which also only takes a single file. Thanks, everyone, for all of the feedback. Barret
Am 17.01.2019 um 21:29 schrieb Barret Rhoden: > The blame_entry will get passed up the tree until we find a commit that > has a diff chunk that affects those lines. If an ignored commit added > more lines than it removed, the blame will fall on a commit that made a > change nearby. There is no general solution here, just a best-effort > approach. For a trivial example, consider ignoring this commit: > > Z: "Adding Lines" > foo > +No commit > +ever touched > +these lines > bar Wouldn't it make more sense to assign such lines to unknown, perhaps represented by an all-zero commit ID, instead of blaming a semi-random bystander? René
On 2019-01-20 at 19:19 René Scharfe <l.s.r@web.de> wrote: > Am 17.01.2019 um 21:29 schrieb Barret Rhoden: > > The blame_entry will get passed up the tree until we find a commit that > > has a diff chunk that affects those lines. If an ignored commit added > > more lines than it removed, the blame will fall on a commit that made a > > change nearby. There is no general solution here, just a best-effort > > approach. For a trivial example, consider ignoring this commit: > > > > Z: "Adding Lines" > > foo > > +No commit > > +ever touched > > +these lines > > bar > > Wouldn't it make more sense to assign such lines to unknown, perhaps > represented by an all-zero commit ID, instead of blaming a semi-random > bystander? I don't know if we can algorithmicly determine whether or not an assignment was the commit the user wanted. Maybe we could add a set of heuristics, like "any diff hunk that only added", as in my example. Maybe if count_a == 0 in blame_chunk_cb, that would work for that case; I'm not too familiar with that code. Either way, in my next commit I provide the option to mark the commit with a '*', which could serve the same purpose as the all-zero commit ID: flag the commit so the user knows it might not be correct. Barret
René Scharfe <l.s.r@web.de> writes: > Am 17.01.2019 um 21:29 schrieb Barret Rhoden: >> The blame_entry will get passed up the tree until we find a commit that >> has a diff chunk that affects those lines. If an ignored commit added >> more lines than it removed, the blame will fall on a commit that made a >> change nearby. There is no general solution here, just a best-effort >> approach. For a trivial example, consider ignoring this commit: >> >> Z: "Adding Lines" >> foo >> +No commit >> +ever touched >> +these lines >> bar > > Wouldn't it make more sense to assign such lines to unknown, perhaps > represented by an all-zero commit ID, instead of blaming a semi-random > bystander? I share the sentiment. Isn't it, however, showing a bigger problem in the "feature"? Your "a change that adds lines without removing any" is an obvious case where these added lines have no corresponding lines in the preimage, but most of the time it is unclear what line corresponds to what previous line. If a commit being "ignored" brought a change like this: 1 -four -three +3 +4 5 did "+3" come from "-three"? Or did "+4" (read: "added '4'") come from "-three" (read: "removed 'three'")? Did it come from "-four"? Or was it genuinely added by that ignored commit? Your suggestion deals with the case where we decide that "+4" had no corresponding lines in the preimage (and paint it as "no blame can be assigned"). But when we decide that "+4" came from "-four" or "-three", we continue drilling down from that ignored commit and assign the blame to either the commit that introduced "four" or the commit that introduced "three", which would obviously give us different result. Worse yet, if a reader expected us to consider "+4" came from "-four" at that ignored commit, but the algorithm decided that "+4" corresponded to "-three", when we show the commit that eventually gets blamed for that line that has "4" has no "four" (it has "three"), which I suspect would confuse the reader of the output. So... I dunno.
On 2019-01-22 at 10:14 Junio C Hamano <gitster@pobox.com> wrote: > René Scharfe <l.s.r@web.de> writes: > > > Am 17.01.2019 um 21:29 schrieb Barret Rhoden: > >> The blame_entry will get passed up the tree until we find a commit that > >> has a diff chunk that affects those lines. If an ignored commit added > >> more lines than it removed, the blame will fall on a commit that made a > >> change nearby. There is no general solution here, just a best-effort > >> approach. For a trivial example, consider ignoring this commit: > >> > >> Z: "Adding Lines" > >> foo > >> +No commit > >> +ever touched > >> +these lines > >> bar > > > > Wouldn't it make more sense to assign such lines to unknown, perhaps > > represented by an all-zero commit ID, instead of blaming a semi-random > > bystander? > > I share the sentiment. > > Isn't it, however, showing a bigger problem in the "feature"? > > Your "a change that adds lines without removing any" is an obvious > case where these added lines have no corresponding lines in the > preimage, but most of the time it is unclear what line corresponds > to what previous line. If a commit being "ignored" brought a change > like this: > > 1 > -four > -three > +3 > +4 > 5 > > did "+3" come from "-three"? > > Or did "+4" (read: "added '4'") come from "-three" (read: "removed > 'three'")? Did it come from "-four"? Or was it genuinely added by > that ignored commit? Your suggestion deals with the case where we > decide that "+4" had no corresponding lines in the preimage (and > paint it as "no blame can be assigned"). But when we decide that > "+4" came from "-four" or "-three", we continue drilling down from > that ignored commit and assign the blame to either the commit that > introduced "four" or the commit that introduced "three", which would > obviously give us different result. Worse yet, if a reader expected > us to consider "+4" came from "-four" at that ignored commit, but > the algorithm decided that "+4" corresponded to "-three", when we > show the commit that eventually gets blamed for that line that has > "4" has no "four" (it has "three"), which I suspect would confuse > the reader of the output. > > So... I dunno. I guess if you swap the lines as well as change them, then we're not going to be able to detect that. Just to be clear, if you did this: Commit A: --------- other_stuff +one other_stuff Commit B: --------- other_stuff one +two other_stuff Commit C: --------- other_stuff -one -two +1 +2 other_stuff And ignore commit C, my change will properly identify commit A and B, e.g.: OTHER_HASH Author Date 11) other_stuff *A_HASH Author Date 12) 1 *B_HASH Author Date 13) 2 OTHER_HASH Author Date 14) other_stuff But if you swapped the lines in addition to change names to numbers: Commit C-swap: -------------- other_stuff -one -two +2 +1 other_stuff Then it won't have the semantic knowledge that "one" == "1". If a user is ignoring a commit, we don't have an oracle that knows exactly what that commit did to determine what commit the user wants blamed. The current change attempts to find the last commit that touched a line. Barret
Barret Rhoden <brho@google.com> writes: >> So... I dunno. > > I guess if you swap the lines as well as change them,... > ... Then it won't have the semantic knowledge that "one" == "1". If a user > is ignoring a commit, we don't have an oracle that knows exactly what > that commit did to determine what commit the user wants blamed. Yeah, and if the original had two adjacent lines, and replacement has three adjacent lines, the algorithm would not even know if - the first line in the original was split into first two in the update and the second line was modified in place; or - the first line in the original was modified in place and the second line was split into the latter two lines in the update In short, there is no answer to "what is the corresponding line of this line before this commit changed it?" in general, and any algorithm, as long as it tries to see what was the "corresponding line" of the line that is blamed to a commit, would not produce results human readers would expect all the time. As you said, heuristics may get us far enough to be useful, though ;-).
On 2019-01-23 at 11:26 Junio C Hamano <gitster@pobox.com> wrote: > Yeah, and if the original had two adjacent lines, and replacement > has three adjacent lines, the algorithm would not even know if > > - the first line in the original was split into first two in the > update and the second line was modified in place; or > > - the first line in the original was modified in place and the > second line was split into the latter two lines in the update > > In short, there is no answer to "what is the corresponding line of > this line before this commit changed it?" in general, and any > algorithm, as long as it tries to see what was the "corresponding > line" of the line that is blamed to a commit, would not produce > results human readers would expect all the time. > > As you said, heuristics may get us far enough to be useful, though > ;-). Yeah. We can do one more thing: when we ignore a change that added more lines than it removed, we can at least not report totally unrelated commits. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) void new_func_1(void *x, void *y); commit-b 12) void new_func_2(void *x, void *y); commit-c 13) some_line_c commit-d 14) some_line_d After a commit 'X', we have: commit-X 11) void new_func_1(void *x, commit-X 12) void *y); commit-X 13) void new_func_2(void *x, commit-X 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d In my existing code, if you ignore commit X, the blames look like this: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); commit-c 13) void new_func_2(void *x, commit-d 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Lines 13 and 14 are blamed on the nearby commits C and D. The reason is the blame entry for X is four lines long, rooted at line 11, but when we look back through the history, we'll look at the parent's image of the file where that diff hunk is only two lines long. The extra two lines just blame whatever follows the diff hunk in the parent's image. In this case, it is just the lines C and D repeated again. I can detect this situation when we ignore the diffs from commit X. If X added more lines than it removed, then I only pass the number of lines to the parent that the parent had. The rest get their own blame_entry, marked 'unblamable', which I'll catch when we create the output. The parent can't find blame for lines that don't exist in its image of the file. With that change, the above example blames like this: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); 00000000 13) void new_func_2(void *x, 00000000 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d As we discussed, we still can never be certain about which commits *should* be blamed for which lines. (Note that line 12 was blamed on B, though B was the commit for new_func_2(), not new_func_1()). But I can avoid blaming commits that just happen to be in the area and would only be 'correct' due to dumb luck. This also handles cases where an ignored commit only adds lines, which is a specific case of "added more than removed." Handling this case is a surprisingly small change. I'll post it once I sort out the tests and other comments on this patch series. For now, I went with unconditionally marking the commit as all 0s for the case where we know it is 'wrong.' I can do that conditionally based on blame.markIgnoredLines if you want, though I think any commit attributed to those lines would be misleading. Thanks, Barret
diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt index dc41957afab2..424a63f0b45c 100644 --- a/Documentation/blame-options.txt +++ b/Documentation/blame-options.txt @@ -110,5 +110,18 @@ commit. And the default value is 40. If there are more than one `-C` options given, the <num> argument of the last `-C` will take effect. +--ignore-rev <rev>:: + Ignore changes made by the revision when assigning blame, as if the + change never happened. Lines that were changed or added by an ignored + commit will be blamed on the previous commit that changed that line or + nearby lines. This option may be specified multiple times to ignore + more than one revision. + +--ignore-revs-file <file>:: + Ignore revisions listed in `file`, one full SHA-1 hash per line. + Whitespace and comments beginning with `#` are ignored. This overrides + the `blame.ignoreRevsFile` config option, which specifies a default + file. + -h:: Show help message. diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index 16323eb80e31..7e8154199635 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -10,6 +10,7 @@ SYNOPSIS [verse] 'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental] [-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>] + [--ignore-rev <rev>] [--ignore-revs-file <file>] [--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>] [--] <file> diff --git a/blame.c b/blame.c index d84c93778080..0b91fba2d04c 100644 --- a/blame.c +++ b/blame.c @@ -845,10 +845,10 @@ static struct blame_entry *reverse_blame(struct blame_entry *head, */ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, int tlno, int offset, int same, - struct blame_origin *parent) + struct blame_origin *parent, int ignore_diffs) { struct blame_entry *e = **srcq; - struct blame_entry *samep = NULL, *diffp = NULL; + struct blame_entry *samep = NULL, *diffp = NULL, *ignoredp = NULL; while (e && e->s_lno < tlno) { struct blame_entry *next = e->next; @@ -925,10 +925,23 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, n->next = samep; samep = n; } - e->next = diffp; - diffp = e; + if (ignore_diffs) { + /* These go to the parent, like the ones before tlno. */ + blame_origin_decref(e->suspect); + e->suspect = blame_origin_incref(parent); + e->s_lno += offset; + e->next = ignoredp; + ignoredp = e; + } else { + e->next = diffp; + diffp = e; + } e = next; } + if (ignoredp) { + **dstq = reverse_blame(ignoredp, **dstq); + *dstq = &ignoredp->next; + } **srcq = reverse_blame(diffp, reverse_blame(samep, e)); /* Move across elements that are in the unblamable portion */ if (diffp) @@ -938,6 +951,7 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, struct blame_chunk_cb_data { struct blame_origin *parent; long offset; + int ignore_diffs; struct blame_entry **dstq; struct blame_entry **srcq; }; @@ -950,7 +964,7 @@ static int blame_chunk_cb(long start_a, long count_a, if (start_a - start_b != d->offset) die("internal error in blame::blame_chunk_cb"); blame_chunk(&d->dstq, &d->srcq, start_b, start_a - start_b, - start_b + count_b, d->parent); + start_b + count_b, d->parent, d->ignore_diffs); d->offset = start_a + count_a - (start_b + count_b); return 0; } @@ -962,7 +976,7 @@ static int blame_chunk_cb(long start_a, long count_a, */ static void pass_blame_to_parent(struct blame_scoreboard *sb, struct blame_origin *target, - struct blame_origin *parent) + struct blame_origin *parent, int ignore_diffs) { mmfile_t file_p, file_o; struct blame_chunk_cb_data d; @@ -973,6 +987,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb, d.parent = parent; d.offset = 0; + d.ignore_diffs = ignore_diffs; d.dstq = &newdest; d.srcq = &target->suspects; fill_origin_blob(&sb->revs->diffopt, parent, &file_p, &sb->num_read_blob); @@ -984,7 +999,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb, oid_to_hex(&parent->commit->object.oid), oid_to_hex(&target->commit->object.oid)); /* The rest are the same as the parent */ - blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent); + blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent, 0); *d.dstq = NULL; queue_blames(sb, parent, newdest); @@ -1489,11 +1504,28 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, blame_origin_incref(porigin); origin->previous = porigin; } - pass_blame_to_parent(sb, origin, porigin); + pass_blame_to_parent(sb, origin, porigin, 0); if (!origin->suspects) goto finish; } + /* + * Pass remaining suspects for ignored commits to their parents. + */ + if (oidset_contains(&sb->ignore_list, &commit->object.oid)) { + for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse); + i < num_sg && sg; + sg = sg->next, i++) { + struct blame_origin *porigin = sg_origin[i]; + + if (!porigin) + continue; + pass_blame_to_parent(sb, origin, porigin, 1); + if (!origin->suspects) + goto finish; + } + } + /* * Optionally find moves in parents' files. */ diff --git a/blame.h b/blame.h index be3a895043e0..086b92915e4b 100644 --- a/blame.h +++ b/blame.h @@ -117,6 +117,8 @@ struct blame_scoreboard { /* linked list of blames */ struct blame_entry *ent; + struct oidset ignore_list; + /* look-up a line in the final buffer */ int num_lines; int *lineno; diff --git a/builtin/blame.c b/builtin/blame.c index 6d798f99392e..2f9183fb5fbd 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -52,6 +52,7 @@ static int no_whole_file_rename; static int show_progress; static char repeated_meta_color[COLOR_MAXLEN]; static int coloring_mode; +static const char *ignore_revs_file; static struct date_mode blame_date_mode = { DATE_ISO8601 }; static size_t blame_date_width; @@ -695,6 +696,8 @@ static int git_blame_config(const char *var, const char *value, void *cb) parse_date_format(value, &blame_date_mode); return 0; } + if (!strcmp(var, "blame.ignorerevsfile")) + return git_config_pathname(&ignore_revs_file, var, value); if (!strcmp(var, "color.blame.repeatedlines")) { if (color_parse_mem(value, strlen(value), repeated_meta_color)) warning(_("invalid color '%s' in color.blame.repeatedLines"), @@ -774,6 +777,22 @@ static int is_a_rev(const char *name) return OBJ_NONE < oid_object_info(the_repository, &oid, NULL); } +static void build_ignorelist(struct blame_scoreboard *sb, + struct string_list *ignore_rev_list) +{ + struct string_list_item *i; + struct object_id oid; + + oidset_init(&sb->ignore_list, 0); + if (ignore_revs_file) + oidset_parse_file(&sb->ignore_list, ignore_revs_file); + for_each_string_list_item(i, ignore_rev_list) { + if (get_oid_committish(i->string, &oid)) + die(_("Can't find revision '%s' to ignore"), i->string); + oidset_insert(&sb->ignore_list, &oid); + } +} + int cmd_blame(int argc, const char **argv, const char *prefix) { struct rev_info revs; @@ -785,6 +804,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct progress_info pi = { NULL, 0 }; struct string_list range_list = STRING_LIST_INIT_NODUP; + struct string_list ignore_rev_list = STRING_LIST_INIT_NODUP; int output_option = 0, opt = 0; int show_stats = 0; const char *revs_file = NULL; @@ -806,6 +826,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) OPT_BIT('s', NULL, &output_option, N_("Suppress author name and timestamp (Default: off)"), OUTPUT_NO_AUTHOR), OPT_BIT('e', "show-email", &output_option, N_("Show author email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL), OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace differences"), XDF_IGNORE_WHITESPACE), + OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("Ignore <rev> when blaming")), + OPT_STRING(0, "ignore-revs-file", &ignore_revs_file, N_("file"), N_("Ignore revisions from <file>")), OPT_BIT(0, "color-lines", &output_option, N_("color redundant metadata from previous line differently"), OUTPUT_COLOR_LINE), OPT_BIT(0, "color-by-age", &output_option, N_("color lines by age"), OUTPUT_SHOW_AGE_WITH_COLOR), @@ -995,6 +1017,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.contents_from = contents_from; sb.reverse = reverse; sb.repo = the_repository; + build_ignorelist(&sb, &ignore_rev_list); + string_list_clear(&ignore_rev_list, 0); setup_scoreboard(&sb, path, &o); lno = sb.num_lines;
Commits that make formatting changes or function renames are often not interesting when blaming a file. A user may deem such a commit as 'not interesting' and want to ignore and its changes it when assigning blame. For example, say a file has the following git history / rev-list: ---O---A---X---B---C---D---Y---E---F Commits X and Y both touch a particular line, and the other commits do not: X: "Take a third parameter" -MyFunc(1, 2); +MyFunc(1, 2, 3); Y: "Remove camelcase" -MyFunc(1, 2, 3); +my_func(1, 2, 3); git-blame will blame Y for the change. I'd like to be able to ignore Y: both the existence of the commit as well as any changes it made. This differs from -S rev-list, which specifies the list of commits to process for the blame. We still process Y, we just don't let the blame 'stick.' Users can ignore a revision with --ignore-rev=rev, which may be repeated. They can specify a file of full SHA-1 hashes of revs (one per line) with the blame.ignoreRevFile config option. They can also specify a file with --ignore-rev-file=file, which overrides the config option. For a typical use case, projects will maintain the file containing revisions for commits that perform mass reformatting, and their users have the optional to ignore all of the commits in that file. Additionally, a user can use the --ignore-rev option for one-off investigation. To go back to the example above, X was a substantive change to the function, but not the change the user is interested in. The user inspected X, but wanted to find the previous change to that line - perhaps a commit that introduced that function call. To make this work, we can't simply remove all ignored commits from the rev-list. We need to diff the changes introduced by Y so that we can ignore them. We let the blames get passed to Y, just like when processing normally. When Y is the target, we make sure that Y does not *keep* any blames. Any changes that Y is responsible for get passed to its parent. Note we make one pass through all of the scapegoats (parents) to attempt to pass blame normally; we don't know if we *need* to ignore the commit until we've checked all of the parents. The blame_entry will get passed up the tree until we find a commit that has a diff chunk that affects those lines. If an ignored commit added more lines than it removed, the blame will fall on a commit that made a change nearby. There is no general solution here, just a best-effort approach. For a trivial example, consider ignoring this commit: Z: "Adding Lines" foo +No commit +ever touched +these lines bar Signed-off-by: Barret Rhoden <brho@google.com> --- Documentation/blame-options.txt | 13 +++++++++ Documentation/git-blame.txt | 1 + blame.c | 48 +++++++++++++++++++++++++++------ blame.h | 2 ++ builtin/blame.c | 24 +++++++++++++++++ 5 files changed, 80 insertions(+), 8 deletions(-)