Message ID | ed71913886e19ccc276b382de707b4bab7834d12.1630376800.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add a new --remerge-diff capability to show & log | expand |
On Tue, Aug 31, 2021 at 02:26:35AM +0000, Elijah Newren via GitGitGadget wrote: > There are several considerations here: > * We have to pick file(s) where we write these conflict messages too > * We want to make it as clear as possible that it's not a real file > but a set of messages about another file > * We want conflict messages about a file to appear near the file in > question in a diff, preferably immediately preceding the file in > question > * Related to the above, per-file conflict messages are preferred > over lumping all conflict messages into one big file > > To achive the above: > * We put the conflict messages for $filename in > $filename[0:-1] + " " + $filename[-1] + ".conflict_msg" > or, in words, we insert a space before the final character of > the filename and then also add ".conflict_msg" at the end. It took me a minute to understand the space thing. I thought at first it was about avoiding conflicts with existing names (and while it might help in practice, it's not a guarantee). But I think it's about the "appear preceding the file" goal. The space sorts before any other printable character in the final position. That's...simultaneously clever and gross. My biggest complaint is that the space looks like a bug in the output. Using another character like "." might not be too bad, as it's also fairly early in the ascii table. But it's really this "do it before the last character" thing that is key to getting the ordering right. Just brainstorming some alternatives: - we have diff.orderFile, etc. Could we stuff this data into a less confusing name (even just "$filename.conflict_msg"), and then provide a custom ordering to the diff code? I think it could be done by generating a static ordering ahead of time, but it might even just be possible to tell diffcore_order() to take the ".conflict_msg" extension into account in its comparison function. - there can be other non-diff data between the individual segments. For example, "patch" will skip over non-diff lines. And certainly in Git we have our own custom headers. I'm wondering if we could attach these annotations to the diff-pair somehow, and then show something like: diff --git a/foo.c b/foo.c index 1234abcd..5678cdef 100644 conflict modify/delete foo.c --- a/foo.c +++ b/foo.c @@ some actual diff starts here @@ Obviously such a thing can't really be applied. But then you wouldn't want to apply the addition of "my.file e.conflict_msg" either. I dunno. The latter especially is definitely more work, and requires a bit more cooperation between the merge and diff code. In particular, you can't just feed a straight tree to the diff anymore. We have to hold back the annotations, and then apply them to the resulting diff. But I think the output is much more pleasing to the eye. -Peff
On Tue, Sep 28, 2021 at 3:29 PM Jeff King <peff@peff.net> wrote: > > On Tue, Aug 31, 2021 at 02:26:35AM +0000, Elijah Newren via GitGitGadget wrote: > > > There are several considerations here: > > * We have to pick file(s) where we write these conflict messages too > > * We want to make it as clear as possible that it's not a real file > > but a set of messages about another file > > * We want conflict messages about a file to appear near the file in > > question in a diff, preferably immediately preceding the file in > > question > > * Related to the above, per-file conflict messages are preferred > > over lumping all conflict messages into one big file > > > > To achive the above: > > * We put the conflict messages for $filename in > > $filename[0:-1] + " " + $filename[-1] + ".conflict_msg" > > or, in words, we insert a space before the final character of > > the filename and then also add ".conflict_msg" at the end. > > It took me a minute to understand the space thing. I thought at first it > was about avoiding conflicts with existing names (and while it might > help in practice, it's not a guarantee). But I think it's about the > "appear preceding the file" goal. The space sorts before any other > printable character in the final position. Yeah, it's all about the ordering. I guess it helps slightly with conflict avoidance, but I cannot rely on it; I have to check for colliding files and potentially tweak the filename further. > That's...simultaneously clever and gross. My biggest complaint is that > the space looks like a bug in the output. Junio had basically the same reaction[*]. :-) [*] https://lore.kernel.org/git/xmqqk0k0qkmv.fsf@gitster.g/ > Using another character like "." might not be too bad, as it's also > fairly early in the ascii table. But it's really this "do it before the > last character" thing that is key to getting the ordering right. > > Just brainstorming some alternatives: > > - we have diff.orderFile, etc. Could we stuff this data into a less > confusing name (even just "$filename.conflict_msg"), and then provide > a custom ordering to the diff code? I think it could be done by > generating a static ordering ahead of time, but it might even just be > possible to tell diffcore_order() to take the ".conflict_msg" > extension into account in its comparison function. I can't just go on the ".conflict_msg" extension. As you noted above, this scheme is not sufficient for avoiding collisions. So I need to append extra "cruft" to the name in the case of collisions -- meaning we can't special case on just that extension. I also don't like how diff.orderFile provides a global ordering of the files listed, rather than providing some scheme for relative orderings. That'd either force me to precompute the diff to determine all the files that were different so I can list _all_ of them, or put up with the fact that the files with non-content conflicts will be listed very first in the output, even if their name is 'zee-last-file.c' -- surprising users at the output ordering. This also means that if the user had their own ordering defined, then I'm overriding it and messing up their ordering, which might be problematic. So, I'm not so sure about this solution; it feels like it introduces bigger holes than the ugly space character it is fixing. > - there can be other non-diff data between the individual segments. For > example, "patch" will skip over non-diff lines. And certainly in Git > we have our own custom headers. I'm wondering if we could attach > these annotations to the diff-pair somehow, and then show something > like: > > diff --git a/foo.c b/foo.c > index 1234abcd..5678cdef 100644 > conflict modify/delete foo.c A couple things here... First, I'm not so sure I like the abbreviation here. Just knowing "modify/delete" might be enough in some cases, but I'd rather have the full messages that would have been printed to the console, e.g.: CONFLICT (modify/delete): foo.c deleted in HASH1 (SHORT SUMMARY1) and modified in HASH2 (SHORT SUMMARY 2). Version HASH2 (SHORT SUMMARY2) of foo.c left in tree. because I think the commit references are useful context. That extra context might be of little use for many modify/delete conflicts, but is much more important for conflicts involving renames; e.g. "rename/rename" is much less useful than being able to know the original name of the file and with which parent commit each filename is associated. So, that raises the question: could we pack all that information from the full conflict notice into these conflict header(s)? (And do we have to special case the code to print it all on one line when doing the remerge-diff since the diff output needs them to be one-line headers?) Second, what about when there are multiple non-content conflict types for a single file, e.g. rename/delete + rename/add + modify/delete + mode conflict + unmergeable binary? (Yes, I think it's possible for one path to have all five of those: (1) source file deleted on one side, renamed on the other, (2) rename target on one side matches new file added on other side of history, (3) renamed file also had its contents modified, thus modify vs. delete, (4) added file on other side of history had a different mode, (5) added file on other side of history is a binary.) Do we just use multiple conflict headers? Third, what about the cases where there is no diff, just conflict headers? (I suspect many modify/delete or rename/delete or binary files may end up in such a situation.) I don't think any of those are deal breakers, but it means more work, and maybe also other forms of ugliness. > --- a/foo.c > +++ b/foo.c > @@ some actual diff starts here @@ > > Obviously such a thing can't really be applied. But then you wouldn't > want to apply the addition of "my.file e.conflict_msg" either. Nit: "my.fil e.conflict_msg", not "my.file e.conflict_msg" (the 'e' in 'file' is not repeated, otherwise the auxiliary file wouldn't sort before its companion file) > I dunno. The latter especially is definitely more work, and requires a > bit more cooperation between the merge and diff code. In particular, you > can't just feed a straight tree to the diff anymore. We have to hold > back the annotations, and then apply them to the resulting diff. But I > think the output is much more pleasing to the eye. It's certainly an interesting idea. It's a lot more work, it involves the inability to feed a straight tree to a diff would require piping things through several different layers (merge -> log -> diff, and possible multiple diff layers), it may mean we need special handling for when there are only conflict headers for a file with no file differences, the length of the conflict headers could be comically long, and it's all essentially for what is a rather uncommon case anyway. But, on the plus side, it does avoid the rather ugly space. I'll have to think about it.
Elijah Newren <newren@gmail.com> writes: > I also don't like how diff.orderFile provides a global ordering of the > files listed, rather than providing some scheme for relative > orderings. That'd either force me to precompute the diff to determine > all the files that were different so I can list _all_ of them,... Don't we determine all the files that would be listed in the diff anyway? The diffcore pipeline collects all the different filepairs, matches them up for rename/copy detection, and finally do the output formatting for each individual filepair. > So, I'm not so sure about this solution; it feels like it introduces > bigger holes than the ugly space character it is fixing. > >> - there can be other non-diff data between the individual segments. For >> example, "patch" will skip over non-diff lines. And certainly in Git >> we have our own custom headers. I'm wondering if we could attach >> these annotations to the diff-pair somehow, and then show something >> like: >> >> diff --git a/foo.c b/foo.c >> index 1234abcd..5678cdef 100644 >> conflict modify/delete foo.c > > A couple things here... > > First, I'm not so sure I like the abbreviation here. Just knowing > "modify/delete" might be enough in some cases, but I'd rather have the > full messages that would have been printed to the console... > > Second, what about when there are multiple ... > > Third, what about the cases where there is no diff, ... None of the above seems like a problem to me at least from the presentation and consumption sides. There is no rule that extended diff headers has to fit on a 72-char line, cannot use line folding by inserting LF-SP in the middle of a logical line, and there already is at least one case we give an extended diff header without a single line of content change (namely, "chmod +x README"). Thanks.
On Wed, Sep 29, 2021 at 9:14 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > I also don't like how diff.orderFile provides a global ordering of the > > files listed, rather than providing some scheme for relative > > orderings. That'd either force me to precompute the diff to determine > > all the files that were different so I can list _all_ of them,... > > Don't we determine all the files that would be listed in the diff > anyway? The diffcore pipeline collects all the different filepairs, > matches them up for rename/copy detection, and finally do the output > formatting for each individual filepair. merge-ort is missing one side of the diff. It only has the parents of the merge commit, and their merge base. So, merge-ort would now need to know about an additional commit (the original merge), and compute the diff between that and the merge-result it is creating, in advance of an external caller (log) calling diff to compute the result between those two trees. > > So, I'm not so sure about this solution; it feels like it introduces > > bigger holes than the ugly space character it is fixing. > > > >> - there can be other non-diff data between the individual segments. For > >> example, "patch" will skip over non-diff lines. And certainly in Git > >> we have our own custom headers. I'm wondering if we could attach > >> these annotations to the diff-pair somehow, and then show something > >> like: > >> > >> diff --git a/foo.c b/foo.c > >> index 1234abcd..5678cdef 100644 > >> conflict modify/delete foo.c > > > > A couple things here... > > > > First, I'm not so sure I like the abbreviation here. Just knowing > > "modify/delete" might be enough in some cases, but I'd rather have the > > full messages that would have been printed to the console... > > > > Second, what about when there are multiple ... > > > > Third, what about the cases where there is no diff, ... > > None of the above seems like a problem to me at least from the > presentation and consumption sides. There is no rule that extended > diff headers has to fit on a 72-char line, cannot use line folding > by inserting LF-SP in the middle of a logical line, and there > already is at least one case we give an extended diff header without > a single line of content change (namely, "chmod +x README"). Cool, that's good to hear. I'll look into it.
On Tue, Sep 28, 2021 at 11:25:20PM -0700, Elijah Newren wrote: > > Just brainstorming some alternatives: > > > > - we have diff.orderFile, etc. Could we stuff this data into a less > > confusing name (even just "$filename.conflict_msg"), and then provide > > a custom ordering to the diff code? I think it could be done by > > generating a static ordering ahead of time, but it might even just be > > possible to tell diffcore_order() to take the ".conflict_msg" > > extension into account in its comparison function. > > I can't just go on the ".conflict_msg" extension. As you noted above, > this scheme is not sufficient for avoiding collisions. So I need to > append extra "cruft" to the name in the case of collisions -- meaning > we can't special case on just that extension. Sure, but we can call it filename.conflict_msg.1, etc, and the sort code can pattern-match. It can never be fully robust (if you really did have a foo.conflict_msg, we'd sort it differently), but I think it's OK if the worst-case is that pathological trees get ordered slightly sub-optimally). That said, I think it could also make sense to annotate the conflict files by giving them an unusual set of mode bits. That would be easier to recognize (and while real trees _could_ have silly modes, we do complain about them in fsck, so it shouldn't happen in practice). > I also don't like how diff.orderFile provides a global ordering of the > files listed, rather than providing some scheme for relative > orderings. That'd either force me to precompute the diff to determine > all the files that were different so I can list _all_ of them, or put > up with the fact that the files with non-content conflicts will be > listed very first in the output, even if their name is > 'zee-last-file.c' -- surprising users at the output ordering. > > This also means that if the user had their own ordering defined, then > I'm overriding it and messing up their ordering, which might be > problematic. Agreed. I do think it may be too horrible unless you teach diffcore_order() to actually understand your annotations in code. But that wouldn't be too hard if it's done in the mode bits. But I do think anything that avoids these pseudo-files is going to be a lot cleaner overall. > > - there can be other non-diff data between the individual segments. For > > example, "patch" will skip over non-diff lines. And certainly in Git > > we have our own custom headers. I'm wondering if we could attach > > these annotations to the diff-pair somehow, and then show something > > like: > > > > diff --git a/foo.c b/foo.c > > index 1234abcd..5678cdef 100644 > > conflict modify/delete foo.c > > A couple things here... I think Junio already indicated that we can pretty much make this look like whatever we want. As soon as any reader sees "conflict", it should happily ignore the rest as something it doesn't know about. And my short example here was just meant to be illustrative. I agree it probably needs more details (and the whole CONFLICT line that usually goes to stderr is probably the best thing). -Peff
On Thu, Sep 30 2021, Jeff King wrote: > [...] > That said, I think it could also make sense to annotate the conflict > files by giving them an unusual set of mode bits. That would be easier > to recognize (and while real trees _could_ have silly modes, we do > complain about them in fsck, so it shouldn't happen in practice). We don't complain about them in fsck. I've been meaning to get to fixing it, but my fixes were contingent on the rather big "tree walk mode bits" series: https://lore.kernel.org/git/87wnu0r8tq.fsf@evledraar.gmail.com/
On Thu, Sep 30, 2021 at 12:58 AM Jeff King <peff@peff.net> wrote: > > On Tue, Sep 28, 2021 at 11:25:20PM -0700, Elijah Newren wrote: > > > > Just brainstorming some alternatives: > > > > > > - we have diff.orderFile, etc. Could we stuff this data into a less > > > confusing name (even just "$filename.conflict_msg"), and then provide > > > a custom ordering to the diff code? I think it could be done by > > > generating a static ordering ahead of time, but it might even just be > > > possible to tell diffcore_order() to take the ".conflict_msg" > > > extension into account in its comparison function. > > > > I can't just go on the ".conflict_msg" extension. As you noted above, > > this scheme is not sufficient for avoiding collisions. So I need to > > append extra "cruft" to the name in the case of collisions -- meaning > > we can't special case on just that extension. > > Sure, but we can call it filename.conflict_msg.1, etc, and the sort code > can pattern-match. It can never be fully robust (if you really did have > a foo.conflict_msg, we'd sort it differently), but I think it's OK if > the worst-case is that pathological trees get ordered slightly > sub-optimally). > > That said, I think it could also make sense to annotate the conflict > files by giving them an unusual set of mode bits. That would be easier > to recognize (and while real trees _could_ have silly modes, we do > complain about them in fsck, so it shouldn't happen in practice). I tried giving things unusual mode bits in early versions of merge-ort for other reasons. It doesn't work: canon_mode() will "fix" the unusualness so there's no way for the reader to know that they had unusual bits. But, as you said later, avoiding these pseudo-files is going to be cleaner anyway, so let's just do that. > > I also don't like how diff.orderFile provides a global ordering of the > > files listed, rather than providing some scheme for relative > > orderings. That'd either force me to precompute the diff to determine > > all the files that were different so I can list _all_ of them, or put > > up with the fact that the files with non-content conflicts will be > > listed very first in the output, even if their name is > > 'zee-last-file.c' -- surprising users at the output ordering. > > > > This also means that if the user had their own ordering defined, then > > I'm overriding it and messing up their ordering, which might be > > problematic. > > Agreed. I do think it may be too horrible unless you teach > diffcore_order() to actually understand your annotations in code. But > that wouldn't be too hard if it's done in the mode bits. > > But I do think anything that avoids these pseudo-files is going to be a > lot cleaner overall. > > > > - there can be other non-diff data between the individual segments. For > > > example, "patch" will skip over non-diff lines. And certainly in Git > > > we have our own custom headers. I'm wondering if we could attach > > > these annotations to the diff-pair somehow, and then show something > > > like: > > > > > > diff --git a/foo.c b/foo.c > > > index 1234abcd..5678cdef 100644 > > > conflict modify/delete foo.c > > > > A couple things here... > > I think Junio already indicated that we can pretty much make this look > like whatever we want. As soon as any reader sees "conflict", it should > happily ignore the rest as something it doesn't know about. And my short > example here was just meant to be illustrative. I agree it probably > needs more details (and the whole CONFLICT line that usually goes to > stderr is probably the best thing). > > -Peff
On Thu, Sep 30, 2021 at 07:07:21PM -0700, Elijah Newren wrote: > > That said, I think it could also make sense to annotate the conflict > > files by giving them an unusual set of mode bits. That would be easier > > to recognize (and while real trees _could_ have silly modes, we do > > complain about them in fsck, so it shouldn't happen in practice). > > I tried giving things unusual mode bits in early versions of merge-ort > for other reasons. It doesn't work: canon_mode() will "fix" the > unusualness so there's no way for the reader to know that they had > unusual bits. Ah, right, I totally forgot about that. > But, as you said later, avoiding these pseudo-files is going to be > cleaner anyway, so let's just do that. I'm quite happy if you're on board with that direction. :) -Peff
diff --git a/merge-ort.c b/merge-ort.c index 4dbf0a477af..a9e69d9cbb0 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -632,7 +632,11 @@ static void path_msg(struct merge_options *opt, const char *fmt, ...) { va_list ap; - struct strbuf *sb = strmap_get(&opt->priv->output, path); + struct strbuf *sb; + + if (opt->record_conflict_msgs_in_tree && omittable_hint) + return; /* Do not record mere hints in tree */ + sb = strmap_get(&opt->priv->output, path); if (!sb) { sb = xmalloc(sizeof(*sb)); strbuf_init(sb, 0); @@ -3531,6 +3535,74 @@ static void write_completed_directory(struct merge_options *opt, info->last_directory_len = strlen(info->last_directory); } +static void put_path_msgs_in_file(struct merge_options *opt, + const char *path, + struct merged_info *mi, + struct directory_versions *dir_metadata) +{ + struct strbuf tmp = STRBUF_INIT, new_path_contents = STRBUF_INIT; + char *new_path; + int new_path_basic_len, unique_counter; + struct merged_info *new_mi; + char final; + struct strbuf *sb = strmap_get(&opt->priv->output, path); + + assert(opt->record_conflict_msgs_in_tree); + if (!sb) + return; + + /* + * Determine a pathname for recording conflict messages. We'd like it + * to sort just before path, but have a name very similar to what path + * has. + */ + strbuf_addstr(&tmp, path); + final = tmp.buf[tmp.len-1]; + strbuf_setlen(&tmp, tmp.len-1); + strbuf_addf(&tmp, " %c.conflict_msg", final); + + /* + * In extremely unlikely event this filename is not unique, modify it + * with ".<integer>" suffixes until it is. + */ + new_path_basic_len = tmp.len; + unique_counter = 0; + while (strmap_contains(&opt->priv->paths, tmp.buf)) { + strbuf_setlen(&tmp, new_path_basic_len); + strbuf_addf(&tmp, ".%d", ++unique_counter); + } + + /* Now that we have a unique string, move it to our pool */ + new_path = mem_pool_strdup(&opt->priv->pool, tmp.buf); + strbuf_release(&tmp); + + /* Set up contents we want to place in the file. */ + strbuf_addf(&new_path_contents, "== Conflict notices for %s ==\n", + path); + strbuf_addbuf(&new_path_contents, sb); + + /* Set up new_mi */ + new_mi = mem_pool_calloc(&opt->priv->pool, 1, sizeof(*new_mi)); + new_mi->result.mode = 0100644; + new_mi->is_null = 0; + new_mi->clean = 1; + new_mi->basename_offset = mi->basename_offset; + new_mi->directory_name = mi->directory_name; + if (write_object_file(new_path_contents.buf, new_path_contents.len, + blob_type, &new_mi->result.oid)) + die(_("Unable to add %s to database"), new_path); + + /* + * Save new_mi in opt->priv->path (so that something will deallocate + * it later), and record the entry for it. + */ + strmap_put(&opt->priv->paths, new_path, new_mi); + record_entry_for_tree(dir_metadata, new_path, new_mi); + + /* Cleanup */ + strbuf_release(&new_path_contents); +} + /* Per entry merge function */ static void process_entry(struct merge_options *opt, const char *path, @@ -3991,6 +4063,8 @@ static void process_entries(struct merge_options *opt, struct conflict_info *ci = (struct conflict_info *)mi; process_entry(opt, path, ci, &dir_metadata); } + if (!opt->priv->call_depth && opt->record_conflict_msgs_in_tree) + put_path_msgs_in_file(opt, path, mi, &dir_metadata); } trace2_region_leave("merge", "processing", opt->repo); @@ -4226,6 +4300,9 @@ void merge_switch_to_result(struct merge_options *opt, struct string_list olist = STRING_LIST_INIT_NODUP; int i; + if (opt->record_conflict_msgs_in_tree) + BUG("Either display conflict messages or record them in tree, not both"); + trace2_region_enter("merge", "display messages", opt->repo); /* Hack to pre-allocate olist to the desired size */ diff --git a/merge-recursive.c b/merge-recursive.c index 3355d50e8ad..b14fa15be91 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -3697,6 +3697,9 @@ static int merge_start(struct merge_options *opt, struct tree *head) assert(opt->priv == NULL); + /* Not supported; option specific to merge-ort */ + assert(!opt->record_conflict_msgs_in_tree); + /* Sanity check on repo state; index must match head */ if (repo_index_has_changes(opt->repo, head, &sb)) { err(opt, _("Your local changes to the following files would be overwritten by merge:\n %s"), diff --git a/merge-recursive.h b/merge-recursive.h index 0795a1d3ec1..2e2fab37f46 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -46,6 +46,7 @@ struct merge_options { /* miscellaneous control options */ const char *subtree_shift; unsigned renormalize : 1; + unsigned record_conflict_msgs_in_tree : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv;