Message ID | b9e73975eab1f349be678779ff57155feb4c3501.1603731448.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Beginning of new merge strategy: New API, empty implementation | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > + * git merge [-s recursive] > + * > + * with > + * > + * git merge -s ort > + * > + * Note: git's parser allows the space between '-s' and its argument to be > + * missing. (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo", > + * "cale", "peedy", or "ins" instead of "ort"?) One thing that is quite unpleasant is "git grep ort" gives us too many hits already, and it will be hard to locate ort related changes with "git log --grep=ort", as the name is too short to serve as an effective way to limit the search. > diff --git a/merge-ort.h b/merge-ort.h > new file mode 100644 > index 0000000000..47d30cf538 > --- /dev/null > +++ b/merge-ort.h > @@ -0,0 +1,49 @@ > +#ifndef MERGE_ORT_H > +#define MERGE_ORT_H > + > +#include "merge-recursive.h" > + > +struct commit; > +struct tree; > + > +struct merge_result { > + /* whether the merge is clean */ > + int clean; > + > + /* Result of merge. If !clean, represents what would go in worktree */ > + struct tree *tree; Curious. Because there is no way for "struct tree" to hold an in-core pointer to a "struct blob" (iow, for a blob to be in a "struct tree", it has to have been assigned an object name), unless we are using the "pretend" mechanism, which has its own downsides, we are committed to create a throw-away blob objects with conflict markers in them, and write them to the object store. If we were writing a new merge machinery from scratch, I would have preferred a truly in-core implementation that does not have to write out to the object store but if this makes the implementation simpler, perhaps it is a small enough price to pay. > + /* > + * Additional metadata used by merge_switch_to_result() or future calls > + * to merge_inmemory_*(). Not for external use. > + */ > + void *priv; > + unsigned ate; I'd prefer to see this named not so cute. Will we hang random variations of things, or would this be better to be made into a pointer to union, with an enum that tells us which kind it is in use? > +}; > +/* rename-detecting three-way merge with recursive ancestor consolidation. */ > +void merge_inmemory_recursive(struct merge_options *opt, > + struct commit_list *merge_bases, > + struct commit *side1, > + struct commit *side2, > + struct merge_result *result); I've seen "incore" spelled as a squashed-into-a-single-word, but not "in_memory". > +/* rename-detecting three-way merge, no recursion. */ > +void merge_inmemory_nonrecursive(struct merge_options *opt, > + struct tree *merge_base, > + struct tree *side1, > + struct tree *side2, > + struct merge_result *result); > + > +/* Update the working tree and index from head to result after inmemory merge */ > +void merge_switch_to_result(struct merge_options *opt, > + struct tree *head, > + struct merge_result *result, > + int update_worktree_and_index, > + int display_update_msgs); To those who have known how our merge works, a natural expectation for an "in-core" merge is that when the "in-core" merge finishes, the index would hold the higher stages for the conflicted paths, and cleanly merged paths would have the result at stage 0, and there is an extra thing that we haven't had that represents what the working tree files for conflicted paths should look like (historically we wrote out the conflicted result to the working tree files---being in-core operation we cannot afford to), so that (1) cleanly merged paths can be externalized by writing from their stage 0 entries and (2) contents with conflicts can be externalized by that "extra thing". But this helper says "working tree and index" are both updated, so the "in-core" merge it expects must have not just the working tree result (in result->tree, as the comment in the structure says) but also how the higher stages of the index should look like somewhere in the result structure. How the latter is done is not at all clear at this point in the mock-up. Leaving it opaque is fine, but the function, and the result structure, deserve clarification to avoid confusing readers by highlighting how it is different from the traditional ways (e.g. "we don't touch the index at all---instead we store that in the priv/ate fields", if that is what is going on). Thanks.
On Mon, Oct 26, 2020 at 1:45 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > + * git merge [-s recursive] > > + * > > + * with > > + * > > + * git merge -s ort > > + * > > + * Note: git's parser allows the space between '-s' and its argument to be > > + * missing. (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo", > > + * "cale", "peedy", or "ins" instead of "ort"?) > > One thing that is quite unpleasant is "git grep ort" gives us too > many hits already, and it will be hard to locate ort related changes > with "git log --grep=ort", as the name is too short to serve as an > effective way to limit the search. Suggestions for an alternative name? merge-pandemic.c since it was mostly written during the pandemic? I'm really not good at naming things... > > diff --git a/merge-ort.h b/merge-ort.h > > new file mode 100644 > > index 0000000000..47d30cf538 > > --- /dev/null > > +++ b/merge-ort.h > > @@ -0,0 +1,49 @@ > > +#ifndef MERGE_ORT_H > > +#define MERGE_ORT_H > > + > > +#include "merge-recursive.h" > > + > > +struct commit; > > +struct tree; > > + > > +struct merge_result { > > + /* whether the merge is clean */ > > + int clean; > > + > > + /* Result of merge. If !clean, represents what would go in worktree */ > > + struct tree *tree; > > Curious. Because there is no way for "struct tree" to hold an > in-core pointer to a "struct blob" (iow, for a blob to be in a > "struct tree", it has to have been assigned an object name), unless > we are using the "pretend" mechanism, which has its own downsides, > we are committed to create a throw-away blob objects with conflict > markers in them, and write them to the object store. This is something merge-recursive already does (and I've copied some of that code over, around merge_3way() and the call to write_object_file() with the results). I thought the reasoning behind this was memory -- we're okay assuming any given file fits in memory (and perhaps up to three copies of it so we can do a three-way merge), but we're not okay assuming all (changed) files from a commit simultaneously fit in memory. > If we were writing a new merge machinery from scratch, I would have > preferred a truly in-core implementation that does not have to write > out to the object store but if this makes the implementation simpler, > perhaps it is a small enough price to pay. I thought about that early on, but I was worried about out-of-memory situations if we attempt to do truly in-memory, at least for large changes in large repositories. And as you have seen above, I do rely on being able to create trees. > > + /* > > + * Additional metadata used by merge_switch_to_result() or future calls > > + * to merge_inmemory_*(). Not for external use. > > + */ > > + void *priv; > > + unsigned ate; > > I'd prefer to see this named not so cute. Will we hang random > variations of things, or would this be better to be made into a > pointer to union, with an enum that tells us which kind it is in > use? I don't understand the union suggestion. Both fields are used. Would you object if 'ate' was named '_'? That was my original name, but Taylor didn't like it. It is used on about 4 lines of code, I'm 99.9% sure it will never be used in additional locations, and callers shouldn't mess with it. I just don't have a good name for it. I guess maybe I should just call it "properly_initialized" or something. > > +}; > > > > +/* rename-detecting three-way merge with recursive ancestor consolidation. */ > > +void merge_inmemory_recursive(struct merge_options *opt, > > + struct commit_list *merge_bases, > > + struct commit *side1, > > + struct commit *side2, > > + struct merge_result *result); > > I've seen "incore" spelled as a squashed-into-a-single-word, but not > "in_memory". I can add an underscore. Or switch to incore. Preference? > > +/* rename-detecting three-way merge, no recursion. */ > > +void merge_inmemory_nonrecursive(struct merge_options *opt, > > + struct tree *merge_base, > > + struct tree *side1, > > + struct tree *side2, > > + struct merge_result *result); > > + > > +/* Update the working tree and index from head to result after inmemory merge */ > > +void merge_switch_to_result(struct merge_options *opt, > > + struct tree *head, > > + struct merge_result *result, > > + int update_worktree_and_index, > > + int display_update_msgs); > > To those who have known how our merge works, a natural expectation > for an "in-core" merge is that when the "in-core" merge finishes, > the index would hold the higher stages for the conflicted paths, and > cleanly merged paths would have the result at stage 0, and there is > an extra thing that we haven't had that represents what the working > tree files for conflicted paths should look like (historically we > wrote out the conflicted result to the working tree files---being > in-core operation we cannot afford to), so that (1) cleanly merged > paths can be externalized by writing from their stage 0 entries and > (2) contents with conflicts can be externalized by that "extra > thing". > > But this helper says "working tree and index" are both updated, so > the "in-core" merge it expects must have not just the working tree > result (in result->tree, as the comment in the structure says) but > also how the higher stages of the index should look like somewhere > in the result structure. How the latter is done is not at all clear > at this point in the mock-up. Leaving it opaque is fine, but the > function, and the result structure, deserve clarification to avoid > confusing readers by highlighting how it is different from the > traditional ways (e.g. "we don't touch the index at all---instead we > store that in the priv/ate fields", if that is what is going on). Yes, your reading is correct. We don't touch the index (or any index, or any cache_entry) at all. Among other things, data that can be used to update the index are in the "priv" field. I'll try to add some notes to the file.
Elijah Newren <newren@gmail.com> writes: >> > + /* >> > + * Additional metadata used by merge_switch_to_result() or future calls >> > + * to merge_inmemory_*(). Not for external use. >> > + */ >> > + void *priv; >> > + unsigned ate; >> >> I'd prefer to see this named not so cute. Will we hang random >> variations of things, or would this be better to be made into a >> pointer to union, with an enum that tells us which kind it is in >> use? > > I don't understand the union suggestion. Both fields are used. I thought "priv" shouldn't be "anything goes, so it is 'void *'" but is probably a "union { ... } priv;" with associated enum next to it that tells which one of the possibilities in the union is in effect. > Would you object if 'ate' was named '_'? Either is horrible name. >> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */ >> > +void merge_inmemory_recursive(struct merge_options *opt, >> > + struct commit_list *merge_bases, >> > + struct commit *side1, >> > + struct commit *side2, >> > + struct merge_result *result); >> >> I've seen "incore" spelled as a squashed-into-a-single-word, but not >> "in_memory". > > I can add an underscore. Or switch to incore. Preference? Anything shorter would get my vote. > Yes, your reading is correct. We don't touch the index (or any index, > or any cache_entry) at all. Among other things, data that can be used > to update the index are in the "priv" field. > > I'll try to add some notes to the file. Sounds good.
On Mon, Oct 26, 2020 at 3:10 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > >> > + /* > >> > + * Additional metadata used by merge_switch_to_result() or future calls > >> > + * to merge_inmemory_*(). Not for external use. > >> > + */ > >> > + void *priv; > >> > + unsigned ate; > >> > >> I'd prefer to see this named not so cute. Will we hang random > >> variations of things, or would this be better to be made into a > >> pointer to union, with an enum that tells us which kind it is in > >> use? > > > > I don't understand the union suggestion. Both fields are used. > > I thought "priv" shouldn't be "anything goes, so it is 'void *'" but > is probably a "union { ... } priv;" with associated enum next to it > that tells which one of the possibilities in the union is in effect. I guess I'm still not following where these "possibilities" come from or what I could have said that would have suggested possibilities. priv is a pointer to a private data structure. The data structure is defined and used only in merge-ort.c and not exposed to callers. That data does need to be passed from merge_incore_*() to merge_switch_to_result() (or to merge_finalize()). Since those two are separate function calls invoked by the caller, and since callers shouldn't touch any of this data in priv, it's passed as an opaque field within merge_result. Thus void*. > > Would you object if 'ate' was named '_'? > > Either is horrible name. I'll just rip it out, for now. It isn't relevant to early versions of merge-ort anyway, and when I re-introduce it with yet another horrible name, there will at least be more context for others to suggest a better name for me. Besides, the variable isn't at all necessary for the algorithm; it exists solely as a way to catch a small gotcha in API usage of later versions of merge-ort, which I otherwise didn't have a clean way of detecting. > >> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */ > >> > +void merge_inmemory_recursive(struct merge_options *opt, > >> > + struct commit_list *merge_bases, > >> > + struct commit *side1, > >> > + struct commit *side2, > >> > + struct merge_result *result); > >> > >> I've seen "incore" spelled as a squashed-into-a-single-word, but not > >> "in_memory". > > > > I can add an underscore. Or switch to incore. Preference? > > Anything shorter would get my vote. Sounds good, will do. > > Yes, your reading is correct. We don't touch the index (or any index, > > or any cache_entry) at all. Among other things, data that can be used > > to update the index are in the "priv" field. > > > > I'll try to add some notes to the file. > > Sounds good. :-)
diff --git a/Makefile b/Makefile index 95571ee3fc..088770c2ae 100644 --- a/Makefile +++ b/Makefile @@ -921,6 +921,7 @@ LIB_OBJS += mailmap.o LIB_OBJS += match-trees.o LIB_OBJS += mem-pool.o LIB_OBJS += merge-blobs.o +LIB_OBJS += merge-ort.o LIB_OBJS += merge-recursive.o LIB_OBJS += merge.o LIB_OBJS += mergesort.o diff --git a/merge-ort.c b/merge-ort.c new file mode 100644 index 0000000000..5230364a8d --- /dev/null +++ b/merge-ort.c @@ -0,0 +1,52 @@ +/* + * "Ostensibly Recursive's Twin" merge strategy, or "ort" for short. Meant + * as a drop-in replacement for the "recursive" merge strategy, allowing one + * to replace + * + * git merge [-s recursive] + * + * with + * + * git merge -s ort + * + * Note: git's parser allows the space between '-s' and its argument to be + * missing. (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo", + * "cale", "peedy", or "ins" instead of "ort"?) + */ + +#include "cache.h" +#include "merge-ort.h" + +void merge_switch_to_result(struct merge_options *opt, + struct tree *head, + struct merge_result *result, + int update_worktree_and_index, + int display_update_msgs) +{ + die("Not yet implemented"); + merge_finalize(opt, result); +} + +void merge_finalize(struct merge_options *opt, + struct merge_result *result) +{ + die("Not yet implemented"); +} + +void merge_inmemory_nonrecursive(struct merge_options *opt, + struct tree *merge_base, + struct tree *side1, + struct tree *side2, + struct merge_result *result) +{ + die("Not yet implemented"); +} + +void merge_inmemory_recursive(struct merge_options *opt, + struct commit_list *merge_bases, + struct commit *side1, + struct commit *side2, + struct merge_result *result) +{ + die("Not yet implemented"); +} diff --git a/merge-ort.h b/merge-ort.h new file mode 100644 index 0000000000..47d30cf538 --- /dev/null +++ b/merge-ort.h @@ -0,0 +1,49 @@ +#ifndef MERGE_ORT_H +#define MERGE_ORT_H + +#include "merge-recursive.h" + +struct commit; +struct tree; + +struct merge_result { + /* whether the merge is clean */ + int clean; + + /* Result of merge. If !clean, represents what would go in worktree */ + struct tree *tree; + + /* + * Additional metadata used by merge_switch_to_result() or future calls + * to merge_inmemory_*(). Not for external use. + */ + void *priv; + unsigned ate; +}; + +/* rename-detecting three-way merge with recursive ancestor consolidation. */ +void merge_inmemory_recursive(struct merge_options *opt, + struct commit_list *merge_bases, + struct commit *side1, + struct commit *side2, + struct merge_result *result); + +/* rename-detecting three-way merge, no recursion. */ +void merge_inmemory_nonrecursive(struct merge_options *opt, + struct tree *merge_base, + struct tree *side1, + struct tree *side2, + struct merge_result *result); + +/* Update the working tree and index from head to result after inmemory merge */ +void merge_switch_to_result(struct merge_options *opt, + struct tree *head, + struct merge_result *result, + int update_worktree_and_index, + int display_update_msgs); + +/* Do needed cleanup when not calling merge_switch_to_result() */ +void merge_finalize(struct merge_options *opt, + struct merge_result *result); + +#endif