Message ID | X/3FwNPHqZqY+hv0@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | abcfbf8603b16bb874f09c0bf8a94bc0030a0409 |
Headers | show |
Series | patch-ids: handle duplicate hashmap entries | expand |
YAY! Thanks a lot :)
Jeff King <peff@peff.net> writes: > At the end, we'll have eliminated commits from both sides that have a > matching patch-id on the other side. But there's a subtle assumption > here: for any given patch-id, we must have exactly one struct > representing it. If two commits from A both have the same patch-id and > we allow duplicates in the hashmap, then we run into a problem: In practical terms, for one side of the history to have two identical patches, the most likely scenerio for that to happen is to have a patch, its reversion, and its reapplication, intermixed in its history with other commits, e.g. ---o---o---M---o---o---W---o---o---M---o--- ... \ o---o---o---M---o--- ... where "M" are commits that does an identical change, and "W" (upside-down "M") is its reversion. On the top history, "M" was introduced, the bottom history cherry-picked, the top history found problems in it and reverted with "W", and later (perhaps with the help of preparatory patches on top of "W"), the top history now considers that "M" is safe, so reapplies it. And a "--cherry-pick" output that excludes "identical patches" that appear on both sides on such a history would hide all "M"'s, leaving a history like ---o---o-------o---o---W---o---o-------o--- ... \ o---o---o-------o--- ... But is this result what the user really wanted to see, I have to wonder. I do not see any problem in the patch itself. We used to hide only one "M" from the history on the top half in the picture, leaving one "M" and "W", while hiding the sole "M" from the bottom half. Now if we want to no longer show any "M", the updated code would correctly hide all of them. It just feels to me that the resulting view of the history look weird, leaving only the reversion of a patch that has never been applied in the first place.
Without this patch, that's even worst, consistency is broken. Let me explain. With your history example: ---o---o---M---o---o---W---o---o---M---o--- branch \ o---o---o---M---o--- master # WITHOUT PATCH If we imagine that master is having more commits count than branch. The result of rev-list will be like you described: $ git rev-list --left-right --cherry-pick branch...master <M <W In other words, it's showing both W and M. BUT, if we imagine now that master is having less commits count than branch. $ git rev-list --left-right --cherry-pick branch...master <W It's only showing W! So the result is not consistent, depending on which branch is having more commits. # WITH PATCH With the patch, everything is consistent, and only W is kept in ouptut, no matter the size of history: $ git.p rev-list --left-right --cherry-pick branch...master <W Cheers,
On Wed, Jan 13, 2021 at 09:24:48AM +0000, Arnaud Morin wrote: > Without this patch, that's even worst, consistency is broken. > Let me explain. > > With your history example: > > ---o---o---M---o---o---W---o---o---M---o--- branch > \ > o---o---o---M---o--- master > > # WITHOUT PATCH > If we imagine that master is having more commits count than branch. > The result of rev-list will be like you described: > $ git rev-list --left-right --cherry-pick branch...master > <M > <W > > In other words, it's showing both W and M. > > BUT, if we imagine now that master is having less commits count than branch. > $ git rev-list --left-right --cherry-pick branch...master > <W > > It's only showing W! > > So the result is not consistent, depending on which branch is having > more commits. Right. There's definitely a question of whether --cherry-pick is the most useful thing in such a situation, but the current behavior was simply broken. It did not behave as advertised, and it treated one side of the history different from the other. So whether we want to do anything else to help that case, I think this at least makes --cherry-pick sane. :) Here are two related histories to think about. This is the same history, but the revert and re-do happens on both sides (and this is actually the setup in the new test): ---o---o---M---o---o---W---o---o---M---o--- branch \ o---o---M---o---o---W---o---o---M---o--- master There we really _do_ want to clear out both M's (really, all four), because we're doing so _from both sides_. And that is ultimately the point of the cherry-pick option: show commits that were picked from one side to the other. Another interesting case is when the first "M" is actually bundled into another commit, like: ---o---o---M+X---o---o---W---o---o---M---o--- branch \ o---o---o--M---o--- master where "M+X" has the changes from "M" plus some other changes. It will get a different patch-id, and --cherry-pick would show both M+X and W, but not M for either side. This is a limitation of the patch-id concept: we are looking at whole commits, not overall changes. I suspect for most operations we care less about "remove all cherry-picks from both sides", but rather "give me one side, omitting commits that are already on the other side" (because you want to rebase or cherry-pick some subset). -Peff
Arnaud Morin <arnaud.morin@gmail.com> writes: > Without this patch, that's even worst, consistency is broken. > Let me explain. > > With your history example: > > ---o---o---M---o---o---W---o---o---M---o--- branch > \ > o---o---o---M---o--- master > > # WITHOUT PATCH > If we imagine that master is having more commits count than branch. > The result of rev-list will be like you described: > $ git rev-list --left-right --cherry-pick branch...master > <M > <W > > In other words, it's showing both W and M. So, at least they cancel out and the reader can tell that the net effect was none --- that is "sort of understandable" result. > BUT, if we imagine now that master is having less commits count than branch. > $ git rev-list --left-right --cherry-pick branch...master > <W > > It's only showing W! Which is what I felt misleading. > # WITH PATCH > With the patch, everything is consistent, and only W is kept in ouptut, > no matter the size of history: > $ git.p rev-list --left-right --cherry-pick branch...master > <W So with the patch, the former case is made the same as the latter misleading result, which would mean that they are consistently misleading? I guess that it is better to be misleading all the time than being "sort of understandable" half the time and misleading the rest of time. At least that is consistent. Thanks.
Jeff King <peff@peff.net> writes: > Right. There's definitely a question of whether --cherry-pick is the > most useful thing in such a situation, but the current behavior was > simply broken. It did not behave as advertised, and it treated one side > of the history different from the other. So whether we want to do > anything else to help that case, I think this at least makes > --cherry-pick sane. :) Yes; I think "sane" does not always equal to "useful", though ;-) > Here are two related histories to think about. > ... > I suspect for most operations we care less about "remove all > cherry-picks from both sides", but rather "give me one side, omitting > commits that are already on the other side" (because you want to rebase > or cherry-pick some subset). Yes again. It means "--cherry-pick" inherently is not symmetric. One side is the reference side (i.e. mainline), and the goal is to remove from the other side what is in the reference side. What we are seeing in this discussion is that "--cherry-pick" and symmetric difference do not conceptually play well together. But the behaviour with the patch makes the most sense when cherry-pick is applied to a symmetric difference (provided that doing so makes sense in the first place, that is).
On Wed, Jan 13, 2021 at 12:21:30PM -0800, Junio C Hamano wrote: > > I suspect for most operations we care less about "remove all > > cherry-picks from both sides", but rather "give me one side, omitting > > commits that are already on the other side" (because you want to rebase > > or cherry-pick some subset). > > Yes again. It means "--cherry-pick" inherently is not symmetric. > One side is the reference side (i.e. mainline), and the goal is to > remove from the other side what is in the reference side. > > What we are seeing in this discussion is that "--cherry-pick" and > symmetric difference do not conceptually play well together. > > But the behaviour with the patch makes the most sense when > cherry-pick is applied to a symmetric difference (provided that > doing so makes sense in the first place, that is). I didn't realize --cherry-pick would work without a symmetric difference. The documentation says: Omit any commit that introduces the same change as another commit on the “other side” when the set of commits are limited with symmetric difference. And indeed, I think it silently does nothing with: git rev-list --cherry-pick A..B (because there is nothing on the "other" side to match). So you do need some symmetric difference in order to define the two sets, but you might only want to see one of the sides. And that is basically what --cherry does. But having looked at the implementation of cherry_pick_list(), it is quite happy to swap the sides internally. I guess if we were going to make the output unsymmetric, the first thing would be to stop doing that swap. :) For the specifics of reverts and replays, though, I think the weirdness has nothing to do with left/right, or showing one side. It's the mismatched count. So if we were to make any change, it would be to keep a count of how many times each commit appears on the other side, and cancel them one for one. I.e., this: o--M--W--M--o-- \ o--M--o--- might plausibly want to show "M" once on the upper branch. But: o--M--W--M--o-- \ o--M--W--M--o-- should never show M, I wouldn't think. I'm not sure if that would be violating what existing callers expect from "--cherry-pick", though (i.e., if it would need another name, or an extra option to trigger). -Peff
diff --git a/patch-ids.c b/patch-ids.c index 21973e4933..f51021a0cb 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -89,7 +89,7 @@ static int init_patch_id_entry(struct patch_id *patch, return 0; } -struct patch_id *has_commit_patch_id(struct commit *commit, +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *ids) { struct patch_id patch; @@ -104,6 +104,18 @@ struct patch_id *has_commit_patch_id(struct commit *commit, return hashmap_get_entry(&ids->patches, &patch, ent, NULL); } +struct patch_id *patch_id_iter_next(struct patch_id *cur, + struct patch_ids *ids) +{ + return hashmap_get_next_entry(&ids->patches, cur, ent); +} + +int has_commit_patch_id(struct commit *commit, + struct patch_ids *ids) +{ + return !!patch_id_iter_first(commit, ids); +} + struct patch_id *add_commit_patch_id(struct commit *commit, struct patch_ids *ids) { diff --git a/patch-ids.h b/patch-ids.h index 03bb04e707..ab6c6a6804 100644 --- a/patch-ids.h +++ b/patch-ids.h @@ -23,7 +23,25 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, struct object_id *oid, int, int); int init_patch_ids(struct repository *, struct patch_ids *); int free_patch_ids(struct patch_ids *); + +/* Add a patch_id for a single commit to the set. */ struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *); -struct patch_id *has_commit_patch_id(struct commit *, struct patch_ids *); + +/* Returns true if the patch-id of "commit" is present in the set. */ +int has_commit_patch_id(struct commit *commit, struct patch_ids *); + +/* + * Iterate over all commits in the set whose patch id matches that of + * "commit", like: + * + * struct patch_id *cur; + * for (cur = patch_id_iter_first(commit, ids); + * cur; + * cur = patch_id_iter_next(cur, ids) { + * ... look at cur->commit + * } + */ +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *); +struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *); #endif /* PATCH_IDS_H */ diff --git a/revision.c b/revision.c index 9dff845bed..7ec9b634e3 100644 --- a/revision.c +++ b/revision.c @@ -1241,12 +1241,14 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) /* * Have we seen the same patch id? */ - id = has_commit_patch_id(commit, &ids); + id = patch_id_iter_first(commit, &ids); if (!id) continue; commit->object.flags |= cherry_flag; - id->commit->object.flags |= cherry_flag; + do { + id->commit->object.flags |= cherry_flag; + } while ((id = patch_id_iter_next(id, &ids))); } free_patch_ids(&ids); diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index f0268372d2..8bf5ae23c2 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -245,6 +245,18 @@ test_expect_success '--count --left-right' ' test_cmp expect actual ' +test_expect_success '--cherry-pick with duplicates on each side' ' + git checkout -b dup-orig && + test_commit dup-base && + git revert dup-base && + git cherry-pick dup-base && + git checkout -b dup-side HEAD~3 && + test_tick && + git cherry-pick -3 dup-orig && + git rev-list --cherry-pick dup-orig...dup-side >actual && + test_must_be_empty actual +' + # Corrupt the object store deliberately to make sure # the object is not even checked for its existence. remove_loose_object () {