Message ID | fb555658057f834d94f232f1d8b380a6304a3671.1718834285.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mktree: support more flexible usage | expand |
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > If multiple tree entries with the same name are provided as input to > 'mktree', only write the last one to the tree. Entries are considered > duplicates if they have identical names (*not* considering mode); if a blob > and a tree with the same name are provided, only the last one will be > written to the tree. A tree with duplicate entries is invalid (per 'git > fsck'), so that condition should be avoided wherever possible. The "should be avoided" in the last sentence can be satisified either by the callers being extra careful, or the callee ignoring earlier entries with the same path. I do not have a strong objection against allowing looser callers, but if that is what is going on, perhaps By teaching "mktree" to ignore the earlier entries for the same path in the input, the callers can be more casual about sending duplicate entries in order to avoid creating an invalid tree objects. is a more honest justification for this setp? > diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt > index 5f3a6dfe38e..cf1fd82f754 100644 > --- a/Documentation/git-mktree.txt > +++ b/Documentation/git-mktree.txt > @@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without > writing the tree if a higher order stage is specified for any entry. > > The order of the tree entries is normalized by `mktree` so pre-sorting the > -input by path is not required. > +input by path is not required. Multiple entries provided with the same path > +are deduplicated, with only the last one specified added to the tree. OK. > struct tree_entry { > + /* Internal */ > + size_t order; > + > unsigned mode; > struct object_id oid; > int len; > @@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat > ent->len = len; > oidcpy(&ent->oid, oid); > > + ent->order = arr->nr; > tree_entry_array_push(arr, ent); > } > > -static int ent_compare(const void *a_, const void *b_) > +static int ent_compare(const void *a_, const void *b_, void *ctx) > { > + int cmp; > struct tree_entry *a = *(struct tree_entry **)a_; > struct tree_entry *b = *(struct tree_entry **)b_; > - return base_name_compare(a->name, a->len, a->mode, > - b->name, b->len, b->mode); > + int ignore_mode = *((int *)ctx); > + > + if (ignore_mode) > + cmp = name_compare(a->name, a->len, b->name, b->len); > + else > + cmp = base_name_compare(a->name, a->len, a->mode, > + b->name, b->len, b->mode); > + return cmp ? cmp : b->order - a->order; > +} Having two similar functions that could go out of sync has bothered me somewhat. We could instead do int a_mode = ignore_mode ? 0 : a->mode; int b_mode = ignore_mode ? 0 : b->mode; cmp = base_name_compare(a->name, a->len, a_mode, b->name, b->len, b_mode); but that should be done by rewriting name_compare() in terms of base_name_compare(), which will help more callers, not just this one. > +static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr) > +{ > + size_t count = arr->nr; > + struct tree_entry *prev = NULL; > + > + int ignore_mode = 1; > + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode); Swap the decl for ignore_mode and the blank line above it? If the callback context only needs a single bit, ent_compare() could just use the NULL-ness of ctx as "do we want to ignore mode?" bit. > + arr->nr = 0; > + for (size_t i = 0; i < count; i++) { > + struct tree_entry *curr = arr->entries[i]; > + if (prev && > + !name_compare(prev->name, prev->len, > + curr->name, curr->len)) { > + FREE_AND_NULL(curr); > + } else { > + arr->entries[arr->nr++] = curr; > + prev = curr; > + } > + } As long as this is done for a single tree (i.e. the paths do not have any slashes in them), this "sort them all and keep the last one" is a good strategy. > + /* Sort again to order the entries for tree insertion */ > + ignore_mode = 0; > + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode); OK. We from time to time find need to do this, and I always regret that we didn't design the sort order of paths in a tree (and in the index) like so [*]. But that is almost 20 years too late ;-). Looking good. [Footnote] * A directory entry $T should have sorted after a non-directory entry $T but before any non-directory entry whose path has $T as its prefix (e.g. even a blob whose path is $T + "\001" should sort after a tree $T). That way we didn't have to worry about a blob at ($T + '-') sorting before a tree at $T but a blob at ($T + '0') sorting after that tree.
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt index 5f3a6dfe38e..cf1fd82f754 100644 --- a/Documentation/git-mktree.txt +++ b/Documentation/git-mktree.txt @@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without writing the tree if a higher order stage is specified for any entry. The order of the tree entries is normalized by `mktree` so pre-sorting the -input by path is not required. +input by path is not required. Multiple entries provided with the same path +are deduplicated, with only the last one specified added to the tree. GIT --- diff --git a/builtin/mktree.c b/builtin/mktree.c index 8f0af24b6b1..a91d3a7b028 100644 --- a/builtin/mktree.c +++ b/builtin/mktree.c @@ -15,6 +15,9 @@ #include "object-store-ll.h" struct tree_entry { + /* Internal */ + size_t order; + unsigned mode; struct object_id oid; int len; @@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat ent->len = len; oidcpy(&ent->oid, oid); + ent->order = arr->nr; tree_entry_array_push(arr, ent); } -static int ent_compare(const void *a_, const void *b_) +static int ent_compare(const void *a_, const void *b_, void *ctx) { + int cmp; struct tree_entry *a = *(struct tree_entry **)a_; struct tree_entry *b = *(struct tree_entry **)b_; - return base_name_compare(a->name, a->len, a->mode, - b->name, b->len, b->mode); + int ignore_mode = *((int *)ctx); + + if (ignore_mode) + cmp = name_compare(a->name, a->len, b->name, b->len); + else + cmp = base_name_compare(a->name, a->len, a->mode, + b->name, b->len, b->mode); + return cmp ? cmp : b->order - a->order; +} + +static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr) +{ + size_t count = arr->nr; + struct tree_entry *prev = NULL; + + int ignore_mode = 1; + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode); + + arr->nr = 0; + for (size_t i = 0; i < count; i++) { + struct tree_entry *curr = arr->entries[i]; + if (prev && + !name_compare(prev->name, prev->len, + curr->name, curr->len)) { + FREE_AND_NULL(curr); + } else { + arr->entries[arr->nr++] = curr; + prev = curr; + } + } + + /* Sort again to order the entries for tree insertion */ + ignore_mode = 0; + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode); } static void write_tree(struct tree_entry_array *arr, struct object_id *oid) @@ -90,7 +127,7 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid) struct strbuf buf; size_t size = 0; - QSORT(arr->entries, arr->nr, ent_compare); + sort_and_dedup_tree_entry_array(arr); for (size_t i = 0; i < arr->nr; i++) size += 32 + arr->entries[i]->len; diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh index 7e750530455..08760141d6f 100755 --- a/t/t1010-mktree.sh +++ b/t/t1010-mktree.sh @@ -6,11 +6,16 @@ TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success setup ' - for d in a a- a0 + for d in folder folder- folder0 do mkdir "$d" && echo "$d/one" >"$d/one" && git add "$d" || return 1 done && + for f in before folder.txt later + do + echo "$f" >"$f" && + git add "$f" || return 1 + done && echo zero >one && git update-index --add --info-only one && git write-tree --missing-ok >tree.missing && @@ -171,7 +176,7 @@ test_expect_success '--literally can create invalid trees' ' test_expect_success 'mktree validates path' ' tree_oid="$(cat tree)" && - blob_oid="$(git rev-parse $tree_oid:a/one)" && + blob_oid="$(git rev-parse $tree_oid:folder.txt)" && head_oid="$(git rev-parse HEAD)" && # Valid: tree with or without trailing slash, blob without trailing slash @@ -202,4 +207,31 @@ test_expect_success 'mktree validates path' ' test_grep "invalid path ${SQ}.git/${SQ}" err ' +test_expect_success 'mktree with duplicate entries' ' + tree_oid=$(cat tree) && + folder_oid=$(git rev-parse ${tree_oid}:folder) && + before_oid=$(git rev-parse ${tree_oid}:before) && + head_oid=$(git rev-parse HEAD) && + + { + printf "100755 blob $before_oid\ttest\n" && + printf "040000 tree $folder_oid\ttest-\n" && + printf "160000 commit $head_oid\ttest.txt\n" && + printf "040000 tree $folder_oid\ttest\n" && + printf "100644 blob $before_oid\ttest0\n" && + printf "160000 commit $head_oid\ttest-\n" + } >top.dup && + git mktree <top.dup >tree.actual && + + { + printf "160000 commit $head_oid\ttest-\n" && + printf "160000 commit $head_oid\ttest.txt\n" && + printf "040000 tree $folder_oid\ttest\n" && + printf "100644 blob $before_oid\ttest0\n" + } >expect && + git ls-tree $(cat tree.actual) >actual && + + test_cmp expect actual +' + test_done