Message ID | 20210823094049.44136-1-johannes@sipsolutions.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] multi-pack-index: fix *.rev cleanups with --object-dir | expand |
On Mon, Aug 23, 2021 at 11:40:49AM +0200, Johannes Berg wrote: > If using --object-dir to point into a repo while the current > working dir is outside, such as > > git init /repo > git -C /repo ... # add some objects > cd /non-repo > git multi-pack-index --object-dir /repo/.git/objects/ write > > the binary will segfault trying to access the object-dir via > the repo it found, but that's not fully initialized. Fix it > to use the object_dir properly to clean up the *.rev files, > this avoids the crash and cleans up the *.rev files for the > now rewritten multi-pack-index properly. I'm not entirely convinced that writing a midx when not "inside" a repo is something that we want to support. But if we do, then... > Due to running inside git's tree, even with TEST_NO_CREATE_REPO=t > I cannot reproduce the segfault in a test without the "cd /", so > I've kept that. Yes, the test caught in that case that the *.rev > file wasn't cleaned up (due to being initialized to the wrong git > repo [git's] and cleaning up there!), but I wanted to test the > segfault too. ...there's a helper in the test suite for doing this kind of "not in a repo" test: diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 3b6331f641..3981bf96d0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -211,11 +211,8 @@ test_expect_success 'multi-pack-index *.rev cleanup with --object-dir' ' ) && rev="objdir-test-repo/$objdir/pack/multi-pack-index-abcdef123456.rev" && touch $rev && - ( - base="$(pwd)" && - cd / && # run outside any git repo, including git itself - git multi-pack-index --object-dir="$base/objdir-test-repo/$objdir" write - ) && + nongit git multi-pack-index \ + --object-dir="$PWD/objdir-test-repo/$objdir" write && test_path_is_file objdir-test-repo/$objdir/pack/multi-pack-index && test_path_is_missing $rev ' -Peff
On Mon, 2021-08-23 at 12:06 -0400, Jeff King wrote: > On Mon, Aug 23, 2021 at 11:40:49AM +0200, Johannes Berg wrote: > > > If using --object-dir to point into a repo while the current > > working dir is outside, such as > > > > git init /repo > > git -C /repo ... # add some objects > > cd /non-repo > > git multi-pack-index --object-dir /repo/.git/objects/ write > > > > the binary will segfault trying to access the object-dir via > > the repo it found, but that's not fully initialized. Fix it > > to use the object_dir properly to clean up the *.rev files, > > this avoids the crash and cleans up the *.rev files for the > > now rewritten multi-pack-index properly. > > I'm not entirely convinced that writing a midx when not "inside" a repo > is something that we want to support. But if we do, then... Seemed like that was the point of --object-dir? johannes
On Mon, Aug 23, 2021 at 07:05:31PM +0200, Johannes Berg wrote: > On Mon, 2021-08-23 at 12:06 -0400, Jeff King wrote: > > I'm not entirely convinced that writing a midx when not "inside" a repo > > is something that we want to support. But if we do, then... > > Seemed like that was the point of --object-dir? Stolee (cc'd) would know more as the original author, but as I recall the point of `--object-dir` was to be able to write a midx in directories which were acting as Git repositories, but didn't contain a `.git` directory. It's kind of a strange use-case, but I recall that it was important at the time. Maybe he could shed more light on why. (Either way, we're stuck with it ;)). Thanks, Taylor
On Mon, Aug 23, 2021 at 01:09:22PM -0400, Taylor Blau wrote: > On Mon, Aug 23, 2021 at 07:05:31PM +0200, Johannes Berg wrote: > > On Mon, 2021-08-23 at 12:06 -0400, Jeff King wrote: > > > I'm not entirely convinced that writing a midx when not "inside" a repo > > > is something that we want to support. But if we do, then... > > > > Seemed like that was the point of --object-dir? > > Stolee (cc'd) would know more as the original author, but as I recall > the point of `--object-dir` was to be able to write a midx in > directories which were acting as Git repositories, but didn't contain a > `.git` directory. > > It's kind of a strange use-case, but I recall that it was important at > the time. Maybe he could shed more light on why. (Either way, we're > stuck with it ;)). And the point was that those directories would also be alternates of the current repo (and that there is a current repo). That's one of the things that your midx-bitmap series tightens. -Peff
Taylor Blau <me@ttaylorr.com> writes: > On Mon, Aug 23, 2021 at 07:05:31PM +0200, Johannes Berg wrote: >> On Mon, 2021-08-23 at 12:06 -0400, Jeff King wrote: >> > I'm not entirely convinced that writing a midx when not "inside" a repo >> > is something that we want to support. But if we do, then... >> >> Seemed like that was the point of --object-dir? > > Stolee (cc'd) would know more as the original author, but as I recall > the point of `--object-dir` was to be able to write a midx in > directories which were acting as Git repositories, but didn't contain a > `.git` directory. > > It's kind of a strange use-case, but I recall that it was important at > the time. Maybe he could shed more light on why. (Either way, we're > stuck with it ;)). It does sound strange. "git -C $there multi-pack-index write" would have felt more natural.
On 8/23/2021 1:09 PM, Taylor Blau wrote: > On Mon, Aug 23, 2021 at 07:05:31PM +0200, Johannes Berg wrote: >> On Mon, 2021-08-23 at 12:06 -0400, Jeff King wrote: >>> I'm not entirely convinced that writing a midx when not "inside" a repo >>> is something that we want to support. But if we do, then... >> >> Seemed like that was the point of --object-dir? > > Stolee (cc'd) would know more as the original author, but as I recall > the point of `--object-dir` was to be able to write a midx in > directories which were acting as Git repositories, but didn't contain a > `.git` directory. > > It's kind of a strange use-case, but I recall that it was important at > the time. Maybe he could shed more light on why. (Either way, we're > stuck with it ;)). Yes, the point was for how VFS for Git (and now Scalar) built the "shared object cache" directory. This is a directory that acts as an alternate for VFS for Git and Scalar clones so objects downloaded by one enlistment are immediately available in another. When the multi-pack-index was created, it was designed to handle this shared object cache through the --object-dir parameter. There are custom patches in microsoft/git that shoehorn the option into background maintenance via a config value. If I were to redesign the feature for use by Git, then I would make the clones create a bare copy of the repository (if it doesn't already exist) and then create a worktree of that bare repo. The key is to allow users to delete individual worktrees without losing the object data. In summary, there is a reason for its design like this, although its due to an external tool. But we are using it a lot, so I'd prefer that it stay. Thanks, -Stolee
diff --git a/midx.c b/midx.c index 321c6fdd2f18..902e1a7a7d9d 100644 --- a/midx.c +++ b/midx.c @@ -882,7 +882,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash, strbuf_release(&buf); } -static void clear_midx_files_ext(struct repository *r, const char *ext, +static void clear_midx_files_ext(const char *object_dir, const char *ext, unsigned char *keep_hash); static int midx_checksum_valid(struct multi_pack_index *m) @@ -1086,7 +1086,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (flags & MIDX_WRITE_REV_INDEX) write_midx_reverse_index(midx_name, midx_hash, &ctx); - clear_midx_files_ext(the_repository, ".rev", midx_hash); + clear_midx_files_ext(object_dir, ".rev", midx_hash); commit_lock_file(&lk); @@ -1135,7 +1135,7 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len, die_errno(_("failed to remove %s"), full_path); } -static void clear_midx_files_ext(struct repository *r, const char *ext, +static void clear_midx_files_ext(const char *object_dir, const char *ext, unsigned char *keep_hash) { struct clear_midx_data data; @@ -1146,7 +1146,7 @@ static void clear_midx_files_ext(struct repository *r, const char *ext, hash_to_hex(keep_hash), ext); data.ext = ext; - for_each_file_in_pack_dir(r->objects->odb->path, + for_each_file_in_pack_dir(object_dir, clear_midx_file_ext, &data); @@ -1165,7 +1165,7 @@ void clear_midx_file(struct repository *r) if (remove_path(midx)) die(_("failed to clear multi-pack-index at %s"), midx); - clear_midx_files_ext(r, ".rev", NULL); + clear_midx_files_ext(r->objects->odb->path, ".rev", NULL); free(midx); } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 3d4d9f10c31b..3b6331f64113 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -201,6 +201,25 @@ test_expect_success 'write midx with twelve packs' ' compare_results_with_midx "twelve packs" +test_expect_success 'multi-pack-index *.rev cleanup with --object-dir' ' + git init objdir-test-repo && + test_when_finished "rm -rf objdir-test-repo" && + ( + cd objdir-test-repo && + test_commit base && + git repack -d + ) && + rev="objdir-test-repo/$objdir/pack/multi-pack-index-abcdef123456.rev" && + touch $rev && + ( + base="$(pwd)" && + cd / && # run outside any git repo, including git itself + git multi-pack-index --object-dir="$base/objdir-test-repo/$objdir" write + ) && + test_path_is_file objdir-test-repo/$objdir/pack/multi-pack-index && + test_path_is_missing $rev +' + test_expect_success 'warn on improper hash version' ' git init --object-format=sha1 sha1 && (
If using --object-dir to point into a repo while the current working dir is outside, such as git init /repo git -C /repo ... # add some objects cd /non-repo git multi-pack-index --object-dir /repo/.git/objects/ write the binary will segfault trying to access the object-dir via the repo it found, but that's not fully initialized. Fix it to use the object_dir properly to clean up the *.rev files, this avoids the crash and cleans up the *.rev files for the now rewritten multi-pack-index properly. Fixes: 38ff7cabb6b8 ("pack-revindex: write multi-pack reverse indexes") Cc: Taylor Blau <me@ttaylorr.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> --- Due to running inside git's tree, even with TEST_NO_CREATE_REPO=t I cannot reproduce the segfault in a test without the "cd /", so I've kept that. Yes, the test caught in that case that the *.rev file wasn't cleaned up (due to being initialized to the wrong git repo [git's] and cleaning up there!), but I wanted to test the segfault too. --- midx.c | 10 +++++----- t/t5319-multi-pack-index.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+), 5 deletions(-)