Message ID | 59556e554565120929549239f1aee5a76d80ac8d.1631730270.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | repack: introduce `--write-midx` | expand |
Thanks for the series! I have a couple of questions: On 2021.09.15 14:24, Taylor Blau wrote: > To power a new `--write-midx` mode, `git repack` will want to write a > multi-pack index containing a certain set of packs in the repository. > > This new option will be used by `git repack` to write a MIDX which > contains only the packs which will survive after the repack (that is, it > will exclude any packs which are about to be deleted). > > This patch effectively exposes the function implemented in the previous > commit via the `git multi-pack-index` builtin. An alternative approach > would have been to call that function from the `git repack` builtin > directly, but this introduces awkward problems around closing and > reopening the object store, so the MIDX will be written out-of-process. Could you elaborate a bit on the "awkward problems" here? I'm afraid I'm missing the context here. > diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c > index 73c0113b48..047647b5f2 100644 > --- a/builtin/multi-pack-index.c > +++ b/builtin/multi-pack-index.c > @@ -47,6 +47,7 @@ static struct opts_multi_pack_index { > const char *preferred_pack; > unsigned long batch_size; > unsigned flags; > + int stdin_packs; > } opts; > > static struct option common_opts[] = { > @@ -61,6 +62,16 @@ static struct option *add_common_options(struct option *prev) > return parse_options_concat(common_opts, prev); > } > > +static void read_packs_from_stdin(struct string_list *to) > +{ > + struct strbuf buf = STRBUF_INIT; > + while (strbuf_getline(&buf, stdin) != EOF) > + string_list_append(to, buf.buf); > + string_list_sort(to); > + > + strbuf_release(&buf); > +} > + I'm presuming that the packfile list is going to be generated automatically, but what happens if that becomes corrupt somehow, and we skip a packfile that should have been included? Will that cause incorrect behavior, or will we just miss out on some of the bitmap performance benefits?
> An alternative approach > would have been to call that function from the `git repack` builtin > directly, but this introduces awkward problems around closing and > reopening the object store, so the MIDX will be written out-of-process. I'm not sure if the implementation direction started by this patch (eventually, running "git multi-pack-index write --stdin-packs" to replace the midx of a repository while "git repack" is running) would work on platforms in which mmap-ing a file means that other processes can't delete it, but if it works on a Windows CI, this should be fine. (I don't have a Windows CI handy to test it on, though.) Assuming it works on platforms like Windows, this patch looks fine.
On Wed, Sep 22, 2021 at 03:29:53PM -0700, Josh Steadmon wrote: > Thanks for the series! I have a couple of questions: > > > On 2021.09.15 14:24, Taylor Blau wrote: > > To power a new `--write-midx` mode, `git repack` will want to write a > > multi-pack index containing a certain set of packs in the repository. > > > > This new option will be used by `git repack` to write a MIDX which > > contains only the packs which will survive after the repack (that is, it > > will exclude any packs which are about to be deleted). > > > > This patch effectively exposes the function implemented in the previous > > commit via the `git multi-pack-index` builtin. An alternative approach > > would have been to call that function from the `git repack` builtin > > directly, but this introduces awkward problems around closing and > > reopening the object store, so the MIDX will be written out-of-process. > > Could you elaborate a bit on the "awkward problems" here? I'm afraid I'm > missing the context here. A variety of things can go wrong when the object store is closed and re-opened in the same process. Many of the symptoms are described beginning at this message: https://lore.kernel.org/git/YPf1m01mcdJ3HNBt@coredump.intra.peff.net/ and further down in the sub-thread. Many of those problems have been resolved, but I'm not convinced that there aren't others lurking. > > +static void read_packs_from_stdin(struct string_list *to) > > +{ > > + struct strbuf buf = STRBUF_INIT; > > + while (strbuf_getline(&buf, stdin) != EOF) > > + string_list_append(to, buf.buf); > > + string_list_sort(to); > > + > > + strbuf_release(&buf); > > +} > > + > > I'm presuming that the packfile list is going to be generated > automatically, but what happens if that becomes corrupt somehow, and we > skip a packfile that should have been included? Will that cause > incorrect behavior, or will we just miss out on some of the bitmap > performance benefits? A multi-pack bitmap can only refer to objects that are in a pack which the repository's MIDX includes. So if we left off a pack from this list, we'd be unable to cover that pack's objects in the resulting bitmap. We'd also be unable to cover any objects which are reachable from the missing pack's objects, since the set of objects in a bitmap must be closed under reachability. If, on the other hand, we read a line which does not correspond to any pack, we'll simply ignore it. That's because we loop over the results of get_all_packs() and try to find a match in this list instead of the other way around. We could mark the packs we found by abusing the string_list_item's util pointer, but it's probably not worth it since this is mostly an internal interface. Thanks, Taylor
On Wed, Sep 22, 2021 at 04:11:37PM -0700, Jonathan Tan wrote: > > An alternative approach > > would have been to call that function from the `git repack` builtin > > directly, but this introduces awkward problems around closing and > > reopening the object store, so the MIDX will be written out-of-process. > > I'm not sure if the implementation direction started by this patch > (eventually, running "git multi-pack-index write --stdin-packs" to > replace the midx of a repository while "git repack" is running) would > work on platforms in which mmap-ing a file means that other processes > can't delete it, but if it works on a Windows CI, this should be fine. > (I don't have a Windows CI handy to test it on, though.) > > Assuming it works on platforms like Windows, this patch looks fine. It definitely passes CI :-). But special care is taken to handle this case, and it works for a couple of reasons. Notably that we only call `write_midx_included_packs()` (which in turn invokes the multi-pack-index builtin as a sub-process) *after* repack has called close_object_store(). That means that `repack` won't be holding the MIDX open while the sub-process is trying to write a new one in its place. The other reason is that the multi-pack-index process also makes sure to close the old MIDX before writing the new one, so we can be certain that neither of these spots are mapping the file or have it opened when trying to write over it. Thanks, Taylor
diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index a9df3dbd32..009c989ef8 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -45,6 +45,10 @@ write:: --[no-]bitmap:: Control whether or not a multi-pack bitmap is written. + + --stdin-packs:: + Write a multi-pack index containing only the set of + line-delimited pack index basenames provided over stdin. -- verify:: diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 73c0113b48..047647b5f2 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -47,6 +47,7 @@ static struct opts_multi_pack_index { const char *preferred_pack; unsigned long batch_size; unsigned flags; + int stdin_packs; } opts; static struct option common_opts[] = { @@ -61,6 +62,16 @@ static struct option *add_common_options(struct option *prev) return parse_options_concat(common_opts, prev); } +static void read_packs_from_stdin(struct string_list *to) +{ + struct strbuf buf = STRBUF_INIT; + while (strbuf_getline(&buf, stdin) != EOF) + string_list_append(to, buf.buf); + string_list_sort(to); + + strbuf_release(&buf); +} + static int cmd_multi_pack_index_write(int argc, const char **argv) { struct option *options; @@ -70,6 +81,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) N_("pack for reuse when computing a multi-pack bitmap")), OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"), MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX), + OPT_BOOL(0, "stdin-packs", &opts.stdin_packs, + N_("write multi-pack index containing only given indexes")), OPT_END(), }; @@ -86,6 +99,20 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) FREE_AND_NULL(options); + if (opts.stdin_packs) { + struct string_list packs = STRING_LIST_INIT_DUP; + int ret; + + read_packs_from_stdin(&packs); + + ret = write_midx_file_only(opts.object_dir, &packs, + opts.preferred_pack, opts.flags); + + string_list_clear(&packs, 0); + + return ret; + + } return write_midx_file(opts.object_dir, opts.preferred_pack, opts.flags); } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index bb04f0f23b..385f0a3efd 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -168,6 +168,21 @@ test_expect_success 'write midx with two packs' ' compare_results_with_midx "two packs" +test_expect_success 'write midx with --stdin-packs' ' + rm -fr $objdir/pack/multi-pack-index && + + idx="$(find $objdir/pack -name "test-2-*.idx")" && + basename "$idx" >in && + + git multi-pack-index write --stdin-packs <in && + + test-tool read-midx $objdir | grep "\.idx$" >packs && + + test_cmp packs in +' + +compare_results_with_midx "mixed mode (one pack + extra)" + test_expect_success 'write progress off for redirected stderr' ' git multi-pack-index --object-dir=$objdir write 2>err && test_line_count = 0 err
To power a new `--write-midx` mode, `git repack` will want to write a multi-pack index containing a certain set of packs in the repository. This new option will be used by `git repack` to write a MIDX which contains only the packs which will survive after the repack (that is, it will exclude any packs which are about to be deleted). This patch effectively exposes the function implemented in the previous commit via the `git multi-pack-index` builtin. An alternative approach would have been to call that function from the `git repack` builtin directly, but this introduces awkward problems around closing and reopening the object store, so the MIDX will be written out-of-process. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/git-multi-pack-index.txt | 4 ++++ builtin/multi-pack-index.c | 27 ++++++++++++++++++++++++++ t/t5319-multi-pack-index.sh | 15 ++++++++++++++ 3 files changed, 46 insertions(+)