mbox series

[v4,0/3] Implement filtering repacks

Message ID 20221221040446.2860985-1-christian.couder@gmail.com (mailing list archive)
Headers show
Series Implement filtering repacks | expand

Message

Christian Couder Dec. 21, 2022, 4:04 a.m. UTC
Earlier this year, John Cai sent 2 versions of a patch series to
implement `git repack --filter=<filter-spec>`:

https://lore.kernel.org/git/pull.1206.git.git.1643248180.gitgitgadget@gmail.com/

We tried to "sell" it as a way to use partial clone on a Git server to
offload large blobs to, for example, an http server, while using
multiple promisor remotes on the client side.

Even though it is still our end goal, it seems a bit far fetched for
now and unnecessary as `git repack --filter=<filter-spec>` could be
useful on the client side too.

For example one might want to clone with a filter to avoid too many
space to be taken by some large blobs, and one might realize after
some time that a number of the large blobs have still be downloaded
because some old branches referencing them were checked out. In this
case a filtering repack could remove some of those large blobs.

Some of the comments on the patch series that John sent were related
to the possible data loss and repo corruption that a filtering repack
could cause. It's indeed true that it could be very dangerous, so the
first version of this patch series asked the user to confirm the
command, either by answering 'Y' on the command line or by passing
`--force`.

In the discussion with Junio following that first version though, it
appeared that asking for such confirmation might not be necessary, so
the v2 removed those checks.

Taylor though asked what would happen to the 'remote.<name>.promisor'
and 'remote.<name>.partialclonefilter' config variables when a
filtering repack is run. As it seemed to me that we should just check
that a promisor remote has been configured and fail if that's not the
case, that was implemented in the third version of this patch series.

In the discussions following the first, second and third versions,
Junio commented that `git gc` was a better way for users to launch
filtering repacks then `git repack`, so in this v4 a new
'gc.repackFilter' config option is implemented that allows `git gc` to
perform filtering repacks. When this config option is set to a non
empty string, `git gc` will just add a `--filter=<filter-spec>`
argument to the repack processes it launches, with '<filter-spec>' set
to the value of 'gc.repackFilter'.

So the changes in this v4 compared to v3 are the following:

  - rebased on top of 57e2c6ebbe (Start the 2.40 cycle, 2022-12-14) to
    avoid a simple conflict,

  - simplified the test in patch 2/3 by using `grep -c ...` instead of
    `grep ... | wc -l`,

  - added patch 3/3 which implements a new 'gc.repackFilter' config
    option so that `git gc` can perform filtering repacks.

Thanks to Junio and Taylor for discussing the v1, v2 and v3, to John
Cai, who worked on the previous versions, to Jonathan Nieder, Jonathan
Tan and Taylor, who discussed this with me at the Git Merge and
Contributor Summit, and to Stolee, Taylor, Robert Coup and Junio who
discussed the versions John sent.

Range diff with v3:

1:  1e64cac782 < -:  ---------- pack-objects: allow --filter without --stdout
-:  ---------- > 1:  c2dca82dee pack-objects: allow --filter without --stdout
2:  7216a7bc05 ! 2:  1dcdba4b1d repack: add --filter=<filter-spec> option
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
     +                  write_promisor_file_1(line.buf);
                item->util = populate_pack_exts(item->string);
        }
    -   fclose(out);
    +   strbuf_release(&line);
     
      ## t/t7700-repack.sh ##
     @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavailable' '
    @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavaila
     +  git clone --bare --no-local server client &&
     +  git -C client config remote.origin.promisor true &&
     +  git -C client rev-list --objects --all --missing=print >objects &&
    -+  test $(grep "^?" objects | wc -l) = 0 &&
    ++  test $(grep -c "^?" objects) = 0 &&
     +  git -C client -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
     +  git -C client rev-list --objects --all --missing=print >objects &&
    -+  test $(grep "^?" objects | wc -l) = 1
    ++  test $(grep -c "^?" objects) = 1
     +'
     +
      objdir=.git/objects
-:  ---------- > 3:  6bb98b4b00 gc: add gc.repackFilter config option


Christian Couder (3):
  pack-objects: allow --filter without --stdout
  repack: add --filter=<filter-spec> option
  gc: add gc.repackFilter config option

 Documentation/config/gc.txt  |  9 +++++++++
 Documentation/git-repack.txt |  8 ++++++++
 builtin/gc.c                 |  6 ++++++
 builtin/pack-objects.c       |  8 ++------
 builtin/repack.c             | 28 +++++++++++++++++++++-------
 t/t6500-gc.sh                | 19 +++++++++++++++++++
 t/t7700-repack.sh            | 15 +++++++++++++++
 7 files changed, 80 insertions(+), 13 deletions(-)