mbox series

[v3,0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks

Message ID pull.696.v3.git.1598380599.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Maintenance II: prefetch, loose-objects, incremental-repack tasks | expand

Message

Philippe Blain via GitGitGadget Aug. 25, 2020, 6:36 p.m. UTC
This series is based on v3 of part I (ds/maintenance-part-1) [2].

This patch series contains 9 patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help, I'm
splitting out the portions that create and test the 'maintenance' builtin
from the additional tasks (prefetch, loose-objects, incremental-repack) that
can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/
[2] 
https://lore.kernel.org/git/pull.695.v3.git.1598380426.gitgitgadget@gmail.com/

As detailed in [2], the 'git maintenance run' subcommand will run certain
tasks based on config options or the --task= arguments. The --auto option
indicates to the task to only run based on some internal check that there
has been "enough" change in that domain to merit the work. In the case of
the 'gc' task, this also reduces the amount of work done. 

The new maintenance tasks in this series are:

 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". 

Since [2] replaced the 'git gc --auto' calls with 'git maintenance run
--auto' at the end of some Git commands, users could replace the 'gc' task
with these lighter-weight changes for foreground maintenance.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves. I have an RFC series for
this available at [3].

[3] 
https://lore.kernel.org/git/pull.680.git.1597857408.gitgitgadget@gmail.com/

Updates since v2
================

 * Dropped "fetch: optionally allow disabling FETCH_HEAD update"
   
   
 * A lot of fallout from the change in the option parsing in v3 of
   Maintenance II.
   
   
 * Dropped the "verify, and delete and rewrite on failure" logic from the
   incremental-repack task. This might be added again later after it can be
   tested more thoroughly.
   
   

Updates since v1 (of this series)
=================================

 * PATCH 1 ("fetch: optionally allow disabling FETCH_HEAD update") was
   rewritten on-list. Getting a version out with this patch is the main
   reason for rolling a v2. (That, and Part I is re-rolled with a v2 and I
   want to make sure this series applies cleanly.)
   
   
 * The 'prefetch' and 'loose-objects' tasks had some review, but my proposed
   changes were not acked, so they may need another review.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test
   that verifies it.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (8):
  maintenance: add prefetch task
  maintenance: add loose-objects task
  maintenance: create auto condition for loose-objects
  midx: enable core.multiPackIndex by default
  midx: use start_delayed_progress()
  maintenance: add incremental-repack task
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack auto condition

 Documentation/config/core.txt        |   4 +-
 Documentation/config/maintenance.txt |  18 ++
 Documentation/git-maintenance.txt    |  45 ++++
 builtin/gc.c                         | 327 +++++++++++++++++++++++++++
 midx.c                               |  23 +-
 midx.h                               |   1 +
 repo-settings.c                      |   6 +
 repository.h                         |   2 +
 t/t5319-multi-pack-index.sh          |  15 +-
 t/t7900-maintenance.sh               | 191 ++++++++++++++++
 10 files changed, 609 insertions(+), 23 deletions(-)


base-commit: 652a8eac57d04a51820c7a5b45031b50c5188e7b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/696

Range-diff vs v2:

  1:  f3bc0b2d92 <  -:  ---------- fetch: optionally allow disabling FETCH_HEAD update
  2:  8779c6c20d !  1:  da64c51a81 maintenance: add prefetch task
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: since it will not expire `.graph` files that were in the previous
     - `commit-graph-chain` file. They will be deleted by a later run based on
     - the expiration delay.
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + 	`commit-graph-chain` file. They will be deleted by a later run based
     + 	on the expiration delay.
       
      +prefetch::
      +	The `prefetch` task updates the object directory with the latest
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     - 	return 1;
     +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
     + 	return 0;
       }
       
     -+static int fetch_remote(const char *remote, struct maintenance_opts *opts)
     ++static int fetch_remote(const char *remote, struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return 0;
      +}
      +
     -+static int maintenance_task_prefetch(struct maintenance_opts *opts)
     ++static int maintenance_task_prefetch(struct maintenance_run_opts *opts)
      +{
      +	int result = 0;
      +	struct string_list_item *item;
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return result;
      +}
      +
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: struct maintenance_task {
  3:  4fa9d298b9 !  2:  75e846456b maintenance: add loose-objects task
     @@ Documentation/git-maintenance.txt: gc::
       --auto::
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       	return run_command(&child);
       }
       
     -+static int prune_packed(struct maintenance_opts *opts)
     ++static int prune_packed(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return ++(d->count) > d->batch_size;
      +}
      +
     -+static int pack_loose(struct maintenance_opts *opts)
     ++static int pack_loose(struct maintenance_run_opts *opts)
      +{
      +	struct repository *r = the_repository;
      +	int result = 0;
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return result;
      +}
      +
     -+static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     ++static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
      +{
      +	return prune_packed(opts) || pack_loose(opts);
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  4:  3432bc3167 =  3:  d6e382c43e maintenance: create auto condition for loose-objects
  5:  0ee2434bdb =  4:  d0f2ec70d9 midx: enable core.multiPackIndex by default
  6:  e157ea8dd7 =  5:  2cd3c803d9 midx: use start_delayed_progress()
  7:  a8d956dad6 !  6:  0dd26bb584 maintenance: add incremental-repack task
     @@ Commit message
             it requires doing some calculations that are better isolated to
             a separate change.
      
     -    Each of the above steps update the multi-pack-index file. After
     -    each step, we verify the new multi-pack-index. If the new
     -    multi-pack-index is corrupt, then delete the multi-pack-index,
     -    rewrite it from scratch, and stop doing the later steps of the
     -    job. This is intended to be an extra-safe check without leaving
     -    a repo with many pack-files without a multi-pack-index.
     -
          These steps are based on a similar background maintenance step in
          Scalar (and VFS for Git) [1]. This was incredibly effective for
          users of the Windows OS repository. After using the same VFS for Git
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     -+static int multi_pack_index_write(struct maintenance_opts *opts)
     ++static int multi_pack_index_write(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int rewrite_multi_pack_index(struct maintenance_opts *opts)
     -+{
     -+	struct repository *r = the_repository;
     -+	char *midx_name = get_midx_filename(r->objects->odb->path);
     -+
     -+	unlink(midx_name);
     -+	free(midx_name);
     -+
     -+	return multi_pack_index_write(opts);
     -+}
     -+
     -+static int multi_pack_index_verify(struct maintenance_opts *opts,
     -+				   const char *message)
     -+{
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+
     -+	child.git_cmd = 1;
     -+	strvec_pushl(&child.args, "multi-pack-index", "verify", NULL);
     -+
     -+	if (opts->quiet)
     -+		strvec_push(&child.args, "--no-progress");
     -+
     -+	if (run_command(&child)) {
     -+		warning(_("'git multi-pack-index verify' failed %s"), message);
     -+		return 1;
     -+	}
     -+
     -+	return 0;
     -+}
     -+
     -+static int multi_pack_index_expire(struct maintenance_opts *opts)
     ++static int multi_pack_index_expire(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int multi_pack_index_repack(struct maintenance_opts *opts)
     ++static int multi_pack_index_repack(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int maintenance_task_incremental_repack(struct maintenance_opts *opts)
     ++static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts)
      +{
      +	prepare_repo_settings(the_repository);
      +	if (!the_repository->settings.core_multi_pack_index) {
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +
      +	if (multi_pack_index_write(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after initial write"))
     -+		return rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_expire(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after expire step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_repack(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after repack step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	return 0;
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  8:  f0e7276755 !  7:  f3b25a9927 maintenance: auto-size incremental-repack batch
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_run_opts *opts)
       	return 0;
       }
       
     @@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
      +	return result_size;
      +}
      +
     - static int multi_pack_index_repack(struct maintenance_opts *opts)
     + static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
     -@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       	if (opts->quiet)
       		strvec_push(&child.args, "--no-progress");
       
  9:  5659a23ad5 !  8:  e9bb32f53a maintenance: add incremental-repack auto condition
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return count >= incremental_repack_auto_limit;
      +}
      +
     - static int multi_pack_index_write(struct maintenance_opts *opts)
     + static int multi_pack_index_write(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {

Comments

Junio C Hamano Aug. 25, 2020, 8:59 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series is based on v3 of part I (ds/maintenance-part-1) [2].
>
> This patch series contains 9 patches that were going to be part of v4 of
> ds/maintenance [1], but the discussion has gotten really long. To help, I'm
> splitting out the portions that create and test the 'maintenance' builtin
> from the additional tasks (prefetch, loose-objects, incremental-repack) that
> can be brought in later.

I gave it a quick look but the changes mostly are fallout from
renaming the options structure to maitenance_RUN_opts and loss of
midx verify while a task rewrites midx; iow, no significant change
that are likely to become controversial.
Son Luong Ngoc Aug. 26, 2020, 3:15 p.m. UTC | #2
Hi Derrick,

> On Aug 25, 2020, at 20:36, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> 

...

> 
> Updates since v2
> ================
> 
> * Dropped "fetch: optionally allow disabling FETCH_HEAD update"
> 
> 
> * A lot of fallout from the change in the option parsing in v3 of
>   Maintenance II.
> 
> 
> * Dropped the "verify, and delete and rewrite on failure" logic from the
>   incremental-repack task. This might be added again later after it can be
>   tested more thoroughly.

Perhaps I missed some conversations related to this change but
why was this verify-rewrite strategy dropped?

Was the problem such strategy were created to solve is now no longer a concern?

I feel like it would be much better to add it in and then remove it using a separated commit?
That way we can follow the reasoning behind these decisions via commit message.

> 

...

> 
> -- 
> gitgitgadget

Thanks,
Son Luong.
Derrick Stolee Aug. 26, 2020, 4:21 p.m. UTC | #3
On 8/26/2020 11:15 AM, Son Luong Ngoc wrote:
> Hi Derrick,
> 
>> On Aug 25, 2020, at 20:36, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> * Dropped the "verify, and delete and rewrite on failure" logic from the
>>   incremental-repack task. This might be added again later after it can be
>>   tested more thoroughly.
> 
> Perhaps I missed some conversations related to this change but
> why was this verify-rewrite strategy dropped?
> 
> Was the problem such strategy were created to solve is now no longer a concern?
> 
> I feel like it would be much better to add it in and then remove it using a separated commit?
> That way we can follow the reasoning behind these decisions via commit message.

The most-recent message was [1]

[1] https://lore.kernel.org/git/20200819174322.3087791-1-jonathantanmy@google.com/

For now, I'd rather move forward with this simpler task
and I will revisit the "verify and fix" situation when
it can be done in a focused way instead of being surrounded
by builtin boilerplate and other basics of the maintenance
feature. Specifically, it would help to have a way to test
the logic. In Scalar, I was able to mock the Git commands
and return failures in specific places. A similar approach
could be done here, or perhaps there is another way to be
confident that the "verify and fix" logic is actually
helpful.

Thanks,
-Stolee