diff mbox series

[v4,2/8] maintenance: add --schedule option and config

Message ID dae8c04bb5523c9b63c770862a1104a0ff4aa6c4.1602782524.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Maintenance III: Background maintenance | expand

Commit Message

Derrick Stolee Oct. 15, 2020, 5:21 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++
 Documentation/git-maintenance.txt    | 13 +++++-
 builtin/gc.c                         | 64 ++++++++++++++++++++++++++--
 t/t7900-maintenance.sh               | 40 +++++++++++++++++
 4 files changed, 118 insertions(+), 4 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Feb. 9, 2021, 2:06 p.m. UTC | #1
On Thu, Oct 15 2020, Derrick Stolee via GitGitGadget wrote:

> +--schedule::
> +	When combined with the `run` subcommand, run maintenance tasks
> +	only if certain time conditions are met, as specified by the
> +	`maintenance.<task>.schedule` config value for each `<task>`.
> +	This config value specifies a number of seconds since the last
> +	time that task ran, according to the `maintenance.<task>.lastRun`
> +	config value. The tasks that are tested are those provided by
> +	the `--task=<task>` option(s) or those with
> +	`maintenance.<task>.enabled` set to true.

I see from searching on list and from spying on your repo that patches
for this maintenance.<task>.lastRun feature exist, but there's no code
for it in git.git.

So we've got a 2.30.0 release with a mention of that, and it can't work,
because it's only in the doc due to b08ff1fee00 (maintenance: add
--schedule option and config, 2020-09-11).
Derrick Stolee Feb. 9, 2021, 4:54 p.m. UTC | #2
On 2/9/2021 9:06 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 15 2020, Derrick Stolee via GitGitGadget wrote:
> 
>> +--schedule::
>> +	When combined with the `run` subcommand, run maintenance tasks
>> +	only if certain time conditions are met, as specified by the
>> +	`maintenance.<task>.schedule` config value for each `<task>`.
>> +	This config value specifies a number of seconds since the last
>> +	time that task ran, according to the `maintenance.<task>.lastRun`
>> +	config value. The tasks that are tested are those provided by
>> +	the `--task=<task>` option(s) or those with
>> +	`maintenance.<task>.enabled` set to true.
> 
> I see from searching on list and from spying on your repo that patches
> for this maintenance.<task>.lastRun feature exist, but there's no code
> for it in git.git.
> 
> So we've got a 2.30.0 release with a mention of that, and it can't work,
> because it's only in the doc due to b08ff1fee00 (maintenance: add
> --schedule option and config, 2020-09-11).

Thank you for pointing out this docbug. This is based on an early
version of the patch series and should have been changed.

Please see this patch which attempts to do a better job. I can
create a new thread with this submission if we need more edits.

Thanks,
-Stolee

--- >8 ---

From 46436b06caf65ee824e781603a8108413bb87705 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 9 Feb 2021 11:51:32 -0500
Subject: [PATCH] maintenance: properly document --schedule
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The documentation for the '--schedule' option is incorrect and based on
an early version of the background maintenance feature. Update the
documentation to describe the actual use of the option.

The most important thing is that Git takes this option as a hint for
which tasks it should run. Users should not run this command arbitrarily
and expect that Git will enforce some timing restrictions.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 6fec1eb8dc2..d4b5aea6760 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -155,15 +155,15 @@ OPTIONS
 	exceeds the `gc.autoPackLimit` config setting. Not compatible with
 	the `--schedule` option.
 
---schedule::
+--schedule=<frequency>::
 	When combined with the `run` subcommand, run maintenance tasks
-	only if certain time conditions are met, as specified by the
-	`maintenance.<task>.schedule` config value for each `<task>`.
-	This config value specifies a number of seconds since the last
-	time that task ran, according to the `maintenance.<task>.lastRun`
-	config value. The tasks that are tested are those provided by
-	the `--task=<task>` option(s) or those with
-	`maintenance.<task>.enabled` set to true.
+	whose `maintenance.<task>.schedule` config value is equal to
+	`<frequency>`. There is no timing restriction imposed by this
+	option, but instead is used to inform the Git process which
+	frequency to use. The command scheduler created by
+	`git maintenance start` runs this command with `<frequency>`
+	equal to `hourly`, `daily`, and `weekly` at the appropriate
+	intervals.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
Ævar Arnfjörð Bjarmason May 10, 2021, 12:16 p.m. UTC | #3
On Tue, Feb 09 2021, Derrick Stolee wrote:

> On 2/9/2021 9:06 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Oct 15 2020, Derrick Stolee via GitGitGadget wrote:
>> 
>>> +--schedule::
>>> +	When combined with the `run` subcommand, run maintenance tasks
>>> +	only if certain time conditions are met, as specified by the
>>> +	`maintenance.<task>.schedule` config value for each `<task>`.
>>> +	This config value specifies a number of seconds since the last
>>> +	time that task ran, according to the `maintenance.<task>.lastRun`
>>> +	config value. The tasks that are tested are those provided by
>>> +	the `--task=<task>` option(s) or those with
>>> +	`maintenance.<task>.enabled` set to true.
>> 
>> I see from searching on list and from spying on your repo that patches
>> for this maintenance.<task>.lastRun feature exist, but there's no code
>> for it in git.git.
>> 
>> So we've got a 2.30.0 release with a mention of that, and it can't work,
>> because it's only in the doc due to b08ff1fee00 (maintenance: add
>> --schedule option and config, 2020-09-11).
>
> Thank you for pointing out this docbug. This is based on an early
> version of the patch series and should have been changed.
>
> Please see this patch which attempts to do a better job. I can
> create a new thread with this submission if we need more edits.
>
> Thanks,
> -Stolee
>
> --- >8 ---
>
> From 46436b06caf65ee824e781603a8108413bb87705 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 9 Feb 2021 11:51:32 -0500
> Subject: [PATCH] maintenance: properly document --schedule
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> The documentation for the '--schedule' option is incorrect and based on
> an early version of the background maintenance feature. Update the
> documentation to describe the actual use of the option.
>
> The most important thing is that Git takes this option as a hint for
> which tasks it should run. Users should not run this command arbitrarily
> and expect that Git will enforce some timing restrictions.
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-maintenance.txt | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 6fec1eb8dc2..d4b5aea6760 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -155,15 +155,15 @@ OPTIONS
>  	exceeds the `gc.autoPackLimit` config setting. Not compatible with
>  	the `--schedule` option.
>  
> ---schedule::
> +--schedule=<frequency>::
>  	When combined with the `run` subcommand, run maintenance tasks
> -	only if certain time conditions are met, as specified by the
> -	`maintenance.<task>.schedule` config value for each `<task>`.
> -	This config value specifies a number of seconds since the last
> -	time that task ran, according to the `maintenance.<task>.lastRun`
> -	config value. The tasks that are tested are those provided by
> -	the `--task=<task>` option(s) or those with
> -	`maintenance.<task>.enabled` set to true.
> +	whose `maintenance.<task>.schedule` config value is equal to
> +	`<frequency>`. There is no timing restriction imposed by this
> +	option, but instead is used to inform the Git process which
> +	frequency to use. The command scheduler created by
> +	`git maintenance start` runs this command with `<frequency>`
> +	equal to `hourly`, `daily`, and `weekly` at the appropriate
> +	intervals.
>  
>  --quiet::
>  	Do not report progress or other information over `stderr`.

+ CC Junio

Late reply, I was reminded of this again. This patch looks good to me,
but I see it never got picked up, and our docs still mention an
unsupported "lastRun". Junio: I think it makes sense to just pick this
up...
Junio C Hamano May 10, 2021, 6:42 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Feb 09 2021, Derrick Stolee wrote:
>
>> On 2/9/2021 9:06 AM, Ævar Arnfjörð Bjarmason wrote:
>>> 
>>> On Thu, Oct 15 2020, Derrick Stolee via GitGitGadget wrote:
>>> 
>>>> +--schedule::
>>>> +	When combined with the `run` subcommand, run maintenance tasks
>>>> +	only if certain time conditions are met, as specified by the
>>>> +	`maintenance.<task>.schedule` config value for each `<task>`.
>>>> +	This config value specifies a number of seconds since the last
>>>> +	time that task ran, according to the `maintenance.<task>.lastRun`
>>>> +	config value. The tasks that are tested are those provided by
>>>> +	the `--task=<task>` option(s) or those with
>>>> +	`maintenance.<task>.enabled` set to true.
>>> 
>>> I see from searching on list and from spying on your repo that patches
>>> for this maintenance.<task>.lastRun feature exist, but there's no code
>>> for it in git.git.
>>> 
>>> So we've got a 2.30.0 release with a mention of that, and it can't work,
>>> because it's only in the doc due to b08ff1fee00 (maintenance: add
>>> --schedule option and config, 2020-09-11).
>>
>> Thank you for pointing out this docbug. This is based on an early
>> version of the patch series and should have been changed.
>>
>> Please see this patch which attempts to do a better job. I can
>> create a new thread with this submission if we need more edits.

Or resend it in a new thread for better visibility, with or without
change, with a mention of the original under the three-dash line.
Nobody can tell, with the above comment, if you abandoned the patch
or if you thought it is good enough.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 06db758172..70585564fa 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -10,6 +10,11 @@  maintenance.<task>.enabled::
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
 
+maintenance.<task>.schedule::
+	This config option controls whether or not the given `<task>` runs
+	during a `git maintenance run --schedule=<frequency>` command. The
+	value must be one of "hourly", "daily", or "weekly".
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 3f5d8946b4..ed94f66e36 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -110,7 +110,18 @@  OPTIONS
 	only if certain thresholds are met. For example, the `gc` task
 	runs when the number of loose objects exceeds the number stored
 	in the `gc.auto` config setting, or when the number of pack-files
-	exceeds the `gc.autoPackLimit` config setting.
+	exceeds the `gc.autoPackLimit` config setting. Not compatible with
+	the `--schedule` option.
+
+--schedule::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain time conditions are met, as specified by the
+	`maintenance.<task>.schedule` config value for each `<task>`.
+	This config value specifies a number of seconds since the last
+	time that task ran, according to the `maintenance.<task>.lastRun`
+	config value. The tasks that are tested are those provided by
+	the `--task=<task>` option(s) or those with
+	`maintenance.<task>.enabled` set to true.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
diff --git a/builtin/gc.c b/builtin/gc.c
index 2b99596ec8..03b24ea0db 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -703,14 +703,51 @@  int cmd_gc(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
-static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
+static const char *const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--schedule]"),
 	NULL
 };
 
+enum schedule_priority {
+	SCHEDULE_NONE = 0,
+	SCHEDULE_WEEKLY = 1,
+	SCHEDULE_DAILY = 2,
+	SCHEDULE_HOURLY = 3,
+};
+
+static enum schedule_priority parse_schedule(const char *value)
+{
+	if (!value)
+		return SCHEDULE_NONE;
+	if (!strcasecmp(value, "hourly"))
+		return SCHEDULE_HOURLY;
+	if (!strcasecmp(value, "daily"))
+		return SCHEDULE_DAILY;
+	if (!strcasecmp(value, "weekly"))
+		return SCHEDULE_WEEKLY;
+	return SCHEDULE_NONE;
+}
+
+static int maintenance_opt_schedule(const struct option *opt, const char *arg,
+				    int unset)
+{
+	enum schedule_priority *priority = opt->value;
+
+	if (unset)
+		die(_("--no-schedule is not allowed"));
+
+	*priority = parse_schedule(arg);
+
+	if (!*priority)
+		die(_("unrecognized --schedule argument '%s'"), arg);
+
+	return 0;
+}
+
 struct maintenance_run_opts {
 	int auto_flag;
 	int quiet;
+	enum schedule_priority schedule;
 };
 
 /* Remember to update object flag allocation in object.h */
@@ -1158,6 +1195,8 @@  struct maintenance_task {
 	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
+	enum schedule_priority schedule;
+
 	/* -1 if not selected. */
 	int selected_order;
 };
@@ -1253,6 +1292,9 @@  static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		if (opts->schedule && tasks[i].schedule < opts->schedule)
+			continue;
+
 		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
@@ -1273,13 +1315,23 @@  static void initialize_task_config(void)
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
+		char *config_str;
 
-		strbuf_setlen(&config_name, 0);
+		strbuf_reset(&config_name);
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 
 		if (!git_config_get_bool(config_name.buf, &config_value))
 			tasks[i].enabled = config_value;
+
+		strbuf_reset(&config_name);
+		strbuf_addf(&config_name, "maintenance.%s.schedule",
+			    tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &config_str)) {
+			tasks[i].schedule = parse_schedule(config_str);
+			free(config_str);
+		}
 	}
 
 	strbuf_release(&config_name);
@@ -1323,6 +1375,9 @@  static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"),
+			     N_("run tasks based on frequency"),
+			     maintenance_opt_schedule),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
 		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
@@ -1343,6 +1398,9 @@  static int maintenance_run(int argc, const char **argv, const char *prefix)
 			     builtin_maintenance_run_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (opts.auto_flag && opts.schedule)
+		die(_("use at most one of --auto and --schedule=<frequency>"));
+
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c7caaa7a55..33d73cd01c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -260,4 +260,44 @@  test_expect_success 'maintenance.incremental-repack.auto' '
 	test_subcommand git multi-pack-index write --no-progress <trace-B
 '
 
+test_expect_success '--auto and --schedule incompatible' '
+	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
+	test_i18ngrep "at most one" err
+'
+
+test_expect_success 'invalid --schedule value' '
+	test_must_fail git maintenance run --schedule=annually 2>err &&
+	test_i18ngrep "unrecognized --schedule" err
+'
+
+test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
+	git config maintenance.loose-objects.enabled true &&
+	git config maintenance.loose-objects.schedule hourly &&
+	git config maintenance.commit-graph.enabled true &&
+	git config maintenance.commit-graph.schedule daily &&
+	git config maintenance.incremental-repack.enabled true &&
+	git config maintenance.incremental-repack.schedule weekly &&
+
+	GIT_TRACE2_EVENT="$(pwd)/hourly.txt" \
+		git maintenance run --schedule=hourly 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <hourly.txt &&
+	test_subcommand ! git commit-graph write --split --reachable \
+		--no-progress <hourly.txt &&
+	test_subcommand ! git multi-pack-index write --no-progress <hourly.txt &&
+
+	GIT_TRACE2_EVENT="$(pwd)/daily.txt" \
+		git maintenance run --schedule=daily 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <daily.txt &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <daily.txt &&
+	test_subcommand ! git multi-pack-index write --no-progress <daily.txt &&
+
+	GIT_TRACE2_EVENT="$(pwd)/weekly.txt" \
+		git maintenance run --schedule=weekly 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <weekly.txt &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <weekly.txt &&
+	test_subcommand git multi-pack-index write --no-progress <weekly.txt
+'
+
 test_done