Message ID | e3ef0b9bea4a31c72ce88841639e88355408f0c1.1598380805.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Maintenance III: background maintenance | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > I selected the timestamp before the task, as opposed to after the task, > for a couple reasons: > > 1. The time the task takes to execute should not contribute to the > interval between running the tasks. ... as long as the run time is sufficiently shorter than the interval, that is. If a task takes 10-30 minutes depending on how dirty the repository is, it does not make sense to even try to run it every 15 minutes. > If a daily task takes 10 minutes > to run, then every day the execution will drift by at least 10 > minutes. That is not incorrect per-se, but it does not tell us why drifting by 10 minutes is a bad thing. > 2. If the task fails for some unforseen reason, it would be good to > indicate that we _attempted_ the task at a certain timestamp. This > will avoid spamming a repository that is in a bad state. Absolutely. > +static void update_last_run(struct maintenance_task *task) > +{ > + timestamp_t now = approxidate("now"); > + struct strbuf config = STRBUF_INIT; > + struct strbuf value = STRBUF_INIT; > + strbuf_addf(&config, "maintenance.%s.lastrun", task->name); > + strbuf_addf(&value, "%"PRItime"", now); So is this essentially meant as a human-unreadable opaque value, like we have in the commit object header lines? I do not have a strong opinion, but it would be nice to allow curious to casually read it. Perhaps "git config --type=timestamp maintenance.lastrun" can be taught to pretty print its value?
On 8/25/2020 5:52 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> I selected the timestamp before the task, as opposed to after the task, >> for a couple reasons: >> >> 1. The time the task takes to execute should not contribute to the >> interval between running the tasks. > > ... as long as the run time is sufficiently shorter than the > interval, that is. If a task takes 10-30 minutes depending on how > dirty the repository is, it does not make sense to even try to run > it every 15 minutes. Definitely. The lock on the object database from earlier prevents these longer-than-anticipated tasks from stacking. >> If a daily task takes 10 minutes >> to run, then every day the execution will drift by at least 10 >> minutes. > > That is not incorrect per-se, but it does not tell us why drifting > by 10 minutes is a bad thing. True. >> 2. If the task fails for some unforseen reason, it would be good to >> indicate that we _attempted_ the task at a certain timestamp. This >> will avoid spamming a repository that is in a bad state. > > Absolutely. > >> +static void update_last_run(struct maintenance_task *task) >> +{ >> + timestamp_t now = approxidate("now"); >> + struct strbuf config = STRBUF_INIT; >> + struct strbuf value = STRBUF_INIT; >> + strbuf_addf(&config, "maintenance.%s.lastrun", task->name); >> + strbuf_addf(&value, "%"PRItime"", now); > > So is this essentially meant as a human-unreadable opaque value, > like we have in the commit object header lines? I do not have a > strong opinion, but it would be nice to allow curious to casually > read it. Perhaps "git config --type=timestamp maintenance.lastrun" > can be taught to pretty print its value? Good idea. I will think on this. Of course, we already have --type=expiry-date, which does the opposite. Perhaps this config value should be a human-readable date and then be parsed into a timestamp in-process using git_config_expiry_date()? I have mixed feelings on using that format, because it can store both a fixed or relative datetime. The *.lastRun config really wants a _fixed_ datetime, but the *.schedule config (in the next patch) would want a _relative_ datetime. This also allows things like "now" or "never", so it presents a lot of flexibility for users. A nightmare to test, but perhaps that flexibility is useful. (Of course, in another thread you mentioned multiple `crontab` lines, which might make this entire discussion irrelevant. I'll follow up there.) Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: >>> 1. The time the task takes to execute should not contribute to the >>> interval between running the tasks. >> >> ... as long as the run time is sufficiently shorter than the >> interval, that is. If a task takes 10-30 minutes depending on how >> dirty the repository is, it does not make sense to even try to run >> it every 15 minutes. > > Definitely. The lock on the object database from earlier prevents these > longer-than-anticipated tasks from stacking. Hmph, I actually was (anticipating|hoping) that you would give a good argument for having maintenance subsystem in change of scheduling rather than cron, as it can monitor how the already running job is goind and skip one cycle if needed. The above is instead a good argument that independent cron jobs can still coordinate and there is no central and custom scheduler in the form of 'maintenance run'. >>> 2. If the task fails for some unforseen reason, it would be good to >>> indicate that we _attempted_ the task at a certain timestamp. This >>> will avoid spamming a repository that is in a bad state. >> >> Absolutely. Somebody already mentioned that using the configuration file for recordkeeping may not be a good idea, and I tend to agree, by the way. I may want to periodically take a snapshot of my configuration to notice and remember changes I made myself intentionally (e.g. switched access method of a hosting site from ssh:// to https://, added a new branch that builds on something else, etc.) by comparing the snapshot with previous ones (and might even put it under version-control) and mechanical noise would interfere with it.
On 8/26/2020 1:03 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>>> 1. The time the task takes to execute should not contribute to the >>>> interval between running the tasks. >>> >>> ... as long as the run time is sufficiently shorter than the >>> interval, that is. If a task takes 10-30 minutes depending on how >>> dirty the repository is, it does not make sense to even try to run >>> it every 15 minutes. >> >> Definitely. The lock on the object database from earlier prevents these >> longer-than-anticipated tasks from stacking. > > Hmph, I actually was (anticipating|hoping) that you would give a > good argument for having maintenance subsystem in change of > scheduling rather than cron, as it can monitor how the already > running job is goind and skip one cycle if needed. The above is > instead a good argument that independent cron jobs can still > coordinate and there is no central and custom scheduler in the form > of 'maintenance run'. While the lock does prevent concurrent 'maintenance run' commands from colliding and causing unpredictable behavior as they both modify the object database, this does not help ensure that maintenance tasks actually happen if certain tasks are fired independently by cron and consistently collide. This is the main motivation for me using a single crontab entry. More discussion of all of the tradeoffs is in [1]. [1] https://lore.kernel.org/git/bd4e18b7-6265-73e7-bc1a-a7d647eafd0a@gmail.com/ >>>> 2. If the task fails for some unforseen reason, it would be good to >>>> indicate that we _attempted_ the task at a certain timestamp. This >>>> will avoid spamming a repository that is in a bad state. >>> >>> Absolutely. > > Somebody already mentioned that using the configuration file for > recordkeeping may not be a good idea, and I tend to agree, by the > way. I may want to periodically take a snapshot of my configuration > to notice and remember changes I made myself intentionally > (e.g. switched access method of a hosting site from ssh:// to > https://, added a new branch that builds on something else, etc.) by > comparing the snapshot with previous ones (and might even put it > under version-control) and mechanical noise would interfere with it. I will think of another way to handle this, then. If we cannot infer that "this task was launched, therefore it is due to run" from an optimal cron schedule, then I'll probably create a new file in the .git repository that stores these values. That file would be in the config format to make parsing easy. Thanks, -Stolee
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt index 06db758172..8dd34169da 100644 --- a/Documentation/config/maintenance.txt +++ b/Documentation/config/maintenance.txt @@ -10,6 +10,11 @@ maintenance.<task>.enabled:: `--task` option exists. By default, only `maintenance.gc.enabled` is true. +maintenance.<task>.lastRun:: + This config value is automatically updated by Git when the task + `<task>` is run. It stores a timestamp representing the most-recent + run of the `<task>`. + maintenance.commit-graph.auto:: This integer config option controls how often the `commit-graph` task should be run as part of `git maintenance run --auto`. If zero, then diff --git a/builtin/gc.c b/builtin/gc.c index f8459df04c..fb6f231a5c 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1212,6 +1212,20 @@ static int compare_tasks_by_selection(const void *a_, const void *b_) return b->selected_order - a->selected_order; } +static void update_last_run(struct maintenance_task *task) +{ + timestamp_t now = approxidate("now"); + struct strbuf config = STRBUF_INIT; + struct strbuf value = STRBUF_INIT; + strbuf_addf(&config, "maintenance.%s.lastrun", task->name); + strbuf_addf(&value, "%"PRItime"", now); + + git_config_set(config.buf, value.buf); + + strbuf_release(&config); + strbuf_release(&value); +} + static int maintenance_run_tasks(struct maintenance_run_opts *opts) { int i, found_selected = 0; @@ -1254,6 +1268,8 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts) !tasks[i].auto_condition())) continue; + update_last_run(&tasks[i]); + trace2_region_enter("maintenance", tasks[i].name, r); if (tasks[i].fn(opts)) { error(_("task '%s' failed"), tasks[i].name); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index e0ba19e1ff..a985ce3674 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -264,4 +264,14 @@ test_expect_success 'maintenance.incremental-repack.auto' ' done ' +test_expect_success 'tasks update maintenance.<task>.lastRun' ' + git config --unset maintenance.commit-graph.lastrun && + GIT_TRACE2_EVENT="$(pwd)/run.txt" \ + GIT_TEST_DATE_NOW=1595000000 \ + git maintenance run --task=commit-graph 2>/dev/null && + test_subcommand git commit-graph write --split --reachable \ + --no-progress <run.txt && + test_cmp_config 1595000000 maintenance.commit-graph.lastrun +' + test_done