mbox series

[v2,0/6] maintenance: use packaged systemd units

Message ID 20240322221327.12204-1-mg@max.gautier.name (mailing list archive)
Headers show
Series maintenance: use packaged systemd units | expand

Message

Max Gautier March 22, 2024, 10:11 p.m. UTC
* Distribute the systemd timers used by the `git maintenance start` with
  the systemd scheduler as part of git, rather than writing them in
  $XDG_CONFIG_HOME.

This allows users to override the units if they wish, and is more
in-line with the usual practices of distribution for systemd units.

We also move away from using the random minute, and instead rely on
systemd features to achieve the same goal (see patch 2). This allows us
to go back to using unit templating for the timers. This is also a
prerequisite to have static unit files.

Note that even if we really need more specific OnCalendar= settings for
each timer, we should still do it that way, but instead distribute
override alongside the template, for instance for weekly:

/usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
[Timer]
OnCalendar=<daily specific calendar spec>

The cleanup code for the units written in $XDG_CONFIG_HOME is adapted,
and takes care of not removing legitimate user overrides, by checking
the file start.

Testing:
The simplest way to simulate having the units in /usr/lib is probably to
copy them in /etc/systemd/user.

Changes since v1:
- Reorganization of the commits and their messages to try to address
  review comments
- Dropped the DON'T APPLY PATCH, added a TODO to the cleanup code
  instead
- Updated the git-maintenance tests to work with the new logic.
- Conditional installation of the units files
- Fixing some style/consistency issues
- template the systemd service file to use $(bindir)

Max Gautier (6):
  maintenance: use systemd timers builtin randomization
  maintenance: use packaged systemd units
  maintenance: simplify systemctl calls
  maintenance: cleanup $XDG_CONFIG_HOME/systemd/user
  maintenance: update systemd scheduler docs
  maintenance: update tests for systemd scheduler

 Documentation/git-maintenance.txt        |  33 ++-
 Makefile                                 |   5 +
 builtin/gc.c                             | 298 ++++-------------------
 config.mak.uname                         |  10 +
 systemd/user/git-maintenance@.service.in |  17 ++
 systemd/user/git-maintenance@.timer      |  12 +
 t/t7900-maintenance.sh                   |  50 ++--
 7 files changed, 126 insertions(+), 299 deletions(-)
 create mode 100644 systemd/user/git-maintenance@.service.in
 create mode 100644 systemd/user/git-maintenance@.timer

Range-diff against v1:
1:  ea54a6e50e < -:  ---------- maintenance: package systemd units
2:  b29dbb9fdd < -:  ---------- maintenance: use packaged systemd units
3:  47bd6712b8 < -:  ---------- maintenance: add fixed random delay to systemd timers
-:  ---------- > 1:  42d88c7f81 maintenance: use systemd timers builtin randomization
-:  ---------- > 2:  18d51b1dd1 maintenance: use packaged systemd units
-:  ---------- > 3:  3aa7446e95 maintenance: simplify systemctl calls
-:  ---------- > 4:  daff7b4d60 maintenance: cleanup $XDG_CONFIG_HOME/systemd/user
4:  fac57db55e ! 5:  5f6a8e141f maintenance: update systemd scheduler docs
    @@ Metadata
      ## Commit message ##
         maintenance: update systemd scheduler docs
     
    +    The `git maintenance` systemd scheduler no longer writes units in
    +    $XDG_CONFIG_HOME.
    +
    +    Describe the new behavior.
    +    Instead of explaining manual ways to modify the timer, suggest the
    +    systemd standard tool: `systemctl edit`.
    +
         Signed-off-by: Max Gautier <mg@max.gautier.name>
     
      ## Documentation/git-maintenance.txt ##
5:  d888fbd0c3 < -:  ---------- DON'T APPLY YET: maintenance: remove cleanup code
-:  ---------- > 6:  4d4bcd6233 maintenance: update tests for systemd scheduler

Comments

Phillip Wood March 24, 2024, 2:54 p.m. UTC | #1
Hi Max

On 22/03/2024 22:11, Max Gautier wrote:
> * Distribute the systemd timers used by the `git maintenance start` with
>    the systemd scheduler as part of git, rather than writing them in
>    $XDG_CONFIG_HOME.
> 
> This allows users to override the units if they wish, and is more
> in-line with the usual practices of distribution for systemd units.

Thanks for suggesting this, I think this is a useful change, but the 
implementation could be improved.

> We also move away from using the random minute, and instead rely on
> systemd features to achieve the same goal (see patch 2). This allows us
> to go back to using unit templating for the timers. This is also a
> prerequisite to have static unit files.
> 
> Note that even if we really need more specific OnCalendar= settings for
> each timer, we should still do it that way, but instead distribute
> override alongside the template, for instance for weekly:
> 
> /usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
> [Timer]
> OnCalendar=<daily specific calendar spec>

We should definitely do that. Using systemd's random delay does not 
prevent the different maintenance jobs from running concurrently as one 
job may be started before a previous job has finished. It is important 
to only have one job running at a time because the first thing "git 
maintenance run" does is to try and acquire a lock file so if the hourly 
job is running when the daily jobs tries to start the daily job will not 
be run. As the daily job is a superset of the hourly job and the weekly 
job is a superset of the daily job so it does not make sense to run more 
than one job per hour.

> The cleanup code for the units written in $XDG_CONFIG_HOME is adapted,
> and takes care of not removing legitimate user overrides, by checking
> the file start.

This points to an alternate strategy for supporting user overrides - 
don't overwrite the unit files if the user has edited them. I think that 
there is still a benefit to moving to system wide unit files though as 
it means that if we improve the unit files in the future systemd will 
pick up these improvements automatically. That is an improvement over 
the status quo where the users' unit files are written once and never 
updated.

I think it would help to reorder the changes in this series as follows:

1 - simplify the invocation of "systemctl --user"
   This would be the current patch 3 without adding "--force" or
   moving "--now" combined with the relevant test changes from patch 6.
   This would make it clear that those changes are a simple clean up that
   is independent of the other changes made in this series.

2 - don't delete user edited unit files
   This would be based on the current patch 4 and would show that we can
   avoid deleting unit files that the user has edited without the other
   changes in this series. This change should have an associated test.

3 - start using systemd's random delay function
   This would be the current patch 1 without the template changes and the
   commit message should explain that it is in preparation for disturbing
   system-wide unit files.

4 - install system-wide systemd unit files
   This would be based on the current patch 2 with the addition of
   overrides to prevent more than one job running per hour. The unit
   files should be installed under $XDG_DATA_HOME when $(prefix) starts
   with $(HOME), not just when they are equal. The associated test
   changes from patch 6 should be moved here as well as the "--force"
   change from patch 3.

5 - documentation updates
   I'm on the fence about having these in a separate commit like the
   current patch 5 or updating the documentation when the code is
   changed.

Best Wishes

Phillip

> Testing:
> The simplest way to simulate having the units in /usr/lib is probably to
> copy them in /etc/systemd/user.
> 
> Changes since v1:
> - Reorganization of the commits and their messages to try to address
>    review comments
> - Dropped the DON'T APPLY PATCH, added a TODO to the cleanup code
>    instead
> - Updated the git-maintenance tests to work with the new logic.
> - Conditional installation of the units files
> - Fixing some style/consistency issues
> - template the systemd service file to use $(bindir)
> 
> Max Gautier (6):
>    maintenance: use systemd timers builtin randomization
>    maintenance: use packaged systemd units
>    maintenance: simplify systemctl calls
>    maintenance: cleanup $XDG_CONFIG_HOME/systemd/user
>    maintenance: update systemd scheduler docs
>    maintenance: update tests for systemd scheduler
> 
>   Documentation/git-maintenance.txt        |  33 ++-
>   Makefile                                 |   5 +
>   builtin/gc.c                             | 298 ++++-------------------
>   config.mak.uname                         |  10 +
>   systemd/user/git-maintenance@.service.in |  17 ++
>   systemd/user/git-maintenance@.timer      |  12 +
>   t/t7900-maintenance.sh                   |  50 ++--
>   7 files changed, 126 insertions(+), 299 deletions(-)
>   create mode 100644 systemd/user/git-maintenance@.service.in
>   create mode 100644 systemd/user/git-maintenance@.timer
> 
> Range-diff against v1:
> 1:  ea54a6e50e < -:  ---------- maintenance: package systemd units
> 2:  b29dbb9fdd < -:  ---------- maintenance: use packaged systemd units
> 3:  47bd6712b8 < -:  ---------- maintenance: add fixed random delay to systemd timers
> -:  ---------- > 1:  42d88c7f81 maintenance: use systemd timers builtin randomization
> -:  ---------- > 2:  18d51b1dd1 maintenance: use packaged systemd units
> -:  ---------- > 3:  3aa7446e95 maintenance: simplify systemctl calls
> -:  ---------- > 4:  daff7b4d60 maintenance: cleanup $XDG_CONFIG_HOME/systemd/user
> 4:  fac57db55e ! 5:  5f6a8e141f maintenance: update systemd scheduler docs
>      @@ Metadata
>        ## Commit message ##
>           maintenance: update systemd scheduler docs
>       
>      +    The `git maintenance` systemd scheduler no longer writes units in
>      +    $XDG_CONFIG_HOME.
>      +
>      +    Describe the new behavior.
>      +    Instead of explaining manual ways to modify the timer, suggest the
>      +    systemd standard tool: `systemctl edit`.
>      +
>           Signed-off-by: Max Gautier <mg@max.gautier.name>
>       
>        ## Documentation/git-maintenance.txt ##
> 5:  d888fbd0c3 < -:  ---------- DON'T APPLY YET: maintenance: remove cleanup code
> -:  ---------- > 6:  4d4bcd6233 maintenance: update tests for systemd scheduler
Eric Sunshine March 24, 2024, 5:03 p.m. UTC | #2
On Sun, Mar 24, 2024 at 10:55 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
> 5 - documentation updates
>    I'm on the fence about having these in a separate commit like the
>    current patch 5 or updating the documentation when the code is
>    changed.

It's generally more reviewer-friendly to bundle documentation change
into the patch which changes the observable behavior. This way, a
reviewer has the behavior change fresh in mind and can verify that the
revised documentation matches the new implementation. Same goes for
revising tests in the same patch which changes behavior (though, of
course, revising tests at the same time as the change of behavior is
also mandatory for maintaining bisectability).
Max Gautier March 25, 2024, 8:32 a.m. UTC | #3
On Sun, Mar 24, 2024 at 02:54:58PM +0000, Phillip Wood wrote:
> Hi Max
> 
> On 22/03/2024 22:11, Max Gautier wrote:
> > * Distribute the systemd timers used by the `git maintenance start` with
> >    the systemd scheduler as part of git, rather than writing them in
> >    $XDG_CONFIG_HOME.
> > 
> > This allows users to override the units if they wish, and is more
> > in-line with the usual practices of distribution for systemd units.
> 
> Thanks for suggesting this, I think this is a useful change, but the
> implementation could be improved.
> 
> > We also move away from using the random minute, and instead rely on
> > systemd features to achieve the same goal (see patch 2). This allows us
> > to go back to using unit templating for the timers. This is also a
> > prerequisite to have static unit files.
> > 
> > Note that even if we really need more specific OnCalendar= settings for
> > each timer, we should still do it that way, but instead distribute
> > override alongside the template, for instance for weekly:
> > 
> > /usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
> > [Timer]
> > OnCalendar=<daily specific calendar spec>
> 
> We should definitely do that. Using systemd's random delay does not prevent
> the different maintenance jobs from running concurrently as one job may be
> started before a previous job has finished. It is important to only have one
> job running at a time because the first thing "git maintenance run" does is
> to try and acquire a lock file so if the hourly job is running when the
> daily jobs tries to start the daily job will not be run.

Thinking about that, it occurs to me that the current scheme does not
prevent concurrent execution either: the timers all use Persistent=true,
which means they can fire concurrently on machine boot, if two or more
would have been triggered during the time the machine was powered off
(or just the user logged out, since it's a user unit).

So maybe there should be a more robust mechanism to avoid concurrent
execution ? I assume from what you say above the lock is acquired in a
non-blocking way. Could going to a blocking one be a solution ?

> As the daily job is
> a superset of the hourly job and the weekly job is a superset of the daily
> job so it does not make sense to run more than one job per hour.

Is that set in stone, or could they perform disjoint set of tasks
instead ?

> 
> > The cleanup code for the units written in $XDG_CONFIG_HOME is adapted,
> > and takes care of not removing legitimate user overrides, by checking
> > the file start.
> 
> This points to an alternate strategy for supporting user overrides - don't
> overwrite the unit files if the user has edited them. I think that there is
> still a benefit to moving to system wide unit files though as it means that
> if we improve the unit files in the future systemd will pick up these
> improvements automatically. That is an improvement over the status quo where
> the users' unit files are written once and never updated.
> 
> I think it would help to reorder the changes in this series as follows:
> 
> 1 - simplify the invocation of "systemctl --user"
>   This would be the current patch 3 without adding "--force" or
>   moving "--now" combined with the relevant test changes from patch 6.
>   This would make it clear that those changes are a simple clean up that
>   is independent of the other changes made in this series.
> 
> 2 - don't delete user edited unit files
>   This would be based on the current patch 4 and would show that we can
>   avoid deleting unit files that the user has edited without the other
>   changes in this series. This change should have an associated test.
> 
> 3 - start using systemd's random delay function
>   This would be the current patch 1 without the template changes and the
>   commit message should explain that it is in preparation for disturbing
>   system-wide unit files.
> 
> 4 - install system-wide systemd unit files
>   This would be based on the current patch 2 with the addition of
>   overrides to prevent more than one job running per hour. The unit
>   files should be installed under $XDG_DATA_HOME when $(prefix) starts
>   with $(HOME), not just when they are equal. The associated test
>   changes from patch 6 should be moved here as well as the "--force"
>   change from patch 3.
> 
> 5 - documentation updates
>   I'm on the fence about having these in a separate commit like the
>   current patch 5 or updating the documentation when the code is
>   changed.

I had started cooking v3, I'll take into account, thanks !
Phillip Wood March 25, 2024, 10:06 a.m. UTC | #4
Hi Max

On 25/03/2024 08:32, Max Gautier wrote:
> On Sun, Mar 24, 2024 at 02:54:58PM +0000, Phillip Wood wrote:
>> Hi Max
>>
>> On 22/03/2024 22:11, Max Gautier wrote:
>>> * Distribute the systemd timers used by the `git maintenance start` with
>>>     the systemd scheduler as part of git, rather than writing them in
>>>     $XDG_CONFIG_HOME.
>>>
>>> This allows users to override the units if they wish, and is more
>>> in-line with the usual practices of distribution for systemd units.
>>
>> Thanks for suggesting this, I think this is a useful change, but the
>> implementation could be improved.
>>
>>> We also move away from using the random minute, and instead rely on
>>> systemd features to achieve the same goal (see patch 2). This allows us
>>> to go back to using unit templating for the timers. This is also a
>>> prerequisite to have static unit files.
>>>
>>> Note that even if we really need more specific OnCalendar= settings for
>>> each timer, we should still do it that way, but instead distribute
>>> override alongside the template, for instance for weekly:
>>>
>>> /usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
>>> [Timer]
>>> OnCalendar=<daily specific calendar spec>
>>
>> We should definitely do that. Using systemd's random delay does not prevent
>> the different maintenance jobs from running concurrently as one job may be
>> started before a previous job has finished. It is important to only have one
>> job running at a time because the first thing "git maintenance run" does is
>> to try and acquire a lock file so if the hourly job is running when the
>> daily jobs tries to start the daily job will not be run.
> 
> Thinking about that, it occurs to me that the current scheme does not
> prevent concurrent execution either: the timers all use Persistent=true,
> which means they can fire concurrently on machine boot, if two or more
> would have been triggered during the time the machine was powered off
> (or just the user logged out, since it's a user unit).

Interesting, I wonder if the other schedulers suffer from the same problem.

> So maybe there should be a more robust mechanism to avoid concurrent
> execution ? I assume from what you say above the lock is acquired in a
> non-blocking way. Could going to a blocking one be a solution ?

It is possible to wait on a lock file but I'd be worried about building 
up an endless queue of processes if the process holding the lock file 
crashed leaving it in place without anyway to automatically remove it.

I don't think we need to solve that problem as part of this patch series 
but we should take care not to make it worse. Long term we may be better 
scheduling a single job and have "git maintenance run" decide which jobs 
to run based on the last time it run, rather than trying to schedule 
different jobs with the os scheduler.

>> As the daily job is
>> a superset of the hourly job and the weekly job is a superset of the daily
>> job so it does not make sense to run more than one job per hour.
> 
> Is that set in stone, or could they perform disjoint set of tasks
> instead ?

All of the schedulers are set up to run a single job each hour, I don't 
see why we'd start running disjoint sets of tasks in the different jobs.

>>> The cleanup code for the units written in $XDG_CONFIG_HOME is adapted,
>>> and takes care of not removing legitimate user overrides, by checking
>>> the file start.
>>
>> This points to an alternate strategy for supporting user overrides - don't
>> overwrite the unit files if the user has edited them. I think that there is
>> still a benefit to moving to system wide unit files though as it means that
>> if we improve the unit files in the future systemd will pick up these
>> improvements automatically. That is an improvement over the status quo where
>> the users' unit files are written once and never updated.
>>
>> I think it would help to reorder the changes in this series as follows:
>>
>> 1 - simplify the invocation of "systemctl --user"
>>    This would be the current patch 3 without adding "--force" or
>>    moving "--now" combined with the relevant test changes from patch 6.
>>    This would make it clear that those changes are a simple clean up that
>>    is independent of the other changes made in this series.
>>
>> 2 - don't delete user edited unit files
>>    This would be based on the current patch 4 and would show that we can
>>    avoid deleting unit files that the user has edited without the other
>>    changes in this series. This change should have an associated test.
>>
>> 3 - start using systemd's random delay function
>>    This would be the current patch 1 without the template changes and the
>>    commit message should explain that it is in preparation for disturbing
>>    system-wide unit files.
>>
>> 4 - install system-wide systemd unit files
>>    This would be based on the current patch 2 with the addition of
>>    overrides to prevent more than one job running per hour. The unit
>>    files should be installed under $XDG_DATA_HOME when $(prefix) starts
>>    with $(HOME), not just when they are equal. The associated test
>>    changes from patch 6 should be moved here as well as the "--force"
>>    change from patch 3.
>>
>> 5 - documentation updates
>>    I'm on the fence about having these in a separate commit like the
>>    current patch 5 or updating the documentation when the code is
>>    changed.
> 
> I had started cooking v3, I'll take into account, thanks !

Thanks

Phillip
Phillip Wood March 25, 2024, 10:08 a.m. UTC | #5
On 24/03/2024 17:03, Eric Sunshine wrote:
> On Sun, Mar 24, 2024 at 10:55 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>> 5 - documentation updates
>>     I'm on the fence about having these in a separate commit like the
>>     current patch 5 or updating the documentation when the code is
>>     changed.
> 
> It's generally more reviewer-friendly to bundle documentation change
> into the patch which changes the observable behavior. This way, a
> reviewer has the behavior change fresh in mind and can verify that the
> revised documentation matches the new implementation. Same goes for
> revising tests in the same patch which changes behavior (though, of
> course, revising tests at the same time as the change of behavior is
> also mandatory for maintaining bisectability).

Good point, I think a couple of the documentation changes like 
recommending "systemctl --user edit" were improving the existing docs 
and so they should be in a separate commit at the start of the series. 
The other patches should update the documentation as the code changes.

Thanks

Phillip
Max Gautier March 25, 2024, 12:27 p.m. UTC | #6
On Mon, Mar 25, 2024 at 10:06:29AM +0000, phillip.wood123@gmail.com wrote:
> Hi Max
> 
> On 25/03/2024 08:32, Max Gautier wrote:
> > On Sun, Mar 24, 2024 at 02:54:58PM +0000, Phillip Wood wrote:
> > > Hi Max
> > > 
> > > On 22/03/2024 22:11, Max Gautier wrote:
> > > > * Distribute the systemd timers used by the `git maintenance start` with
> > > >     the systemd scheduler as part of git, rather than writing them in
> > > >     $XDG_CONFIG_HOME.
> > > > 
> > > > This allows users to override the units if they wish, and is more
> > > > in-line with the usual practices of distribution for systemd units.
> > > 
> > > Thanks for suggesting this, I think this is a useful change, but the
> > > implementation could be improved.
> > > 
> > > > We also move away from using the random minute, and instead rely on
> > > > systemd features to achieve the same goal (see patch 2). This allows us
> > > > to go back to using unit templating for the timers. This is also a
> > > > prerequisite to have static unit files.
> > > > 
> > > > Note that even if we really need more specific OnCalendar= settings for
> > > > each timer, we should still do it that way, but instead distribute
> > > > override alongside the template, for instance for weekly:
> > > > 
> > > > /usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
> > > > [Timer]
> > > > OnCalendar=<daily specific calendar spec>
> > > 
> > > We should definitely do that. Using systemd's random delay does not prevent
> > > the different maintenance jobs from running concurrently as one job may be
> > > started before a previous job has finished. It is important to only have one
> > > job running at a time because the first thing "git maintenance run" does is
> > > to try and acquire a lock file so if the hourly job is running when the
> > > daily jobs tries to start the daily job will not be run.
> > 
> > Thinking about that, it occurs to me that the current scheme does not
> > prevent concurrent execution either: the timers all use Persistent=true,
> > which means they can fire concurrently on machine boot, if two or more
> > would have been triggered during the time the machine was powered off
> > (or just the user logged out, since it's a user unit).
> 
> Interesting, I wonder if the other schedulers suffer from the same problem.

From what I can find (didn't dig much):
- cron does not have the problem, because it will just miss the timers
  if the machine was powered off. Not really better ^
  - anacron though is another implementation of cron which apparently
    support that semantic and is the default on ubuntu [1]
    I can't find if there is something to avoid the same problem that
    Persitent=true imply
- same goes for launchctl (Effect of Sleeping and Powering Off at the
  bottom of the page [2])
- for schtasks it's apparently possible to have a similar mechanism than
  Persistent [3]. There is a policy apparently to handle multiples
  instances [4] but I'm not completely sure whether or not theses
  instances can have different parameters.
  It's currently defined that way for the schtasks scheduler:
  "<MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>\n". I
  don't think it would prevent parallel execution between the different
  schedule though, it seems to me they are different tasks.

[1]: https://serverfault.com/a/52338
[2]: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html
[3]: https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/scheduled-task-not-run-upon-reboot-machine-off
[4]: https://learn.microsoft.com/en-us/windows/win32/taskschd/tasksettings-multipleinstances

> 
> > So maybe there should be a more robust mechanism to avoid concurrent
> > execution ? I assume from what you say above the lock is acquired in a
> > non-blocking way. Could going to a blocking one be a solution ?
> 
> It is possible to wait on a lock file but I'd be worried about building up
> an endless queue of processes if the process holding the lock file crashed
> leaving it in place without anyway to automatically remove it.
> 

At least with systemd we have some mechanisms to deal with that.
- systemd timers don't touch an already running unit, so that won't
  trigger a new hourly or daily if the previous one is still running.
- for the automatically removing it, we could:
  - use XDG_RUNTIME_DIR ("%t" in systemd units) which is removed on
    logout
  - optionally add a tmpfiles fragments to delete locks which are really
    too old (tmpfiles won't delete files on which a lock is taken)
  - I thought about using a common RuntimeDirectory (see systemd.exec),
    but this is not possible due to [5]


[5]: https://github.com/systemd/systemd/issues/5394

> I don't think we need to solve that problem as part of this patch series but
> we should take care not to make it worse. Long term we may be better
> scheduling a single job and have "git maintenance run" decide which jobs to
> run based on the last time it run, rather than trying to schedule different
> jobs with the os scheduler.
> 
> > > As the daily job is
> > > a superset of the hourly job and the weekly job is a superset of the daily
> > > job so it does not make sense to run more than one job per hour.
> > 
> > Is that set in stone, or could they perform disjoint set of tasks
> > instead ?
> 
> All of the schedulers are set up to run a single job each hour, I don't see
> why we'd start running disjoint sets of tasks in the different jobs.

I was wondering if running distinct tasks would allow overlapping
execution, or if the different tasks are not safe to run concurrently
anyway. I'm not familiar enough with them and the git internals to tell.

Another option if the tasks set was distinct for each service instance
would be to use dependencies and ordering directives like this:
weekly.service
```
[Unit]
Requires=daily.service
After=daily.service

[Service]
ExecStart=<run only weekly stuff>
```

daily.service
```
[Unit]
Requires=hourly.service
After=hourly.service

[Service]
ExecStart=<run only daily stuff>
```

hourly.service
```
[Service]
ExecStart=<run only hourly stuff>
```

That would avoid concurrent execution I think.
Max Gautier March 25, 2024, 1:45 p.m. UTC | #7
On Mon, Mar 25, 2024 at 10:06:29AM +0000, phillip.wood123@gmail.com wrote:
> 
> It is possible to wait on a lock file but I'd be worried about building up
> an endless queue of processes if the process holding the lock file crashed
> leaving it in place without anyway to automatically remove it.

Side question: we can't use flock(2) in git, or lockf(3) ? The latter is
in POSIX.
Phillip Wood March 25, 2024, 4:39 p.m. UTC | #8
On 25/03/2024 12:27, Max Gautier wrote:
> On Mon, Mar 25, 2024 at 10:06:29AM +0000, phillip.wood123@gmail.com wrote:
>>>>> Note that even if we really need more specific OnCalendar= settings for
>>>>> each timer, we should still do it that way, but instead distribute
>>>>> override alongside the template, for instance for weekly:
>>>>>
>>>>> /usr/lib/systemd-user/git-maintenance@daily.timer.d/override.conf:
>>>>> [Timer]
>>>>> OnCalendar=<daily specific calendar spec>
>>>>
>>>> We should definitely do that. Using systemd's random delay does not prevent
>>>> the different maintenance jobs from running concurrently as one job may be
>>>> started before a previous job has finished. It is important to only have one
>>>> job running at a time because the first thing "git maintenance run" does is
>>>> to try and acquire a lock file so if the hourly job is running when the
>>>> daily jobs tries to start the daily job will not be run.
>>>
>>> Thinking about that, it occurs to me that the current scheme does not
>>> prevent concurrent execution either: the timers all use Persistent=true,
>>> which means they can fire concurrently on machine boot, if two or more
>>> would have been triggered during the time the machine was powered off
>>> (or just the user logged out, since it's a user unit).
>>
>> Interesting, I wonder if the other schedulers suffer from the same problem.
> 
>  From what I can find (didn't dig much):

Thanks for looking at this

> - cron does not have the problem, because it will just miss the timers
>    if the machine was powered off. Not really better ^

Yes, skipping the jobs is not great. On debian at least the job will be 
run if it is less than three hours since it should have been run. See
https://manpages.debian.org/bookworm/cron/cron.8.en.html

>    - anacron though is another implementation of cron which apparently
>      support that semantic and is the default on ubuntu [1]
>      I can't find if there is something to avoid the same problem that
>      Persitent=true imply

> - same goes for launchctl (Effect of Sleeping and Powering Off at the
>    bottom of the page [2])

As I read it the job is rescheduled if the computer was asleep when it 
should have run, but not if it was powered off.

> - for schtasks it's apparently possible to have a similar mechanism than
>    Persistent [3]. There is a policy apparently to handle multiples
>    instances [4] but I'm not completely sure whether or not theses
>    instances can have different parameters.
>    It's currently defined that way for the schtasks scheduler:
>    "<MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>\n". I
>    don't think it would prevent parallel execution between the different
>    schedule though, it seems to me they are different tasks.
> 
> [1]: https://serverfault.com/a/52338
> [2]: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html
> [3]: https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/scheduled-task-not-run-upon-reboot-machine-off
> [4]: https://learn.microsoft.com/en-us/windows/win32/taskschd/tasksettings-multipleinstances

We should have a think about what to do about this once your patches to 
move to system-wide unit files are merged. We'll need to come up with 
something that works for all the schedulers so that we don't miss the 
daily and weekly jobs when the computer is powered off and ensures we 
don't run concurrent jobs.

Best Wishes

Phillip

>>> So maybe there should be a more robust mechanism to avoid concurrent
>>> execution ? I assume from what you say above the lock is acquired in a
>>> non-blocking way. Could going to a blocking one be a solution ?
>>
>> It is possible to wait on a lock file but I'd be worried about building up
>> an endless queue of processes if the process holding the lock file crashed
>> leaving it in place without anyway to automatically remove it.
>>
> 
> At least with systemd we have some mechanisms to deal with that.
> - systemd timers don't touch an already running unit, so that won't
>    trigger a new hourly or daily if the previous one is still running.
> - for the automatically removing it, we could:
>    - use XDG_RUNTIME_DIR ("%t" in systemd units) which is removed on
>      logout
>    - optionally add a tmpfiles fragments to delete locks which are really
>      too old (tmpfiles won't delete files on which a lock is taken)
>    - I thought about using a common RuntimeDirectory (see systemd.exec),
>      but this is not possible due to [5]
> 
> 
> [5]: https://github.com/systemd/systemd/issues/5394
> 
>> I don't think we need to solve that problem as part of this patch series but
>> we should take care not to make it worse. Long term we may be better
>> scheduling a single job and have "git maintenance run" decide which jobs to
>> run based on the last time it run, rather than trying to schedule different
>> jobs with the os scheduler.
>>
>>>> As the daily job is
>>>> a superset of the hourly job and the weekly job is a superset of the daily
>>>> job so it does not make sense to run more than one job per hour.
>>>
>>> Is that set in stone, or could they perform disjoint set of tasks
>>> instead ?
>>
>> All of the schedulers are set up to run a single job each hour, I don't see
>> why we'd start running disjoint sets of tasks in the different jobs.
> 
> I was wondering if running distinct tasks would allow overlapping
> execution, or if the different tasks are not safe to run concurrently
> anyway. I'm not familiar enough with them and the git internals to tell.
> 
> Another option if the tasks set was distinct for each service instance
> would be to use dependencies and ordering directives like this:
> weekly.service
> ```
> [Unit]
> Requires=daily.service
> After=daily.service
> 
> [Service]
> ExecStart=<run only weekly stuff>
> ```
> 
> daily.service
> ```
> [Unit]
> Requires=hourly.service
> After=hourly.service
> 
> [Service]
> ExecStart=<run only daily stuff>
> ```
> 
> hourly.service
> ```
> [Service]
> ExecStart=<run only hourly stuff>
> ```
> 
> That would avoid concurrent execution I think.
>
Phillip Wood March 25, 2024, 4:39 p.m. UTC | #9
On 25/03/2024 13:45, Max Gautier wrote:
> On Mon, Mar 25, 2024 at 10:06:29AM +0000, phillip.wood123@gmail.com wrote:
>>
>> It is possible to wait on a lock file but I'd be worried about building up
>> an endless queue of processes if the process holding the lock file crashed
>> leaving it in place without anyway to automatically remove it.
> 
> Side question: we can't use flock(2) in git, or lockf(3) ? The latter is
> in POSIX.

We need something cross-platform, I don't think there is any appetite to 
move away from our current lock file implementation. We can think about 
how to handle concurrent maintenance jobs once these patches are merged.

Best Wishes

Phillip
Max Gautier March 27, 2024, 4:21 p.m. UTC | #10
Le 25 mars 2024 17:39:47 GMT+01:00, Phillip Wood <phillip.wood123@gmail.com> a écrit :
>On 25/03/2024 13:45, Max Gautier wrote:
>> On Mon, Mar 25, 2024 at 10:06:29AM +0000, phillip.wood123@gmail.com wrote:
>>> 
>>> It is possible to wait on a lock file but I'd be worried about building up
>>> an endless queue of processes if the process holding the lock file crashed
>>> leaving it in place without anyway to automatically remove it.
>> 
>> Side question: we can't use flock(2) in git, or lockf(3) ? The latter is
>> in POSIX.
>
>We need something cross-platform, I don't think there is any appetite to move away from our current lock file implementation. We can think about how to handle concurrent maintenance jobs once these patches are merged.
>
>Best Wishes
>
>Phillip
>
>

Ack, I'll leave that concern aside for now.