diff mbox series

[1/6] maintenance: add get_random_minute()

Message ID fefdaa9457948ee5302e7cbfaae250e0b589d752.1691434300.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series maintenance: schedule maintenance on a random minute | expand

Commit Message

Derrick Stolee Aug. 7, 2023, 6:51 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

When we initially created background maintenance -- with its hourly,
daily, and weekly schedules -- we considered the effects of all clients
launching fetches to the server every hour on the hour. The worry of
DDoSing server hosts was noted, but left as something we would consider
for a future update.

As background maintenance has gained more adoption over the past three
years, our worries about DDoSing the big Git hosts has been unfounded.
Those systems, especially those serving public repositories, are already
resilient to thundering herds of much smaller scale.

However, sometimes organizations spin up specific custom server
infrastructure either in addition to or on top of their Git host. Some
of these technologies are built for a different range of scale, and can
hit concurrency limits sooner. Organizations with such custom
infrastructures are more likely to recommend tools like `scalar` which
furthers their adoption of background maintenance.

To help solve for this, create get_random_minute() as a method to help
Git select a random minute when creating schedules in the future. The
integrations with this method do not yet exist, but will follow in
future changes.

One thing that is important for testability is that we notice when we
are under a test scenario and return a predictable result. The schedules
themselves are not checked for this value, but at least one launchctl
test checks that we do not unnecessarily reboot the schedule if it has
not changed from a previous version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/gc.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Taylor Blau Aug. 7, 2023, 9:20 p.m. UTC | #1
On Mon, Aug 07, 2023 at 06:51:35PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/gc.c b/builtin/gc.c
> index f3942188a61..66a972bc292 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1708,6 +1708,23 @@ static int get_schedule_cmd(const char **cmd, int *is_available)
>  	return 1;
>  }
>
> +MAYBE_UNUSED
> +static int get_random_minute(void)
> +{
> +	static int random_initialized = 0;
> +
> +	/* Use a static value when under tests. */
> +	if (!getenv("GIT_TEST_MAINTENANCE_SCHEDULER"))
> +		return 13;
> +
> +	if (!random_initialized) {
> +		srand((unsigned int)getpid());
> +		random_initialized = 1;
> +	}

I was wondering where else we call srand() within Git, and it looks like
the only other spot is in `lock_file_timeout()`.

I doubt it, but is there a chance that that code depends on only calling
srand() once? I think the answer is "no", since we only use rand()
within that function to generate a random-ish backoff period, so I think
the repeatability of it doesn't matter all that much.

So I think this is kind of outside the scope of your series, but I
wonder if we should have a git_rand() that automatically initializes the
PRNG with the value of getpid()? Then multiple callers can grab random
values at will without reinitializing the PRNG.

Thanks,
Taylor
Junio C Hamano Aug. 7, 2023, 11:53 p.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> I was wondering where else we call srand() within Git, and it looks like
> the only other spot is in `lock_file_timeout()`.

lock_file_timeout() should be updated to match git_mkstemps_mode(),
which was taught to use the csprng_bytes() function with 47efda96
(wrapper: use a CSPRNG to generate random file names, 2022-01-17),
and this new caller may want to do so as well, perhaps?  I dunno,
but the caller then does not have to worry about "initializing it
just once".
Junio C Hamano Aug. 8, 2023, 12:22 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Taylor Blau <me@ttaylorr.com> writes:
>
>> I was wondering where else we call srand() within Git, and it looks like
>> the only other spot is in `lock_file_timeout()`.
>
> lock_file_timeout() should be updated to match git_mkstemps_mode(),
> which was taught to use the csprng_bytes() function with 47efda96
> (wrapper: use a CSPRNG to generate random file names, 2022-01-17),
> and this new caller may want to do so as well, perhaps?  I dunno,
> but the caller then does not have to worry about "initializing it
> just once".

Of course, the obvious downside is that crypto-secure one may be,
unlike for its use in mkstemps(), way overkill for lockfiles and
cron dispersion purposes, as these codepaths are not on the target
surface.
Taylor Blau Aug. 8, 2023, 2:48 p.m. UTC | #4
On Mon, Aug 07, 2023 at 05:22:49PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> I was wondering where else we call srand() within Git, and it looks like
> >> the only other spot is in `lock_file_timeout()`.
> >
> > lock_file_timeout() should be updated to match git_mkstemps_mode(),
> > which was taught to use the csprng_bytes() function with 47efda96
> > (wrapper: use a CSPRNG to generate random file names, 2022-01-17),
> > and this new caller may want to do so as well, perhaps?  I dunno,
> > but the caller then does not have to worry about "initializing it
> > just once".
>
> Of course, the obvious downside is that crypto-secure one may be,
> unlike for its use in mkstemps(), way overkill for lockfiles and
> cron dispersion purposes, as these codepaths are not on the target
> surface.

I think that's an acceptable price to pay here, since we can drop the
code to remember whether or not srand() has been called or not. Here's a
patch that we could take in that direction:

--- 8< ---
Subject: [PATCH] lockfile.c: use a CSPRNG to generate backoff milliseconds

Upon failing to acquire the lockfile, `lock_file_timeout()` will try
again with an exponential backoff. This backoff includes some noise as a
multiplier over the default backoff behavior ranging from [0.75, 1.25].

It generates this noise via rand(3). Using a non-cryptographic source of
randomness here is OK, since a more trivial attack vector (holding the
file open via an external process for longer than the value of
`timeout_ms`) is easier to exploit.

That all said, `lock_file_timeout()` initializes the PRNG with
`srand()`. This has a couple of downsides:

  - lock_file_timeout() needs to remember whether or not the PRNG has
    or hasn't been seeded.

  - If a different function also calls `srand()`, the PRNG may start
    generating repeated values (if that caller also initialized the PRNG
    with `getpid()`).

Let's avoid both of these by using `csprng_bytes()`, in a similar spirit
as 47efda967c (wrapper: use a CSPRNG to generate random file names,
2022-01-17).

Using a CSPRNG is definitely overkill for noising a backoff window, but
it avoids the concerns about calling `srand()`, so let's use it here,
too.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 lockfile.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lockfile.c b/lockfile.c
index 1d5ed01682..6587d407f4 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -107,22 +107,17 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
 	int n = 1;
 	int multiplier = 1;
 	long remaining_ms = 0;
-	static int random_initialized = 0;

 	if (timeout_ms == 0)
 		return lock_file(lk, path, flags, mode);

-	if (!random_initialized) {
-		srand((unsigned int)getpid());
-		random_initialized = 1;
-	}
-
 	if (timeout_ms > 0)
 		remaining_ms = timeout_ms;

 	while (1) {
 		long backoff_ms, wait_ms;
 		int fd;
+		uint64_t rand;

 		fd = lock_file(lk, path, flags, mode);

@@ -135,7 +130,10 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,

 		backoff_ms = multiplier * INITIAL_BACKOFF_MS;
 		/* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
-		wait_ms = (750 + rand() % 500) * backoff_ms / 1000;
+		if (csprng_bytes(&rand, sizeof(uint64_t)) < 0)
+			return error_errno(_("unable to get random bytes for"
+					     "lockfile backoff"));
+		wait_ms = (750 + rand % 500) * backoff_ms / 1000;
 		sleep_millisec(wait_ms);
 		remaining_ms -= wait_ms;

--
2.42.0.rc0.26.g802d811bac.dirty

--- >8 ---

Thanks,
Taylor
Junio C Hamano Aug. 8, 2023, 4:34 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> I think that's an acceptable price to pay here, since we can drop the
> code to remember whether or not srand() has been called or not. Here's a
> patch that we could take in that direction:
> ...
>   - If a different function also calls `srand()`, the PRNG may start
>     generating repeated values (if that caller also initialized the PRNG
>     with `getpid()`).

Hmph, I didn't think about that one.  I do not think it can be a
viable attack vector to attack _this_ code, but the other function
might be.  But if the other function is worth attacking and
attackable, it ought to be using crypto-secure one and not srand(),
so this argument may not give us a good justification X-<.

Provided if code simplification is a good enough rationale, the
patch looks sensible, but I find its use of u64 a bit questionable
(though not wrong).  I would have expected that the type of "rand"
would be the same as backoff_ms and wait_ms, two variables involved
in the same expression.

Thanks.

> diff --git a/lockfile.c b/lockfile.c
> index 1d5ed01682..6587d407f4 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -107,22 +107,17 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
>  	int n = 1;
>  	int multiplier = 1;
>  	long remaining_ms = 0;
> -	static int random_initialized = 0;
>
>  	if (timeout_ms == 0)
>  		return lock_file(lk, path, flags, mode);
>
> -	if (!random_initialized) {
> -		srand((unsigned int)getpid());
> -		random_initialized = 1;
> -	}
> -
>  	if (timeout_ms > 0)
>  		remaining_ms = timeout_ms;
>
>  	while (1) {
>  		long backoff_ms, wait_ms;
>  		int fd;
> +		uint64_t rand;
>
>  		fd = lock_file(lk, path, flags, mode);
>
> @@ -135,7 +130,10 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
>
>  		backoff_ms = multiplier * INITIAL_BACKOFF_MS;
>  		/* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
> -		wait_ms = (750 + rand() % 500) * backoff_ms / 1000;
> +		if (csprng_bytes(&rand, sizeof(uint64_t)) < 0)
> +			return error_errno(_("unable to get random bytes for"
> +					     "lockfile backoff"));
> +		wait_ms = (750 + rand % 500) * backoff_ms / 1000;
>  		sleep_millisec(wait_ms);
>  		remaining_ms -= wait_ms;
>
> --
> 2.42.0.rc0.26.g802d811bac.dirty
>
> --- >8 ---
>
> Thanks,
> Taylor
Junio C Hamano Aug. 8, 2023, 4:49 p.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> Provided if code simplification is a good enough rationale, the
> patch looks sensible, but I find its use of u64 a bit questionable
> (though not wrong).  I would have expected that the type of "rand"
> would be the same as backoff_ms and wait_ms, two variables involved
> in the same expression.

Ah, not so fast.  The use of modulo means we need to be careful
about about the fuzzing factor going negative, and use of unsigned
type allows us to forget about it.

(fuzz % 250), when fuzz is of a signed random integral type, ranges
between -250 and +250 and because we want the center of our
distribution at 1000, so I think the following is equivalent.

 lockfile.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git c/lockfile.c w/lockfile.c
index 6587d407f4..7de53526ac 100644
--- c/lockfile.c
+++ w/lockfile.c
@@ -115,9 +115,8 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
 		remaining_ms = timeout_ms;
 
 	while (1) {
-		long backoff_ms, wait_ms;
+		long backoff_ms, wait_ms, fuzz;
 		int fd;
-		uint64_t rand;
 
 		fd = lock_file(lk, path, flags, mode);
 
@@ -130,10 +129,10 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
 
 		backoff_ms = multiplier * INITIAL_BACKOFF_MS;
 		/* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
-		if (csprng_bytes(&rand, sizeof(uint64_t)) < 0)
+		if (csprng_bytes(&fuzz, sizeof(fuzz)) < 0)
 			return error_errno(_("unable to get random bytes for"
 					     "lockfile backoff"));
-		wait_ms = (750 + rand % 500) * backoff_ms / 1000;
+		wait_ms = (1000 + fuzz % 250) * backoff_ms / 1000;
 		sleep_millisec(wait_ms);
 		remaining_ms -= wait_ms;
Derrick Stolee Aug. 8, 2023, 5:28 p.m. UTC | #7
On 8/7/2023 5:20 PM, Taylor Blau wrote:
> On Mon, Aug 07, 2023 at 06:51:35PM +0000, Derrick Stolee via GitGitGadget wrote:

>> +	if (!random_initialized) {
>> +		srand((unsigned int)getpid());
>> +		random_initialized = 1;
>> +	}
> 
> I was wondering where else we call srand() within Git, and it looks like
> the only other spot is in `lock_file_timeout()`.
> 
> I doubt it, but is there a chance that that code depends on only calling
> srand() once? I think the answer is "no", since we only use rand()
> within that function to generate a random-ish backoff period, so I think
> the repeatability of it doesn't matter all that much.

The main point would be to avoid the cost of spinning up the number
generator, but there is potential for a bug if we run both initializers:
the lockfile would re-use the same filenames after the generator is
reset.

> So I think this is kind of outside the scope of your series, but I
> wonder if we should have a git_rand() that automatically initializes the
> PRNG with the value of getpid()? Then multiple callers can grab random
> values at will without reinitializing the PRNG.

I see you're moving ahead with removing the srand() from the lockfile code,
so I'll focus on creating a `git_rand()` that centralizes the use of
srand(), but won't touch the code in the lockfile so your patch applies
independently.

Thanks,
-Stolee
Taylor Blau Aug. 8, 2023, 8:01 p.m. UTC | #8
On Tue, Aug 08, 2023 at 09:49:30AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Provided if code simplification is a good enough rationale, the
> > patch looks sensible, but I find its use of u64 a bit questionable
> > (though not wrong).  I would have expected that the type of "rand"
> > would be the same as backoff_ms and wait_ms, two variables involved
> > in the same expression.
>
> Ah, not so fast.  The use of modulo means we need to be careful
> about about the fuzzing factor going negative, and use of unsigned
> type allows us to forget about it.
>
> (fuzz % 250), when fuzz is of a signed random integral type, ranges
> between -250 and +250 and because we want the center of our
> distribution at 1000, so I think the following is equivalent.

[...]

> -		wait_ms = (750 + rand % 500) * backoff_ms / 1000;
> +		wait_ms = (1000 + fuzz % 250) * backoff_ms / 1000;

I was going to say that having rand be an unsigned type was the right
behavior, but with this change it can (and should) be signed. Thanks for
the simplification :-).

Thanks,
Taylor
Taylor Blau Aug. 8, 2023, 8:04 p.m. UTC | #9
On Tue, Aug 08, 2023 at 01:28:50PM -0400, Derrick Stolee wrote:
> > So I think this is kind of outside the scope of your series, but I
> > wonder if we should have a git_rand() that automatically initializes the
> > PRNG with the value of getpid()? Then multiple callers can grab random
> > values at will without reinitializing the PRNG.
>
> I see you're moving ahead with removing the srand() from the lockfile code,
> so I'll focus on creating a `git_rand()` that centralizes the use of
> srand(), but won't touch the code in the lockfile so your patch applies
> independently.

That thread may have progressed a little since you last looked at it.

Instead of using srand() and rand() (which would make sense to wrap with
git_rand() as you propose), we can simplify our lives by using a CSPRNG,
which only gets initialized once, as is already the case with
csprng_bytes().

I think Junio is picking up a lightly modified version of my patch
there, see [1].

Thanks,
Taylor

[1]: https://lore.kernel.org/git/ZNKfKs1mLQhnybvF@nand.local/
Derrick Stolee Aug. 9, 2023, 12:17 p.m. UTC | #10
On 8/8/2023 4:04 PM, Taylor Blau wrote:
> On Tue, Aug 08, 2023 at 01:28:50PM -0400, Derrick Stolee wrote:

>> I see you're moving ahead with removing the srand() from the lockfile code,
 
> That thread may have progressed a little since you last looked at it.

I think this part of my summary is still correct.

>> so I'll focus on creating a `git_rand()` that centralizes the use of
>> srand(), but won't touch the code in the lockfile so your patch applies
>> independently.
 
> Instead of using srand() and rand() (which would make sense to wrap with
> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
> which only gets initialized once, as is already the case with
> csprng_bytes().

So the idea is to use csprng_bytes() everywhere instead of srand()/rand().

I can adjust my local patch to still create git_rand(), but base it on
csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
is a simpler interface to consume.

Thanks,
-Stolee
Junio C Hamano Aug. 9, 2023, 6:50 p.m. UTC | #11
Derrick Stolee <derrickstolee@github.com> writes:

>> Instead of using srand() and rand() (which would make sense to wrap with
>> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
>> which only gets initialized once, as is already the case with
>> csprng_bytes().
>
> So the idea is to use csprng_bytes() everywhere instead of srand()/rand().
>
> I can adjust my local patch to still create git_rand(), but base it on
> csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
> is a simpler interface to consume.

I am still ambivalent about wasting entropy for something that
srand() would suffice, so git_rand() interface may be an welcome
addition anyway, that serves an extra layer of indirection to
insulate the callers from the implementation.

Thanks.
Taylor Blau Aug. 9, 2023, 8:34 p.m. UTC | #12
On Wed, Aug 09, 2023 at 11:50:38AM -0700, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
>
> >> Instead of using srand() and rand() (which would make sense to wrap with
> >> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
> >> which only gets initialized once, as is already the case with
> >> csprng_bytes().
> >
> > So the idea is to use csprng_bytes() everywhere instead of srand()/rand().
> >
> > I can adjust my local patch to still create git_rand(), but base it on
> > csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
> > is a simpler interface to consume.
>
> I am still ambivalent about wasting entropy for something that
> srand() would suffice, so git_rand() interface may be an welcome
> addition anyway, that serves an extra layer of indirection to
> insulate the callers from the implementation.

Sounds good to me, I'm not particularly attached to one implementation
over another. Thanks, both.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/gc.c b/builtin/gc.c
index f3942188a61..66a972bc292 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1708,6 +1708,23 @@  static int get_schedule_cmd(const char **cmd, int *is_available)
 	return 1;
 }
 
+MAYBE_UNUSED
+static int get_random_minute(void)
+{
+	static int random_initialized = 0;
+
+	/* Use a static value when under tests. */
+	if (!getenv("GIT_TEST_MAINTENANCE_SCHEDULER"))
+		return 13;
+
+	if (!random_initialized) {
+		srand((unsigned int)getpid());
+		random_initialized = 1;
+	}
+
+	return rand() % 60;
+}
+
 static int is_launchctl_available(void)
 {
 	const char *cmd = "launchctl";