diff mbox series

[v2,06/10] refs: skip collision checks in initial transactions

Message ID 20241120-pks-refs-optimize-migrations-v2-6-a233374b7452@pks.im (mailing list archive)
State Superseded
Commit e4929cdf797467b5d55a976ee938f1d4f358900e
Headers show
Series refs: optimize ref format migrations | expand

Commit Message

Patrick Steinhardt Nov. 20, 2024, 7:51 a.m. UTC
Reference transactions use `refs_verify_refname_available()` to check
for colliding references. This check consists of two parts:

  - Checks for whether multiple ref updates in the same transaction
    conflict with each other.

  - Checks for whether existing refs conflict with any refs part of the
    transaction.

While we generally cannot avoid the first check, the second check is
superfluous in cases where the transaction is an initial one in an
otherwise empty ref store. The check results in multiple ref reads as
well as the creation of a ref iterator for every ref we're checking,
which adds up quite fast when performing the check for many refs.

Introduce a new flag that allows us to skip this check and wire it up in
such that the backends pass it when running an initial transaction. This
leads to significant speedups when migrating ref storage backends. From
"files" to "reftable":

    Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     472.4 ms ±   6.7 ms    [User: 175.9 ms, System: 285.2 ms]
      Range (min … max):   463.5 ms … 483.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      86.1 ms ±   1.9 ms    [User: 67.9 ms, System: 16.0 ms]
      Range (min … max):    82.9 ms …  90.9 ms    29 runs

    Summary
      migrate files:reftable (refcount = 100000, revision = HEAD) ran
        5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~)

And from "reftable" to "files":

    Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     452.7 ms ±   3.4 ms    [User: 209.9 ms, System: 235.4 ms]
      Range (min … max):   445.9 ms … 457.5 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      95.2 ms ±   2.2 ms    [User: 73.6 ms, System: 20.6 ms]
      Range (min … max):    91.7 ms … 100.8 ms    28 runs

    Summary
      migrate reftable:files (refcount = 100000, revision = HEAD) ran
        4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                    | 37 +++++++++++++++++++++----------------
 refs.h                    |  5 ++++-
 refs/files-backend.c      | 13 ++++++-------
 refs/reftable-backend.c   |  6 ++++--
 t/helper/test-ref-store.c |  2 +-
 5 files changed, 36 insertions(+), 27 deletions(-)

Comments

Christian Couder Nov. 20, 2024, 10:21 a.m. UTC | #1
On Wed, Nov 20, 2024 at 8:54 AM Patrick Steinhardt <ps@pks.im> wrote:

> diff --git a/refs.c b/refs.c
> index 0f10c565bbb4e0d91210c52a3221a224e4264d28..d690eb19b3fd7083a5309deb98738547e4f48040 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2324,6 +2324,7 @@ int refs_verify_refname_available(struct ref_store *refs,
>                                   const char *refname,
>                                   const struct string_list *extras,
>                                   const struct string_list *skip,
> +                                 int initial_transaction,

Nit: Using 'unsigned int' instead of 'int' might be slightly better as
the type would be the same as "transaction->flags &
REF_TRANSACTION_FLAG_INITIAL" we pass as an argument below. It might
also make it slightly simpler to convert "int initial_transaction" to
"int flags" if we add more flags and need to do that in the future.
(Other functions add an 'int initial_transaction' argument which could
be an 'unsigned int' instead.)

>                                   struct strbuf *err)
Kristoffer Haugsbakk Nov. 20, 2024, 10:42 a.m. UTC | #2
Hi

On Wed, Nov 20, 2024, at 08:51, Patrick Steinhardt wrote:
> While we generally cannot avoid the first check, the second check is
> superfluous in cases where the transaction is an initial one in an
> otherwise empty ref store. The check results in multiple ref reads as
> well as the creation of a ref iterator for every ref we're checking,
> which adds up quite fast when performing the check for many refs.
>
> Introduce a new flag that allows us to skip this check and wire it up in
> such that the backends pass it when running an initial transaction. This

Missing word? “wire it up in such that”.

Maybe: “wire it up so that”.

> leads to significant speedups when migrating ref storage backends. From
> "files" to "reftable":
Patrick Steinhardt Nov. 25, 2024, 5:52 a.m. UTC | #3
On Wed, Nov 20, 2024 at 11:42:29AM +0100, Kristoffer Haugsbakk wrote:
> Hi
> 
> On Wed, Nov 20, 2024, at 08:51, Patrick Steinhardt wrote:
> > While we generally cannot avoid the first check, the second check is
> > superfluous in cases where the transaction is an initial one in an
> > otherwise empty ref store. The check results in multiple ref reads as
> > well as the creation of a ref iterator for every ref we're checking,
> > which adds up quite fast when performing the check for many refs.
> >
> > Introduce a new flag that allows us to skip this check and wire it up in
> > such that the backends pass it when running an initial transaction. This
> 
> Missing word? “wire it up in such that”.
> 
> Maybe: “wire it up so that”.

Ah, indeed. Will fix. Thanks!

Patrick
Patrick Steinhardt Nov. 25, 2024, 5:52 a.m. UTC | #4
On Wed, Nov 20, 2024 at 11:21:44AM +0100, Christian Couder wrote:
> On Wed, Nov 20, 2024 at 8:54 AM Patrick Steinhardt <ps@pks.im> wrote:
> 
> > diff --git a/refs.c b/refs.c
> > index 0f10c565bbb4e0d91210c52a3221a224e4264d28..d690eb19b3fd7083a5309deb98738547e4f48040 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2324,6 +2324,7 @@ int refs_verify_refname_available(struct ref_store *refs,
> >                                   const char *refname,
> >                                   const struct string_list *extras,
> >                                   const struct string_list *skip,
> > +                                 int initial_transaction,
> 
> Nit: Using 'unsigned int' instead of 'int' might be slightly better as
> the type would be the same as "transaction->flags &
> REF_TRANSACTION_FLAG_INITIAL" we pass as an argument below. It might
> also make it slightly simpler to convert "int initial_transaction" to
> "int flags" if we add more flags and need to do that in the future.
> (Other functions add an 'int initial_transaction' argument which could
> be an 'unsigned int' instead.)

Makes sense, will adapt.

Patrick
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 0f10c565bbb4e0d91210c52a3221a224e4264d28..d690eb19b3fd7083a5309deb98738547e4f48040 100644
--- a/refs.c
+++ b/refs.c
@@ -2324,6 +2324,7 @@  int refs_verify_refname_available(struct ref_store *refs,
 				  const char *refname,
 				  const struct string_list *extras,
 				  const struct string_list *skip,
+				  int initial_transaction,
 				  struct strbuf *err)
 {
 	const char *slash;
@@ -2332,8 +2333,6 @@  int refs_verify_refname_available(struct ref_store *refs,
 	struct strbuf referent = STRBUF_INIT;
 	struct object_id oid;
 	unsigned int type;
-	struct ref_iterator *iter;
-	int ok;
 	int ret = -1;
 
 	/*
@@ -2363,7 +2362,8 @@  int refs_verify_refname_available(struct ref_store *refs,
 		if (skip && string_list_has_string(skip, dirname.buf))
 			continue;
 
-		if (!refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+		if (!initial_transaction &&
+		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 				       &type, &ignore_errno)) {
 			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 				    dirname.buf, refname);
@@ -2388,21 +2388,26 @@  int refs_verify_refname_available(struct ref_store *refs,
 	strbuf_addstr(&dirname, refname + dirname.len);
 	strbuf_addch(&dirname, '/');
 
-	iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-				       DO_FOR_EACH_INCLUDE_BROKEN);
-	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-		if (skip &&
-		    string_list_has_string(skip, iter->refname))
-			continue;
+	if (!initial_transaction) {
+		struct ref_iterator *iter;
+		int ok;
 
-		strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-			    iter->refname, refname);
-		ref_iterator_abort(iter);
-		goto cleanup;
-	}
+		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+					       DO_FOR_EACH_INCLUDE_BROKEN);
+		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+			if (skip &&
+			    string_list_has_string(skip, iter->refname))
+				continue;
 
-	if (ok != ITER_DONE)
-		BUG("error while iterating over references");
+			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+				    iter->refname, refname);
+			ref_iterator_abort(iter);
+			goto cleanup;
+		}
+
+		if (ok != ITER_DONE)
+			BUG("error while iterating over references");
+	}
 
 	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
 	if (extra_refname)
diff --git a/refs.h b/refs.h
index 024a370554e013d66febee635e4c0415ba061fe6..980bd20cf24e15144aeff991eeba8b27a936d386 100644
--- a/refs.h
+++ b/refs.h
@@ -101,13 +101,16 @@  int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
  * both "foo" and with "foo/bar/baz" but not with "foo/bar" or
  * "foo/barbados".
  *
+ * If `initial_transaction` is truish, then all collision checks with
+ * preexisting refs are skipped.
+ *
  * extras and skip must be sorted.
  */
-
 int refs_verify_refname_available(struct ref_store *refs,
 				  const char *refname,
 				  const struct string_list *extras,
 				  const struct string_list *skip,
+				  int initial_transaction,
 				  struct strbuf *err);
 
 int refs_ref_exists(struct ref_store *refs, const char *refname);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 116d4259697b20583cb2db34ed47025e8781cd42..d27806c02c272f8bddc789b509e3c3c7af4f75aa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -706,7 +706,7 @@  static int lock_raw_ref(struct files_ref_store *refs,
 		 * reason to expect this error to be transitory.
 		 */
 		if (refs_verify_refname_available(&refs->base, refname,
-						  extras, NULL, err)) {
+						  extras, NULL, 0, err)) {
 			if (mustexist) {
 				/*
 				 * To the user the relevant error is
@@ -813,7 +813,7 @@  static int lock_raw_ref(struct files_ref_store *refs,
 							  REMOVE_DIR_EMPTY_ONLY)) {
 				if (refs_verify_refname_available(
 						    &refs->base, refname,
-						    extras, NULL, err)) {
+						    extras, NULL, 0, err)) {
 					/*
 					 * The error message set by
 					 * verify_refname_available() is OK.
@@ -850,7 +850,7 @@  static int lock_raw_ref(struct files_ref_store *refs,
 		 */
 		if (refs_verify_refname_available(
 				    refs->packed_ref_store, refname,
-				    extras, NULL, err)) {
+				    extras, NULL, 0, err)) {
 			ret = TRANSACTION_NAME_CONFLICT;
 			goto error_return;
 		}
@@ -1159,7 +1159,7 @@  static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 	 */
 	if (is_null_oid(&lock->old_oid) &&
 	    refs_verify_refname_available(refs->packed_ref_store, refname,
-					  NULL, NULL, err))
+					  NULL, NULL, 0, err))
 		goto error_return;
 
 	lock->ref_name = xstrdup(refname);
@@ -1538,7 +1538,7 @@  static int refs_rename_ref_available(struct ref_store *refs,
 
 	string_list_insert(&skip, old_refname);
 	ok = !refs_verify_refname_available(refs, new_refname,
-					    NULL, &skip, &err);
+					    NULL, &skip, 0, &err);
 	if (!ok)
 		error("%s", err.buf);
 
@@ -3043,8 +3043,7 @@  static int files_transaction_finish_initial(struct files_ref_store *refs,
 			BUG("initial ref transaction with old_sha1 set");
 
 		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL,
-						  err)) {
+						  &affected_refnames, NULL, 1, err)) {
 			ret = TRANSACTION_NAME_CONFLICT;
 			goto cleanup;
 		}
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8e914afc817f198bed3199630b430e179cabc740..bbc779ab410b41f00759a3a41a76dd708f115c90 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1097,7 +1097,9 @@  static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * at a later point.
 			 */
 			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL, err);
+							    &affected_refnames, NULL,
+							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+							    err);
 			if (ret < 0)
 				goto done;
 
@@ -1584,7 +1586,7 @@  static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	if (arg->delete_old)
 		string_list_insert(&skip, arg->oldname);
 	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
-					    NULL, &skip, &errbuf);
+					    NULL, &skip, 0, &errbuf);
 	if (ret < 0) {
 		error("%s", errbuf.buf);
 		goto done;
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 65346dee551ccd781a88786f0c8465f60b286cde..240f6fc29d7f1bb20658deee467bcb46ac3407b2 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -199,7 +199,7 @@  static int cmd_verify_ref(struct ref_store *refs, const char **argv)
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
-	ret = refs_verify_refname_available(refs, refname, NULL, NULL, &err);
+	ret = refs_verify_refname_available(refs, refname, NULL, NULL, 0, &err);
 	if (err.len)
 		puts(err.buf);
 	return ret;