diff mbox series

[GSoC,1/2] refs: setup ref consistency check infrastructure

Message ID 20240530122753.1114818-2-shejialuo@gmail.com (mailing list archive)
State Superseded
Headers show
Series ref consistency check infra setup | expand

Commit Message

shejialuo May 30, 2024, 12:27 p.m. UTC
Add a new interface `fsck_refs_fn` for the `ref_storage_be`. Implement
dummy method for files and reftable backends.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/fsck.c          | 5 +++++
 refs.c                  | 5 +++++
 refs.h                  | 5 +++++
 refs/files-backend.c    | 9 ++++++++-
 refs/packed-backend.c   | 7 +++++++
 refs/refs-internal.h    | 4 ++++
 refs/reftable-backend.c | 7 +++++++
 7 files changed, 41 insertions(+), 1 deletion(-)

Comments

Junio C Hamano May 31, 2024, 2:23 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5f3089d947..b6147c588b 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3299,6 +3299,11 @@ static int files_init_db(struct ref_store *ref_store,
>  	return 0;
>  }
>  
> +static int files_fsck(struct ref_store *ref_store)
> +{
> +	return 0;
> +}
> +
>  struct ref_storage_be refs_be_files = {
>  	.name = "files",
>  	.init = files_ref_store_create,
> @@ -3322,5 +3327,7 @@ struct ref_storage_be refs_be_files = {
>  	.reflog_exists = files_reflog_exists,
>  	.create_reflog = files_create_reflog,
>  	.delete_reflog = files_delete_reflog,
> -	.reflog_expire = files_reflog_expire
> +	.reflog_expire = files_reflog_expire,
> +
> +	.fsck = files_fsck,

What is the extra blank line doing there?  It makes reader wonder
why the .fsck member is somehow very special and different from
others.  Is there a valid reason to single it out (and no, "yes this
is special because I invented it" does not count as a valid reason)?

The same comment applies to a few other places in this patch.
shejialuo May 31, 2024, 3:20 p.m. UTC | #2
On Fri, May 31, 2024 at 07:23:27AM -0700, Junio C Hamano wrote:
> 
> What is the extra blank line doing there?  It makes reader wonder
> why the .fsck member is somehow very special and different from
> others.  Is there a valid reason to single it out (and no, "yes this
> is special because I invented it" does not count as a valid reason)?
> 
> The same comment applies to a few other places in this patch.

The interfaces defined in the `ref_storage_be` are carefully structured
in semantic. It is organized as the five parts at now:

  1. The name and the initialization interfaces.
  2. The ref transaction interfaces.
  3. The ref internal interfaces (pack, rename and copy).
  4. The ref filesystem interfaces.
  5. The reflog related interfaces.

I firstly thought that we could just add a new function into the ref
internal interfaces part with the name `check_refs_fn`. However, the
ultimate goal is not only to achieve consistency check of refs but also
consistency check of reflogs. So it's not suitable to name this
interface to `check_refs_fn`.

In order to keep consistent with the git-fsck, we decide to name this
interface to `fsck_fn`. This semantic cannot be grouped into any above
five categories in my view. So I deliberately add a blank line to
indicate its independence.
Patrick Steinhardt June 3, 2024, 7:22 a.m. UTC | #3
On Thu, May 30, 2024 at 08:27:52PM +0800, shejialuo wrote:
> Add a new interface `fsck_refs_fn` for the `ref_storage_be`. Implement
> dummy method for files and reftable backends.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  builtin/fsck.c          | 5 +++++
>  refs.c                  | 5 +++++
>  refs.h                  | 5 +++++
>  refs/files-backend.c    | 9 ++++++++-
>  refs/packed-backend.c   | 7 +++++++
>  refs/refs-internal.h    | 4 ++++
>  refs/reftable-backend.c | 7 +++++++
>  7 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d13a226c2e..65a26e2d1b 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -1065,6 +1065,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>  
>  	check_connectivity();
>  
> +	if (refs_fsck(get_main_ref_store(the_repository))) {
> +		error("ref database is corrupt");
> +		errors_found |= ERROR_REFS;
> +	}

One thing I was wondering is whether we want to make the infrastructure
part of the new git-refs(1) command, which is about to land via my
series that introduces ref backend migrations. We already make it a
habit that we spawn new processes here, as you can see in two lines
below for example. It would also make it trivial to execute ref checks
standalone, by saying e.g. `git refs verify`.

>  	if (the_repository->settings.core_commit_graph) {
>  		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
>  
[snip]
> diff --git a/refs.h b/refs.h
> index 34568ee1fb..2799820c40 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -544,6 +544,11 @@ int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_dat
>   */
>  int check_refname_format(const char *refname, int flags);
>  
> +/*
> +  * Return 0 iff all refs in filesystem are consistent.
> +*/
> +int refs_fsck(struct ref_store *refs);

We should probably also mention that errors will be written to `stderr`?

Also, I noticed that you are missing the implementation for the debug
backend in "refs/debug.c".

>  /*
>   * Apply the rules from check_refname_format, but mutate the result until it
>   * is acceptable, and place the result in "out".
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5f3089d947..b6147c588b 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3299,6 +3299,11 @@ static int files_init_db(struct ref_store *ref_store,
>  	return 0;
>  }
>  
> +static int files_fsck(struct ref_store *ref_store)
> +{
> +	return 0;
> +}

We can already wire up the call to the embedded packed-refs ref store
here. We always want to call that from the files backend.

Patrick
Karthik Nayak June 3, 2024, 8:56 a.m. UTC | #4
Hello,

shejialuo <shejialuo@gmail.com> writes:

[snip]

> diff --git a/refs.h b/refs.h
> index 34568ee1fb..2799820c40 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -544,6 +544,11 @@ int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_dat
>   */
>  int check_refname_format(const char *refname, int flags);
>
> +/*
> +  * Return 0 iff all refs in filesystem are consistent.
> +*/

s/iff/if

The indentation seems to be wrong here. Also since we want to check refs
and reflogs, it might be better to mention that the function checks the
reference database for consistency.

> +int refs_fsck(struct ref_store *refs);
> +
>  /*
>   * Apply the rules from check_refname_format, but mutate the result until it
>   * is acceptable, and place the result in "out".
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5f3089d947..b6147c588b 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3299,6 +3299,11 @@ static int files_init_db(struct ref_store *ref_store,
>  	return 0;
>  }
>
> +static int files_fsck(struct ref_store *ref_store)
> +{
> +	return 0;
> +}
> +
>  struct ref_storage_be refs_be_files = {
>  	.name = "files",
>  	.init = files_ref_store_create,
> @@ -3322,5 +3327,7 @@ struct ref_storage_be refs_be_files = {
>  	.reflog_exists = files_reflog_exists,
>  	.create_reflog = files_create_reflog,
>  	.delete_reflog = files_delete_reflog,
> -	.reflog_expire = files_reflog_expire
> +	.reflog_expire = files_reflog_expire,
> +
> +	.fsck = files_fsck,
>  };
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index a937e7dbfc..0617321634 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1704,6 +1704,11 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
>  	return empty_ref_iterator_begin();
>  }
>
> +static int packed_fsck(struct ref_store *ref_store)
> +{
> +	return 0;
> +}
> +
>  struct ref_storage_be refs_be_packed = {
>  	.name = "packed",
>  	.init = packed_ref_store_create,
> @@ -1728,4 +1733,6 @@ struct ref_storage_be refs_be_packed = {
>  	.create_reflog = NULL,
>  	.delete_reflog = NULL,
>  	.reflog_expire = NULL,
> +
> +	.fsck = packed_fsck,
>  };
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 53a6c5d842..ef697bf3bf 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -675,6 +675,8 @@ typedef int read_raw_ref_fn(struct ref_store *ref_store, const char *refname,
>  typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refname,
>  				 struct strbuf *referent);
>
> +typedef int fsck_fn(struct ref_store *ref_store);
> +
>  struct ref_storage_be {
>  	const char *name;
>  	ref_store_init_fn *init;
> @@ -700,6 +702,8 @@ struct ref_storage_be {
>  	create_reflog_fn *create_reflog;
>  	delete_reflog_fn *delete_reflog;
>  	reflog_expire_fn *reflog_expire;
> +
> +	fsck_fn *fsck;
>  };
>
>  extern struct ref_storage_be refs_be_files;
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 1af86bbdec..f3f85cd2f0 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2167,6 +2167,11 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>  	return ret;
>  }
>
> +static int reftable_be_fsck(struct ref_store *ref_store)
> +{
> +	return 0;
> +}
> +
>  struct ref_storage_be refs_be_reftable = {
>  	.name = "reftable",
>  	.init = reftable_be_init,
> @@ -2191,4 +2196,6 @@ struct ref_storage_be refs_be_reftable = {
>  	.create_reflog = reftable_be_create_reflog,
>  	.delete_reflog = reftable_be_delete_reflog,
>  	.reflog_expire = reftable_be_reflog_expire,
> +
> +	.fsck = reftable_be_fsck,
>  };
> --
> 2.45.1
Eric Sunshine June 3, 2024, 10:03 p.m. UTC | #5
On Mon, Jun 3, 2024 at 4:56 AM Karthik Nayak <karthik.188@gmail.com> wrote:
> shejialuo <shejialuo@gmail.com> writes:
> > +/*
> > +  * Return 0 iff all refs in filesystem are consistent.
> > +*/
>
> s/iff/if

I had spotted this, as well, but figured that the use of "iff" was
intentional to mean "if and only if".
diff mbox series

Patch

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d13a226c2e..65a26e2d1b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -1065,6 +1065,11 @@  int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	check_connectivity();
 
+	if (refs_fsck(get_main_ref_store(the_repository))) {
+		error("ref database is corrupt");
+		errors_found |= ERROR_REFS;
+	}
+
 	if (the_repository->settings.core_commit_graph) {
 		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
 
diff --git a/refs.c b/refs.c
index 8260c27cde..5ac61d551c 100644
--- a/refs.c
+++ b/refs.c
@@ -316,6 +316,11 @@  int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
+int refs_fsck(struct ref_store *refs)
+{
+	return refs->be->fsck(refs);
+}
+
 void sanitize_refname_component(const char *refname, struct strbuf *out)
 {
 	if (check_or_sanitize_refname(refname, REFNAME_ALLOW_ONELEVEL, out))
diff --git a/refs.h b/refs.h
index 34568ee1fb..2799820c40 100644
--- a/refs.h
+++ b/refs.h
@@ -544,6 +544,11 @@  int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_dat
  */
 int check_refname_format(const char *refname, int flags);
 
+/*
+  * Return 0 iff all refs in filesystem are consistent.
+*/
+int refs_fsck(struct ref_store *refs);
+
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
  * is acceptable, and place the result in "out".
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5f3089d947..b6147c588b 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3299,6 +3299,11 @@  static int files_init_db(struct ref_store *ref_store,
 	return 0;
 }
 
+static int files_fsck(struct ref_store *ref_store)
+{
+	return 0;
+}
+
 struct ref_storage_be refs_be_files = {
 	.name = "files",
 	.init = files_ref_store_create,
@@ -3322,5 +3327,7 @@  struct ref_storage_be refs_be_files = {
 	.reflog_exists = files_reflog_exists,
 	.create_reflog = files_create_reflog,
 	.delete_reflog = files_delete_reflog,
-	.reflog_expire = files_reflog_expire
+	.reflog_expire = files_reflog_expire,
+
+	.fsck = files_fsck,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a937e7dbfc..0617321634 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1704,6 +1704,11 @@  static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 	return empty_ref_iterator_begin();
 }
 
+static int packed_fsck(struct ref_store *ref_store)
+{
+	return 0;
+}
+
 struct ref_storage_be refs_be_packed = {
 	.name = "packed",
 	.init = packed_ref_store_create,
@@ -1728,4 +1733,6 @@  struct ref_storage_be refs_be_packed = {
 	.create_reflog = NULL,
 	.delete_reflog = NULL,
 	.reflog_expire = NULL,
+
+	.fsck = packed_fsck,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 53a6c5d842..ef697bf3bf 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -675,6 +675,8 @@  typedef int read_raw_ref_fn(struct ref_store *ref_store, const char *refname,
 typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refname,
 				 struct strbuf *referent);
 
+typedef int fsck_fn(struct ref_store *ref_store);
+
 struct ref_storage_be {
 	const char *name;
 	ref_store_init_fn *init;
@@ -700,6 +702,8 @@  struct ref_storage_be {
 	create_reflog_fn *create_reflog;
 	delete_reflog_fn *delete_reflog;
 	reflog_expire_fn *reflog_expire;
+
+	fsck_fn *fsck;
 };
 
 extern struct ref_storage_be refs_be_files;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1af86bbdec..f3f85cd2f0 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2167,6 +2167,11 @@  static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	return ret;
 }
 
+static int reftable_be_fsck(struct ref_store *ref_store)
+{
+	return 0;
+}
+
 struct ref_storage_be refs_be_reftable = {
 	.name = "reftable",
 	.init = reftable_be_init,
@@ -2191,4 +2196,6 @@  struct ref_storage_be refs_be_reftable = {
 	.create_reflog = reftable_be_create_reflog,
 	.delete_reflog = reftable_be_delete_reflog,
 	.reflog_expire = reftable_be_reflog_expire,
+
+	.fsck = reftable_be_fsck,
 };