diff mbox series

[GSoC,v13,02/10] fsck: add a unified interface for reporting fsck messages

Message ID ZqeYhkaArVmMdrnK@ArchLinux (mailing list archive)
State Superseded
Headers show
Series ref consistency check infra setup | expand

Commit Message

shejialuo July 29, 2024, 1:26 p.m. UTC
The static function "report" provided by "fsck.c" aims at checking fsck
error type and calling the callback "error_func" to report the message.
However, "report" function is only related to object database which
cannot be reused for refs. In order to provide a unified interface which
can report either objects or refs, create a new function "fsck_vreport"
following the "report" prototype. Instead of using "...", provide
"va_list" to allow more flexibility.

In order to provide an extensible error report for refs, add a new
"fsck_refs_info" structure and add parameter "const struct
*fsck_refs_info" into "fsck_vreport" function.

Like "report", the "fsck_vreport" function will use "error_func"
registered in "fsck_options" to report customized messages. Change
"error_func" prototype to align with the new "fsck_vreport".

Then, change "report" function to use "fsck_vreport" to report objects
related messages. Add a new function called "fsck_refs_report" to use
"fsck_vreport" to report refs related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/fsck.c  |  1 +
 builtin/mktag.c |  1 +
 fsck.c          | 56 +++++++++++++++++++++++++++++++++++++++++--------
 fsck.h          | 25 +++++++++++++++++++++-
 object-file.c   | 11 +++++-----
 5 files changed, 79 insertions(+), 15 deletions(-)

Comments

Patrick Steinhardt July 30, 2024, 8:31 a.m. UTC | #1
On Mon, Jul 29, 2024 at 09:26:30PM +0800, shejialuo wrote:
> The static function "report" provided by "fsck.c" aims at checking fsck
> error type and calling the callback "error_func" to report the message.
> However, "report" function is only related to object database which
> cannot be reused for refs.

Nit: it would be nice to mention _why_ it cannot be reused for refs.

> diff --git a/fsck.c b/fsck.c
> index 3f32441492..1185e9a8ad 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -226,12 +226,18 @@ static int object_on_skiplist(struct fsck_options *opts,
>  	return opts && oid && oidset_contains(&opts->skip_oids, oid);
>  }
>  
> -__attribute__((format (printf, 5, 6)))
> -static int report(struct fsck_options *options,
> -		  const struct object_id *oid, enum object_type object_type,
> -		  enum fsck_msg_id msg_id, const char *fmt, ...)
> +/*
> + * Provide a unified interface for either fscking refs or objects.
> + * It will get the current msg error type and call the error_func callback
> + * which is registered in the "fsck_options" struct.
> + */
> +static int fsck_vreport(struct fsck_options *options,
> +			const struct object_id *oid,
> +			enum object_type object_type,
> +			const struct fsck_refs_info *refs_info,
> +			enum fsck_msg_id msg_id, const char *fmt, va_list ap)
>  {
> -	va_list ap;
> +	va_list ap_copy;
>  	struct strbuf sb = STRBUF_INIT;
>  	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
>  	int result;

It is a bit weird that this new generic function receives non-generic
inputs which are specific to the respective subsystems (objects or refs)
that we are checking.

A better design would likely be to make `error_func()` receive a void
pointer such that `error_func()` and then have the respective subsystems
provide a function that knows to format the message while receiving
either a `struct fsck_object_report *` or a `struct fsck_ref_report *`.

I don't think this is particularly worriesome though as it is still
manageable right now. So I'm fine if we want to leave this as-is, and
then we can iterate on this in a future patch series as required.

> @@ -250,9 +256,9 @@ static int report(struct fsck_options *options,
>  	prepare_msg_ids();
>  	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
>  
> -	va_start(ap, fmt);
> -	strbuf_vaddf(&sb, fmt, ap);
> -	result = options->error_func(options, oid, object_type,
> +	va_copy(ap_copy, ap);
> +	strbuf_vaddf(&sb, fmt, ap_copy);
> +	result = options->error_func(options, oid, object_type, refs_info,
>  				     msg_type, msg_id, sb.buf);
>  	strbuf_release(&sb);
>  	va_end(ap);
> @@ -260,6 +266,35 @@ static int report(struct fsck_options *options,
>  	return result;
>  }
>  
> +__attribute__((format (printf, 5, 6)))
> +static int report(struct fsck_options *options,
> +		  const struct object_id *oid, enum object_type object_type,
> +		  enum fsck_msg_id msg_id, const char *fmt, ...)
> +{
> +	va_list ap;
> +	int result;
> +
> +	va_start(ap, fmt);
> +	result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap);
> +	va_end(ap);
> +
> +	return result;
> +}

As far as I can see, `report()` is now specific to reporting errors with
objects while `fsck_vreport()` is the generic part. Do we want to rename
the function to `fsck_report_object()` to clarify, or would that cause
too much churn?

Hm. Seeing that we have 62 callsites of that function it may be too much
churn indeed.

> +int fsck_refs_report(struct fsck_options *options,
> +		     const struct object_id *oid,
> +		     const struct fsck_refs_info *refs_info,
> +		     enum fsck_msg_id msg_id, const char *fmt, ...)

Would `fsck_report_ref()` be a better name?

What is the intent of the `oid` field? Would it be set to the object ID
that a reference points to? What if the reference is a non-resolving
symbolic reference? I wonder whether we can just remove it.

Patrick
shejialuo July 30, 2024, 2:56 p.m. UTC | #2
On Tue, Jul 30, 2024 at 10:31:16AM +0200, Patrick Steinhardt wrote:
> On Mon, Jul 29, 2024 at 09:26:30PM +0800, shejialuo wrote:
> > The static function "report" provided by "fsck.c" aims at checking fsck
> > error type and calling the callback "error_func" to report the message.
> > However, "report" function is only related to object database which
> > cannot be reused for refs.
> 
> Nit: it would be nice to mention _why_ it cannot be reused for refs.
> 
> > diff --git a/fsck.c b/fsck.c
> > index 3f32441492..1185e9a8ad 100644
> > --- a/fsck.c
> > +++ b/fsck.c
> > @@ -226,12 +226,18 @@ static int object_on_skiplist(struct fsck_options *opts,
> >  	return opts && oid && oidset_contains(&opts->skip_oids, oid);
> >  }
> >  
> > -__attribute__((format (printf, 5, 6)))
> > -static int report(struct fsck_options *options,
> > -		  const struct object_id *oid, enum object_type object_type,
> > -		  enum fsck_msg_id msg_id, const char *fmt, ...)
> > +/*
> > + * Provide a unified interface for either fscking refs or objects.
> > + * It will get the current msg error type and call the error_func callback
> > + * which is registered in the "fsck_options" struct.
> > + */
> > +static int fsck_vreport(struct fsck_options *options,
> > +			const struct object_id *oid,
> > +			enum object_type object_type,
> > +			const struct fsck_refs_info *refs_info,
> > +			enum fsck_msg_id msg_id, const char *fmt, va_list ap)
> >  {
> > -	va_list ap;
> > +	va_list ap_copy;
> >  	struct strbuf sb = STRBUF_INIT;
> >  	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
> >  	int result;
> 
> It is a bit weird that this new generic function receives non-generic
> inputs which are specific to the respective subsystems (objects or refs)
> that we are checking.
> 

Actually, this is one of the biggest problem when implementing the
infrastructure. The original function "report" only cares about
reporting the problem of objects. So the callback "error_func" uses the
similar prototype.

Problem comes when we want to add ref-related report. In my very former
implementation, I just created a new function "fsck_refs_report" to just
copy some codes from "report" and defines refs-related callback.

However, this is a bad way because we make duplication. If we want to
reuse the "report" function, we should add new parameters into "report"
and "error_func". This is the idea of this patch. However, as you can
see, there are so many "report" function calls in the codebase, it's bad
to change them. So I define a more common function called "fsck_vreport"
function and wrap "report" to eventually call this function.

> A better design would likely be to make `error_func()` receive a void
> pointer such that `error_func()` and then have the respective subsystems
> provide a function that knows to format the message while receiving
> either a `struct fsck_object_report *` or a `struct fsck_ref_report *`.
> 

Yes, I agree with this idea. And I think we should use only one function
called "fsck_reportf" to report any fsck-related messages. We could
design the following callback "prototype".

  typedef int (*fsck_error)(struct fsck_options *o,
                            void *info,
                            enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
                            const char *message);

Thus, we could make "fsck_reportf" generic. It will handle the common
"fsck_options" and "enum fsck_msg_id" and then it will call "fsck_error"
callback. The user could pass either refs information or objects
information.

> I don't think this is particularly worriesome though as it is still
> manageable right now. So I'm fine if we want to leave this as-is, and
> then we can iterate on this in a future patch series as required.
> 

I strongly suggest that we should use the above design for the following
reasons:

1. We only expose one interface called "fsck_reportf" which will make
the code clear. Actually, there is no different between reporting refs
and reporting objects.
2. We provide more extensibility here, because we will never change
"fsck_reportf" and "fsck_error" prototype when we want to add more info
for either refs or objects.

But do we really need this? Junio, could you please give some advice
here. How do you think about this design. In my perspective, the only
overhead here is that there are too many "report" function we should
refactor.

> > @@ -250,9 +256,9 @@ static int report(struct fsck_options *options,
> >  	prepare_msg_ids();
> >  	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
> >  
> > -	va_start(ap, fmt);
> > -	strbuf_vaddf(&sb, fmt, ap);
> > -	result = options->error_func(options, oid, object_type,
> > +	va_copy(ap_copy, ap);
> > +	strbuf_vaddf(&sb, fmt, ap_copy);
> > +	result = options->error_func(options, oid, object_type, refs_info,
> >  				     msg_type, msg_id, sb.buf);
> >  	strbuf_release(&sb);
> >  	va_end(ap);
> > @@ -260,6 +266,35 @@ static int report(struct fsck_options *options,
> >  	return result;
> >  }
> >  
> > +__attribute__((format (printf, 5, 6)))
> > +static int report(struct fsck_options *options,
> > +		  const struct object_id *oid, enum object_type object_type,
> > +		  enum fsck_msg_id msg_id, const char *fmt, ...)
> > +{
> > +	va_list ap;
> > +	int result;
> > +
> > +	va_start(ap, fmt);
> > +	result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap);
> > +	va_end(ap);
> > +
> > +	return result;
> > +}
> 
> As far as I can see, `report()` is now specific to reporting errors with
> objects while `fsck_vreport()` is the generic part. Do we want to rename
> the function to `fsck_report_object()` to clarify, or would that cause
> too much churn?
> 
> Hm. Seeing that we have 62 callsites of that function it may be too much
> churn indeed.
> 

Yes, there are too many references for "report" function. That's why I
wrap the "report" using "fsck_vreport".

> > +int fsck_refs_report(struct fsck_options *options,
> > +		     const struct object_id *oid,
> > +		     const struct fsck_refs_info *refs_info,
> > +		     enum fsck_msg_id msg_id, const char *fmt, ...)
> 
> Would `fsck_report_ref()` be a better name?
> 

I agree. However, if we use the above design, we will just use
"fsck_reportf" here both for refs and objects.

> What is the intent of the `oid` field? Would it be set to the object ID
> that a reference points to? What if the reference is a non-resolving
> symbolic reference? I wonder whether we can just remove it.
> 

`oid` is used to be the object ID that a reference points to. If the
reference is a symbolic link or symref, we do not care about it. The
caller should just pass `NULL`. Actually, we may not use this field.
I just suppose that we may provide the user more information. Because
when using "file-backend.c::parse_loose_ref_contents()" we will
automatically get the `oid` if the ref is a regular reference. So I just
provide `oid` here.

> Patrick
diff mbox series

Patch

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d13a226c2e..6abad60e7e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -92,6 +92,7 @@  static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o UNUSED,
 			   const struct object_id *oid,
 			   enum object_type object_type,
+			   const struct fsck_refs_info *refs_info UNUSED,
 			   enum fsck_msg_type msg_type,
 			   enum fsck_msg_id msg_id UNUSED,
 			   const char *message)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 4767f1a97e..6496deca0a 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -20,6 +20,7 @@  static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static int mktag_fsck_error_func(struct fsck_options *o UNUSED,
 				 const struct object_id *oid UNUSED,
 				 enum object_type object_type UNUSED,
+				 const struct fsck_refs_info *refs_info UNUSED,
 				 enum fsck_msg_type msg_type,
 				 enum fsck_msg_id msg_id UNUSED,
 				 const char *message)
diff --git a/fsck.c b/fsck.c
index 3f32441492..1185e9a8ad 100644
--- a/fsck.c
+++ b/fsck.c
@@ -226,12 +226,18 @@  static int object_on_skiplist(struct fsck_options *opts,
 	return opts && oid && oidset_contains(&opts->skip_oids, oid);
 }
 
-__attribute__((format (printf, 5, 6)))
-static int report(struct fsck_options *options,
-		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id msg_id, const char *fmt, ...)
+/*
+ * Provide a unified interface for either fscking refs or objects.
+ * It will get the current msg error type and call the error_func callback
+ * which is registered in the "fsck_options" struct.
+ */
+static int fsck_vreport(struct fsck_options *options,
+			const struct object_id *oid,
+			enum object_type object_type,
+			const struct fsck_refs_info *refs_info,
+			enum fsck_msg_id msg_id, const char *fmt, va_list ap)
 {
-	va_list ap;
+	va_list ap_copy;
 	struct strbuf sb = STRBUF_INIT;
 	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
 	int result;
@@ -250,9 +256,9 @@  static int report(struct fsck_options *options,
 	prepare_msg_ids();
 	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
 
-	va_start(ap, fmt);
-	strbuf_vaddf(&sb, fmt, ap);
-	result = options->error_func(options, oid, object_type,
+	va_copy(ap_copy, ap);
+	strbuf_vaddf(&sb, fmt, ap_copy);
+	result = options->error_func(options, oid, object_type, refs_info,
 				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
@@ -260,6 +266,35 @@  static int report(struct fsck_options *options,
 	return result;
 }
 
+__attribute__((format (printf, 5, 6)))
+static int report(struct fsck_options *options,
+		  const struct object_id *oid, enum object_type object_type,
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
+{
+	va_list ap;
+	int result;
+
+	va_start(ap, fmt);
+	result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap);
+	va_end(ap);
+
+	return result;
+}
+
+int fsck_refs_report(struct fsck_options *options,
+		     const struct object_id *oid,
+		     const struct fsck_refs_info *refs_info,
+		     enum fsck_msg_id msg_id, const char *fmt, ...)
+{
+	va_list ap;
+	int result;
+	va_start(ap, fmt);
+	result = fsck_vreport(options, oid, OBJ_NONE, refs_info,
+			      msg_id, fmt, ap);
+	va_end(ap);
+	return result;
+}
+
 void fsck_enable_object_names(struct fsck_options *options)
 {
 	if (!options->object_names)
@@ -1203,6 +1238,7 @@  int fsck_buffer(const struct object_id *oid, enum object_type type,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type UNUSED,
+			const struct fsck_refs_info *refs_info UNUSED,
 			enum fsck_msg_type msg_type,
 			enum fsck_msg_id msg_id UNUSED,
 			const char *message)
@@ -1306,6 +1342,7 @@  int git_fsck_config(const char *var, const char *value,
 int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
 					   const struct object_id *oid,
 					   enum object_type object_type,
+					   const struct fsck_refs_info *refs_info,
 					   enum fsck_msg_type msg_type,
 					   enum fsck_msg_id msg_id,
 					   const char *message)
@@ -1314,5 +1351,6 @@  int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
 		puts(oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+	return fsck_error_function(o, oid, object_type, refs_info,
+				   msg_type, msg_id, message);
 }
diff --git a/fsck.h b/fsck.h
index bcfb2e34cd..4f01a46cc7 100644
--- a/fsck.h
+++ b/fsck.h
@@ -92,6 +92,7 @@  enum fsck_msg_id {
 };
 #undef MSG_ID
 
+struct fsck_refs_info;
 struct fsck_options;
 struct object;
 
@@ -114,23 +115,35 @@  int is_valid_msg_type(const char *msg_id, const char *msg_type);
 typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 			      void *data, struct fsck_options *options);
 
-/* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
+/*
+ * callback function for reporting errors when checking either objects or refs
+ */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
+			  const struct fsck_refs_info *refs_info,
 			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
 			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
+			const struct fsck_refs_info *refs_info,
 			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
 			const char *message);
 int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
 					   const struct object_id *oid,
 					   enum object_type object_type,
+					   const struct fsck_refs_info *refs_info,
 					   enum fsck_msg_type msg_type,
 					   enum fsck_msg_id msg_id,
 					   const char *message);
 
+/*
+ * The information for reporting refs-related error message
+ */
+struct fsck_refs_info {
+	const char *path;
+};
+
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
@@ -209,6 +222,16 @@  int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
  */
 int fsck_finish(struct fsck_options *options);
 
+/*
+ * Report an error or warning for refs.
+ */
+__attribute__((format (printf, 5, 6)))
+int fsck_refs_report(struct fsck_options *options,
+		     const struct object_id *oid,
+		     const struct fsck_refs_info *refs_info,
+		     enum fsck_msg_id msg_id,
+		     const char *fmt, ...);
+
 /*
  * Subsystem for storing human-readable names for each object.
  *
diff --git a/object-file.c b/object-file.c
index 065103be3e..91ddab2696 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2470,11 +2470,12 @@  int repo_has_object_file(struct repository *r,
  * give more context.
  */
 static int hash_format_check_report(struct fsck_options *opts UNUSED,
-				     const struct object_id *oid UNUSED,
-				     enum object_type object_type UNUSED,
-				     enum fsck_msg_type msg_type UNUSED,
-				     enum fsck_msg_id msg_id UNUSED,
-				     const char *message)
+				    const struct object_id *oid UNUSED,
+				    enum object_type object_type UNUSED,
+				    const struct fsck_refs_info *refs_info UNUSED,
+				    enum fsck_msg_type msg_type UNUSED,
+				    enum fsck_msg_id msg_id UNUSED,
+				    const char *message)
 {
 	error(_("object fails fsck: %s"), message);
 	return 1;