Message ID | ZqeYhkaArVmMdrnK@ArchLinux (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ref consistency check infra setup | expand |
On Mon, Jul 29, 2024 at 09:26:30PM +0800, shejialuo wrote: > The static function "report" provided by "fsck.c" aims at checking fsck > error type and calling the callback "error_func" to report the message. > However, "report" function is only related to object database which > cannot be reused for refs. Nit: it would be nice to mention _why_ it cannot be reused for refs. > diff --git a/fsck.c b/fsck.c > index 3f32441492..1185e9a8ad 100644 > --- a/fsck.c > +++ b/fsck.c > @@ -226,12 +226,18 @@ static int object_on_skiplist(struct fsck_options *opts, > return opts && oid && oidset_contains(&opts->skip_oids, oid); > } > > -__attribute__((format (printf, 5, 6))) > -static int report(struct fsck_options *options, > - const struct object_id *oid, enum object_type object_type, > - enum fsck_msg_id msg_id, const char *fmt, ...) > +/* > + * Provide a unified interface for either fscking refs or objects. > + * It will get the current msg error type and call the error_func callback > + * which is registered in the "fsck_options" struct. > + */ > +static int fsck_vreport(struct fsck_options *options, > + const struct object_id *oid, > + enum object_type object_type, > + const struct fsck_refs_info *refs_info, > + enum fsck_msg_id msg_id, const char *fmt, va_list ap) > { > - va_list ap; > + va_list ap_copy; > struct strbuf sb = STRBUF_INIT; > enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options); > int result; It is a bit weird that this new generic function receives non-generic inputs which are specific to the respective subsystems (objects or refs) that we are checking. A better design would likely be to make `error_func()` receive a void pointer such that `error_func()` and then have the respective subsystems provide a function that knows to format the message while receiving either a `struct fsck_object_report *` or a `struct fsck_ref_report *`. I don't think this is particularly worriesome though as it is still manageable right now. So I'm fine if we want to leave this as-is, and then we can iterate on this in a future patch series as required. > @@ -250,9 +256,9 @@ static int report(struct fsck_options *options, > prepare_msg_ids(); > strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased); > > - va_start(ap, fmt); > - strbuf_vaddf(&sb, fmt, ap); > - result = options->error_func(options, oid, object_type, > + va_copy(ap_copy, ap); > + strbuf_vaddf(&sb, fmt, ap_copy); > + result = options->error_func(options, oid, object_type, refs_info, > msg_type, msg_id, sb.buf); > strbuf_release(&sb); > va_end(ap); > @@ -260,6 +266,35 @@ static int report(struct fsck_options *options, > return result; > } > > +__attribute__((format (printf, 5, 6))) > +static int report(struct fsck_options *options, > + const struct object_id *oid, enum object_type object_type, > + enum fsck_msg_id msg_id, const char *fmt, ...) > +{ > + va_list ap; > + int result; > + > + va_start(ap, fmt); > + result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap); > + va_end(ap); > + > + return result; > +} As far as I can see, `report()` is now specific to reporting errors with objects while `fsck_vreport()` is the generic part. Do we want to rename the function to `fsck_report_object()` to clarify, or would that cause too much churn? Hm. Seeing that we have 62 callsites of that function it may be too much churn indeed. > +int fsck_refs_report(struct fsck_options *options, > + const struct object_id *oid, > + const struct fsck_refs_info *refs_info, > + enum fsck_msg_id msg_id, const char *fmt, ...) Would `fsck_report_ref()` be a better name? What is the intent of the `oid` field? Would it be set to the object ID that a reference points to? What if the reference is a non-resolving symbolic reference? I wonder whether we can just remove it. Patrick
On Tue, Jul 30, 2024 at 10:31:16AM +0200, Patrick Steinhardt wrote: > On Mon, Jul 29, 2024 at 09:26:30PM +0800, shejialuo wrote: > > The static function "report" provided by "fsck.c" aims at checking fsck > > error type and calling the callback "error_func" to report the message. > > However, "report" function is only related to object database which > > cannot be reused for refs. > > Nit: it would be nice to mention _why_ it cannot be reused for refs. > > > diff --git a/fsck.c b/fsck.c > > index 3f32441492..1185e9a8ad 100644 > > --- a/fsck.c > > +++ b/fsck.c > > @@ -226,12 +226,18 @@ static int object_on_skiplist(struct fsck_options *opts, > > return opts && oid && oidset_contains(&opts->skip_oids, oid); > > } > > > > -__attribute__((format (printf, 5, 6))) > > -static int report(struct fsck_options *options, > > - const struct object_id *oid, enum object_type object_type, > > - enum fsck_msg_id msg_id, const char *fmt, ...) > > +/* > > + * Provide a unified interface for either fscking refs or objects. > > + * It will get the current msg error type and call the error_func callback > > + * which is registered in the "fsck_options" struct. > > + */ > > +static int fsck_vreport(struct fsck_options *options, > > + const struct object_id *oid, > > + enum object_type object_type, > > + const struct fsck_refs_info *refs_info, > > + enum fsck_msg_id msg_id, const char *fmt, va_list ap) > > { > > - va_list ap; > > + va_list ap_copy; > > struct strbuf sb = STRBUF_INIT; > > enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options); > > int result; > > It is a bit weird that this new generic function receives non-generic > inputs which are specific to the respective subsystems (objects or refs) > that we are checking. > Actually, this is one of the biggest problem when implementing the infrastructure. The original function "report" only cares about reporting the problem of objects. So the callback "error_func" uses the similar prototype. Problem comes when we want to add ref-related report. In my very former implementation, I just created a new function "fsck_refs_report" to just copy some codes from "report" and defines refs-related callback. However, this is a bad way because we make duplication. If we want to reuse the "report" function, we should add new parameters into "report" and "error_func". This is the idea of this patch. However, as you can see, there are so many "report" function calls in the codebase, it's bad to change them. So I define a more common function called "fsck_vreport" function and wrap "report" to eventually call this function. > A better design would likely be to make `error_func()` receive a void > pointer such that `error_func()` and then have the respective subsystems > provide a function that knows to format the message while receiving > either a `struct fsck_object_report *` or a `struct fsck_ref_report *`. > Yes, I agree with this idea. And I think we should use only one function called "fsck_reportf" to report any fsck-related messages. We could design the following callback "prototype". typedef int (*fsck_error)(struct fsck_options *o, void *info, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id, const char *message); Thus, we could make "fsck_reportf" generic. It will handle the common "fsck_options" and "enum fsck_msg_id" and then it will call "fsck_error" callback. The user could pass either refs information or objects information. > I don't think this is particularly worriesome though as it is still > manageable right now. So I'm fine if we want to leave this as-is, and > then we can iterate on this in a future patch series as required. > I strongly suggest that we should use the above design for the following reasons: 1. We only expose one interface called "fsck_reportf" which will make the code clear. Actually, there is no different between reporting refs and reporting objects. 2. We provide more extensibility here, because we will never change "fsck_reportf" and "fsck_error" prototype when we want to add more info for either refs or objects. But do we really need this? Junio, could you please give some advice here. How do you think about this design. In my perspective, the only overhead here is that there are too many "report" function we should refactor. > > @@ -250,9 +256,9 @@ static int report(struct fsck_options *options, > > prepare_msg_ids(); > > strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased); > > > > - va_start(ap, fmt); > > - strbuf_vaddf(&sb, fmt, ap); > > - result = options->error_func(options, oid, object_type, > > + va_copy(ap_copy, ap); > > + strbuf_vaddf(&sb, fmt, ap_copy); > > + result = options->error_func(options, oid, object_type, refs_info, > > msg_type, msg_id, sb.buf); > > strbuf_release(&sb); > > va_end(ap); > > @@ -260,6 +266,35 @@ static int report(struct fsck_options *options, > > return result; > > } > > > > +__attribute__((format (printf, 5, 6))) > > +static int report(struct fsck_options *options, > > + const struct object_id *oid, enum object_type object_type, > > + enum fsck_msg_id msg_id, const char *fmt, ...) > > +{ > > + va_list ap; > > + int result; > > + > > + va_start(ap, fmt); > > + result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap); > > + va_end(ap); > > + > > + return result; > > +} > > As far as I can see, `report()` is now specific to reporting errors with > objects while `fsck_vreport()` is the generic part. Do we want to rename > the function to `fsck_report_object()` to clarify, or would that cause > too much churn? > > Hm. Seeing that we have 62 callsites of that function it may be too much > churn indeed. > Yes, there are too many references for "report" function. That's why I wrap the "report" using "fsck_vreport". > > +int fsck_refs_report(struct fsck_options *options, > > + const struct object_id *oid, > > + const struct fsck_refs_info *refs_info, > > + enum fsck_msg_id msg_id, const char *fmt, ...) > > Would `fsck_report_ref()` be a better name? > I agree. However, if we use the above design, we will just use "fsck_reportf" here both for refs and objects. > What is the intent of the `oid` field? Would it be set to the object ID > that a reference points to? What if the reference is a non-resolving > symbolic reference? I wonder whether we can just remove it. > `oid` is used to be the object ID that a reference points to. If the reference is a symbolic link or symref, we do not care about it. The caller should just pass `NULL`. Actually, we may not use this field. I just suppose that we may provide the user more information. Because when using "file-backend.c::parse_loose_ref_contents()" we will automatically get the `oid` if the ref is a regular reference. So I just provide `oid` here. > Patrick
diff --git a/builtin/fsck.c b/builtin/fsck.c index d13a226c2e..6abad60e7e 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -92,6 +92,7 @@ static int objerror(struct object *obj, const char *err) static int fsck_error_func(struct fsck_options *o UNUSED, const struct object_id *oid, enum object_type object_type, + const struct fsck_refs_info *refs_info UNUSED, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id UNUSED, const char *message) diff --git a/builtin/mktag.c b/builtin/mktag.c index 4767f1a97e..6496deca0a 100644 --- a/builtin/mktag.c +++ b/builtin/mktag.c @@ -20,6 +20,7 @@ static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT; static int mktag_fsck_error_func(struct fsck_options *o UNUSED, const struct object_id *oid UNUSED, enum object_type object_type UNUSED, + const struct fsck_refs_info *refs_info UNUSED, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id UNUSED, const char *message) diff --git a/fsck.c b/fsck.c index 3f32441492..1185e9a8ad 100644 --- a/fsck.c +++ b/fsck.c @@ -226,12 +226,18 @@ static int object_on_skiplist(struct fsck_options *opts, return opts && oid && oidset_contains(&opts->skip_oids, oid); } -__attribute__((format (printf, 5, 6))) -static int report(struct fsck_options *options, - const struct object_id *oid, enum object_type object_type, - enum fsck_msg_id msg_id, const char *fmt, ...) +/* + * Provide a unified interface for either fscking refs or objects. + * It will get the current msg error type and call the error_func callback + * which is registered in the "fsck_options" struct. + */ +static int fsck_vreport(struct fsck_options *options, + const struct object_id *oid, + enum object_type object_type, + const struct fsck_refs_info *refs_info, + enum fsck_msg_id msg_id, const char *fmt, va_list ap) { - va_list ap; + va_list ap_copy; struct strbuf sb = STRBUF_INIT; enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options); int result; @@ -250,9 +256,9 @@ static int report(struct fsck_options *options, prepare_msg_ids(); strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased); - va_start(ap, fmt); - strbuf_vaddf(&sb, fmt, ap); - result = options->error_func(options, oid, object_type, + va_copy(ap_copy, ap); + strbuf_vaddf(&sb, fmt, ap_copy); + result = options->error_func(options, oid, object_type, refs_info, msg_type, msg_id, sb.buf); strbuf_release(&sb); va_end(ap); @@ -260,6 +266,35 @@ static int report(struct fsck_options *options, return result; } +__attribute__((format (printf, 5, 6))) +static int report(struct fsck_options *options, + const struct object_id *oid, enum object_type object_type, + enum fsck_msg_id msg_id, const char *fmt, ...) +{ + va_list ap; + int result; + + va_start(ap, fmt); + result = fsck_vreport(options, oid, object_type, NULL, msg_id, fmt, ap); + va_end(ap); + + return result; +} + +int fsck_refs_report(struct fsck_options *options, + const struct object_id *oid, + const struct fsck_refs_info *refs_info, + enum fsck_msg_id msg_id, const char *fmt, ...) +{ + va_list ap; + int result; + va_start(ap, fmt); + result = fsck_vreport(options, oid, OBJ_NONE, refs_info, + msg_id, fmt, ap); + va_end(ap); + return result; +} + void fsck_enable_object_names(struct fsck_options *options) { if (!options->object_names) @@ -1203,6 +1238,7 @@ int fsck_buffer(const struct object_id *oid, enum object_type type, int fsck_error_function(struct fsck_options *o, const struct object_id *oid, enum object_type object_type UNUSED, + const struct fsck_refs_info *refs_info UNUSED, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id UNUSED, const char *message) @@ -1306,6 +1342,7 @@ int git_fsck_config(const char *var, const char *value, int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o, const struct object_id *oid, enum object_type object_type, + const struct fsck_refs_info *refs_info, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id, const char *message) @@ -1314,5 +1351,6 @@ int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o, puts(oid_to_hex(oid)); return 0; } - return fsck_error_function(o, oid, object_type, msg_type, msg_id, message); + return fsck_error_function(o, oid, object_type, refs_info, + msg_type, msg_id, message); } diff --git a/fsck.h b/fsck.h index bcfb2e34cd..4f01a46cc7 100644 --- a/fsck.h +++ b/fsck.h @@ -92,6 +92,7 @@ enum fsck_msg_id { }; #undef MSG_ID +struct fsck_refs_info; struct fsck_options; struct object; @@ -114,23 +115,35 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type); typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type, void *data, struct fsck_options *options); -/* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */ +/* + * callback function for reporting errors when checking either objects or refs + */ typedef int (*fsck_error)(struct fsck_options *o, const struct object_id *oid, enum object_type object_type, + const struct fsck_refs_info *refs_info, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id, const char *message); int fsck_error_function(struct fsck_options *o, const struct object_id *oid, enum object_type object_type, + const struct fsck_refs_info *refs_info, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id, const char *message); int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o, const struct object_id *oid, enum object_type object_type, + const struct fsck_refs_info *refs_info, enum fsck_msg_type msg_type, enum fsck_msg_id msg_id, const char *message); +/* + * The information for reporting refs-related error message + */ +struct fsck_refs_info { + const char *path; +}; + struct fsck_options { fsck_walk_func walk; fsck_error error_func; @@ -209,6 +222,16 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer, */ int fsck_finish(struct fsck_options *options); +/* + * Report an error or warning for refs. + */ +__attribute__((format (printf, 5, 6))) +int fsck_refs_report(struct fsck_options *options, + const struct object_id *oid, + const struct fsck_refs_info *refs_info, + enum fsck_msg_id msg_id, + const char *fmt, ...); + /* * Subsystem for storing human-readable names for each object. * diff --git a/object-file.c b/object-file.c index 065103be3e..91ddab2696 100644 --- a/object-file.c +++ b/object-file.c @@ -2470,11 +2470,12 @@ int repo_has_object_file(struct repository *r, * give more context. */ static int hash_format_check_report(struct fsck_options *opts UNUSED, - const struct object_id *oid UNUSED, - enum object_type object_type UNUSED, - enum fsck_msg_type msg_type UNUSED, - enum fsck_msg_id msg_id UNUSED, - const char *message) + const struct object_id *oid UNUSED, + enum object_type object_type UNUSED, + const struct fsck_refs_info *refs_info UNUSED, + enum fsck_msg_type msg_type UNUSED, + enum fsck_msg_id msg_id UNUSED, + const char *message) { error(_("object fails fsck: %s"), message); return 1;
The static function "report" provided by "fsck.c" aims at checking fsck error type and calling the callback "error_func" to report the message. However, "report" function is only related to object database which cannot be reused for refs. In order to provide a unified interface which can report either objects or refs, create a new function "fsck_vreport" following the "report" prototype. Instead of using "...", provide "va_list" to allow more flexibility. In order to provide an extensible error report for refs, add a new "fsck_refs_info" structure and add parameter "const struct *fsck_refs_info" into "fsck_vreport" function. Like "report", the "fsck_vreport" function will use "error_func" registered in "fsck_options" to report customized messages. Change "error_func" prototype to align with the new "fsck_vreport". Then, change "report" function to use "fsck_vreport" to report objects related messages. Add a new function called "fsck_refs_report" to use "fsck_vreport" to report refs related messages. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- builtin/fsck.c | 1 + builtin/mktag.c | 1 + fsck.c | 56 +++++++++++++++++++++++++++++++++++++++++-------- fsck.h | 25 +++++++++++++++++++++- object-file.c | 11 +++++----- 5 files changed, 79 insertions(+), 15 deletions(-)