Message ID | 20240930220352.2461975-2-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Typed errors | expand |
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > .... It is designed to be passed and returned by value, not > pointer, and it is possible to do so in two registers on 64-bit systems. > Similar functionality works well for error types in Rust and for the > standard library's lldiv_t, so this should not pose a problem. It should not, in the sense that "any reasonable platform should be able to pass two 64-bit word in a structure by value", but isn't it optimizing for a wrong (i.e. have error) case? In the case where there is no error, a "negative return is error, zero is success", with a pointer to "more detailed error info, in case the call resulted in an error", would let us take branch based on a zero-ness check on an integer in a machine-natural word, without even looking at these two words in the struct.
On 2024-09-30 at 22:44:37, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > .... It is designed to be passed and returned by value, not > > pointer, and it is possible to do so in two registers on 64-bit systems. > > Similar functionality works well for error types in Rust and for the > > standard library's lldiv_t, so this should not pose a problem. > > It should not, in the sense that "any reasonable platform should be > able to pass two 64-bit word in a structure by value", but isn't it > optimizing for a wrong (i.e. have error) case? In the case where > there is no error, a "negative return is error, zero is success", > with a pointer to "more detailed error info, in case the call > resulted in an error", would let us take branch based on a zero-ness > check on an integer in a machine-natural word, without even looking > at these two words in the struct. We can adjust the first word so that it's always zero on success, in which case, because it's returned in two registers, the processor will be able to branch on a zero-ness check on one of those registers. (We can even optimize the check by looking at the low 32 bits, which will do the same for 32-bit machines as well.) The performance benefit will be the same, and I should note that Rust does this kind of thing without a problem. We did discuss passing an error accumulating argument to all functions at the contributor summit, but it didn't seem that was the way folks wanted to go: people seemed to prefer a return value. I'm totally open to other proposals here and not particularly tied to this one, but I told Emily that I'd send a proposal to the list, and so here it is. If we like this with changes or prefer something different, that's fine with me.
On Mon, Sep 30, 2024 at 10:03:52PM +0000, brian m. carlson wrote: >+ case GIT_ERR_MULTIPLE: >+ strbuf_addf(buf, _("multiple errors:\n")); >+ for (size_t i = 0; i < me->count; i++) { >+ git_error_strbuf(buf, me->errs[i]); >+ strbuf_addstr(buf, "\n"); >+ } >+ } ah. i was wondering how you'd address this ("this" being non-fatal errors piling up), as i'm facing the same problem in a project of mine. the problem is that one can either impose a very rigid formatting (as you do here), or provide formatting options, which can get arbitrarily complex. so i wonder whether one shouldn't pursue a hybrid approach instead: singular return values for low-level functions that either work or fail fatally, and error callbacks for higher-level functionality? btw, the switch misses breaks.
On Mon, Sep 30, 2024 at 3:04 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > There is work underway to move some of the Git code out into a reusable > library. In such a case, it's not very desirable to have the library > code write random errors to standard error, since this is an antipattern > which annoys terminal users. > > Instead, we will expect callers of our library function to return > errors. The reusability of our library will be substantially improved > if we return typed errors so that users can easily determine what kind > of error might have occurred and react differently based on different > contexts. For example, if we are looking up an object in a partial > clone and it's not found, we might decide to download it, but we might > return an error to the user if the problem is instead that the revision > specified is not syntactically valid. > > To help the libification process and make our code more generally > maintainable, add an error type. This consists of on 64-bit integer, > which contains bit flags and a 32-bit code, and a pointer, which depends > on the code. It is designed to be passed and returned by value, not > pointer, and it is possible to do so in two registers on 64-bit systems. > Similar functionality works well for error types in Rust and for the > standard library's lldiv_t, so this should not pose a problem. > > Provide the ability to specify either an errno value or a git error code > as the code. This allows us to use this type generically when handling > errno values such as processing files, as well as express a rich set of > possible error codes specific to Git. We pick an unsigned 32-bit code > because Windows can use the full set of 32 bits in its error values, > even though most Unix systems use only a small set of codes which > traditionally start at 1. 32 bits for Git errors also allows us plenty > of space to expand as we see fit. > > Allow multiple errors to be provided and wrapped in a single object, > which is useful in many situations, and add helpers to determine if any > error in the set matches a particular code. > > Additionally, provide error formatting functions that produce a suitable > localized string for ease of use. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Left a few comments below about portability and readability, but general thoughts: of course this is quite limited, but I think it will actually be enough in most cases. It also seems like if we need more later, we can use this same code-OR-with-pointer thing to point to something more nuanced than `void* meta`. I'd be in favor of applying this model to somewhere with clear error codes and relatively few invocations (maybe to the internals of something like argparse or run-command, without initially modifying the public interface, for example?) and seeing where we run into friction. I have a little bit of concern about the idea of centralizing all the `meta` parsing and error string generation, but it seems like we might be able to adopt a model more similar to the one we use for `git_config()` callbacks and do more context-aware parsing instead to keep `git_error_strbuf` from getting too huge. Thanks for sending this. I'm looking forward to hearing others' opinions. - Emily > --- > Makefile | 1 + > error.c | 43 ++++++++++++++ > error.h | 168 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 212 insertions(+) > create mode 100644 error.c > create mode 100644 error.h > > diff --git a/Makefile b/Makefile > index 7344a7f725..5d9bf992e9 100644 > --- a/Makefile > +++ b/Makefile > @@ -1013,6 +1013,7 @@ LIB_OBJS += dir.o > LIB_OBJS += editor.o > LIB_OBJS += entry.o > LIB_OBJS += environment.o > +LIB_OBJS += error.o > LIB_OBJS += ewah/bitmap.o > LIB_OBJS += ewah/ewah_bitmap.o > LIB_OBJS += ewah/ewah_io.o > diff --git a/error.c b/error.c > new file mode 100644 > index 0000000000..713bc42187 > --- /dev/null > +++ b/error.c > @@ -0,0 +1,43 @@ > +#include "git-compat-util.h" > +#include "gettext.h" > +#include "error.h" > +#include "hex.h" > +#include "strbuf.h" > + > +const char *git_error_string(struct git_error err) > +{ > + struct strbuf buf = STRBUF_INIT; > + if (!git_error_strbuf(&buf, err)) > + return NULL; > + return strbuf_detach(&buf, NULL); > +} > + > +const char *git_error_strbuf(struct strbuf *buf, struct git_error err) Is the idea that we continue to extend `git_error_strbuf` with more `git_error_code` values as we add this method of error handling across the codebase? I'm worried it could get quite unwieldy. Or are you suggesting that we could write `git_error_strbuf_object_store`, `git_error_strbuf_hook`, etc to keep the code->string conversion local? That would let us do custom processing of err.meta depending on context, too, wouldn't it? > +{ > + if (GIT_ERROR_SUCCESS(err)) { > + return NULL; > + } else if (GIT_ERROR_ERRNO(err) != -1) { > + return xstrdup(strerror(GIT_ERROR_ERRNO(err))); > + } else { > + struct git_error_multiple *me = err.meta; > + switch (GIT_ERROR_GITERR(err)) { > + case GIT_ERR_OBJECT_NOT_FOUND: > + if (err.meta) > + strbuf_addf(buf, _("object not found: %s"), oid_to_hex(err.meta)); > + else > + strbuf_addf(buf, _("object not found")); > + case GIT_ERR_NULL_OID: > + if (err.meta) > + strbuf_addf(buf, _("null object ID not allowed in this context: %s"), (char *)err.meta); > + else > + strbuf_addf(buf, _("null object ID not allowed")); > + case GIT_ERR_MULTIPLE: > + strbuf_addf(buf, _("multiple errors:\n")); > + for (size_t i = 0; i < me->count; i++) { > + git_error_strbuf(buf, me->errs[i]); > + strbuf_addstr(buf, "\n"); > + } > + } > + return buf->buf; > + } > +} > diff --git a/error.h b/error.h > new file mode 100644 > index 0000000000..485cca99e0 > --- /dev/null > +++ b/error.h > @@ -0,0 +1,168 @@ > +#ifndef ERROR_H > +#define ERROR_H > + > +#include "git-compat-util.h" > + > +/* Set if this value is initialized. */ > +#define GIT_ERROR_BIT_INIT (1ULL << 63) > +/* Set if the code is an errno code, clear if it's a git error code. */ > +#define GIT_ERROR_BIT_ERRNO (1ULL << 62) > +/* > + * Set if the memory in meta should be freed; otherwise, it's statically > + * allocated. > + */ > +#define GIT_ERROR_BIT_ALLOC (1ULL << 61) > +/* > + * Set if the memory in meta is a C string; otherwise, it's a metadata struct. > + */ > +#define GIT_ERROR_BIT_MSG (1ULL << 60) > + > +#define GIT_ERROR_BIT_MASK (0xf << 60) > + > +#define GIT_ERROR_OK (git_error_ok()) > + > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) > + > +#define GIT_ERROR_ERRNO(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? ((e).code & 0xffffffff) : -1) > +#define GIT_ERROR_GITERR(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? -1 : ((e).code & 0xffffffff)) Aurgh, too bad we don't get bitfields before C11. :) (Although I am not sure if that helps with your register-level optimization at that point... but it probably helps with readability.) But, I do wonder if this gluing-together-two-types-into-one-value thing may break based on endianness? (And if we care? I don't think we have any tests running on POWER systems, so maybe this falls under the umbrella of "you should give us tests if you want us to not break you"?) > + > +/* > + * A value representing an error in Git. > + */ > +struct git_error { > + uint64_t code; > + void *meta; > +}; > + > +struct git_error_multiple { > + struct git_error *errs; > + size_t count; > +}; > + > +enum git_error_code { > + /* The operation was a success. */ > + GIT_ERR_SUCCESS = 0, > + /* An object ID was provided, but it was not found. > + * > + * meta will be NULL or a pointer to struct object ID. > + */ > + GIT_ERR_OBJECT_NOT_FOUND = 1, > + /* > + * An object ID was provided, but it is all zeros and that is not > + * allowed. > + * > + * meta will be NULL or a message explaining the context. > + */ > + GIT_ERR_NULL_OID = 2, > + /* > + * Multiple errors occurred. > + * > + * meta must be a pointer to struct git_error_multiple. > + */ > + GIT_ERR_MULTIPLE = 3, > +}; > + > +const char *git_error_string(struct git_error err); > +const char *git_error_strbuf(struct strbuf *buf, struct git_error err); > + > +/* > + * A successful error status. > + */ > +static inline struct git_error git_error_ok(void) { > + struct git_error e = { > + .code = 0 | GIT_ERROR_BIT_INIT, > + .meta = NULL, > + }; > + return e; > +} > + > +static inline struct git_error git_error_new_errno(uint32_t errnoval, const char *msg, int to_free) { > + struct git_error e = { > + .code = errnoval | GIT_ERROR_BIT_INIT | GIT_ERROR_BIT_ERRNO | > + GIT_ERROR_BIT_MSG | (to_free ? GIT_ERROR_BIT_ALLOC : 0), > + .meta = (void *)msg, > + }; > + return e; > +} > + > +static inline struct git_error git_error_new_git(uint32_t gitcode, const char *msg, int to_free) { > + struct git_error e = { > + .code = gitcode | GIT_ERROR_BIT_INIT | > + GIT_ERROR_BIT_MSG | (to_free ? GIT_ERROR_BIT_ALLOC : 0), > + .meta = (void *)msg, > + }; > + return e; > +} > + > +static inline struct git_error git_error_new_simple(uint32_t gitcode) { > + struct git_error e = { > + .code = gitcode | GIT_ERROR_BIT_INIT, > + .meta = NULL, > + }; > + return e; > +} > + > +static inline struct git_error git_error_new_multiple(struct git_error *errs, size_t count) > +{ > + struct git_error_multiple *me = xmalloc(sizeof(*me)); > + struct git_error e = { > + .code = GIT_ERR_MULTIPLE | GIT_ERROR_BIT_INIT | GIT_ERROR_BIT_ALLOC, > + .meta = me, > + }; > + me->errs = errs; > + me->count = count; > + return e; > +} > + > +/* > + * If this is a git error and the code matches the given code, or if this is a > + * multiple error and any of the contained errors are a git error whose code > + * matches, returns a pointer to that error. If there is no match, returns > + * NULL. > + */ > +static inline struct git_error *git_error_is_git(struct git_error *e, int code) { > + int64_t giterr = GIT_ERROR_GITERR(*e); > + if (giterr == code) { > + return e; > + } else if (giterr == GIT_ERR_MULTIPLE) { > + struct git_error_multiple *me = e->meta; > + for (size_t i = 0; i < me->count; i++) > + return git_error_is_git(me->errs + i, code); > + return NULL; > + } else { > + return NULL; > + } > +} > + > +/* > + * If this is an errno error and the code matches the given code, or if this is > + * a multiple error and any of the contained errors are an errno error whose > + * code matches, returns a pointer to that error. Otherwise, returns NULL. > + */ > +static inline struct git_error *git_error_is_errno(struct git_error *e, int code) { > + int64_t giterr = GIT_ERROR_GITERR(*e); > + if (GIT_ERROR_ERRNO(*e) == code) { > + return e; > + } else if (giterr == GIT_ERR_MULTIPLE) { > + struct git_error_multiple *me = e->meta; > + for (size_t i = 0; i < me->count; i++) > + return git_error_is_errno(me->errs + i, code); > + return NULL; > + } else { > + return NULL; > + } > +} > + > +/* Frees and deinitializes this error structure. */ > +static inline uint64_t git_error_free(struct git_error *e) > +{ > + uint64_t code = e->code; > + if (e->code & GIT_ERROR_BIT_ALLOC) > + free(e->meta); > + e->code = 0; > + e->meta = NULL; > + return code; > +} > + > +#endif
Hi brian Thanks for working on this. On 30/09/2024 23:03, brian m. carlson wrote: > There is work underway to move some of the Git code out into a reusable > library. In such a case, it's not very desirable to have the library > code write random errors to standard error, since this is an antipattern > which annoys terminal users. > > Instead, we will expect callers of our library function to return > errors. Isn't it the callers that will expect the function to return an error? > The reusability of our library will be substantially improved > if we return typed errors so that users can easily determine what kind > of error might have occurred and react differently based on different > contexts. For example, if we are looking up an object in a partial > clone and it's not found, we might decide to download it, but we might > return an error to the user if the problem is instead that the revision > specified is not syntactically valid. > > To help the libification process and make our code more generally > maintainable, add an error type. This consists of on 64-bit integer, > which contains bit flags and a 32-bit code, and a pointer, which depends > on the code. It is designed to be passed and returned by value, not > pointer, and it is possible to do so in two registers on 64-bit systems. > Similar functionality works well for error types in Rust and for the > standard library's lldiv_t, so this should not pose a problem. Part of the reason it works well in rust is that it supports discriminated unions with pattern matching and has the "?" macro for early returns. In C the code ends up being quite verbose compared to taking a pointer to error struct as a function parameter and returning a boolean success/fail flag. struct git_error e; struct object_id oid; e = repo_get_oid(r, "HEAD", &oid); if (!GIT_ERROR_SUCCESS(e)) return e; With a boolean return we can have struct object_id oid; if (repo_get_oid(r, "HEAD", &oid, e)) return -1; where "e" is a "struct git_error*" passed into the function. > Provide the ability to specify either an errno value or a git error code > as the code. This allows us to use this type generically when handling > errno values such as processing files, as well as express a rich set of > possible error codes specific to Git. We pick an unsigned 32-bit code > because Windows can use the full set of 32 bits in its error values, > even though most Unix systems use only a small set of codes which > traditionally start at 1. 32 bits for Git errors also allows us plenty > of space to expand as we see fit. I think the design of the struct is fine. It does mean we need a global list of error values. GError [1] avoids this by having a "domain" field that is an interned string so that error codes only need to be unique within a given domain. I think that for a self-contained project like git a global list is probably fine - svn does this for example [2]. [1] https://docs.gtk.org/glib/error-reporting.html [2] https://github.com/apache/subversion/blob/be229fd54f5860b3140831671efbfd3f7f6fbb0b/subversion/include/svn_error_codes.h > Allow multiple errors to be provided and wrapped in a single object, > which is useful in many situations, and add helpers to determine if any > error in the set matches a particular code. The api appears to require the caller know up front how many errors there will be which seems unlikely to be true in practice. I think a more realistic design would allow callers to push errors as they occur and grow the array accordingly. For example ref_transaction_prepare() would want to return a list of errors, one for each ref that it was unable to lock or which did not match the expected value but it would not know up-front how many errors there were going to be. It would be useful to be able to add context to an error as the stack is unwound. For example if unpack_trees() detects that it would overwrite untracked files it prints a error listing those files. The exact formatting of that message depends on the command being run. That is currently handled by calling setup_unpack_trees_porcelain() with the name of the command before calling unpack_trees(). In a world where unpack_trees() returns a error containing the list of files we would want to customize the message by adding some context before converting it to a string. > Additionally, provide error formatting functions that produce a suitable > localized string for ease of use. I share Emily's concern that this function has to know the details of how to format every error. We could mitigate that somewhat using a switch that calls external helper functions that do the actual formatting switch (e.code) { case GIT_ERR_OBJECT_NOT_FOUND: format_object_not_found(buf, e); /* lives in another file */ break; ... I know this is an RFC but I couldn't resist one code comment > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) git_error_free() returns the code as in integer so we don't need ".code" here. Also our coding guidelines would suggest git_error_clear() for the name of that function. In conclusion I like the general idea but have concerns about the verbosity of returning an error struct. It would be helpful to see some examples of this being used to see how it works in practice. Best Wishes Phillip
Hi Oswald On 01/10/2024 16:29, Oswald Buddenhagen wrote: > On Mon, Sep 30, 2024 at 10:03:52PM +0000, brian m. carlson wrote: >> + case GIT_ERR_MULTIPLE: >> + strbuf_addf(buf, _("multiple errors:\n")); >> + for (size_t i = 0; i < me->count; i++) { >> + git_error_strbuf(buf, me->errs[i]); >> + strbuf_addstr(buf, "\n"); >> + } >> + } > > ah. i was wondering how you'd address this ("this" being non-fatal > errors piling up), as i'm facing the same problem in a project of mine. > > the problem is that one can either impose a very rigid formatting (as > you do here), or provide formatting options, which can get arbitrarily > complex. so i wonder whether one shouldn't pursue a hybrid approach > instead: singular return values for low-level functions that either work > or fail fatally, and error callbacks for higher-level functionality? That's a good point - the rigid formatting is easy to implement but is not ideal from the user's point of view. Ideally we'd compose a coherent message with the data from the various errors but as you say that's not trivial. I wonder if there are any good examples out there that we could use as inspiration. The ones I'm aware of (rust's anyhow and svn's svn_error_compose) go for a rigid approach. Glib's GError punts the problem entirely by forbidding multiple errors. Best Wishes Phillip > btw, the switch misses breaks. > >
On 2024-10-01 at 20:31:15, Emily Shaffer wrote: > On Mon, Sep 30, 2024 at 3:04 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > +const char *git_error_strbuf(struct strbuf *buf, struct git_error err) > > Is the idea that we continue to extend `git_error_strbuf` with more > `git_error_code` values as we add this method of error handling across > the codebase? I'm worried it could get quite unwieldy. Or are you > suggesting that we could write `git_error_strbuf_object_store`, > `git_error_strbuf_hook`, etc to keep the code->string conversion > local? That would let us do custom processing of err.meta depending on > context, too, wouldn't it? My intention was to centralize this processing in one place. Part of the problem we have is that we have many different messages for one error code, so `GIT_ERR_OBJECT_NOT_FOUND` might have "object not found - (HEX)" or a variety of other messages, which makes parsing errors difficult. (As I said, we do this at work, and it's quite annoying.) If folks want to do thing differently, we can, but we should avoid losing the consistent error message behaviour. > > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) > > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) > > + > > +#define GIT_ERROR_ERRNO(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? ((e).code & 0xffffffff) : -1) > > +#define GIT_ERROR_GITERR(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? -1 : ((e).code & 0xffffffff)) > > Aurgh, too bad we don't get bitfields before C11. :) (Although I am > not sure if that helps with your register-level optimization at that > point... but it probably helps with readability.) > > But, I do wonder if this gluing-together-two-types-into-one-value > thing may break based on endianness? (And if we care? I don't think we > have any tests running on POWER systems, so maybe this falls under the > umbrella of "you should give us tests if you want us to not break > you"?) I don't think this breaks on big-endian systems because we're always treating it as a `uint64_t` and never converting it into bytes. My first laptop was a PowerPC Mac running Linux and I've owned UltraSPARC hardware in the past, so I'm pretty confident in avoiding writing endianness (and alignment) bugs. I wouldn't have written it in a way that broke on less common hardware even if we don't have tests for it.
On 2024-10-02 at 09:54:52, Phillip Wood wrote: > On 30/09/2024 23:03, brian m. carlson wrote: > > There is work underway to move some of the Git code out into a reusable > > library. In such a case, it's not very desirable to have the library > > code write random errors to standard error, since this is an antipattern > > which annoys terminal users. > > > > Instead, we will expect callers of our library function to return > > errors. > > Isn't it the callers that will expect the function to return an error? Yes, that's correct. I'll fix that. > > To help the libification process and make our code more generally > > maintainable, add an error type. This consists of on 64-bit integer, > > which contains bit flags and a 32-bit code, and a pointer, which depends > > on the code. It is designed to be passed and returned by value, not > > pointer, and it is possible to do so in two registers on 64-bit systems. > > Similar functionality works well for error types in Rust and for the > > standard library's lldiv_t, so this should not pose a problem. > > Part of the reason it works well in rust is that it supports discriminated > unions with pattern matching and has the "?" macro for early returns. In C > the code ends up being quite verbose compared to taking a pointer to error > struct as a function parameter and returning a boolean success/fail flag. > > struct git_error e; > struct object_id oid; > > e = repo_get_oid(r, "HEAD", &oid); > if (!GIT_ERROR_SUCCESS(e)) > return e; > > With a boolean return we can have > > struct object_id oid; > > if (repo_get_oid(r, "HEAD", &oid, e)) > return -1; > > where "e" is a "struct git_error*" passed into the function. Yes, I agree that this is more verbose than in Rust. If we were using Rust (which is a thing I'm working on), then we'd have nicer error handling, but for now we don't. However, Go still uses this kind of error handling, and many people use it every day with this limitation, so I don't think it's too awful for what we're getting. I won't say that Go is my favourite language and I do prefer the less verbose error handling in Rust, but the fact that this design is widely used means that it's at least a defensible decision. > I think the design of the struct is fine. It does mean we need a global list > of error values. GError [1] avoids this by having a "domain" field that is > an interned string so that error codes only need to be unique within a given > domain. I think that for a self-contained project like git a global list is > probably fine - svn does this for example [2]. I think so, too. libgit2 does the same thing, which seems to have worked out okay. > > Allow multiple errors to be provided and wrapped in a single object, > > which is useful in many situations, and add helpers to determine if any > > error in the set matches a particular code. > > The api appears to require the caller know up front how many errors there > will be which seems unlikely to be true in practice. I think a more > realistic design would allow callers to push errors as they occur and grow > the array accordingly. For example ref_transaction_prepare() would want to > return a list of errors, one for each ref that it was unable to lock or > which did not match the expected value but it would not know up-front how > many errors there were going to be. That seems reasonable. We could add helpers to extend the number of errors. > It would be useful to be able to add context to an error as the stack is > unwound. For example if unpack_trees() detects that it would overwrite > untracked files it prints a error listing those files. The exact formatting > of that message depends on the command being run. That is currently handled > by calling setup_unpack_trees_porcelain() with the name of the command > before calling unpack_trees(). In a world where unpack_trees() returns a > error containing the list of files we would want to customize the message by > adding some context before converting it to a string. I think we could do that with either a particular error type (e.g., `GIT_ERR_WOULD_OVERWRITE`) that contains nested errors and text or a generic `GIT_ERR_ANNOTATED`. > > Additionally, provide error formatting functions that produce a suitable > > localized string for ease of use. > > I share Emily's concern that this function has to know the details of how to > format every error. We could mitigate that somewhat using a switch that > calls external helper functions that do the actual formatting > > switch (e.code) { > case GIT_ERR_OBJECT_NOT_FOUND: > format_object_not_found(buf, e); /* lives in another file */ > break; > ... > > I know this is an RFC but I couldn't resist one code comment I'm fine with that approach. This is mostly designed to solicit comments about the overall design, and I don't have a strong opinion about how we format. > > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) > > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) > > git_error_free() returns the code as in integer so we don't need ".code" > here. Also our coding guidelines would suggest git_error_clear() for the > name of that function. Got it. > In conclusion I like the general idea but have concerns about the verbosity > of returning an error struct. It would be helpful to see some examples of > this being used to see how it works in practice. If I send a v2, I'll try to wire up some code so folks can see some examples.
On Wed, Oct 2, 2024 at 6:04 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > On 2024-10-02 at 09:54:52, Phillip Wood wrote: > > Part of the reason it works well in rust is that it supports discriminated > > unions with pattern matching and has the "?" macro for early returns. In C > > the code ends up being quite verbose compared to taking a pointer to error > > struct as a function parameter and returning a boolean success/fail flag. > > > > struct git_error e; > > struct object_id oid; > > > > e = repo_get_oid(r, "HEAD", &oid); > > if (!GIT_ERROR_SUCCESS(e)) > > return e; > > > > With a boolean return we can have > > > > struct object_id oid; > > > > if (repo_get_oid(r, "HEAD", &oid, e)) > > return -1; > > > > where "e" is a "struct git_error*" passed into the function. > > However, Go still uses this kind of error handling, and many people use > it every day with this limitation, so I don't think it's too awful for > what we're getting. I won't say that Go is my favourite language and I > do prefer the less verbose error handling in Rust, but the fact that > this design is widely used means that it's at least a defensible > decision. I'm not sure I understand your response to Phillip's observation. Idiomatic error handling in Go: if oid, err := repo_get_oid(r, "HEAD"); err != nil { return err; } seems much closer to Phillip's more succinct example than to the more verbose example using GIT_ERROR_SUCCESS().
On 2024-10-02 at 22:16:00, Eric Sunshine wrote: > I'm not sure I understand your response to Phillip's observation. > Idiomatic error handling in Go: > > if oid, err := repo_get_oid(r, "HEAD"); err != nil { > return err; > } > > seems much closer to Phillip's more succinct example than to the more > verbose example using GIT_ERROR_SUCCESS(). I don't think that approach works if you want to use `oid`, since it scopes just to the `if` statement (and any relevant `else`). If `oid` were already defined, then you would need to omit the colon in `:=`, or the compiler would complain about "no new variables". That's why I usually see this: oid, err := repo_get_oid(r, "HEAD") if err != nil { return err } which is much more like what Phillip's more verbose example shows.
On Wed, Oct 2, 2024 at 6:24 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > On 2024-10-02 at 22:16:00, Eric Sunshine wrote: > > I'm not sure I understand your response to Phillip's observation. > > Idiomatic error handling in Go: > > > > if oid, err := repo_get_oid(r, "HEAD"); err != nil { > > return err; > > } > > > > seems much closer to Phillip's more succinct example than to the more > > verbose example using GIT_ERROR_SUCCESS(). > > I don't think that approach works if you want to use `oid`, since it > scopes just to the `if` statement (and any relevant `else`). If `oid` > were already defined, then you would need to omit the colon in `:=`, or > the compiler would complain about "no new variables". > > That's why I usually see this: > > oid, err := repo_get_oid(r, "HEAD") > if err != nil { > return err > } > > which is much more like what Phillip's more verbose example shows. Yes, you often see that in Go, as well as: var oid oid_type var err error if oid, err = repo_get_oid(r, "HEAD"); err != nil { return err; } which is very much in line with Phillip's succinct C example. But isn't this all beside the point? Your proposal uses Rust as a model to justify the API choice in this RFC, but Phillip's point was that -- despite being perfectly suitable in Rust -- it is _not_ ergonomic in C. Rather than explaining how the proposed non-ergonomic API is superior to the ergonomic API in Phillip's example, you instead responded by saying that people in some other programming language (Go, in this case) have to deal with non-ergonomic error handling on a daily basis, therefore, a non-ergonomic API is good enough for Git's C programmers. But that is not a very convincing argument for choosing a non-ergonomic API for *this* project which is written in C, especially considering that a more ergonomic API has been presented (and is already in use in parts of the codebase). That's why I said in my original response that I didn't understand your response to Phillip. You seem to be using a non-justification ("other programmers suffer, so Git programmers can suffer too") as a justification for a non-ergonomic design.
Eric Sunshine <sunshine@sunshineco.com> writes: > Your proposal uses Rust as a model to justify the API choice in this > RFC, but Phillip's point was that -- despite being perfectly suitable > in Rust -- it is _not_ ergonomic in C. > ... > That's why I said in my original response that I didn't understand > your response to Phillip. You seem to be using a non-justification > ("other programmers suffer, so Git programmers can suffer too") as a > justification for a non-ergonomic design. The statement may be a bit too harsh, as some may not even realize that they are suffering anymore, after prolonged exposure to these idioms, just like C folks consider it is a fact of life that they have to carefully manage their pointers and the memory they point at. I do agree that "return value with more details in the out parameter whose address is supplied by the caller" is a convention that is easier to grok when written in C. Thanks.
On Wed, Oct 2, 2024 at 2:54 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > > Hi brian > > Thanks for working on this. > > On 30/09/2024 23:03, brian m. carlson wrote: > > There is work underway to move some of the Git code out into a reusable > > library. In such a case, it's not very desirable to have the library > > code write random errors to standard error, since this is an antipattern > > which annoys terminal users. > > > > Instead, we will expect callers of our library function to return > > errors. > > Isn't it the callers that will expect the function to return an error? > > > The reusability of our library will be substantially improved > > if we return typed errors so that users can easily determine what kind > > of error might have occurred and react differently based on different > > contexts. For example, if we are looking up an object in a partial > > clone and it's not found, we might decide to download it, but we might > > return an error to the user if the problem is instead that the revision > > specified is not syntactically valid. > > > > To help the libification process and make our code more generally > > maintainable, add an error type. This consists of on 64-bit integer, > > which contains bit flags and a 32-bit code, and a pointer, which depends > > on the code. It is designed to be passed and returned by value, not > > pointer, and it is possible to do so in two registers on 64-bit systems. > > Similar functionality works well for error types in Rust and for the > > standard library's lldiv_t, so this should not pose a problem. > > Part of the reason it works well in rust is that it supports > discriminated unions with pattern matching and has the "?" macro for > early returns. In C the code ends up being quite verbose compared to > taking a pointer to error struct as a function parameter and returning a > boolean success/fail flag. > > struct git_error e; > struct object_id oid; > > e = repo_get_oid(r, "HEAD", &oid); > if (!GIT_ERROR_SUCCESS(e)) > return e; > > With a boolean return we can have > > struct object_id oid; > > if (repo_get_oid(r, "HEAD", &oid, e)) > return -1; > > where "e" is a "struct git_error*" passed into the function. I actually don't find this complaint all that compelling; it's not hard to write a shorter macro that can be used inline, so we can do things like: ERR_VAR(e); if(ERR(e, repo_get_oid(...)) return e; or even a macro to do the return if desired: ERR_VAR(e); // or, i guess we can be not SO lazy and just write struct git_error e here, whatever :) ) RETURN_IF_ERR(e, repo_get_oid(...)); For better or worse, you can do a lot of things in a macro, so I don't see verboseness as much of an issue because I think we can hide a lot of it this way. > > > Provide the ability to specify either an errno value or a git error code > > as the code. This allows us to use this type generically when handling > > errno values such as processing files, as well as express a rich set of > > possible error codes specific to Git. We pick an unsigned 32-bit code > > because Windows can use the full set of 32 bits in its error values, > > even though most Unix systems use only a small set of codes which > > traditionally start at 1. 32 bits for Git errors also allows us plenty > > of space to expand as we see fit. > > I think the design of the struct is fine. It does mean we need a global > list of error values. GError [1] avoids this by having a "domain" field > that is an interned string so that error codes only need to be unique > within a given domain. I think that for a self-contained project like > git a global list is probably fine - svn does this for example [2]. > > [1] https://docs.gtk.org/glib/error-reporting.html > [2] > https://github.com/apache/subversion/blob/be229fd54f5860b3140831671efbfd3f7f6fbb0b/subversion/include/svn_error_codes.h > > > Allow multiple errors to be provided and wrapped in a single object, > > which is useful in many situations, and add helpers to determine if any > > error in the set matches a particular code. > > The api appears to require the caller know up front how many errors > there will be which seems unlikely to be true in practice. I think a > more realistic design would allow callers to push errors as they occur > and grow the array accordingly. For example ref_transaction_prepare() > would want to return a list of errors, one for each ref that it was > unable to lock or which did not match the expected value but it would > not know up-front how many errors there were going to be. > > It would be useful to be able to add context to an error as the stack is > unwound. For example if unpack_trees() detects that it would overwrite > untracked files it prints a error listing those files. The exact > formatting of that message depends on the command being run. That is > currently handled by calling setup_unpack_trees_porcelain() with the > name of the command before calling unpack_trees(). In a world where > unpack_trees() returns a error containing the list of files we would > want to customize the message by adding some context before converting > it to a string. > > > Additionally, provide error formatting functions that produce a suitable > > localized string for ease of use. > > I share Emily's concern that this function has to know the details of > how to format every error. We could mitigate that somewhat using a > switch that calls external helper functions that do the actual formatting > > switch (e.code) { > case GIT_ERR_OBJECT_NOT_FOUND: > format_object_not_found(buf, e); /* lives in another file */ > break; > ... > > I know this is an RFC but I couldn't resist one code comment > > > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) > > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) > > git_error_free() returns the code as in integer so we don't need ".code" > here. Also our coding guidelines would suggest git_error_clear() for the > name of that function. > > > In conclusion I like the general idea but have concerns about the > verbosity of returning an error struct. It would be helpful to see some > examples of this being used to see how it works in practice. > > Best Wishes > > Phillip >
On Thu, Oct 3, 2024 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > That's why I said in my original response that I didn't understand > > your response to Phillip. You seem to be using a non-justification > > ("other programmers suffer, so Git programmers can suffer too") as a > > justification for a non-ergonomic design. > > The statement may be a bit too harsh [...] Yes, sorry if that came across as harsh, brian. The "Git programmers can suffer too" was my (clearly) poor tongue-in-cheek attempt to soften the earlier part of the email in case it sounded harsh.
On 2024-10-03 at 22:27:29, Eric Sunshine wrote: > On Thu, Oct 3, 2024 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote: > > Eric Sunshine <sunshine@sunshineco.com> writes: > > > That's why I said in my original response that I didn't understand > > > your response to Phillip. You seem to be using a non-justification > > > ("other programmers suffer, so Git programmers can suffer too") as a > > > justification for a non-ergonomic design. > > > > The statement may be a bit too harsh [...] > > Yes, sorry if that came across as harsh, brian. The "Git programmers > can suffer too" was my (clearly) poor tongue-in-cheek attempt to > soften the earlier part of the email in case it sounded harsh. I'm definitely not offended. We can certainly disagree on the merits of the proposal; hopefully we'll find some solution that the project generally approves of.
Hi Emily On 03/10/2024 17:17, Emily Shaffer wrote: > On Wed, Oct 2, 2024 at 2:54 AM Phillip Wood <phillip.wood123@gmail.com> wrote: >> >> Part of the reason it works well in rust is that it supports >> discriminated unions with pattern matching and has the "?" macro for >> early returns. In C the code ends up being quite verbose compared to >> taking a pointer to error struct as a function parameter and returning a >> boolean success/fail flag. >> >> struct git_error e; >> struct object_id oid; >> >> e = repo_get_oid(r, "HEAD", &oid); >> if (!GIT_ERROR_SUCCESS(e)) >> return e; >> >> With a boolean return we can have >> >> struct object_id oid; >> >> if (repo_get_oid(r, "HEAD", &oid, e)) >> return -1; >> >> where "e" is a "struct git_error*" passed into the function. > > I actually don't find this complaint all that compelling; it's not > hard to write a shorter macro that can be used inline, so we can do > things like: > > ERR_VAR(e); > if(ERR(e, repo_get_oid(...)) > return e; Right, but what's the advantage over passing the error struct as a function parameter? It feels like we're adding extra complexity and hiding the assignment to "e" to work around the inconvenience of returning a struct rather than a flag. Is there some other advantage to returning a struct that makes this worthwhile? Best Wishes Phillip > or even a macro to do the return if desired: > > ERR_VAR(e); // or, i guess we can be not SO lazy and just write > struct git_error e here, whatever :) ) > RETURN_IF_ERR(e, repo_get_oid(...)); > > For better or worse, you can do a lot of things in a macro, so I don't > see verboseness as much of an issue because I think we can hide a lot > of it this way. > >> >>> Provide the ability to specify either an errno value or a git error code >>> as the code. This allows us to use this type generically when handling >>> errno values such as processing files, as well as express a rich set of >>> possible error codes specific to Git. We pick an unsigned 32-bit code >>> because Windows can use the full set of 32 bits in its error values, >>> even though most Unix systems use only a small set of codes which >>> traditionally start at 1. 32 bits for Git errors also allows us plenty >>> of space to expand as we see fit. >> >> I think the design of the struct is fine. It does mean we need a global >> list of error values. GError [1] avoids this by having a "domain" field >> that is an interned string so that error codes only need to be unique >> within a given domain. I think that for a self-contained project like >> git a global list is probably fine - svn does this for example [2]. >> >> [1] https://docs.gtk.org/glib/error-reporting.html >> [2] >> https://github.com/apache/subversion/blob/be229fd54f5860b3140831671efbfd3f7f6fbb0b/subversion/include/svn_error_codes.h >> >>> Allow multiple errors to be provided and wrapped in a single object, >>> which is useful in many situations, and add helpers to determine if any >>> error in the set matches a particular code. >> >> The api appears to require the caller know up front how many errors >> there will be which seems unlikely to be true in practice. I think a >> more realistic design would allow callers to push errors as they occur >> and grow the array accordingly. For example ref_transaction_prepare() >> would want to return a list of errors, one for each ref that it was >> unable to lock or which did not match the expected value but it would >> not know up-front how many errors there were going to be. >> >> It would be useful to be able to add context to an error as the stack is >> unwound. For example if unpack_trees() detects that it would overwrite >> untracked files it prints a error listing those files. The exact >> formatting of that message depends on the command being run. That is >> currently handled by calling setup_unpack_trees_porcelain() with the >> name of the command before calling unpack_trees(). In a world where >> unpack_trees() returns a error containing the list of files we would >> want to customize the message by adding some context before converting >> it to a string. >> >>> Additionally, provide error formatting functions that produce a suitable >>> localized string for ease of use. >> >> I share Emily's concern that this function has to know the details of >> how to format every error. We could mitigate that somewhat using a >> switch that calls external helper functions that do the actual formatting >> >> switch (e.code) { >> case GIT_ERR_OBJECT_NOT_FOUND: >> format_object_not_found(buf, e); /* lives in another file */ >> break; >> ... >> >> I know this is an RFC but I couldn't resist one code comment >> >>> +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) >>> +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) >> >> git_error_free() returns the code as in integer so we don't need ".code" >> here. Also our coding guidelines would suggest git_error_clear() for the >> name of that function. >> >> >> In conclusion I like the general idea but have concerns about the >> verbosity of returning an error struct. It would be helpful to see some >> examples of this being used to see how it works in practice. >> >> Best Wishes >> >> Phillip >>
Hi brian On 02/10/2024 23:04, brian m. carlson wrote: > On 2024-10-02 at 09:54:52, Phillip Wood wrote: >> On 30/09/2024 23:03, brian m. carlson wrote: >> Part of the reason it works well in rust is that it supports discriminated >> unions with pattern matching and has the "?" macro for early returns. In C >> the code ends up being quite verbose compared to taking a pointer to error >> struct as a function parameter and returning a boolean success/fail flag. >> >> struct git_error e; >> struct object_id oid; >> >> e = repo_get_oid(r, "HEAD", &oid); >> if (!GIT_ERROR_SUCCESS(e)) >> return e; >> >> With a boolean return we can have >> >> struct object_id oid; >> >> if (repo_get_oid(r, "HEAD", &oid, e)) >> return -1; >> >> where "e" is a "struct git_error*" passed into the function. > > Yes, I agree that this is more verbose than in Rust. If we were using > Rust (which is a thing I'm working on), then we'd have nicer error > handling, but for now we don't. > > However, Go still uses this kind of error handling, and many people use > it every day with this limitation, so I don't think it's too awful for > what we're getting. I feel like I'm missing something - what is the advantage of returning an error struct rather than passing it as a parameter that makes the inconvenience of handling the return value worth while? >> In conclusion I like the general idea but have concerns about the verbosity >> of returning an error struct. It would be helpful to see some examples of >> this being used to see how it works in practice. > > If I send a v2, I'll try to wire up some code so folks can see some > examples. I think that would be really helpful. A couple of examples in the cover letter showing how you'd convert some function from our codebase would give us an idea of what using this api would look like in practice. Best Wishes Phillip
This issue reminded me of a system I used for quite a long time, quite a long time ago. Details of errors were handled using an "Auxiliary Stack". The caller receives a simple result value indicating Ok or error. If there is an error then the part of the system that instigated the error may create a frame on the aux stack and add relevant data to it. Then the initiating caller can just unstack any frames and display, or otherwise handle, (or completely ignore) the data therein. If a function detected a problem somewhere down the call stack it could stack whatever it liked, and then abort. Higher levels could add more frames if they wanted. Although I don't now remember the details it could have been that the convention was to stack a formatting sting together with a series of values. Then at the top level use a formatter function to turn all that into a single actual string to output (This was Pascal so didn't have C-style formatted string handling). This means that the only common data type required is a standard OK/Fail value (original was in Pascal so had strongly-typed Boolean for this). (Implementation of aux stack was in assembler, callers were Pascal). There is no need for a special return type. Integer or some sort of Boolean would suffice. There is no need for a special variable referencing an error status block to be passed into every call. This implicitly only allows one set of error information to be returned per call - aux tack could have several. There is no need for a common table of error code numbers agreed by all parts of the system. There is no need for a system of registered messages, or an Event Log (eg Windows mc, registry, RegisterEventSource etc). Regards, Richard.
On Mon, Sep 30, 2024 at 11:35:50PM +0000, brian m. carlson wrote: > On 2024-09-30 at 22:44:37, Junio C Hamano wrote: > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > .... It is designed to be passed and returned by value, not > > > pointer, and it is possible to do so in two registers on 64-bit systems. > > > Similar functionality works well for error types in Rust and for the > > > standard library's lldiv_t, so this should not pose a problem. > > > > It should not, in the sense that "any reasonable platform should be > > able to pass two 64-bit word in a structure by value", but isn't it > > optimizing for a wrong (i.e. have error) case? In the case where > > there is no error, a "negative return is error, zero is success", > > with a pointer to "more detailed error info, in case the call > > resulted in an error", would let us take branch based on a zero-ness > > check on an integer in a machine-natural word, without even looking > > at these two words in the struct. > > We can adjust the first word so that it's always zero on success, in > which case, because it's returned in two registers, the processor will > be able to branch on a zero-ness check on one of those registers. (We > can even optimize the check by looking at the low 32 bits, which will do > the same for 32-bit machines as well.) The performance benefit will be > the same, and I should note that Rust does this kind of thing without a > problem. I was wondering the same here, also because having to write `git_error_ok()` is a bit unwieldy. One of my thoughts in this context was to not be shy of allocating the error structures such that we don't have to pass by value, but instead pass by pointer. It also gives us a bit more flexibility with the error structure itself, as we don't have to optimize for size anymore. Or at least not to the extent as we'd have to do with the current proposal. The obvious problem of course is if we're running out of memory. But I think we can easily special-case this and return a statically-allocated error specific to that situation, where `git_error_free()` would know not to free it. Patrick
diff --git a/Makefile b/Makefile index 7344a7f725..5d9bf992e9 100644 --- a/Makefile +++ b/Makefile @@ -1013,6 +1013,7 @@ LIB_OBJS += dir.o LIB_OBJS += editor.o LIB_OBJS += entry.o LIB_OBJS += environment.o +LIB_OBJS += error.o LIB_OBJS += ewah/bitmap.o LIB_OBJS += ewah/ewah_bitmap.o LIB_OBJS += ewah/ewah_io.o diff --git a/error.c b/error.c new file mode 100644 index 0000000000..713bc42187 --- /dev/null +++ b/error.c @@ -0,0 +1,43 @@ +#include "git-compat-util.h" +#include "gettext.h" +#include "error.h" +#include "hex.h" +#include "strbuf.h" + +const char *git_error_string(struct git_error err) +{ + struct strbuf buf = STRBUF_INIT; + if (!git_error_strbuf(&buf, err)) + return NULL; + return strbuf_detach(&buf, NULL); +} + +const char *git_error_strbuf(struct strbuf *buf, struct git_error err) +{ + if (GIT_ERROR_SUCCESS(err)) { + return NULL; + } else if (GIT_ERROR_ERRNO(err) != -1) { + return xstrdup(strerror(GIT_ERROR_ERRNO(err))); + } else { + struct git_error_multiple *me = err.meta; + switch (GIT_ERROR_GITERR(err)) { + case GIT_ERR_OBJECT_NOT_FOUND: + if (err.meta) + strbuf_addf(buf, _("object not found: %s"), oid_to_hex(err.meta)); + else + strbuf_addf(buf, _("object not found")); + case GIT_ERR_NULL_OID: + if (err.meta) + strbuf_addf(buf, _("null object ID not allowed in this context: %s"), (char *)err.meta); + else + strbuf_addf(buf, _("null object ID not allowed")); + case GIT_ERR_MULTIPLE: + strbuf_addf(buf, _("multiple errors:\n")); + for (size_t i = 0; i < me->count; i++) { + git_error_strbuf(buf, me->errs[i]); + strbuf_addstr(buf, "\n"); + } + } + return buf->buf; + } +} diff --git a/error.h b/error.h new file mode 100644 index 0000000000..485cca99e0 --- /dev/null +++ b/error.h @@ -0,0 +1,168 @@ +#ifndef ERROR_H +#define ERROR_H + +#include "git-compat-util.h" + +/* Set if this value is initialized. */ +#define GIT_ERROR_BIT_INIT (1ULL << 63) +/* Set if the code is an errno code, clear if it's a git error code. */ +#define GIT_ERROR_BIT_ERRNO (1ULL << 62) +/* + * Set if the memory in meta should be freed; otherwise, it's statically + * allocated. + */ +#define GIT_ERROR_BIT_ALLOC (1ULL << 61) +/* + * Set if the memory in meta is a C string; otherwise, it's a metadata struct. + */ +#define GIT_ERROR_BIT_MSG (1ULL << 60) + +#define GIT_ERROR_BIT_MASK (0xf << 60) + +#define GIT_ERROR_OK (git_error_ok()) + +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT)) +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT) + +#define GIT_ERROR_ERRNO(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? ((e).code & 0xffffffff) : -1) +#define GIT_ERROR_GITERR(e) (((e).code & GIT_ERROR_BIT_ERRNO) ? -1 : ((e).code & 0xffffffff)) + +/* + * A value representing an error in Git. + */ +struct git_error { + uint64_t code; + void *meta; +}; + +struct git_error_multiple { + struct git_error *errs; + size_t count; +}; + +enum git_error_code { + /* The operation was a success. */ + GIT_ERR_SUCCESS = 0, + /* An object ID was provided, but it was not found. + * + * meta will be NULL or a pointer to struct object ID. + */ + GIT_ERR_OBJECT_NOT_FOUND = 1, + /* + * An object ID was provided, but it is all zeros and that is not + * allowed. + * + * meta will be NULL or a message explaining the context. + */ + GIT_ERR_NULL_OID = 2, + /* + * Multiple errors occurred. + * + * meta must be a pointer to struct git_error_multiple. + */ + GIT_ERR_MULTIPLE = 3, +}; + +const char *git_error_string(struct git_error err); +const char *git_error_strbuf(struct strbuf *buf, struct git_error err); + +/* + * A successful error status. + */ +static inline struct git_error git_error_ok(void) { + struct git_error e = { + .code = 0 | GIT_ERROR_BIT_INIT, + .meta = NULL, + }; + return e; +} + +static inline struct git_error git_error_new_errno(uint32_t errnoval, const char *msg, int to_free) { + struct git_error e = { + .code = errnoval | GIT_ERROR_BIT_INIT | GIT_ERROR_BIT_ERRNO | + GIT_ERROR_BIT_MSG | (to_free ? GIT_ERROR_BIT_ALLOC : 0), + .meta = (void *)msg, + }; + return e; +} + +static inline struct git_error git_error_new_git(uint32_t gitcode, const char *msg, int to_free) { + struct git_error e = { + .code = gitcode | GIT_ERROR_BIT_INIT | + GIT_ERROR_BIT_MSG | (to_free ? GIT_ERROR_BIT_ALLOC : 0), + .meta = (void *)msg, + }; + return e; +} + +static inline struct git_error git_error_new_simple(uint32_t gitcode) { + struct git_error e = { + .code = gitcode | GIT_ERROR_BIT_INIT, + .meta = NULL, + }; + return e; +} + +static inline struct git_error git_error_new_multiple(struct git_error *errs, size_t count) +{ + struct git_error_multiple *me = xmalloc(sizeof(*me)); + struct git_error e = { + .code = GIT_ERR_MULTIPLE | GIT_ERROR_BIT_INIT | GIT_ERROR_BIT_ALLOC, + .meta = me, + }; + me->errs = errs; + me->count = count; + return e; +} + +/* + * If this is a git error and the code matches the given code, or if this is a + * multiple error and any of the contained errors are a git error whose code + * matches, returns a pointer to that error. If there is no match, returns + * NULL. + */ +static inline struct git_error *git_error_is_git(struct git_error *e, int code) { + int64_t giterr = GIT_ERROR_GITERR(*e); + if (giterr == code) { + return e; + } else if (giterr == GIT_ERR_MULTIPLE) { + struct git_error_multiple *me = e->meta; + for (size_t i = 0; i < me->count; i++) + return git_error_is_git(me->errs + i, code); + return NULL; + } else { + return NULL; + } +} + +/* + * If this is an errno error and the code matches the given code, or if this is + * a multiple error and any of the contained errors are an errno error whose + * code matches, returns a pointer to that error. Otherwise, returns NULL. + */ +static inline struct git_error *git_error_is_errno(struct git_error *e, int code) { + int64_t giterr = GIT_ERROR_GITERR(*e); + if (GIT_ERROR_ERRNO(*e) == code) { + return e; + } else if (giterr == GIT_ERR_MULTIPLE) { + struct git_error_multiple *me = e->meta; + for (size_t i = 0; i < me->count; i++) + return git_error_is_errno(me->errs + i, code); + return NULL; + } else { + return NULL; + } +} + +/* Frees and deinitializes this error structure. */ +static inline uint64_t git_error_free(struct git_error *e) +{ + uint64_t code = e->code; + if (e->code & GIT_ERROR_BIT_ALLOC) + free(e->meta); + e->code = 0; + e->meta = NULL; + return code; +} + +#endif
There is work underway to move some of the Git code out into a reusable library. In such a case, it's not very desirable to have the library code write random errors to standard error, since this is an antipattern which annoys terminal users. Instead, we will expect callers of our library function to return errors. The reusability of our library will be substantially improved if we return typed errors so that users can easily determine what kind of error might have occurred and react differently based on different contexts. For example, if we are looking up an object in a partial clone and it's not found, we might decide to download it, but we might return an error to the user if the problem is instead that the revision specified is not syntactically valid. To help the libification process and make our code more generally maintainable, add an error type. This consists of on 64-bit integer, which contains bit flags and a 32-bit code, and a pointer, which depends on the code. It is designed to be passed and returned by value, not pointer, and it is possible to do so in two registers on 64-bit systems. Similar functionality works well for error types in Rust and for the standard library's lldiv_t, so this should not pose a problem. Provide the ability to specify either an errno value or a git error code as the code. This allows us to use this type generically when handling errno values such as processing files, as well as express a rich set of possible error codes specific to Git. We pick an unsigned 32-bit code because Windows can use the full set of 32 bits in its error values, even though most Unix systems use only a small set of codes which traditionally start at 1. 32 bits for Git errors also allows us plenty of space to expand as we see fit. Allow multiple errors to be provided and wrapped in a single object, which is useful in many situations, and add helpers to determine if any error in the set matches a particular code. Additionally, provide error formatting functions that produce a suitable localized string for ease of use. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- Makefile | 1 + error.c | 43 ++++++++++++++ error.h | 168 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 212 insertions(+) create mode 100644 error.c create mode 100644 error.h