Message ID | patch-1.1-c317e6e125e-20210921T124416Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | const-correctness in grep.c | expand |
On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > I think that generally git's codebase could use going beyond just > "const char *" when a "const char * const" would suffice, for some > reason we seem to mostly use it for the static usage variables. I didn't dig up the references in the list archive, but I feel like we've had this discussion long ago. One of the reasons not to do so is that it pollutes the function's interface with internal details. The caller does not care whether the function is going to modify the pointer itself, because it is passed by value. You could apply the same logic that we should be passing "const int", and so on. -Peff
On Tue, Sep 21 2021, Jeff King wrote: > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I think that generally git's codebase could use going beyond just >> "const char *" when a "const char * const" would suffice, for some >> reason we seem to mostly use it for the static usage variables. > > I didn't dig up the references in the list archive, but I feel like > we've had this discussion long ago. One of the reasons not to do so is > that it pollutes the function's interface with internal details.[...] Are there cases in my conversion where the caller has to do anything special that they didn't before? These are also all static functions, so it's all internal details exported to nobody. > The caller does not care whether the function is going to modify the > pointer itself, because it is passed by value. Sure, it's for increased clarity of reading te function in question, not its although in one case we pass a ** so you can see what exactly we modify both from the callers and function perspective. > You could apply the same logic that we should be passing "const int", > and so on. Sure, in general I think churn like that isn't worth it, and "const char *" is usually good enough, even if it could be "const char * const" or whatever. I think it makes senes in this specific case where you need to read one function after another with "bol" and "eol" variables, with some treating their copy as immutable, others not etc. Particularly if we'd convert it to some other style (e.g. str/len) we can see if an entire chain of functions can all be safely changed over.
On Tue, Sep 21, 2021 at 05:17:57PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 21 2021, Jeff King wrote: > > > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > >> I think that generally git's codebase could use going beyond just > >> "const char *" when a "const char * const" would suffice, for some > >> reason we seem to mostly use it for the static usage variables. > > > > I didn't dig up the references in the list archive, but I feel like > > we've had this discussion long ago. One of the reasons not to do so is > > that it pollutes the function's interface with internal details.[...] > > Are there cases in my conversion where the caller has to do anything > special that they didn't before? These are also all static functions, so > it's all internal details exported to nobody. No, they don't have to do anything differently. I just meant that it clutters the interface when a human is reading it. -Peff
Jeff King <peff@peff.net> writes: > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I think that generally git's codebase could use going beyond just >> "const char *" when a "const char * const" would suffice, for some >> reason we seem to mostly use it for the static usage variables. > > I didn't dig up the references in the list archive, but I feel like > we've had this discussion long ago. One of the reasons not to do so is > that it pollutes the function's interface with internal details. The > caller does not care whether the function is going to modify the pointer > itself, because it is passed by value. You could apply the same logic > that we should be passing "const int", and so on. Yes. "This pointer is not modified" is a good thing to have inside an implementation, but the callers should not have to care. Thanks.
On Tue, Sep 21 2021, Jeff King wrote: > On Tue, Sep 21, 2021 at 05:17:57PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> On Tue, Sep 21 2021, Jeff King wrote: >> >> > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote: >> > >> >> I think that generally git's codebase could use going beyond just >> >> "const char *" when a "const char * const" would suffice, for some >> >> reason we seem to mostly use it for the static usage variables. >> > >> > I didn't dig up the references in the list archive, but I feel like >> > we've had this discussion long ago. One of the reasons not to do so is >> > that it pollutes the function's interface with internal details.[...] >> >> Are there cases in my conversion where the caller has to do anything >> special that they didn't before? These are also all static functions, so >> it's all internal details exported to nobody. > > No, they don't have to do anything differently. I just meant that it > clutters the interface when a human is reading it. Fair enough, to this human reading it (and I'm not new to the grep.c code) I find myself trying and failing to mentally track what gets modified where, but anyway, I think that point is understood and I won't argue for it ad nauseum. Just replied to say to Junio's <xmqqh7ec77s3.fsf@gitster.g> in the side-thread: > Yes. "This pointer is not modified" is a good thing to have inside > an implementation, but the callers should not have to care. That I just don't understand, i.e. how the variable is defined in this 6/5 pertains to how it gets used /inside/ the function in this case, the caller code doesn't have to care, but perhaps I'm missing something...
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > That I just don't understand, i.e. how the variable is defined in this > 6/5 pertains to how it gets used /inside/ the function in this case, the > caller code doesn't have to care, but perhaps I'm missing something... -static struct grep_pat *create_grep_pat(const char *pat, size_t patlen, +static struct grep_pat *create_grep_pat(const char *const pat, size_t patlen, const char *origin, int no, enum grep_pat_token t, enum grep_header_field field) Because the function signature is visible to the caller of this function, it sees that the callee, as part of its implementation detail, keeps "pat" pointer pointing at the same location. That is not something the caller needs to care, even though the caller may deeply care that the memory locations "pat" points at will not be changed via the pointer by calling this function. On the other hand @@ -1438,7 +1439,7 @@ static int look_ahead(struct grep_opt *opt, const char **bol_p) { unsigned lno = *lno_p; - const char *bol = *bol_p; + const char * const bol = *bol_p; struct grep_pat *p; const char *sp, *last_bol; regoff_t earliest = -1; This one is totally invisible to the caller of the function, and is a good documentation for those who read the implementation of it. Thanks.
diff --git a/grep.c b/grep.c index 14fe8a0fd23..4e266769931 100644 --- a/grep.c +++ b/grep.c @@ -206,7 +206,7 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt); } -static struct grep_pat *create_grep_pat(const char *pat, size_t patlen, +static struct grep_pat *create_grep_pat(const char *const pat, size_t patlen, const char *origin, int no, enum grep_pat_token t, enum grep_header_field field) @@ -436,8 +436,8 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } } -static int pcre2match(struct grep_pat *p, const char *line, const char *eol, - regmatch_t *match, int eflags) +static int pcre2match(struct grep_pat *p, const char * const line, + const char * const eol, regmatch_t *match, int eflags) { int ret, flags = 0; PCRE2_SIZE *ovector; @@ -489,8 +489,9 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); } -static int pcre2match(struct grep_pat *p, const char *line, const char *eol, - regmatch_t *match, int eflags) +static int pcre2match(struct grep_pat *p, + const char * const line, const char * const eol, + regmatch_t *match, int eflags) { return 1; } @@ -909,7 +910,7 @@ static void show_name(struct grep_opt *opt, const char *name) } static int patmatch(struct grep_pat *p, - const char *line, const char *eol, + const char * const line, const char * const eol, regmatch_t *match, int eflags) { int hit; @@ -923,7 +924,7 @@ static int patmatch(struct grep_pat *p, return hit; } -static void strip_timestamp(const char *bol, const char **eol_p) +static void strip_timestamp(const char * const bol, const char ** const eol_p) { const char *eol = *eol_p; @@ -1026,7 +1027,7 @@ static int match_one_pattern(struct grep_pat *p, } static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, - const char *bol, const char *eol, + const char *bol, const char * const eol, enum grep_context ctx, ssize_t *col, ssize_t *icol, int collect_hits) { @@ -1095,7 +1096,7 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, } static int match_expr(struct grep_opt *opt, - const char *bol, const char *eol, + const char *bol, const char * const eol, enum grep_context ctx, ssize_t *col, ssize_t *icol, int collect_hits) { @@ -1104,7 +1105,7 @@ static int match_expr(struct grep_opt *opt, } static int match_line(struct grep_opt *opt, - const char *bol, const char *eol, + const char *bol, const char * const eol, ssize_t *col, ssize_t *icol, enum grep_context ctx, int collect_hits) { @@ -1137,7 +1138,7 @@ static int match_line(struct grep_opt *opt, } static int match_next_pattern(struct grep_pat *p, - const char *bol, const char *eol, + const char * const bol, const char * const eol, enum grep_context ctx, regmatch_t *pmatch, int eflags) { @@ -1159,7 +1160,7 @@ static int match_next_pattern(struct grep_pat *p, } static int next_match(struct grep_opt *opt, - const char *bol, const char *eol, + const char * const bol, const char * const eol, enum grep_context ctx, regmatch_t *pmatch, int eflags) { struct grep_pat *p; @@ -1216,7 +1217,7 @@ static void show_line_header(struct grep_opt *opt, const char *name, } static void show_line(struct grep_opt *opt, - const char *bol, const char *eol, + const char *bol, const char * const eol, const char *name, unsigned lno, ssize_t cno, char sign) { int rest = eol - bol; @@ -1306,7 +1307,7 @@ static inline void grep_attr_unlock(void) } static int match_funcname(struct grep_opt *opt, struct grep_source *gs, - const char *bol, const char *eol) + const char * const bol, const char * const eol) { xdemitconf_t *xecfg = opt->priv; if (xecfg && !xecfg->find_func) { @@ -1336,7 +1337,7 @@ static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs, const char *bol, unsigned lno) { while (bol > gs->buf) { - const char *eol = --bol; + const char * const eol = --bol; while (bol > gs->buf && bol[-1] != '\n') bol--; @@ -1352,7 +1353,7 @@ static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs, } } -static int is_empty_line(const char *bol, const char *eol); +static int is_empty_line(const char * const bol, const char * const eol); static void show_pre_context(struct grep_opt *opt, struct grep_source *gs, const char *bol, const char *end, unsigned lno) @@ -1375,8 +1376,8 @@ static void show_pre_context(struct grep_opt *opt, struct grep_source *gs, /* Rewind. */ while (bol > gs->buf && cur > from) { - const char *next_bol = bol; - const char *eol = --bol; + const char * const next_bol = bol; + const char * const eol = --bol; while (bol > gs->buf && bol[-1] != '\n') bol--; @@ -1438,7 +1439,7 @@ static int look_ahead(struct grep_opt *opt, const char **bol_p) { unsigned lno = *lno_p; - const char *bol = *bol_p; + const char * const bol = *bol_p; struct grep_pat *p; const char *sp, *last_bol; regoff_t earliest = -1; @@ -1533,7 +1534,7 @@ static int fill_textconv_grep(struct repository *r, return 0; } -static int is_empty_line(const char *bol, const char *eol) +static int is_empty_line(const char *bol, const char * const eol) { while (bol < eol && isspace(*bol)) bol++;
The control flow of how "bol" and "eol" are passed through the various functions in grep.c can be hard to follow, because in some functions we want e.g. an "eol pointer to const char", but in others we can do with a "const pointer to const char", likewise for the "const char **" case. I think that it would be good to eventually change this code to mostly take a "const char *const"/"const size_t" pair, as that's what both regexec_buf() and the equivalent PCRE function consume[1], now we convert to length with "eol - bol" in several places. For any such future conversion, and for reading the code, it'll be much easier if we're at a starting point of knowing what pointers we modify where, so let's have the compiler help us with that. This change was made by converting these to the strictest possible "const" forms and seeing where we had errors, note that the lone "const char ** const" can only be that strict, it can't be a "const char * const * const". This change doesn't matter for the compiler's optimization, both gcc and clang generate practically (only address differences) the same code under both -O0 and -O3. 1. https://lore.kernel.org/git/87czp29l2c.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- A suggested patch either on top of Jeff King's <YUlVZk1xXulAqdef@coredump.intra.peff.net> or even for squashing into his series. I think that generally git's codebase could use going beyond just "const char *" when a "const char * const" would suffice, for some reason we seem to mostly use it for the static usage variables. In this particular case I think it makes the flow of grep.c much easier to reason about. You can immediately see which functions are twiddling the bol/eof pointers and which aren't. grep.c | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-)