diff mbox series

[6/5] grep.c: mark eol/bol and derived as "const char * const"

Message ID patch-1.1-c317e6e125e-20210921T124416Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series const-correctness in grep.c | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 21, 2021, 12:45 p.m. UTC
The control flow of how "bol" and "eol" are passed through the various
functions in grep.c can be hard to follow, because in some functions
we want e.g. an "eol pointer to const char", but in others we can do
with a "const pointer to const char", likewise for the "const char **"
case.

I think that it would be good to eventually change this code to mostly
take a "const char *const"/"const size_t" pair, as that's what both
regexec_buf() and the equivalent PCRE function consume[1], now we
convert to length with "eol - bol" in several places.

For any such future conversion, and for reading the code, it'll be
much easier if we're at a starting point of knowing what pointers we
modify where, so let's have the compiler help us with that.

This change was made by converting these to the strictest possible
"const" forms and seeing where we had errors, note that the lone
"const char ** const" can only be that strict, it can't be a "const
char * const * const".

This change doesn't matter for the compiler's optimization, both gcc
and clang generate practically (only address differences) the same
code under both -O0 and -O3.

1. https://lore.kernel.org/git/87czp29l2c.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

A suggested patch either on top of Jeff King's
<YUlVZk1xXulAqdef@coredump.intra.peff.net> or even for squashing into
his series.

I think that generally git's codebase could use going beyond just
"const char *" when a "const char * const" would suffice, for some
reason we seem to mostly use it for the static usage variables.

In this particular case I think it makes the flow of grep.c much
easier to reason about. You can immediately see which functions are
twiddling the bol/eof pointers and which aren't.

 grep.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

Comments

Jeff King Sept. 21, 2021, 2:53 p.m. UTC | #1
On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I think that generally git's codebase could use going beyond just
> "const char *" when a "const char * const" would suffice, for some
> reason we seem to mostly use it for the static usage variables.

I didn't dig up the references in the list archive, but I feel like
we've had this discussion long ago. One of the reasons not to do so is
that it pollutes the function's interface with internal details. The
caller does not care whether the function is going to modify the pointer
itself, because it is passed by value. You could apply the same logic
that we should be passing "const int", and so on.

-Peff
Ævar Arnfjörð Bjarmason Sept. 21, 2021, 3:17 p.m. UTC | #2
On Tue, Sep 21 2021, Jeff King wrote:

> On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I think that generally git's codebase could use going beyond just
>> "const char *" when a "const char * const" would suffice, for some
>> reason we seem to mostly use it for the static usage variables.
>
> I didn't dig up the references in the list archive, but I feel like
> we've had this discussion long ago. One of the reasons not to do so is
> that it pollutes the function's interface with internal details.[...]

Are there cases in my conversion where the caller has to do anything
special that they didn't before? These are also all static functions, so
it's all internal details exported to nobody.

> The caller does not care whether the function is going to modify the
> pointer itself, because it is passed by value.

Sure, it's for increased clarity of reading te function in question, not
its although in one case we pass a ** so you can see what exactly we
modify both from the callers and function perspective.

> You could apply the same logic that we should be passing "const int",
> and so on.

Sure, in general I think churn like that isn't worth it, and "const char
*" is usually good enough, even if it could be "const char * const" or
whatever.

I think it makes senes in this specific case where you need to read one
function after another with "bol" and "eol" variables, with some
treating their copy as immutable, others not etc. Particularly if we'd
convert it to some other style (e.g. str/len) we can see if an entire
chain of functions can all be safely changed over.
Jeff King Sept. 21, 2021, 7:18 p.m. UTC | #3
On Tue, Sep 21, 2021 at 05:17:57PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Tue, Sep 21 2021, Jeff King wrote:
> 
> > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> >> I think that generally git's codebase could use going beyond just
> >> "const char *" when a "const char * const" would suffice, for some
> >> reason we seem to mostly use it for the static usage variables.
> >
> > I didn't dig up the references in the list archive, but I feel like
> > we've had this discussion long ago. One of the reasons not to do so is
> > that it pollutes the function's interface with internal details.[...]
> 
> Are there cases in my conversion where the caller has to do anything
> special that they didn't before? These are also all static functions, so
> it's all internal details exported to nobody.

No, they don't have to do anything differently. I just meant that it
clutters the interface when a human is reading it.

-Peff
Junio C Hamano Sept. 22, 2021, 7:02 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I think that generally git's codebase could use going beyond just
>> "const char *" when a "const char * const" would suffice, for some
>> reason we seem to mostly use it for the static usage variables.
>
> I didn't dig up the references in the list archive, but I feel like
> we've had this discussion long ago. One of the reasons not to do so is
> that it pollutes the function's interface with internal details. The
> caller does not care whether the function is going to modify the pointer
> itself, because it is passed by value. You could apply the same logic
> that we should be passing "const int", and so on.

Yes.  "This pointer is not modified" is a good thing to have inside
an implementation, but the callers should not have to care.

Thanks.
Ævar Arnfjörð Bjarmason Sept. 23, 2021, 1:56 p.m. UTC | #5
On Tue, Sep 21 2021, Jeff King wrote:

> On Tue, Sep 21, 2021 at 05:17:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> 
>> On Tue, Sep 21 2021, Jeff King wrote:
>> 
>> > On Tue, Sep 21, 2021 at 02:45:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> I think that generally git's codebase could use going beyond just
>> >> "const char *" when a "const char * const" would suffice, for some
>> >> reason we seem to mostly use it for the static usage variables.
>> >
>> > I didn't dig up the references in the list archive, but I feel like
>> > we've had this discussion long ago. One of the reasons not to do so is
>> > that it pollutes the function's interface with internal details.[...]
>> 
>> Are there cases in my conversion where the caller has to do anything
>> special that they didn't before? These are also all static functions, so
>> it's all internal details exported to nobody.
>
> No, they don't have to do anything differently. I just meant that it
> clutters the interface when a human is reading it.

Fair enough, to this human reading it (and I'm not new to the grep.c
code) I find myself trying and failing to mentally track what gets
modified where, but anyway, I think that point is understood and I won't
argue for it ad nauseum.

Just replied to say to Junio's <xmqqh7ec77s3.fsf@gitster.g> in the
side-thread:

> Yes.  "This pointer is not modified" is a good thing to have inside
> an implementation, but the callers should not have to care.

That I just don't understand, i.e. how the variable is defined in this
6/5 pertains to how it gets used /inside/ the function in this case, the
caller code doesn't have to care, but perhaps I'm missing something...
Junio C Hamano Sept. 24, 2021, 4:22 a.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> That I just don't understand, i.e. how the variable is defined in this
> 6/5 pertains to how it gets used /inside/ the function in this case, the
> caller code doesn't have to care, but perhaps I'm missing something...


-static struct grep_pat *create_grep_pat(const char *pat, size_t patlen,
+static struct grep_pat *create_grep_pat(const char *const pat, size_t patlen,
 					const char *origin, int no,
 					enum grep_pat_token t,
 					enum grep_header_field field)

Because the function signature is visible to the caller of this
function, it sees that the callee, as part of its implementation
detail, keeps "pat" pointer pointing at the same location.  That is
not something the caller needs to care, even though the caller may
deeply care that the memory locations "pat" points at will not be
changed via the pointer by calling this function.

On the other hand

@@ -1438,7 +1439,7 @@ static int look_ahead(struct grep_opt *opt,
 		      const char **bol_p)
 {
 	unsigned lno = *lno_p;
-	const char *bol = *bol_p;
+	const char * const bol = *bol_p;
 	struct grep_pat *p;
 	const char *sp, *last_bol;
 	regoff_t earliest = -1;

This one is totally invisible to the caller of the function, and is
a good documentation for those who read the implementation of it.

Thanks.
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 14fe8a0fd23..4e266769931 100644
--- a/grep.c
+++ b/grep.c
@@ -206,7 +206,7 @@  void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
 		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);
 }
 
-static struct grep_pat *create_grep_pat(const char *pat, size_t patlen,
+static struct grep_pat *create_grep_pat(const char *const pat, size_t patlen,
 					const char *origin, int no,
 					enum grep_pat_token t,
 					enum grep_header_field field)
@@ -436,8 +436,8 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	}
 }
 
-static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
-		regmatch_t *match, int eflags)
+static int pcre2match(struct grep_pat *p, const char * const line,
+		      const char * const eol, regmatch_t *match, int eflags)
 {
 	int ret, flags = 0;
 	PCRE2_SIZE *ovector;
@@ -489,8 +489,9 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
 }
 
-static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
-		regmatch_t *match, int eflags)
+static int pcre2match(struct grep_pat *p,
+		      const char * const line, const char * const eol,
+		      regmatch_t *match, int eflags)
 {
 	return 1;
 }
@@ -909,7 +910,7 @@  static void show_name(struct grep_opt *opt, const char *name)
 }
 
 static int patmatch(struct grep_pat *p,
-		    const char *line, const char *eol,
+		    const char * const line, const char * const eol,
 		    regmatch_t *match, int eflags)
 {
 	int hit;
@@ -923,7 +924,7 @@  static int patmatch(struct grep_pat *p,
 	return hit;
 }
 
-static void strip_timestamp(const char *bol, const char **eol_p)
+static void strip_timestamp(const char * const bol, const char ** const eol_p)
 {
 	const char *eol = *eol_p;
 
@@ -1026,7 +1027,7 @@  static int match_one_pattern(struct grep_pat *p,
 }
 
 static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
-			   const char *bol, const char *eol,
+			   const char *bol, const char * const eol,
 			   enum grep_context ctx, ssize_t *col,
 			   ssize_t *icol, int collect_hits)
 {
@@ -1095,7 +1096,7 @@  static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
 }
 
 static int match_expr(struct grep_opt *opt,
-		      const char *bol, const char *eol,
+		      const char *bol, const char * const eol,
 		      enum grep_context ctx, ssize_t *col,
 		      ssize_t *icol, int collect_hits)
 {
@@ -1104,7 +1105,7 @@  static int match_expr(struct grep_opt *opt,
 }
 
 static int match_line(struct grep_opt *opt,
-		      const char *bol, const char *eol,
+		      const char *bol, const char * const eol,
 		      ssize_t *col, ssize_t *icol,
 		      enum grep_context ctx, int collect_hits)
 {
@@ -1137,7 +1138,7 @@  static int match_line(struct grep_opt *opt,
 }
 
 static int match_next_pattern(struct grep_pat *p,
-			      const char *bol, const char *eol,
+			      const char * const bol, const char * const eol,
 			      enum grep_context ctx,
 			      regmatch_t *pmatch, int eflags)
 {
@@ -1159,7 +1160,7 @@  static int match_next_pattern(struct grep_pat *p,
 }
 
 static int next_match(struct grep_opt *opt,
-		      const char *bol, const char *eol,
+		      const char * const bol, const char * const eol,
 		      enum grep_context ctx, regmatch_t *pmatch, int eflags)
 {
 	struct grep_pat *p;
@@ -1216,7 +1217,7 @@  static void show_line_header(struct grep_opt *opt, const char *name,
 }
 
 static void show_line(struct grep_opt *opt,
-		      const char *bol, const char *eol,
+		      const char *bol, const char * const eol,
 		      const char *name, unsigned lno, ssize_t cno, char sign)
 {
 	int rest = eol - bol;
@@ -1306,7 +1307,7 @@  static inline void grep_attr_unlock(void)
 }
 
 static int match_funcname(struct grep_opt *opt, struct grep_source *gs,
-			  const char *bol, const char *eol)
+			  const char * const bol, const char * const eol)
 {
 	xdemitconf_t *xecfg = opt->priv;
 	if (xecfg && !xecfg->find_func) {
@@ -1336,7 +1337,7 @@  static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs,
 			       const char *bol, unsigned lno)
 {
 	while (bol > gs->buf) {
-		const char *eol = --bol;
+		const char * const eol = --bol;
 
 		while (bol > gs->buf && bol[-1] != '\n')
 			bol--;
@@ -1352,7 +1353,7 @@  static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs,
 	}
 }
 
-static int is_empty_line(const char *bol, const char *eol);
+static int is_empty_line(const char * const bol, const char * const eol);
 
 static void show_pre_context(struct grep_opt *opt, struct grep_source *gs,
 			     const char *bol, const char *end, unsigned lno)
@@ -1375,8 +1376,8 @@  static void show_pre_context(struct grep_opt *opt, struct grep_source *gs,
 
 	/* Rewind. */
 	while (bol > gs->buf && cur > from) {
-		const char *next_bol = bol;
-		const char *eol = --bol;
+		const char * const next_bol = bol;
+		const char * const eol = --bol;
 
 		while (bol > gs->buf && bol[-1] != '\n')
 			bol--;
@@ -1438,7 +1439,7 @@  static int look_ahead(struct grep_opt *opt,
 		      const char **bol_p)
 {
 	unsigned lno = *lno_p;
-	const char *bol = *bol_p;
+	const char * const bol = *bol_p;
 	struct grep_pat *p;
 	const char *sp, *last_bol;
 	regoff_t earliest = -1;
@@ -1533,7 +1534,7 @@  static int fill_textconv_grep(struct repository *r,
 	return 0;
 }
 
-static int is_empty_line(const char *bol, const char *eol)
+static int is_empty_line(const char *bol, const char * const eol)
 {
 	while (bol < eol && isspace(*bol))
 		bol++;