Message ID | 6694c52b38674859eb0390c7f62da1209a8d8ec3.1670266373.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | git-compat-util.h: introduce CALLOC(x) | expand |
Am 05.12.22 um 19:54 schrieb Taylor Blau: > When zero-initializing a struct without the use of static initializers > like "{ 0 }", it is common to write one of: > > T *ptr = xcalloc(1, sizeof(T)); > T *ptr = xcalloc(1, sizeof(*ptr)); > > These correctly initialize "*ptr" to the all-zeros value, but have a > couple of drawbacks. Notably, both initializations are verbose, but the > former is a foot-gun. If "*ptr"'s type changes to something other than > "T", the programmer has to remember to update not only the type of the > variable, but the right-hand side of its initialization, too. > > In git.git, it is sometimes common to write something like: > > T *ptr; > CALLOC_ARRAY(ptr, 1); > > ...but that is confusing, since we're not initializing an array. Rather, > we're using the CALLOC_ARRAY() macro to pretend like we are. But since > the array length is 1, the effect is zero initializing a single element. An object and a single-element array of objects allocated on the heap are indistinguishable. In that sense there is no confusion -- we are indeed allocating a single-element array. But if the intent is to only get one thing then having to fill in 1 in the bulk order form is a bit tedious, especially since this is the most common kind of request. A shortcut for the frequent case makes sense. > Introduce a new variant, CALLOC(x), which initializes "x" to the > all-zeros value, without exposing the confusing use of CALLOC_ARRAY(), > or the foot-guns available when using xcalloc() directly. AFAIK the first "c" in "calloc" is for "continuous", not "zeroed". A single object is always contiguous, so the "C" in "CALLOC" is tautologic if read in that way. It fits our naming scheme for ALLOC_ARRAY and CALLOC_ARRAY, though, so that might not be much of a problem. And there are lots of occurrences of xmalloc(sizeof(T)) and xmalloc(sizeof(*ptr)) that would benefit from the automatic size inference of ALLOC_ARRAY -- an ALLOC macro would complement CALLOC nicely. > Write a Coccinelle patch which codifies these rules, but mark it as > "pending" since the resulting diff is too large to include in a single > patch: > > $ git apply .build/contrib/coccinelle/xcalloc.pending.cocci.patch > $ git diff --shortstat > 89 files changed, 221 insertions(+), 178 deletions(-) > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > This is a follow-up on [1], where introducing CALLOC(x) as the preferred > alternative to CALLOC_ARRAY(x, 1) was first suggested. > > The patch is straightforward, and I am pretty sure that the Coccinelle > rules are right, too ;-). But as a first-time Coccinelle user, any extra > scrutiny on those bits would be appreciated. > > The main point of discussion I think is whether we should consider > adopting this rule. I am biased, of course, but I think that we should. > > In any case, we should focus our efforts on polishing v2.39.0, but I > wanted to send this out to the list before I forgot about it. > > [1]: https://lore.kernel.org/git/Y1MrXoobkVKngYL1@coredump.intra.peff.net/ > > contrib/coccinelle/xcalloc.pending.cocci | 21 +++++++++++++++++++++ > git-compat-util.h | 8 ++++++++ > 2 files changed, 29 insertions(+) > create mode 100644 contrib/coccinelle/xcalloc.pending.cocci > > diff --git a/contrib/coccinelle/xcalloc.pending.cocci b/contrib/coccinelle/xcalloc.pending.cocci > new file mode 100644 > index 0000000000..83e4ca1a68 > --- /dev/null > +++ b/contrib/coccinelle/xcalloc.pending.cocci > @@ -0,0 +1,21 @@ > +@@ > +type T; > +T *ptr; > +@@ > +- ptr = xcalloc(1, \( sizeof(T) \| sizeof(*ptr) \) ) > ++ CALLOC(ptr) > + > +@@ > +type T; > +identifier ptr; > +@@ > +- T ptr = xcalloc(1, \( sizeof(T) \| sizeof(*ptr) \) ); > ++ T ptr; > ++ CALLOC(ptr); This rule would turn this code: struct foo *bar = xcalloc(1, sizeof(*bar)); int i; ... into: struct foo *bar; CALLOC(bar); int i; ... which violates the coding guideline to not mix declarations and statements (-Wdeclaration-after-statement). > + > +@@ > +type T; > +T *ptr; > +@@ > +- CALLOC_ARRAY(ptr, 1) > ++ CALLOC(ptr) > diff --git a/git-compat-util.h b/git-compat-util.h > index a76d0526f7..827e5be89c 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -1107,6 +1107,14 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size) > memmove(dst, src, st_mult(size, n)); > } > > +/* > + * Like CALLOC_ARRAY, but the argument is treated as a pointer to a > + * single struct. > + * > + * Preferable over xcalloc(1, sizeof(...)), or CALLOC_ARRAY(..., 1). > + */ > +#define CALLOC(x) (x) = CALLOC_ARRAY((x), 1) > + > /* > * These functions help you allocate structs with flex arrays, and copy > * the data directly into the array. For example, if you had: > -- > 2.38.0.16.g393fd4c6db
On Mon, Dec 05, 2022 at 10:01:11PM +0100, René Scharfe wrote: > This rule would turn this code: > > struct foo *bar = xcalloc(1, sizeof(*bar)); > int i; > > ... into: > > struct foo *bar; > CALLOC(bar); > int i; > > ... which violates the coding guideline to not mix declarations and > statements (-Wdeclaration-after-statement). Yeah, I was wondering about this myself when I wrote this part of the Coccinelle patch. Is there an intelligent way to tell it to put the first statement after all declarations? I couldn't find anything after a quick scan of the documentation nor our own patches. Thanks, Taylor
On Mon, Dec 05 2022, Taylor Blau wrote: > On Mon, Dec 05, 2022 at 10:01:11PM +0100, René Scharfe wrote: >> This rule would turn this code: >> >> struct foo *bar = xcalloc(1, sizeof(*bar)); >> int i; >> >> ... into: >> >> struct foo *bar; >> CALLOC(bar); >> int i; >> >> ... which violates the coding guideline to not mix declarations and >> statements (-Wdeclaration-after-statement). > > Yeah, I was wondering about this myself when I wrote this part of the > Coccinelle patch. > > Is there an intelligent way to tell it to put the first statement after > all declarations? I couldn't find anything after a quick scan of the > documentation nor our own patches. We can get that working, but I think it's good to establish some ground rules first: First, when you add *.pending.cocci rules they shouldn't be pseudocode, but things that are too large to apply right now. I think my recent 041df69edd3 (Merge branch 'ab/fewer-the-index-macros', 2022-11-28) is a good example. In this case if I apply the proposed patch and do: make contrib/coccinelle/xcalloc.pending.cocci.patch && patch -p1 <contrib/coccinelle/xcalloc.pending.cocci.patch && make We have a deluge of syntax errors, not just the warnings René notes. This needs to be fixed, the *.pending.cocci must actually work on our code. (We should probably modify the "static-analysis" to actually apply the result of these *.pending patches and compile with those, but until then we can manually test them). Second, we have test support for rules now, see contrib/coccinelle/tests/. You just need to create a contrib/coccinelle/tests/xcalloc.pending.c, and have the expected post-image in contrib/coccinelle/tests/xcalloc.pending.res. Please add one of those. We don't have them for existing rules, but it really helps to assert & review new rules. The various edge cases that your current *.cocci doesn't compile on etc. should go into that test file. Third, the resulting patch is currently ~2k lines. Can we really not make it non-pending with some whitelisting/gblacklisting. E.g. see this out-of-tree patch for an example of opting out certain functions: https://github.com/avar/git/commit/bedd0323e9b I think that's preferable to having some *.pending.cocci we may or may not get to. I think you can also probably write a working rule for this to incrementally target certain subsets of these, and we could just mass-convert some stuff already (if it doesn't conflict with in-flight etc.). Fourth, I must say I'm kind of negative on the proposed change. I.e. the foot-gun is definitely worth solving, but why not just create a coccinelle rule to narrowly address that? In that case we can presumably start with it non-pending, as we don't have that many of them. On the notion that we need to special-case: - CALLOC_ARRAY(ptr, 1) + CALLOC(ptr) Because an array of one is "not an array" I don't really buy it. The calloc() interface itself exposes that, so I don't see why we'd consider those separate on the API level for these wrapper macros. I.e. I think the point of the macro should be to provide the appropiate sizeof() for you, not to be opinionated that nmemb=1 should be special-cased.I think it's probably not worth the churn to make that transformation. But if we *are* doing that then surely we should provide the full set of functions. I.e. ALLOC() and ALLOC_ARRAY(), CALLOC() and CALLOC_ARRAY(), and REALLOC() and REALLOC_ARRAY()?
Taylor Blau <me@ttaylorr.com> writes: > In git.git, it is sometimes common to write something like: > > T *ptr; > CALLOC_ARRAY(ptr, 1); > > ...but that is confusing, since we're not initializing an array. Given that "man calloc" tells us that calloc takes two parameters, void *calloc(size_t nmemb, size_t size); I personally find CALLOC() that takes only a single parameter and is capable only to allocate a single element array very much confusing. It _might_ be arguable that the order of the parameters CALLOC_ARRAY takes should have been reversed in that the number of elements in the array should come first just like in calloc(), while the pointer that is used to infer the size of an array element should come next, but that is water under the bridge.
On Tue, Dec 06, 2022 at 08:57:21AM +0900, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > In git.git, it is sometimes common to write something like: > > > > T *ptr; > > CALLOC_ARRAY(ptr, 1); > > > > ...but that is confusing, since we're not initializing an array. > > Given that "man calloc" tells us that calloc takes two parameters, > > void *calloc(size_t nmemb, size_t size); > > I personally find CALLOC() that takes only a single parameter and is > capable only to allocate a single element array very much confusing. Hmm. I have always considered "calloc" a mental shorthand for "zero initialize some bytes on the heap". It seemed like you were in favor of such a change in: https://lore.kernel.org/git/xmqq8rl8ivlb.fsf@gitster.g/ ...but it's entirely possible that I misread your message, in which case I would not be sad if you dropped this patch on the floor since I don't feel that strongly about it. I'd be fine to call it CALLOC_ONE() or something, but I'm not sure at that point if it's significantly better to write "CALLOC_ONE(x)" versus "CALLOC_ARRAY(foo, 1)" > It _might_ be arguable that the order of the parameters CALLOC_ARRAY > takes should have been reversed in that the number of elements in > the array should come first just like in calloc(), while the pointer > that is used to infer the size of an array element should come next, > but that is water under the bridge. Yes, I agree that that would be better. But it would be frustrating to make a tree-wide change of that magnitude at this point. So I agree it's water under the bridge ;-). Thanks, Taylor
On Mon, Dec 05, 2022 at 07:29:22PM -0500, Taylor Blau wrote: > On Tue, Dec 06, 2022 at 08:57:21AM +0900, Junio C Hamano wrote: > > Taylor Blau <me@ttaylorr.com> writes: > > > > > In git.git, it is sometimes common to write something like: > > > > > > T *ptr; > > > CALLOC_ARRAY(ptr, 1); > > > > > > ...but that is confusing, since we're not initializing an array. > > > > Given that "man calloc" tells us that calloc takes two parameters, > > > > void *calloc(size_t nmemb, size_t size); > > > > I personally find CALLOC() that takes only a single parameter and is > > capable only to allocate a single element array very much confusing. > > Hmm. I have always considered "calloc" a mental shorthand for "zero > initialize some bytes on the heap". It seemed like you were in favor of > such a change in: > > https://lore.kernel.org/git/xmqq8rl8ivlb.fsf@gitster.g/ > > ...but it's entirely possible that I misread your message, in which case > I would not be sad if you dropped this patch on the floor since I don't > feel that strongly about it. I think the weird thing about calloc is that it does _two_ things compared to malloc: zero-ing, and doing an implicit nmemb*size multiplication. So you can think of "allocate this one element and zero it" as "calloc, but don't multiply" or as "malloc, but zero". Naming it CALLOC() is thinking of it as the former. If we think of it as the latter it could perhaps be MALLOCZ() or something. I don't know if that name is too subtle or not. We have xmemdupz(), which is basically the same thing; it's only a zero-terminator, but that is because we are writing non-zero bytes in the rest of it. Mostly I'd worry that it is easy to glance past the "Z". -Peff
Jeff King <peff@peff.net> writes: > So you can think of "allocate this one element and zero it" as "calloc, > but don't multiply" or as "malloc, but zero". Naming it CALLOC() is > thinking of it as the former. If we think of it as the latter it could > perhaps be MALLOCZ() or something. I don't know if that name is too > subtle or not. We have xmemdupz(), which is basically the same thing; > it's only a zero-terminator, but that is because we are writing non-zero > bytes in the rest of it. Mostly I'd worry that it is easy to glance past > the "Z". I think the name for the former would be CALLOC_ONE(), as I would rephrase it as "calloc, but just one element". I agree MALLOCZ() would be fine for the other interpretation, and I do not have much problem as much problem with the name as calling it CALLOC(). Thanks.
On Mon, Dec 05, 2022 at 05:36:25PM -0500, Taylor Blau wrote: > On Mon, Dec 05, 2022 at 10:01:11PM +0100, René Scharfe wrote: > > This rule would turn this code: > > > > struct foo *bar = xcalloc(1, sizeof(*bar)); > > int i; > > > > ... into: > > > > struct foo *bar; > > CALLOC(bar); > > int i; > > > > ... which violates the coding guideline to not mix declarations and > > statements (-Wdeclaration-after-statement). > > Yeah, I was wondering about this myself when I wrote this part of the > Coccinelle patch. > > Is there an intelligent way to tell it to put the first statement after > all declarations? I couldn't find anything after a quick scan of the > documentation nor our own patches. It feels like generating the code as above is not the end of the world. The most valuable thing that coccinelle is doing here is _finding_ the location, and telling you "it's supposed to be like this". It is great when the "this" post-image is perfect and doesn't need further tweaking. But if the compiler then reminds you "hey, you need to go a bit further manually", that doesn't seem so bad. In other words, I would be happy to follow that work flow if I introduced a bare xcalloc(). My only worry is that somebody less experienced with the project (or with C) would get confused. -Peff
On Tue, Dec 06, 2022 at 12:12:32AM +0100, Ævar Arnfjörð Bjarmason wrote: > But if we *are* doing that then surely we should provide the full set of > functions. I.e. ALLOC() and ALLOC_ARRAY(), CALLOC() and CALLOC_ARRAY(), > and REALLOC() and REALLOC_ARRAY()? FWIW, I would be happy to see all of those (minus REALLOC(), as there is not really any point in growing or shrinking something with a fixed size). The biggest argument against them that I can see is that: struct foo *x = malloc(sizeof(*x)); is idiomatic C that newcomers to the project will easily understand, and: struct foo *x; ALLOC(x); is not. But it feels like we already crossed that bridge with ALLOC_ARRAY(), etc. -Peff
On Mon, Dec 05 2022, Jeff King wrote: > On Tue, Dec 06, 2022 at 12:12:32AM +0100, Ævar Arnfjörð Bjarmason wrote: > >> But if we *are* doing that then surely we should provide the full set of >> functions. I.e. ALLOC() and ALLOC_ARRAY(), CALLOC() and CALLOC_ARRAY(), >> and REALLOC() and REALLOC_ARRAY()? > > FWIW, I would be happy to see all of those (minus REALLOC(), as there is > not really any point in growing or shrinking something with a fixed > size). > > The biggest argument against them that I can see is that: > > struct foo *x = malloc(sizeof(*x)); > > is idiomatic C that newcomers to the project will easily understand, > and: > > struct foo *x; > ALLOC(x); > > is not. But it feels like we already crossed that bridge with > ALLOC_ARRAY(), etc. This is probably too ugly to exist, but if we are going to have more variants maybe one that would allow use within declarations would be better, e.g.: diff --git a/attr.c b/attr.c index 42ad6de8c7c..43ade426e57 100644 --- a/attr.c +++ b/attr.c @@ -666,11 +666,10 @@ static void handle_attr_line(struct attr_stack *res, static struct attr_stack *read_attr_from_array(const char **list) { - struct attr_stack *res; + struct attr_stack *INIT_CALLOC_ARRAY(res, 1); const char *line; int lineno = 0; - CALLOC_ARRAY(res, 1); while ((line = *(list++)) != NULL) handle_attr_line(res, line, "[builtin]", ++lineno, READ_ATTR_MACRO_OK); diff --git a/git-compat-util.h b/git-compat-util.h index a76d0526f79..932d0907f3f 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1089,6 +1089,7 @@ int xstrncmpz(const char *s, const char *t, size_t len); #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc))) #define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x))) +#define INIT_CALLOC_ARRAY(x, alloc) x = xcalloc((alloc), sizeof(*(x))) #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc))) #define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + \ I suspect (but haven't checked) that it might get us into the same parsing trouble as your UNUSED(name) macro. Or maybe: diff --git a/attr.c b/attr.c index 42ad6de8c7c..c3cb5c98bbf 100644 --- a/attr.c +++ b/attr.c @@ -669 +669 @@ static struct attr_stack *read_attr_from_array(const char **list) - struct attr_stack *res; + INIT_CALLOC_ARRAY(struct attr_stack *, res, 1); @@ -673 +672,0 @@ static struct attr_stack *read_attr_from_array(const char **list) - CALLOC_ARRAY(res, 1); diff --git a/git-compat-util.h b/git-compat-util.h index a76d0526f79..fd9af571dc9 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1091,0 +1092 @@ int xstrncmpz(const char *s, const char *t, size_t len); +#define INIT_CALLOC_ARRAY(d, x, alloc) d x = xcalloc((alloc), sizeof(*(x))) I think it's also worth considering just having the "normal C' versions, but we could transform anything that doesn't look lik ea narrow whitelist of patterns into an error with coccinelle. I.e. if we're paranoid about "v" v.s. "*var" in the "sizeof" we could also check & tranform that with coccinelle...
On Mon, Dec 05 2022, Taylor Blau wrote: > On Tue, Dec 06, 2022 at 08:57:21AM +0900, Junio C Hamano wrote: >> Taylor Blau <me@ttaylorr.com> writes: >> >> > In git.git, it is sometimes common to write something like: >> > >> > T *ptr; >> > CALLOC_ARRAY(ptr, 1); >> > >> > ...but that is confusing, since we're not initializing an array. >> >> Given that "man calloc" tells us that calloc takes two parameters, >> >> void *calloc(size_t nmemb, size_t size); >> >> I personally find CALLOC() that takes only a single parameter and is >> capable only to allocate a single element array very much confusing. > > Hmm. I have always considered "calloc" a mental shorthand for "zero > initialize some bytes on the heap". It seemed like you were in favor of > such a change in: > > https://lore.kernel.org/git/xmqq8rl8ivlb.fsf@gitster.g/ > > ...but it's entirely possible that I misread your message, in which case > I would not be sad if you dropped this patch on the floor since I don't > feel that strongly about it. > > I'd be fine to call it CALLOC_ONE() or something, but I'm not sure at > that point if it's significantly better to write "CALLOC_ONE(x)" versus > "CALLOC_ARRAY(foo, 1)" > >> It _might_ be arguable that the order of the parameters CALLOC_ARRAY >> takes should have been reversed in that the number of elements in >> the array should come first just like in calloc(), while the pointer >> that is used to infer the size of an array element should come next, >> but that is water under the bridge. > > Yes, I agree that that would be better. But it would be frustrating to > make a tree-wide change of that magnitude at this point. So I agree it's > water under the bridge ;-). I'm not saying you *should*, but now that we use C99 macros we *could* also make the "1" an optional argument. I.e. these would be the same thing: CALLOC(x) CALLOC(x, 1) And you could also do: CALLOC(x, 123) Whether that makes the interface even nastier is another matter. That can be done by dispatching to an underlying function, and defining the macro as: #define CALLOC(...) mycallocfn_1(__VA_ARGS__, NULL) I.e. for the above you'd get either "x, NULL, NULL", "x, 1, NULL", or "x, 123, NULL". The function could then manually check the arity. Maybe I've just been corrupted by reading the P99 library :)
On Mon, Dec 05, 2022 at 08:43:50PM -0500, Jeff King wrote: > On Mon, Dec 05, 2022 at 05:36:25PM -0500, Taylor Blau wrote: > > > On Mon, Dec 05, 2022 at 10:01:11PM +0100, René Scharfe wrote: > > > This rule would turn this code: > > > > > > struct foo *bar = xcalloc(1, sizeof(*bar)); > > > int i; > > > > > > ... into: > > > > > > struct foo *bar; > > > CALLOC(bar); > > > int i; > > > > > > ... which violates the coding guideline to not mix declarations and > > > statements (-Wdeclaration-after-statement). > > > > Yeah, I was wondering about this myself when I wrote this part of the > > Coccinelle patch. > > > > Is there an intelligent way to tell it to put the first statement after > > all declarations? I couldn't find anything after a quick scan of the > > documentation nor our own patches. > > It feels like generating the code as above is not the end of the world. > The most valuable thing that coccinelle is doing here is _finding_ the > location, and telling you "it's supposed to be like this". It is great > when the "this" post-image is perfect and doesn't need further tweaking. I have to agree. If Coccinelle can generate the right output; great. But if it can't, the amount of additional work to reorganize an already generated and mostly correct *.patch from the tool seems minimal by comparison. > But if the compiler then reminds you "hey, you need to go a bit further > manually", that doesn't seem so bad. In other words, I would be happy to > follow that work flow if I introduced a bare xcalloc(). My only worry is > that somebody less experienced with the project (or with C) would get > confused. Agreed. Thanks, Taylor
On Tue, Dec 06, 2022 at 12:12:32AM +0100, Ævar Arnfjörð Bjarmason wrote: > First, when you add *.pending.cocci rules they shouldn't be pseudocode, > but things that are too large to apply right now. I think my recent > 041df69edd3 (Merge branch 'ab/fewer-the-index-macros', 2022-11-28) is a > good example. Agreed, but I do tend to consider this patch too-large to apply right now. Whether or not 2k lines of diff is review-able or not (and FWIW, I think that your recent Coccinelle-generated patches were quite easy to review pretty quickly), I think there's a bigger question of whether or not we want to make such a sweeping, tree-wide change. And assuming the answer to that is "yes", there's is also a question of timing. Proposing it towards the middle or end of a release cycle seems in bad taste. But I think splitting this discussion up into, "should we introduce something like CALLOC() into our style conventions?" and "do we want to apply this everywhere?" is worth doing. The former doesn't seem to take a ton of time away from polishing the release candidates, and the latter can be done only after the former has settled. So the decision to make this a *.pending.cocci rule was definitely intentional in this case. > Second, we have test support for rules now, see > contrib/coccinelle/tests/. You just need to create a > contrib/coccinelle/tests/xcalloc.pending.c, and have the expected > post-image in contrib/coccinelle/tests/xcalloc.pending.res. Please add > one of those. We don't have them for existing rules, but it really helps > to assert & review new rules. > > The various edge cases that your current *.cocci doesn't compile on > etc. should go into that test file. Thanks for mentioning. > Third, the resulting patch is currently ~2k lines. Can we really not > make it non-pending with some whitelisting/gblacklisting. E.g. see this > out-of-tree patch for an example of opting out certain functions: > https://github.com/avar/git/commit/bedd0323e9b See above. > Fourth, I must say I'm kind of negative on the proposed change. I.e. the > foot-gun is definitely worth solving, but why not just create a > coccinelle rule to narrowly address that? In that case we can presumably > start with it non-pending, as we don't have that many of them. > > On the notion that we need to special-case: > > - CALLOC_ARRAY(ptr, 1) > + CALLOC(ptr) > > Because an array of one is "not an array" I don't really buy it. The > calloc() interface itself exposes that, so I don't see why we'd consider > those separate on the API level for these wrapper macros. I think the point is that it's just weird. Yes, an array of just a single element on the heap is indistinguishable from asking malloc() to give me the same bytes and then memset() them myself, just as it's indistinguishable from calloc()'ing the right number of bytes for a single structure to begin with. But whether or not the two are indistinguishable doesn't mean that it makes intuitive sense, and that is the goal of CALLOC(). Thanks, Taylor
On Mon, Dec 05, 2022 at 08:47:10PM -0500, Jeff King wrote: > On Tue, Dec 06, 2022 at 12:12:32AM +0100, Ævar Arnfjörð Bjarmason wrote: > > > But if we *are* doing that then surely we should provide the full set of > > functions. I.e. ALLOC() and ALLOC_ARRAY(), CALLOC() and CALLOC_ARRAY(), > > and REALLOC() and REALLOC_ARRAY()? > > FWIW, I would be happy to see all of those (minus REALLOC(), as there is > not really any point in growing or shrinking something with a fixed > size). Same. > The biggest argument against them that I can see is that: > > struct foo *x = malloc(sizeof(*x)); > > is idiomatic C that newcomers to the project will easily understand, > and: > > struct foo *x; > ALLOC(x); > > is not. But it feels like we already crossed that bridge with > ALLOC_ARRAY(), etc. Yeah, I agree that it might be off-putting to have that convention pervade throughout the tree. On the other hand, though, we can clearly document that in CodingGuidelines, and make Coccinelle rules that indicate our preferences. And indeed, ALLOC_ARRAY() and friends is a useful test-case for gauging how something like this might go. I don't think we necessarily need to do all of that in the same set of patches (though I'm not opposed to doing so, either), but I wouldn't mind getting there in the end, one way or another. Thanks, Taylor
On Tue, Dec 06, 2022 at 10:35:54AM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So you can think of "allocate this one element and zero it" as "calloc, > > but don't multiply" or as "malloc, but zero". Naming it CALLOC() is > > thinking of it as the former. If we think of it as the latter it could > > perhaps be MALLOCZ() or something. I don't know if that name is too > > subtle or not. We have xmemdupz(), which is basically the same thing; > > it's only a zero-terminator, but that is because we are writing non-zero > > bytes in the rest of it. Mostly I'd worry that it is easy to glance past > > the "Z". > > I think the name for the former would be CALLOC_ONE(), as I would > rephrase it as "calloc, but just one element". I agree MALLOCZ() > would be fine for the other interpretation, and I do not have much > problem as much problem with the name as calling it CALLOC(). Between CALLOC_ONE() and MALLOCZ(), I prefer the latter for brevity. But between that and CALLOC(), I prefer the latter, since "CALLOC()" reminds me of the zero-initialization of calloc()-proper, and the "Z" in "MALLOCZ()" feels easy-ish to miss. Thanks, Taylor
On Tue, Dec 06 2022, Taylor Blau wrote: > On Tue, Dec 06, 2022 at 12:12:32AM +0100, Ævar Arnfjörð Bjarmason wrote: >> First, when you add *.pending.cocci rules they shouldn't be pseudocode, >> but things that are too large to apply right now. I think my recent >> 041df69edd3 (Merge branch 'ab/fewer-the-index-macros', 2022-11-28) is a >> good example. > > Agreed, but I do tend to consider this patch too-large to apply right > now. I think it's OK if a *pending* patch is 10k lines or whatever, as long as the result of applying it compiles. I.e. if it doesn't compile the semantic patch is broken, and doesn't really help to clarify what the proposed change is. Whether the change itself is too large to be worth it or whatever is another matter, but we can always consider it advisory. E.g. consider a change to remove braces from if/else per our coding style, we don't have such a *.cocci now, but if we did it would be a very large patch. It might still be useful in *.pending.cocci, e.g. to run new code through it. > Whether or not 2k lines of diff is review-able or not (and FWIW, I think > that your recent Coccinelle-generated patches were quite easy to review > pretty quickly), I think there's a bigger question of whether or not we > want to make such a sweeping, tree-wide change. And assuming the answer > to that is "yes", there's is also a question of timing. Proposing it > towards the middle or end of a release cycle seems in bad taste. > > But I think splitting this discussion up into, "should we introduce > something like CALLOC() into our style conventions?" and "do we want to > apply this everywhere?" is worth doing. The former doesn't seem to take > a ton of time away from polishing the release candidates, and the latter > can be done only after the former has settled. > > So the decision to make this a *.pending.cocci rule was definitely > intentional in this case. Yes. A sweeping change like this should definitely start there (although see below). > [...] >> Third, the resulting patch is currently ~2k lines. Can we really not >> make it non-pending with some whitelisting/gblacklisting. E.g. see this >> out-of-tree patch for an example of opting out certain functions: >> https://github.com/avar/git/commit/bedd0323e9b > > See above. What I'm demonstrating with that commit is that you can make a non-pending patch by blacklisting the functions, filenames etc. that are known to be outstanding. The advantage of doing that being that you'd apply it for any new code outside those scopes. So re the "see above" it's an explicit end-run around the "do we want to apply this everywhere?". >> Fourth, I must say I'm kind of negative on the proposed change. I.e. the >> foot-gun is definitely worth solving, but why not just create a >> coccinelle rule to narrowly address that? In that case we can presumably >> start with it non-pending, as we don't have that many of them. >> >> On the notion that we need to special-case: >> >> - CALLOC_ARRAY(ptr, 1) >> + CALLOC(ptr) >> >> Because an array of one is "not an array" I don't really buy it. The >> calloc() interface itself exposes that, so I don't see why we'd consider >> those separate on the API level for these wrapper macros. > > I think the point is that it's just weird. Yes, an array of just a > single element on the heap is indistinguishable from asking malloc() to > give me the same bytes and then memset() them myself, just as it's > indistinguishable from calloc()'ing the right number of bytes for a > single structure to begin with. > > But whether or not the two are indistinguishable doesn't mean that it > makes intuitive sense, and that is the goal of CALLOC(). I understand that's the goal, I just don't get how it's intuitive. First we had REALLOC_ARRAY(), which is the same as a reallocarray(3) where the "size" is computed for you based on the "ptr", which we know to be a variable name in our case. Later we had ALLOC_ARRAY() and CALLOC_ARRAY(), which aim to provide the same sort of interface for malloc() and calloc(). The reallocarray() function comes from OpenBSD, where it's always supported nmemb = 1: https://man.openbsd.org/OpenBSD-5.9/malloc.3#reallocarray~2 It has always been redundant to use it for nmemb=1, but as arrays in C don't have containers (unlike some higher-level lanugages) it would be odd to disallow it, and force the user to do: mem = n == 1 ? realloc(mem, sz) : reallocarray(mem, n, sz); Instead of: mem = reallocarray(mem, n, sz); I.e. you don't need the overlfow check for n * sz if n == 1, but treating it as a special-case is a hassle.
On Tue, Dec 06 2022, Taylor Blau wrote: > On Mon, Dec 05, 2022 at 08:43:50PM -0500, Jeff King wrote: >> On Mon, Dec 05, 2022 at 05:36:25PM -0500, Taylor Blau wrote: >> >> > On Mon, Dec 05, 2022 at 10:01:11PM +0100, René Scharfe wrote: >> > > This rule would turn this code: >> > > >> > > struct foo *bar = xcalloc(1, sizeof(*bar)); >> > > int i; >> > > >> > > ... into: >> > > >> > > struct foo *bar; >> > > CALLOC(bar); >> > > int i; >> > > >> > > ... which violates the coding guideline to not mix declarations and >> > > statements (-Wdeclaration-after-statement). >> > >> > Yeah, I was wondering about this myself when I wrote this part of the >> > Coccinelle patch. >> > >> > Is there an intelligent way to tell it to put the first statement after >> > all declarations? I couldn't find anything after a quick scan of the >> > documentation nor our own patches. >> >> It feels like generating the code as above is not the end of the world. >> The most valuable thing that coccinelle is doing here is _finding_ the >> location, and telling you "it's supposed to be like this". It is great >> when the "this" post-image is perfect and doesn't need further tweaking. > > I have to agree. If Coccinelle can generate the right output; great. But > if it can't, the amount of additional work to reorganize an already > generated and mostly correct *.patch from the tool seems minimal by > comparison. It can, but you need to write your semantic patch to match your intent. If you write e.g.: - int x; + int y; + foo(); That means "add the int y and foo() line right after that "int x" line you removed. Whereas what you want in this case is closer to: - match the "int x" line - remove or amend it - skip past all subsequent declarations - skip past all code that isn't referring to the "x variable? - insert the "CALLOC_ARRAY" (or whatever) before that first "x" use. I don't know offhand how to match this, but presumably it's some mixture of the wildcard syntax ("...", "<... ...>" etc.) and matching a "statement", or maybe marking the "int x" with the "@pos" syntax, and referring to that position again. I usually just browse through the coccinelle.git for *.cocci examples and/or read the PDF (*not* the manual page, which discusses almost none of the syntax) documentation. See: git grep -F '...' -- contrib/coccinelle For some in-tree use of this, the unused.cocci I added recently is probably the closest equivalent to what you'll want.
On Tue, Dec 06, 2022 at 02:58:53AM +0100, Ævar Arnfjörð Bjarmason wrote: > This is probably too ugly to exist, but if we are going to have more > variants maybe one that would allow use within declarations would be > better, e.g.: > > > diff --git a/attr.c b/attr.c > index 42ad6de8c7c..43ade426e57 100644 > --- a/attr.c > +++ b/attr.c > @@ -666,11 +666,10 @@ static void handle_attr_line(struct attr_stack *res, > > static struct attr_stack *read_attr_from_array(const char **list) > { > - struct attr_stack *res; > + struct attr_stack *INIT_CALLOC_ARRAY(res, 1); Yeah. While clever, I would agree that is probably too ugly to exist. -Peff
On Tue, Dec 06, 2022 at 03:12:08AM +0100, Ævar Arnfjörð Bjarmason wrote: > I'm not saying you *should*, but now that we use C99 macros we *could* > also make the "1" an optional argument. I.e. these would be the same > thing: > > CALLOC(x) > CALLOC(x, 1) > > And you could also do: > > CALLOC(x, 123) > > Whether that makes the interface even nastier is another matter. Like the INIT proposal you mentioned, I tend to think this is clever but not really a good idea in the long run. I don't mind magic if it's giving us a big benefit, but I don't think that's the case here. :) -Peff
On Tue, Dec 06, 2022 at 09:38:02PM -0500, Taylor Blau wrote: > On Tue, Dec 06, 2022 at 10:35:54AM +0900, Junio C Hamano wrote: > > Jeff King <peff@peff.net> writes: > > > > > So you can think of "allocate this one element and zero it" as "calloc, > > > but don't multiply" or as "malloc, but zero". Naming it CALLOC() is > > > thinking of it as the former. If we think of it as the latter it could > > > perhaps be MALLOCZ() or something. I don't know if that name is too > > > subtle or not. We have xmemdupz(), which is basically the same thing; > > > it's only a zero-terminator, but that is because we are writing non-zero > > > bytes in the rest of it. Mostly I'd worry that it is easy to glance past > > > the "Z". > > > > I think the name for the former would be CALLOC_ONE(), as I would > > rephrase it as "calloc, but just one element". I agree MALLOCZ() > > would be fine for the other interpretation, and I do not have much > > problem as much problem with the name as calling it CALLOC(). > > Between CALLOC_ONE() and MALLOCZ(), I prefer the latter for brevity. But > between that and CALLOC(), I prefer the latter, since "CALLOC()" reminds > me of the zero-initialization of calloc()-proper, and the "Z" in > "MALLOCZ()" feels easy-ish to miss. FWIW, I agree with your preferences here. But if people think CALLOC() is misleading or badly named, I'm OK with MALLOCZ(). I guess maybe it is ALLOCZ() to go with ALLOC_ARRAY(), and a potential ALLOC() if we choose to have one. -Peff
diff --git a/contrib/coccinelle/xcalloc.pending.cocci b/contrib/coccinelle/xcalloc.pending.cocci new file mode 100644 index 0000000000..83e4ca1a68 --- /dev/null +++ b/contrib/coccinelle/xcalloc.pending.cocci @@ -0,0 +1,21 @@ +@@ +type T; +T *ptr; +@@ +- ptr = xcalloc(1, \( sizeof(T) \| sizeof(*ptr) \) ) ++ CALLOC(ptr) + +@@ +type T; +identifier ptr; +@@ +- T ptr = xcalloc(1, \( sizeof(T) \| sizeof(*ptr) \) ); ++ T ptr; ++ CALLOC(ptr); + +@@ +type T; +T *ptr; +@@ +- CALLOC_ARRAY(ptr, 1) ++ CALLOC(ptr) diff --git a/git-compat-util.h b/git-compat-util.h index a76d0526f7..827e5be89c 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1107,6 +1107,14 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size) memmove(dst, src, st_mult(size, n)); } +/* + * Like CALLOC_ARRAY, but the argument is treated as a pointer to a + * single struct. + * + * Preferable over xcalloc(1, sizeof(...)), or CALLOC_ARRAY(..., 1). + */ +#define CALLOC(x) (x) = CALLOC_ARRAY((x), 1) + /* * These functions help you allocate structs with flex arrays, and copy * the data directly into the array. For example, if you had:
When zero-initializing a struct without the use of static initializers like "{ 0 }", it is common to write one of: T *ptr = xcalloc(1, sizeof(T)); T *ptr = xcalloc(1, sizeof(*ptr)); These correctly initialize "*ptr" to the all-zeros value, but have a couple of drawbacks. Notably, both initializations are verbose, but the former is a foot-gun. If "*ptr"'s type changes to something other than "T", the programmer has to remember to update not only the type of the variable, but the right-hand side of its initialization, too. In git.git, it is sometimes common to write something like: T *ptr; CALLOC_ARRAY(ptr, 1); ...but that is confusing, since we're not initializing an array. Rather, we're using the CALLOC_ARRAY() macro to pretend like we are. But since the array length is 1, the effect is zero initializing a single element. Introduce a new variant, CALLOC(x), which initializes "x" to the all-zeros value, without exposing the confusing use of CALLOC_ARRAY(), or the foot-guns available when using xcalloc() directly. Write a Coccinelle patch which codifies these rules, but mark it as "pending" since the resulting diff is too large to include in a single patch: $ git apply .build/contrib/coccinelle/xcalloc.pending.cocci.patch $ git diff --shortstat 89 files changed, 221 insertions(+), 178 deletions(-) Signed-off-by: Taylor Blau <me@ttaylorr.com> --- This is a follow-up on [1], where introducing CALLOC(x) as the preferred alternative to CALLOC_ARRAY(x, 1) was first suggested. The patch is straightforward, and I am pretty sure that the Coccinelle rules are right, too ;-). But as a first-time Coccinelle user, any extra scrutiny on those bits would be appreciated. The main point of discussion I think is whether we should consider adopting this rule. I am biased, of course, but I think that we should. In any case, we should focus our efforts on polishing v2.39.0, but I wanted to send this out to the list before I forgot about it. [1]: https://lore.kernel.org/git/Y1MrXoobkVKngYL1@coredump.intra.peff.net/ contrib/coccinelle/xcalloc.pending.cocci | 21 +++++++++++++++++++++ git-compat-util.h | 8 ++++++++ 2 files changed, 29 insertions(+) create mode 100644 contrib/coccinelle/xcalloc.pending.cocci -- 2.38.0.16.g393fd4c6db