diff mbox series

[3/3] Documentation: document difference between release and free

Message ID 5e1de3c3159968e897a83c05dae5e8504d37a16c.1721818488.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Documentation: some coding guideline updates | expand

Commit Message

Patrick Steinhardt July 24, 2024, 11:05 a.m. UTC
We semi-regularly have discussions around whether a function shall be
named `release()` or `free()`. For most of the part we use these two
terminologies quite consistently though:

  - `release()` only frees internal state of a structure, whereas the
    structure itself is not free'd.

  - `free()` frees both internal state and the structure itself.

Carve out a space where we can add idiomatic names for common functions
in our coding guidelines. This space can get extended in the future when
we feel the need to document more idiomatic names.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/CodingGuidelines | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Karthik Nayak July 24, 2024, 11:46 a.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> We semi-regularly have discussions around whether a function shall be
> named `release()` or `free()`. For most of the part we use these two
> terminologies quite consistently though:
>

I noticed there is also `clear()` used in some places. Should we also
mention that we don't recommend using `clear()` WRT freeing memory?

>   - `release()` only frees internal state of a structure, whereas the
>     structure itself is not free'd.
>
>   - `free()` frees both internal state and the structure itself.
>
> Carve out a space where we can add idiomatic names for common functions
> in our coding guidelines. This space can get extended in the future when
> we feel the need to document more idiomatic names.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/CodingGuidelines | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 34fcbcb5a4..ace4c4ad0c 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -560,6 +560,18 @@ For C programs:
>
>  	void reset_strbuf(struct strbuf *buf);
>
> + - There are several common idiomatic names for functions performing
> +   specific tasks on structures:
> +
> +    - `<struct>_init()` initializes a structure without allocating the
> +      structure itself.
> +
> +    - `<struct>_release()` releases a structure's contents without
> +      freeing the structure.
> +
> +    - `<struct>_free()` releases a structure's contents and frees the
> +      structure.
> +
>  For Perl programs:
>
>   - Most of the C guidelines above apply.
> --
> 2.46.0.rc1.dirty

The patch itself looks good.
Patrick Steinhardt July 24, 2024, 1:11 p.m. UTC | #2
On Wed, Jul 24, 2024 at 04:46:20AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We semi-regularly have discussions around whether a function shall be
> > named `release()` or `free()`. For most of the part we use these two
> > terminologies quite consistently though:
> >
> 
> I noticed there is also `clear()` used in some places. Should we also
> mention that we don't recommend using `clear()` WRT freeing memory?

In any case I think we should decide on eithe using `clear()` or using
`release()` for consistency's sake. Which of both  we use I don't quite
care, but the following very shoddy analysis clearly favors `release()`:

    $ git grep '_clear(' | wc -l
    844
    $ git grep '_release(' | wc -l
    2126

So yeah, I'm happy to explicitly mention that `clear()` shouldn't be
used in favor of `release()`.

Patrick
Phillip Wood July 24, 2024, 2:30 p.m. UTC | #3
Hi Patrick

On 24/07/2024 14:11, Patrick Steinhardt wrote:
> On Wed, Jul 24, 2024 at 04:46:20AM -0700, Karthik Nayak wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>>> We semi-regularly have discussions around whether a function shall be
>>> named `release()` or `free()`. For most of the part we use these two
>>> terminologies quite consistently though:
>>>
>>
>> I noticed there is also `clear()` used in some places. Should we also
>> mention that we don't recommend using `clear()` WRT freeing memory?
> 
> In any case I think we should decide on eithe using `clear()` or using
> `release()` for consistency's sake. Which of both  we use I don't quite
> care, but the following very shoddy analysis clearly favors `release()`:
> 
>      $ git grep '_clear(' | wc -l
>      844
>      $ git grep '_release(' | wc -l
>      2126

I think a fairer comparison would be to look at function declarations, 
not all the call sites.

$ { git grep 'void [a-z_]*_release(' '*.h'
     git grep 'static void [a-z_]*_release(' '*.c'
   } | wc -l
47
$ { git grep 'void [a-z_]*_clear(' '*.h'
     git grep 'static void [a-z_]*_clear(' '*.c'
   } | wc -l
58

So we have more _clear() functions than _release() functions. I think 
there may sometimes be a semantic difference between _clear() and 
_release() as well where some _clear() functions zero out the struct 
after freeing the members.

Thanks for working on this it will be a useful addition to our coding 
guidelines

Best Wishes

Phillip

> So yeah, I'm happy to explicitly mention that `clear()` shouldn't be
> used in favor of `release()`.
> 
> Patrick
Junio C Hamano July 24, 2024, 4:52 p.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> We semi-regularly have discussions around whether a function shall be
> named `release()` or `free()`. For most of the part we use these two
> terminologies quite consistently though:
>
>   - `release()` only frees internal state of a structure, whereas the
>     structure itself is not free'd.
>
>   - `free()` frees both internal state and the structure itself.
>
> Carve out a space where we can add idiomatic names for common functions
> in our coding guidelines. This space can get extended in the future when
> we feel the need to document more idiomatic names.

We have _clear() in some subsystem/API.  Are we sure the listed two
are sufficient and _clear() can be replaced with one of them
(perhaps _release())?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/CodingGuidelines | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 34fcbcb5a4..ace4c4ad0c 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -560,6 +560,18 @@ For C programs:
>  
>  	void reset_strbuf(struct strbuf *buf);
>  
> + - There are several common idiomatic names for functions performing
> +   specific tasks on structures:
> +
> +    - `<struct>_init()` initializes a structure without allocating the
> +      structure itself.
> +
> +    - `<struct>_release()` releases a structure's contents without
> +      freeing the structure.
> +
> +    - `<struct>_free()` releases a structure's contents and frees the
> +      structure.
> +
>  For Perl programs:
>  
>   - Most of the C guidelines above apply.
Junio C Hamano July 24, 2024, 6:02 p.m. UTC | #5
Phillip Wood <phillip.wood123@gmail.com> writes:

>>> I noticed there is also `clear()` used in some places. Should we also
>>> mention that we don't recommend using `clear()` WRT freeing memory?
>> In any case I think we should decide on eithe using `clear()` or
>> using
>> `release()` for consistency's sake. Which of both  we use I don't quite
>> care, but the following very shoddy analysis clearly favors `release()`:
>>      $ git grep '_clear(' | wc -l
>>      844
>>      $ git grep '_release(' | wc -l
>>      2126
>
> I think a fairer comparison would be to look at function declarations,
> not all the call sites.
>
> $ { git grep 'void [a-z_]*_release(' '*.h'
>     git grep 'static void [a-z_]*_release(' '*.c'
>   } | wc -l
> 47
> $ { git grep 'void [a-z_]*_clear(' '*.h'
>     git grep 'static void [a-z_]*_clear(' '*.c'
>   } | wc -l
> 58
>
> So we have more _clear() functions than _release() functions. I think
> there may sometimes be a semantic difference between _clear() and
> _release() as well where some _clear() functions zero out the struct
> after freeing the members.
>
> Thanks for working on this it will be a useful addition to our coding
> guidelines

Thanks for doing a more thorough study of the current codebase.  I
tend to agree that the number of actual _clear() functions matter a
lot more than how many callsites call _clear(), and it would make
sense to standardise on it.  If everything else being equal, it does
not matter which one we pick, but it rarely happens that everything
else is equal.

 - "release" is a bit more cumbersome to type and read than "clear".

 - "clear" at least to me says more about the state of the thing
   after it got cleared (e.g., I would expect it would be filled
   with NUL bytes)

 - "release" places a lot more stress on what happens to the things
   that were contained before the release takes place.

For example, upon either "clear" or "release", I would expect
everything pointed by elements in an array member of the struct, and
the array pointed at by the member, are free'd when we are
"clearing/releasing" a strvec.  But I may not care what is left in
it after "release".  It can be left to hold all the bytes the struct
had before "release" got called, as anybody who called the function
are not supposed to look at the struct again anyway.  But we may
choose not to have such a variant and always clear the struct after
releasing resources it held, just for good hygiene.

So in short, I would consider that "clear = release + init".  If we
want to have both "clear" and "release" and have them distinct
meaning, that is fine.  If we want to simplify and do without "just
release and leave them dirty" variant, then we need only one name
for it, and I do not mind if we called it "release", even though
I would think "clear" is a better name for the action that behaves
as if "init" was done at the end to make it reusable.

Thanks.
Patrick Steinhardt July 30, 2024, 6:43 a.m. UTC | #6
On Wed, Jul 24, 2024 at 09:52:20AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We semi-regularly have discussions around whether a function shall be
> > named `release()` or `free()`. For most of the part we use these two
> > terminologies quite consistently though:
> >
> >   - `release()` only frees internal state of a structure, whereas the
> >     structure itself is not free'd.
> >
> >   - `free()` frees both internal state and the structure itself.
> >
> > Carve out a space where we can add idiomatic names for common functions
> > in our coding guidelines. This space can get extended in the future when
> > we feel the need to document more idiomatic names.
> 
> We have _clear() in some subsystem/API.  Are we sure the listed two
> are sufficient and _clear() can be replaced with one of them
> (perhaps _release())?

I'd think that `clear()` can be replaced by `release()`, yes. But in
another branch I heard the argument that `clear()` is equivalent to
`release()` followed by `init()`, which I do like. The only downside is
that `init()` must not allocate memory in this case, as otherwise the
`clear()` function would lose the ability to release all resources
associated with its structure.

Patrick
Patrick Steinhardt July 30, 2024, 6:49 a.m. UTC | #7
On Wed, Jul 24, 2024 at 11:02:34AM -0700, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> >>> I noticed there is also `clear()` used in some places. Should we also
> >>> mention that we don't recommend using `clear()` WRT freeing memory?
> >> In any case I think we should decide on eithe using `clear()` or
> >> using
> >> `release()` for consistency's sake. Which of both  we use I don't quite
> >> care, but the following very shoddy analysis clearly favors `release()`:
> >>      $ git grep '_clear(' | wc -l
> >>      844
> >>      $ git grep '_release(' | wc -l
> >>      2126
> >
> > I think a fairer comparison would be to look at function declarations,
> > not all the call sites.
> >
> > $ { git grep 'void [a-z_]*_release(' '*.h'
> >     git grep 'static void [a-z_]*_release(' '*.c'
> >   } | wc -l
> > 47
> > $ { git grep 'void [a-z_]*_clear(' '*.h'
> >     git grep 'static void [a-z_]*_clear(' '*.c'
> >   } | wc -l
> > 58
> >
> > So we have more _clear() functions than _release() functions. I think
> > there may sometimes be a semantic difference between _clear() and
> > _release() as well where some _clear() functions zero out the struct
> > after freeing the members.
> >
> > Thanks for working on this it will be a useful addition to our coding
> > guidelines
> 
> Thanks for doing a more thorough study of the current codebase.  I
> tend to agree that the number of actual _clear() functions matter a
> lot more than how many callsites call _clear(), and it would make
> sense to standardise on it.  If everything else being equal, it does
> not matter which one we pick, but it rarely happens that everything
> else is equal.

I'm not quite sure that I agree with this. I think coding style is most
heavily influenced by what you see most in a codebase. So I'd argue that
it is both declarations/definitions and callsites that influence the
general shape.

This of course means that interfaces like `struct strbuf` have way more
impact on our coding style than others, simply because it is being used
all over the place. But in my opinion that follows naturally, because
the coding style that we use should work best for what is being used
most often.

But anyway, this is splitting hairs :)

>  - "release" is a bit more cumbersome to type and read than "clear".
> 
>  - "clear" at least to me says more about the state of the thing
>    after it got cleared (e.g., I would expect it would be filled
>    with NUL bytes)
> 
>  - "release" places a lot more stress on what happens to the things
>    that were contained before the release takes place.
> 
> For example, upon either "clear" or "release", I would expect
> everything pointed by elements in an array member of the struct, and
> the array pointed at by the member, are free'd when we are
> "clearing/releasing" a strvec.  But I may not care what is left in
> it after "release".  It can be left to hold all the bytes the struct
> had before "release" got called, as anybody who called the function
> are not supposed to look at the struct again anyway.  But we may
> choose not to have such a variant and always clear the struct after
> releasing resources it held, just for good hygiene.
> 
> So in short, I would consider that "clear = release + init".  If we
> want to have both "clear" and "release" and have them distinct
> meaning, that is fine.  If we want to simplify and do without "just
> release and leave them dirty" variant, then we need only one name
> for it, and I do not mind if we called it "release", even though
> I would think "clear" is a better name for the action that behaves
> as if "init" was done at the end to make it reusable.

I actually like this definition. The only downside I see of defining
`clear = release + init` is that `init()` probably shouldn't be allowed
to allocate any memory in this case. Otherwise, calling `clear()` on a
structure would not cause us to free all resources associated with it,
which would be unexpected to me.

Patrick
diff mbox series

Patch

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 34fcbcb5a4..ace4c4ad0c 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -560,6 +560,18 @@  For C programs:
 
 	void reset_strbuf(struct strbuf *buf);
 
+ - There are several common idiomatic names for functions performing
+   specific tasks on structures:
+
+    - `<struct>_init()` initializes a structure without allocating the
+      structure itself.
+
+    - `<struct>_release()` releases a structure's contents without
+      freeing the structure.
+
+    - `<struct>_free()` releases a structure's contents and frees the
+      structure.
+
 For Perl programs:
 
  - Most of the C guidelines above apply.