diff mbox series

[v3,2/3] CodingGuidelines: hint why we value clearly written log messages

Message ID 20220127190259.2470753-3-gitster@pobox.com (mailing list archive)
State Accepted
Commit 607817a3c8d2992f69e53c42dfa604b59d1570ba
Headers show
Series None | expand

Commit Message

Junio C Hamano Jan. 27, 2022, 7:02 p.m. UTC
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/CodingGuidelines | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Emily Shaffer March 4, 2022, 12:07 a.m. UTC | #1
On Thu, Jan 27, 2022 at 11:02:58AM -0800, Junio C Hamano wrote:
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/CodingGuidelines | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 0e27b5395d..c37c43186e 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -26,6 +26,13 @@ code.  For Git in general, a few rough rules are:
>     go and fix it up."
>     Cf. http://lkml.iu.edu/hypermail/linux/kernel/1001.3/01069.html
>  
> + - Log messages to explain your changes are as important as the
> +   changes themselves.  Clearly written code and in-code comments
> +   explain how the code works and what is assumed from the surrounding
> +   context.  The log messages explain what the changes wanted to
> +   achieve and why the changes were necessary (more on this in the
> +   accompanying SubmittingPatches document).
> +

One thing not listed here, that I often hope to find from the commit
message (and don't), is "why we did it this way instead of <other way>".
I am not sure how to phrase it in this document, though. Maybe:

  The log messages explain what the changes wanted to achieve, any
  decisions that were made between alternative approaches, and why the
  changes were necessary (more on this in blah blah)

Or maybe "...whether any alternative approaches were considered..." fits
the form of the surrounding sentence better.

 - Emily

>  Make your code readable and sensible, and don't try to be clever.
>  
>  As for more concrete guidelines, just imitate the existing code
> -- 
> 2.35.0-177-g7d269f5170
>
Junio C Hamano March 4, 2022, 12:27 a.m. UTC | #2
Emily Shaffer <emilyshaffer@google.com> writes:

>> + - Log messages to explain your changes are as important as the
>> +   changes themselves.  Clearly written code and in-code comments
>> +   explain how the code works and what is assumed from the surrounding
>> +   context.  The log messages explain what the changes wanted to
>> +   achieve and why the changes were necessary (more on this in the
>> +   accompanying SubmittingPatches document).
>> +
>
> One thing not listed here, that I often hope to find from the commit
> message (and don't), is "why we did it this way instead of <other way>".
> I am not sure how to phrase it in this document, though. Maybe:
>
>   The log messages explain what the changes wanted to achieve, any
>   decisions that were made between alternative approaches, and why the
>   changes were necessary (more on this in blah blah)
>
> Or maybe "...whether any alternative approaches were considered..." fits
> the form of the surrounding sentence better.

Quite valid observation.

Documentation/SubmittingPatches::meaningful-message makes a note on
these points, and the above may want to be more aligned to them.

Patches welcome, as these have long been merged to 'master/main'.

Thanks.
Junio C Hamano April 14, 2022, 6:51 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Emily Shaffer <emilyshaffer@google.com> writes:
>
>>> + - Log messages to explain your changes are as important as the
>>> +   changes themselves.  Clearly written code and in-code comments
>>> +   explain how the code works and what is assumed from the surrounding
>>> +   context.  The log messages explain what the changes wanted to
>>> +   achieve and why the changes were necessary (more on this in the
>>> +   accompanying SubmittingPatches document).
>>> +
>>
>> One thing not listed here, that I often hope to find from the commit
>> message (and don't), is "why we did it this way instead of <other way>".
>> I am not sure how to phrase it in this document, though. Maybe:
>>
>>   The log messages explain what the changes wanted to achieve, any
>>   decisions that were made between alternative approaches, and why the
>>   changes were necessary (more on this in blah blah)
>>
>> Or maybe "...whether any alternative approaches were considered..." fits
>> the form of the surrounding sentence better.
>
> Quite valid observation.
>
> Documentation/SubmittingPatches::meaningful-message makes a note on
> these points, and the above may want to be more aligned to them.
>
> Patches welcome, as these have long been merged to 'master/main'.

Another thing.  If you (not Emily, but figuratively) haven't watched
Victoria's talk https://www.youtube.com/watch?v=4qLtKx9S9a8 on the
topic of clearly written commits, you should drop everything you are
doing and go watch it.

And with what we learn from it, we may be able to rewrite this part
of the documentation much more clearly.
Ævar Arnfjörð Bjarmason April 14, 2022, 2:04 p.m. UTC | #4
On Wed, Apr 13 2022, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Emily Shaffer <emilyshaffer@google.com> writes:
>>
>>>> + - Log messages to explain your changes are as important as the
>>>> +   changes themselves.  Clearly written code and in-code comments
>>>> +   explain how the code works and what is assumed from the surrounding
>>>> +   context.  The log messages explain what the changes wanted to
>>>> +   achieve and why the changes were necessary (more on this in the
>>>> +   accompanying SubmittingPatches document).
>>>> +
>>>
>>> One thing not listed here, that I often hope to find from the commit
>>> message (and don't), is "why we did it this way instead of <other way>".
>>> I am not sure how to phrase it in this document, though. Maybe:
>>>
>>>   The log messages explain what the changes wanted to achieve, any
>>>   decisions that were made between alternative approaches, and why the
>>>   changes were necessary (more on this in blah blah)
>>>
>>> Or maybe "...whether any alternative approaches were considered..." fits
>>> the form of the surrounding sentence better.
>>
>> Quite valid observation.
>>
>> Documentation/SubmittingPatches::meaningful-message makes a note on
>> these points, and the above may want to be more aligned to them.
>>
>> Patches welcome, as these have long been merged to 'master/main'.
>
> Another thing.  If you (not Emily, but figuratively) haven't watched
> Victoria's talk https://www.youtube.com/watch?v=4qLtKx9S9a8 on the
> topic of clearly written commits, you should drop everything you are
> doing and go watch it.
>
> And with what we learn from it, we may be able to rewrite this part
> of the documentation much more clearly.

The slides for it are at
https://vdye.github..io/2022/OS101-Writing-Commits.pdf (not in the video
description, but at the very end of the video).

It's easy to nitpick/improve existing examples, so here goes :)

The main commit message example in that talk starts as just "Make error
text more helpful", and ends with a better version as:

	git-portable.sh: make error text more helpful
	
	The message “Not a valid command: <invalid command>” is
	intended to notify the user that their subcommand is invalid.
	However, when no subcommand is given, the "empty" subcommand
	results in the same message: "Not a valid command:". This does
	not clearly guide the user to the correct behavior, so print
	"Please specify a command" when no subcommand is specified.

For our CodingGuidelines I think it would be useful to have some version
of "if you can explain something with prose or tests, prefer
tests".

I.e. other things being equal I'd much prefer this version
(pseudo-patch):

	git-portable.sh: don't conflate invalid and non-existing command

	 git-portable-test.sh | 2 +-
	 1 file changed, 1 insertion(+), 1 deletion(-)
	
	diff --git a/git-portable-test.sh b/git-portable-test.sh
	index c8bd464..e03f4a8 100644
	--- a/git-portable-test.sh
	+++ b/git-portable-test.sh
	@@ -5,7 +5,7 @@ test_expect_failure 'usage: invalid command' '
	 '
	 
	 test_expect_failure 'usage: no command' '
	-	test_expect_code_output 129 "Not a valid command: " ./gitportable.sh
	+	test_expect_code_output 129 "Please specify a command" ./gitportable.sh
	 '
	 
	 test_done

It ends up basically saying the same thing, but now we're saying it with
a regression test (test_expect_code_output doesn't exist, but let's
pretend it's test_expect_code + a test_cmp-alike).

What it does entirely omit is the "why".

Now I realize I'm nitpicking a slide shown at a conference, which by its
nature needs to show a small pseudo-example, but I think this applies in
general:

While "why" is a good rule of thumb I think it's just as important to
know when not to include explanations and when to include one.

For cases where something is straightforward enough (as in this case,
the RHS of ": " is clearly missing) I'd think omitting the explanation
would be better, as we should also be concerned about the overall signal
ratio.

(Now, if anyone glances at my own commit messages they'll see I'm
thoroughly in "throwing rocks from a glass house" territory here :) I'm
not saying I'm consistency practicing what I'm preaching).

But just like comments there's no right answer, when one person thinks
an explanation is different from another.

But it is unambiguously the case that we can often replace prose with
tests, and in those cases we should almost always prefer that.

It's also the case that even if everyone agrees that a "why" is needed
there's multiple ways to store that information. One is via commit
messages, another would e.g. be that same commit updating some shared
guidelines about goals/examples of CLI usage.

So in this case, if a Documentation/CodingGuidelines had clear examples
of preferred usage, we could just point briefly point to that as
rationale.

While git's commit messages are excellent, I think that's one area where
we really need improvement. It's rare to dig into some old code where no
rationale can be found for it, either in the commit itself, or in the
preceding ML discussion.

But it's unfortunately (at least in my experience) more often than not
the case that you really do need to consult those commit messages or ML
archives, even for things that have come up a *lot* of times, they were
just never documented in-tree.

There's all sorts of reasons for that which are not the result of any
person doing anything wrong, but I do think it's something we could and
should focus more on as a project.

The barriers of entry for adding documentation or adjusting existing
documentation are much higher than adding a one-off explanation in a
commit message.

Partially (and probably mostly) that's a really good thing, but I can't
help but wonder if we're getting that balance right given the (in my
subjective experience) end result of us often lacking good docs, while
we're not lacking if one searches for replacements for those docs in
commit messages or the ML archive.

One more thing that I think is not explicitly covered (I skimmed the
slides, but haven't gone throug the full back yet): Minimizing diffs.

E.g. the talk shows 287fd17e3a1 (sparse-index: prevent repo root from
becoming sparse, 2022-03-01) as an example, which has this hunk:
	
	diff --git a/dir.c b/dir.c
	index d91295f2bcd..a136377eb49 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -1463,10 +1463,11 @@ static int path_in_sparse_checkout_1(const char *path,
	 	const char *end, *slash;
	 
	 	/*
	-	 * We default to accepting a path if there are no patterns or
	-	 * they are of the wrong type.
	+	 * We default to accepting a path if the path is empty, there are no
	+	 * patterns, or the patterns are of the wrong type.
	 	 */
	-	if (init_sparse_checkout_patterns(istate) ||
	+	if (!*path ||
	+	    init_sparse_checkout_patterns(istate) ||
	 	    (require_cone_mode &&
	 	     !istate->sparse_checkout_patterns->use_cone_patterns))
	 		return 1;

I think this is a worthwhile thing to consider as a replacement:
	
	diff --git a/dir.c b/dir.c
	index d91295f2bcd..93a2320ae57 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -1466,7 +1466,8 @@ static int path_in_sparse_checkout_1(const char *path,
	 	 * We default to accepting a path if there are no patterns or
	 	 * they are of the wrong type.
	 	 */
	-	if (init_sparse_checkout_patterns(istate) ||
	+	if (!*path || /* we consider an empty pattern to be no pattern */
	+	    init_sparse_checkout_patterns(istate) ||
	 	    (require_cone_mode &&
	 	     !istate->sparse_checkout_patterns->use_cone_patterns))
	 		return 1;

I.e. trying to optimize for smaller diffs whenever possible. It this
case the word-diff for the original is:

        /*
         * We default to accepting a path if {+the path is empty,+} there are no
         {+*+} patterns{+,+} or [-* they-]{+the patterns+} are of the wrong type.
         */

Now, obviously another small isolated example that's not worth
nitpicking in itself, but just serves to make a larger point. It's clear
why the rephrasing was done in that case, because the patch adds the
"!*path" check, so it makes sense a-priory to have the comment reflect
that.

But one thing where advice about "narrative structure" and good prose
tends to break down when it comes to software development is that we're
much more focused on reviews of incremental additions than many other
fields, where it tends to be more about the final product.
Emily Shaffer April 19, 2022, 10:53 p.m. UTC | #5
On Thu, Apr 14, 2022 at 04:04:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Wed, Apr 13 2022, Junio C Hamano wrote:
> 
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> >> Emily Shaffer <emilyshaffer@google.com> writes:
> >>
> >>>> + - Log messages to explain your changes are as important as the
> >>>> +   changes themselves.  Clearly written code and in-code comments
> >>>> +   explain how the code works and what is assumed from the surrounding
> >>>> +   context.  The log messages explain what the changes wanted to
> >>>> +   achieve and why the changes were necessary (more on this in the
> >>>> +   accompanying SubmittingPatches document).
> >>>> +
> >>>
> >>> One thing not listed here, that I often hope to find from the commit
> >>> message (and don't), is "why we did it this way instead of <other way>".
> >>> I am not sure how to phrase it in this document, though. Maybe:
> >>>
> >>>   The log messages explain what the changes wanted to achieve, any
> >>>   decisions that were made between alternative approaches, and why the
> >>>   changes were necessary (more on this in blah blah)
> >>>
> >>> Or maybe "...whether any alternative approaches were considered..." fits
> >>> the form of the surrounding sentence better.
> >>
> >> Quite valid observation.
> >>
> >> Documentation/SubmittingPatches::meaningful-message makes a note on
> >> these points, and the above may want to be more aligned to them.
> >>
> >> Patches welcome, as these have long been merged to 'master/main'.
> >
> > Another thing.  If you (not Emily, but figuratively) haven't watched
> > Victoria's talk https://www.youtube.com/watch?v=4qLtKx9S9a8 on the
> > topic of clearly written commits, you should drop everything you are
> > doing and go watch it.
> >
> > And with what we learn from it, we may be able to rewrite this part
> > of the documentation much more clearly.
> 
> The slides for it are at
> https://vdye.github..io/2022/OS101-Writing-Commits.pdf (not in the video
> description, but at the very end of the video).
> 
> It's easy to nitpick/improve existing examples, so here goes :)
> 
> The main commit message example in that talk starts as just "Make error
> text more helpful", and ends with a better version as:
> 
> 	git-portable.sh: make error text more helpful
> 	
> 	The message “Not a valid command: <invalid command>” is
> 	intended to notify the user that their subcommand is invalid.
> 	However, when no subcommand is given, the "empty" subcommand
> 	results in the same message: "Not a valid command:". This does
> 	not clearly guide the user to the correct behavior, so print
> 	"Please specify a command" when no subcommand is specified.
> 
> For our CodingGuidelines I think it would be useful to have some version
> of "if you can explain something with prose or tests, prefer
> tests".
> 
> I.e. other things being equal I'd much prefer this version
> (pseudo-patch):
> 
> 	git-portable.sh: don't conflate invalid and non-existing command
> 
> 	 git-portable-test.sh | 2 +-
> 	 1 file changed, 1 insertion(+), 1 deletion(-)
> 	
> 	diff --git a/git-portable-test.sh b/git-portable-test.sh
> 	index c8bd464..e03f4a8 100644
> 	--- a/git-portable-test.sh
> 	+++ b/git-portable-test.sh
> 	@@ -5,7 +5,7 @@ test_expect_failure 'usage: invalid command' '
> 	 '
> 	 
> 	 test_expect_failure 'usage: no command' '
> 	-	test_expect_code_output 129 "Not a valid command: " ./gitportable.sh
> 	+	test_expect_code_output 129 "Please specify a command" ./gitportable.sh
> 	 '
> 	 
> 	 test_done
> 
> It ends up basically saying the same thing, but now we're saying it with
> a regression test (test_expect_code_output doesn't exist, but let's
> pretend it's test_expect_code + a test_cmp-alike).
> 
> What it does entirely omit is the "why".
> 
> Now I realize I'm nitpicking a slide shown at a conference, which by its
> nature needs to show a small pseudo-example, but I think this applies in
> general:
> 
> While "why" is a good rule of thumb I think it's just as important to
> know when not to include explanations and when to include one.
> 
> For cases where something is straightforward enough (as in this case,
> the RHS of ": " is clearly missing) I'd think omitting the explanation
> would be better, as we should also be concerned about the overall signal
> ratio.

Preface: I don't want to start a fight ;)

I think if you are in a position where you already will read every
single patch that comes across the mailing list, including its diff,
then you make a really valid point. I can read the negative line of your
diff, infer the problem ("oh, there's nothing after :"), and examine the
solution. Fine.

But I also don't think that most of us working on Git have the time to
read every patch and its diff. I certainly don't. I'd agree that your
patch's subject line is a little more informative than Victoria's, but
past that, if the commit message is empty, I have no idea what problem
you were trying to solve until I have scrolled through lines of context,
diff lines, and finally arrive to the regression test (which ends up at
the very end of the patch in Git, because of the way the codebase is
organized). Whereas, with Victoria's proposed commit message, I can read
the paragraph and decide whether I need to review, and from there decide
whether the diff does what she says it should.

So as I'm deciding what to review, I definitely would prefer Victoria's
commit message. Plus, like I mentioned, it gives the extra safeguard of
allowing reviewers to check: does the patch actually do what the author
meant for it to do? If we're never told what the author meant for it to
do, then we are missing information needed for that part of the review.

Anyway, I haven't watched Victoria's talk yet, but I will do so soon :)

 - Emily
Junio C Hamano April 20, 2022, 8:23 a.m. UTC | #6
Emily Shaffer <emilyshaffer@google.com> writes:

> On Thu, Apr 14, 2022 at 04:04:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>  ...
>> For our CodingGuidelines I think it would be useful to have some version
>> of "if you can explain something with prose or tests, prefer
>> tests".

I was going to ignore this part as it is merely showing personal
preference, but I guess I need to weigh in here.

Demonstrating what you meant to say in the log message with tests is
fine, but that should be in addition to prose, explaining how the
scenario is set up and what the user wanted to do, before showing
that a command is giving an outcome that does not help what the user
wanted to do.

IOW, in our CodingGUidelines, we should have "tests can be a good
way to augument what you want to say, but explain it well to those
who are not so familiar with the area."  You do not necessarily have
to explain it to 5 year old, but the audience should not have to be
whoever writes the patch themself to understand it.

> So as I'm deciding what to review, I definitely would prefer Victoria's
> commit message. Plus, like I mentioned, it gives the extra safeguard of
> allowing reviewers to check: does the patch actually do what the author
> meant for it to do? If we're never told what the author meant for it to
> do, then we are missing information needed for that part of the review.

I have nothing to add here.

> Anyway, I haven't watched Victoria's talk yet, but I will do so soon :)

I do not necessarily agree with the presentation order in a proposed
log message she suggests, but overall, it's good investment of your
time.  Highly recommended.
diff mbox series

Patch

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 0e27b5395d..c37c43186e 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -26,6 +26,13 @@  code.  For Git in general, a few rough rules are:
    go and fix it up."
    Cf. http://lkml.iu.edu/hypermail/linux/kernel/1001.3/01069.html
 
+ - Log messages to explain your changes are as important as the
+   changes themselves.  Clearly written code and in-code comments
+   explain how the code works and what is assumed from the surrounding
+   context.  The log messages explain what the changes wanted to
+   achieve and why the changes were necessary (more on this in the
+   accompanying SubmittingPatches document).
+
 Make your code readable and sensible, and don't try to be clever.
 
 As for more concrete guidelines, just imitate the existing code