Message ID | 74bbd2f9db1ddfd5210be8fde2db84f67acff27d.1728697428.git.karthik.188@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5e9fa0f9fa90450688a3bd1cf2630dc0c574e516 |
Headers | show |
Series | clang-format: fix rules to make the CI job cleaner | expand |
Karthik Nayak <karthik.188@gmail.com> writes: [snip] > This avoids weird formatting like: > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, > const char *path, int flags) > > or > > static const struct strbuf *a_really_really_large_function_name( > struct strbuf resolved, const char *path, int flags) > > to instead have something more readable like: > > static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, > const char *path, int flags) > > This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' > and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a > few characters above the 80 column limit to make code more readable. I'm really liking the idea of penalties, but I feel we're relying too much on guestimation of these values. What do you think about adding example files to our codebase? Having concrete examples at hand will allow us to tweak the values in the future, while preserving behavior for existing cases. Or when we decide to change them, we understand what and when. Now, I'm not sure where to put such files. I think I would suggest something like t/style-lint or t/clang-format. Anyway, for our tooling it doesn't seem to matter, because both `make style` and `ci/run-style-check.sh` pick up all .c and .h files anywhere in the source tree. Adding a README to that directory will help people understand why the files are there. -- Toon
On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@gmail.com> wrote: > > In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we > adjusted the line break penalties to really fine tune what we care about > while doing line breaks. Modify some of those to be more inline with > what we care about in the Git project now. > > We need to understand that the values set to penalties in > '.clang-format' are relative to each other and do not hold any absolute > value. The penalty arguments take an 'Unsigned' value, so we have some > liberty over the values we can set. > > First, in that commit, we decided, that under no circumstances do we > want to exceed 80 characters. This seems a bit too strict. We do > overshoot this limit from time to time to prioritize readability. I think that attempting to get the weights right so as to avoid cases where there was an intentional affordance for readability is going to be essentially impossible. Areas where there's an intentional disregard for the clang-format-generated formatting should disable the formatter for that line/region, instead of trying to find a way to adjust the rules to produce something that's going to end up being context dependent. Example: In ref-filter.c, there's 13 lines when initializing the `valid_atom` array that are >80 characters, and 20 lines that are >80 columns (when using 8-space tabs). Line breaking in that block of code may be undesirable, so just disable clang-format there. I don't think there's a consistent set of penalties you could establish that would handle that well without mishandling some other section of code. It's also not clear what the reason for the overshoot is in many cases. - difference between "80 characters" and "80 columns"? - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files) - intentional for readability? - refactorings pushed originally compliant lines out of compliance? - no one caught it and it was just added without any intentional decision? > So > let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we > that we add a penalty of 10 for each character that exceeds the column > limit. By itself this is enough to restrict to column limit. Tuning > other penalties in relation to this is what is important. > > The penalty `PenaltyBreakAssignment` talks about the penalty for > breaking an assignment operator on to the next line. In our project, we > are okay with this, so giving a value of 5, which is below the value for > 'PenaltyExcessCharacter' ensures that in the end, even 1 character over > the column limit is not worth keeping an assignment on the same line. > > Similarly set the penalty for breaking before the first call parameter > 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking > comments 'PenaltyBreakComment' and the penalty for breaking string > literals 'PenaltyBreakString' also to 5. > > Finally, we really care about not breaking the return type into its own > line and we really care about not breaking before an open parenthesis. > This avoids weird formatting like: > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, > const char *path, int flags) Is this how it'd be indented without the penalties, or would it do this, with the function name indented the same amount as the return type (which is, in C, probably going to be the 0th column most times): static const struct strbuf * a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags) > > or > > static const struct strbuf *a_really_really_large_function_name( > struct strbuf resolved, const char *path, int flags) Personal opinion: I prefer this over the version that has a single argument on the first line. My preference for reading functions is: return_type func_name(arg1, arg2, arg3, arg4, arg5, arg6, ...); Or return_type func_name( arg1, arg2, arg3, arg4, arg5, arg6, ...); or, in some cases, putting every argument on their own line (typically when the majority of the arguments are already on their own line, not having one "hiding" somewhere is preferable, but at this point if that's not what my formatter does, I don't fight it). For functions that accept an obvious first parameter, such as `strbuf_add`, maybe having the first parameter on the first line is acceptable/desirable, since it's "obvious" what it is/does. But for many functions that's not the case, and needing to read the end of the first line, potentially beyond the 80th column, feels weird. > > to instead have something more readable like: > > static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, > const char *path, int flags) > > This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' > and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a > few characters above the 80 column limit to make code more readable. A few examples, such as by formatting the code using the current rules (since much of the codebase does not currently comply), and then changing the penalties and seeing what changes, might be nice? > > Signed-off-by: Karthik Nayak <karthik.188@gmail.com> > --- > .clang-format | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/.clang-format b/.clang-format > index 41969eca4b..66a2360ae5 100644 > --- a/.clang-format > +++ b/.clang-format > @@ -209,13 +209,14 @@ KeepEmptyLinesAtTheStartOfBlocks: false > > # Penalties > # This decides what order things should be done if a line is too long > -PenaltyBreakAssignment: 10 > -PenaltyBreakBeforeFirstCallParameter: 30 > -PenaltyBreakComment: 10 > +PenaltyBreakAssignment: 5 > +PenaltyBreakBeforeFirstCallParameter: 5 > +PenaltyBreakComment: 5 > PenaltyBreakFirstLessLess: 0 > -PenaltyBreakString: 10 > -PenaltyExcessCharacter: 100 > -PenaltyReturnTypeOnItsOwnLine: 60 > +PenaltyBreakOpenParenthesis: 300 How does this interact with PenaltyBreakBeforeFirstCallParameter? Does one override the other? > +PenaltyBreakString: 5 > +PenaltyExcessCharacter: 10 > +PenaltyReturnTypeOnItsOwnLine: 300 > > # Don't sort #include's > SortIncludes: false > -- > 2.47.0 >
On Mon, Oct 14, 2024 at 11:08:29AM +0200, Toon Claes wrote: > Karthik Nayak <karthik.188@gmail.com> writes: > > [snip] > > > This avoids weird formatting like: > > > > static const struct strbuf * > > a_really_really_large_function_name(struct strbuf resolved, > > const char *path, int flags) > > > > or > > > > static const struct strbuf *a_really_really_large_function_name( > > struct strbuf resolved, const char *path, int flags) > > > > to instead have something more readable like: > > > > static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, > > const char *path, int flags) > > > > This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' > > and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a > > few characters above the 80 column limit to make code more readable. > > I'm really liking the idea of penalties, but I feel we're relying too > much on guestimation of these values. What do you think about adding > example files to our codebase? Having concrete examples at hand will > allow us to tweak the values in the future, while preserving behavior > for existing cases. Or when we decide to change them, we understand > what and when. I am not sure I see it the same way. I might just be ill-informed or not experienced with these clang-format rules, but having these penalties be defined as such makes it difficult to reason about what lines will and won't be re-wrapped as a result of running the formatter. What is the purpose of these penalties? Thanks, Taylor
Toon Claes <toon@iotcl.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > > [snip] > >> This avoids weird formatting like: >> >> static const struct strbuf * >> a_really_really_large_function_name(struct strbuf resolved, >> const char *path, int flags) >> >> or >> >> static const struct strbuf *a_really_really_large_function_name( >> struct strbuf resolved, const char *path, int flags) >> >> to instead have something more readable like: >> >> static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, >> const char *path, int flags) >> >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a >> few characters above the 80 column limit to make code more readable. > > I'm really liking the idea of penalties, but I feel we're relying too > much on guestimation of these values. What do you think about adding That is true indeed. There is a bit of guestimation done here, I had to try various values to find the ones which worked well. I wish there was a more formal way to do this... > example files to our codebase? Having concrete examples at hand will > allow us to tweak the values in the future, while preserving behavior > for existing cases. Or when we decide to change them, we understand > what and when. > > Now, I'm not sure where to put such files. I think I would suggest > something like t/style-lint or t/clang-format. Anyway, for our tooling > it doesn't seem to matter, because both `make style` and > `ci/run-style-check.sh` pick up all .c and .h files anywhere in the > source tree. Adding a README to that directory will help people > understand why the files are there. > I'm not too keen on adding examples, for the mere facts that: 1. They will be outdated each time we change rules. 2. The commit message already has the information around each rule. Karthik
Taylor Blau <me@ttaylorr.com> writes: > On Mon, Oct 14, 2024 at 11:08:29AM +0200, Toon Claes wrote: >> Karthik Nayak <karthik.188@gmail.com> writes: >> >> [snip] >> >> > This avoids weird formatting like: >> > >> > static const struct strbuf * >> > a_really_really_large_function_name(struct strbuf resolved, >> > const char *path, int flags) >> > >> > or >> > >> > static const struct strbuf *a_really_really_large_function_name( >> > struct strbuf resolved, const char *path, int flags) >> > >> > to instead have something more readable like: >> > >> > static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, >> > const char *path, int flags) >> > >> > This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' >> > and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a >> > few characters above the 80 column limit to make code more readable. >> >> I'm really liking the idea of penalties, but I feel we're relying too >> much on guestimation of these values. What do you think about adding >> example files to our codebase? Having concrete examples at hand will >> allow us to tweak the values in the future, while preserving behavior >> for existing cases. Or when we decide to change them, we understand >> what and when. > > I am not sure I see it the same way. > > I might just be ill-informed or not experienced with these clang-format > rules, but having these penalties be defined as such makes it difficult > to reason about what lines will and won't be re-wrapped as a result of > running the formatter. > > What is the purpose of these penalties? > We have a column limit set in our clang-format 'ColumnLimit: 80', this dictates when line breaks should occur in our code. The penalties are options through which we can influence this decision [1]. > Thanks, > Taylor [1]: https://stackoverflow.com/a/27608250
Kyle Lippincott <spectral@google.com> writes: > On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@gmail.com> wrote: >> >> In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we >> adjusted the line break penalties to really fine tune what we care about >> while doing line breaks. Modify some of those to be more inline with >> what we care about in the Git project now. >> >> We need to understand that the values set to penalties in >> '.clang-format' are relative to each other and do not hold any absolute >> value. The penalty arguments take an 'Unsigned' value, so we have some >> liberty over the values we can set. >> >> First, in that commit, we decided, that under no circumstances do we >> want to exceed 80 characters. This seems a bit too strict. We do >> overshoot this limit from time to time to prioritize readability. > > I think that attempting to get the weights right so as to avoid cases > where there was an intentional affordance for readability is going to > be essentially impossible. Areas where there's an intentional > disregard for the clang-format-generated formatting should disable the > formatter for that line/region, instead of trying to find a way to > adjust the rules to produce something that's going to end up being > context dependent. > To some extent I agree. But the issue is that clang-format is still not enforced within the code base. So expecting users to add: // clang-format off will not hold, at least for _now_. So the next best thing we can do is to get the format rules as close as we can to the current styling, so the actual errors thrown by the CI job is something we can look at without containing too many false positives. > Example: In ref-filter.c, there's 13 lines when initializing the > `valid_atom` array that are >80 characters, and 20 lines that are >80 > columns (when using 8-space tabs). Line breaking in that block of code > may be undesirable, so just disable clang-format there. I don't think > there's a consistent set of penalties you could establish that would > handle that well without mishandling some other section of code. While true, we can quantify if it is better or not: ❯ ci/run-style-check.sh @~50 | wc -l 4718 (master) ❯ ci/run-style-check.sh @~53 | wc -l 4475 (with these patches) And looking through the other changes, those look like violations which should have been fixed. > It's also not clear what the reason for the overshoot is in many cases. > - difference between "80 characters" and "80 columns"? > - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files) > - intentional for readability? > - refactorings pushed originally compliant lines out of compliance? > - no one caught it and it was just added without any intentional decision? > I agree with your inference here, but I'm not sure there is a smooth way to have this information. Either we go full in and say we enable the formatting and every patch must conform to it, or we simply keep the clang-format as a warning system. Currently we do neither. I'd say we should be in a state to reach the latter and then we can gradually think of how to move to the former. >> So >> let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we >> that we add a penalty of 10 for each character that exceeds the column >> limit. By itself this is enough to restrict to column limit. Tuning >> other penalties in relation to this is what is important. >> >> The penalty `PenaltyBreakAssignment` talks about the penalty for >> breaking an assignment operator on to the next line. In our project, we >> are okay with this, so giving a value of 5, which is below the value for >> 'PenaltyExcessCharacter' ensures that in the end, even 1 character over >> the column limit is not worth keeping an assignment on the same line. >> >> Similarly set the penalty for breaking before the first call parameter >> 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking >> comments 'PenaltyBreakComment' and the penalty for breaking string >> literals 'PenaltyBreakString' also to 5. >> >> Finally, we really care about not breaking the return type into its own >> line and we really care about not breaking before an open parenthesis. >> This avoids weird formatting like: >> >> static const struct strbuf * >> a_really_really_large_function_name(struct strbuf resolved, >> const char *path, int flags) > > Is this how it'd be indented without the penalties, or would it do > this, with the function name indented the same amount as the return > type (which is, in C, probably going to be the 0th column most times): > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, > const char *path, int flags) > It will be indented, so not the 0th column. >> >> or >> >> static const struct strbuf *a_really_really_large_function_name( >> struct strbuf resolved, const char *path, int flags) > > Personal opinion: I prefer this over the version that has a single > argument on the first line. My preference for reading functions is: > > return_type func_name(arg1, arg2, > arg3, arg4, > arg5, arg6, ...); > > Or > > return_type func_name( > arg1, arg2, arg3, arg4, > arg5, arg6, ...); > I'm mostly basing my changes on the current state of the 'clang-format' and our code base. I'm happy to change it if everyone agrees on this :) > or, in some cases, putting every argument on their own line (typically > when the majority of the arguments are already on their own line, not > having one "hiding" somewhere is preferable, but at this point if > that's not what my formatter does, I don't fight it). > > For functions that accept an obvious first parameter, such as > `strbuf_add`, maybe having the first parameter on the first line is > acceptable/desirable, since it's "obvious" what it is/does. But for > many functions that's not the case, and needing to read the end of the > first line, potentially beyond the 80th column, feels weird. > >> >> to instead have something more readable like: >> >> static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, >> const char *path, int flags) >> >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a >> few characters above the 80 column limit to make code more readable. > > A few examples, such as by formatting the code using the current rules > (since much of the codebase does not currently comply), and then > changing the penalties and seeing what changes, might be nice? > You mean apart from the example above? > >> >> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> >> --- >> .clang-format | 13 +++++++------ >> 1 file changed, 7 insertions(+), 6 deletions(-) >> >> diff --git a/.clang-format b/.clang-format >> index 41969eca4b..66a2360ae5 100644 >> --- a/.clang-format >> +++ b/.clang-format >> @@ -209,13 +209,14 @@ KeepEmptyLinesAtTheStartOfBlocks: false >> >> # Penalties >> # This decides what order things should be done if a line is too long >> -PenaltyBreakAssignment: 10 >> -PenaltyBreakBeforeFirstCallParameter: 30 >> -PenaltyBreakComment: 10 >> +PenaltyBreakAssignment: 5 >> +PenaltyBreakBeforeFirstCallParameter: 5 >> +PenaltyBreakComment: 5 >> PenaltyBreakFirstLessLess: 0 >> -PenaltyBreakString: 10 >> -PenaltyExcessCharacter: 100 >> -PenaltyReturnTypeOnItsOwnLine: 60 >> +PenaltyBreakOpenParenthesis: 300 > > How does this interact with PenaltyBreakBeforeFirstCallParameter? Does > one override the other? > From my understanding PenaltyBreakOpenParenthesis seems to apply more generally and is the more preferred value. [1]: https://github.com/llvm/llvm-project/blob/a4367d2d136420f562f64e7731b9393fb609f3fc/clang/lib/Format/TokenAnnotator.cpp#L4322 >> +PenaltyBreakString: 5 >> +PenaltyExcessCharacter: 10 >> +PenaltyReturnTypeOnItsOwnLine: 300 >> >> # Don't sort #include's >> SortIncludes: false >> -- >> 2.47.0 >>
On Tue, Oct 15, 2024 at 5:37 AM karthik nayak <karthik.188@gmail.com> wrote: > > Kyle Lippincott <spectral@google.com> writes: > > > On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@gmail.com> wrote: > >> > >> In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we > >> adjusted the line break penalties to really fine tune what we care about > >> while doing line breaks. Modify some of those to be more inline with > >> what we care about in the Git project now. > >> > >> We need to understand that the values set to penalties in > >> '.clang-format' are relative to each other and do not hold any absolute > >> value. The penalty arguments take an 'Unsigned' value, so we have some > >> liberty over the values we can set. > >> > >> First, in that commit, we decided, that under no circumstances do we > >> want to exceed 80 characters. This seems a bit too strict. We do > >> overshoot this limit from time to time to prioritize readability. > > > > I think that attempting to get the weights right so as to avoid cases > > where there was an intentional affordance for readability is going to > > be essentially impossible. Areas where there's an intentional > > disregard for the clang-format-generated formatting should disable the > > formatter for that line/region, instead of trying to find a way to > > adjust the rules to produce something that's going to end up being > > context dependent. > > > > To some extent I agree. But the issue is that clang-format is still not > enforced within the code base. So expecting users to add: > // clang-format off > will not hold, at least for _now_. > > So the next best thing we can do is to get the format rules as close as > we can to the current styling, so the actual errors thrown by the CI job > is something we can look at without containing too many false positives. > > > Example: In ref-filter.c, there's 13 lines when initializing the > > `valid_atom` array that are >80 characters, and 20 lines that are >80 > > columns (when using 8-space tabs). Line breaking in that block of code > > may be undesirable, so just disable clang-format there. I don't think > > there's a consistent set of penalties you could establish that would > > handle that well without mishandling some other section of code. > > While true, we can quantify if it is better or not: > > ❯ ci/run-style-check.sh @~50 | wc -l > 4718 (master) > > ❯ ci/run-style-check.sh @~53 | wc -l > 4475 (with these patches) > > And looking through the other changes, those look like violations which > should have been fixed. > > > It's also not clear what the reason for the overshoot is in many cases. > > - difference between "80 characters" and "80 columns"? > > - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files) > > - intentional for readability? > > - refactorings pushed originally compliant lines out of compliance? > > - no one caught it and it was just added without any intentional decision? > > > > I agree with your inference here, but I'm not sure there is a smooth way > to have this information. Either we go full in and say we enable the > formatting and every patch must conform to it, or we simply keep the > clang-format as a warning system. Currently we do neither. I'd say we > should be in a state to reach the latter and then we can gradually think > of how to move to the former. > > >> So > >> let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we > >> that we add a penalty of 10 for each character that exceeds the column > >> limit. By itself this is enough to restrict to column limit. Tuning > >> other penalties in relation to this is what is important. > >> > >> The penalty `PenaltyBreakAssignment` talks about the penalty for > >> breaking an assignment operator on to the next line. In our project, we > >> are okay with this, so giving a value of 5, which is below the value for > >> 'PenaltyExcessCharacter' ensures that in the end, even 1 character over > >> the column limit is not worth keeping an assignment on the same line. > >> > >> Similarly set the penalty for breaking before the first call parameter > >> 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking > >> comments 'PenaltyBreakComment' and the penalty for breaking string > >> literals 'PenaltyBreakString' also to 5. > >> > >> Finally, we really care about not breaking the return type into its own > >> line and we really care about not breaking before an open parenthesis. > >> This avoids weird formatting like: > >> > >> static const struct strbuf * > >> a_really_really_large_function_name(struct strbuf resolved, > >> const char *path, int flags) > > > > Is this how it'd be indented without the penalties, or would it do > > this, with the function name indented the same amount as the return > > type (which is, in C, probably going to be the 0th column most times): > > > > static const struct strbuf * > > a_really_really_large_function_name(struct strbuf resolved, > > const char *path, int flags) > > > > It will be indented, so not the 0th column. I'm not getting that behavior when I try it. Is this only indented with your updated penalties? Currently on ef8ce8f3d4344fd3af049c17eeba5cd20d98b69f, git status is clean, added this line (no line breaks, just in case my email client makes a mess of this) to top of path.h (chosen arbitrarily): static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags); and ran `clang-format path.h | head -n20` and got this (where `int flags` is indented to align with the opening `(`, but tabs cause problems yet again): static const struct strbuf * a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags); `clang-format --version` shows it's a google-internal build, but it still respects the .clang-format file, so this shouldn't matter? I'm assuming it's a relatively recent (within the past 1 month) commit that it's based off of. > > >> > >> or > >> > >> static const struct strbuf *a_really_really_large_function_name( > >> struct strbuf resolved, const char *path, int flags) > > > > Personal opinion: I prefer this over the version that has a single > > argument on the first line. My preference for reading functions is: > > > > return_type func_name(arg1, arg2, > > arg3, arg4, > > arg5, arg6, ...); > > > > Or > > > > return_type func_name( > > arg1, arg2, arg3, arg4, > > arg5, arg6, ...); > > > > I'm mostly basing my changes on the current state of the 'clang-format' > and our code base. I'm happy to change it if everyone agrees on this :) I'm wondering if tabs have caused some confusion here... [thought continued below] > > > or, in some cases, putting every argument on their own line (typically > > when the majority of the arguments are already on their own line, not > > having one "hiding" somewhere is preferable, but at this point if > > that's not what my formatter does, I don't fight it). > > > > For functions that accept an obvious first parameter, such as > > `strbuf_add`, maybe having the first parameter on the first line is > > acceptable/desirable, since it's "obvious" what it is/does. But for > > many functions that's not the case, and needing to read the end of the > > first line, potentially beyond the 80th column, feels weird. > > > >> > >> to instead have something more readable like: > >> > >> static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, > >> const char *path, int flags) Does this have `const char *path` aligned with the opening `(`? if so, then that matches my first example and I'm fine with it. If this is truly "first argument immediately after (, other arguments indented one indentation level", then my original comment stands: I don't find this readable at all, and I don't see evidence of this being an acceptable format according to our CodingGuidelines document. I also don't understand how the penalties would produce t > >> > >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' > >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a > >> few characters above the 80 column limit to make code more readable. > > > > A few examples, such as by formatting the code using the current rules > > (since much of the codebase does not currently comply), and then > > changing the penalties and seeing what changes, might be nice? > > > > You mean apart from the example above? The example above is hypothetical, but I think I was thinking about this incorrectly. Our existing codebase isn't formatted with clang-format, and formatting it first and then adjusting the penalties doesn't really provide much useful information. Setting the penalties as you have them in this patch, and running it on a copy of the line you have there, produces this for me: static const struct strbuf * a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags); The 80 column limit is still _strictly_ adhered to, it won't even go over by 1 character: static const struct strbuf * a_really_really_large_function_named_Ed(struct strbuf resolved, const char *path, int flags); (note: switched tabs to spaces because tabs are difficult to use during discussions like this) Specifically: - 80 column hard limit applied - return type on its own line - continuation arguments are aligned on the next line (which is expected since we have AlignAfterOpenBracket set). > > > > >> > >> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> > >> --- > >> .clang-format | 13 +++++++------ > >> 1 file changed, 7 insertions(+), 6 deletions(-) > >> > >> diff --git a/.clang-format b/.clang-format > >> index 41969eca4b..66a2360ae5 100644 > >> --- a/.clang-format > >> +++ b/.clang-format > >> @@ -209,13 +209,14 @@ KeepEmptyLinesAtTheStartOfBlocks: false > >> > >> # Penalties > >> # This decides what order things should be done if a line is too long > >> -PenaltyBreakAssignment: 10 > >> -PenaltyBreakBeforeFirstCallParameter: 30 > >> -PenaltyBreakComment: 10 > >> +PenaltyBreakAssignment: 5 > >> +PenaltyBreakBeforeFirstCallParameter: 5 > >> +PenaltyBreakComment: 5 > >> PenaltyBreakFirstLessLess: 0 > >> -PenaltyBreakString: 10 > >> -PenaltyExcessCharacter: 100 > >> -PenaltyReturnTypeOnItsOwnLine: 60 > >> +PenaltyBreakOpenParenthesis: 300 > > > > How does this interact with PenaltyBreakBeforeFirstCallParameter? Does > > one override the other? > > > > From my understanding PenaltyBreakOpenParenthesis seems to apply more > generally and is the more preferred value. > > [1]: https://github.com/llvm/llvm-project/blob/a4367d2d136420f562f64e7731b9393fb609f3fc/clang/lib/Format/TokenAnnotator.cpp#L4322 > > >> +PenaltyBreakString: 5 > >> +PenaltyExcessCharacter: 10 > >> +PenaltyReturnTypeOnItsOwnLine: 300 > >> > >> # Don't sort #include's > >> SortIncludes: false > >> -- > >> 2.47.0 > >>
Kyle Lippincott <spectral@google.com> writes: > On Tue, Oct 15, 2024 at 5:37 AM karthik nayak <karthik.188@gmail.com> wrote: >> >> Kyle Lippincott <spectral@google.com> writes: >> >> > On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@gmail.com> wrote: >> >> >> >> In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we >> >> adjusted the line break penalties to really fine tune what we care about >> >> while doing line breaks. Modify some of those to be more inline with >> >> what we care about in the Git project now. >> >> >> >> We need to understand that the values set to penalties in >> >> '.clang-format' are relative to each other and do not hold any absolute >> >> value. The penalty arguments take an 'Unsigned' value, so we have some >> >> liberty over the values we can set. >> >> >> >> First, in that commit, we decided, that under no circumstances do we >> >> want to exceed 80 characters. This seems a bit too strict. We do >> >> overshoot this limit from time to time to prioritize readability. >> > >> > I think that attempting to get the weights right so as to avoid cases >> > where there was an intentional affordance for readability is going to >> > be essentially impossible. Areas where there's an intentional >> > disregard for the clang-format-generated formatting should disable the >> > formatter for that line/region, instead of trying to find a way to >> > adjust the rules to produce something that's going to end up being >> > context dependent. >> > >> >> To some extent I agree. But the issue is that clang-format is still not >> enforced within the code base. So expecting users to add: >> // clang-format off >> will not hold, at least for _now_. >> >> So the next best thing we can do is to get the format rules as close as >> we can to the current styling, so the actual errors thrown by the CI job >> is something we can look at without containing too many false positives. >> >> > Example: In ref-filter.c, there's 13 lines when initializing the >> > `valid_atom` array that are >80 characters, and 20 lines that are >80 >> > columns (when using 8-space tabs). Line breaking in that block of code >> > may be undesirable, so just disable clang-format there. I don't think >> > there's a consistent set of penalties you could establish that would >> > handle that well without mishandling some other section of code. >> >> While true, we can quantify if it is better or not: >> >> ❯ ci/run-style-check.sh @~50 | wc -l >> 4718 (master) >> >> ❯ ci/run-style-check.sh @~53 | wc -l >> 4475 (with these patches) >> >> And looking through the other changes, those look like violations which >> should have been fixed. >> >> > It's also not clear what the reason for the overshoot is in many cases. >> > - difference between "80 characters" and "80 columns"? >> > - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files) >> > - intentional for readability? >> > - refactorings pushed originally compliant lines out of compliance? >> > - no one caught it and it was just added without any intentional decision? >> > >> >> I agree with your inference here, but I'm not sure there is a smooth way >> to have this information. Either we go full in and say we enable the >> formatting and every patch must conform to it, or we simply keep the >> clang-format as a warning system. Currently we do neither. I'd say we >> should be in a state to reach the latter and then we can gradually think >> of how to move to the former. >> >> >> So >> >> let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we >> >> that we add a penalty of 10 for each character that exceeds the column >> >> limit. By itself this is enough to restrict to column limit. Tuning >> >> other penalties in relation to this is what is important. >> >> >> >> The penalty `PenaltyBreakAssignment` talks about the penalty for >> >> breaking an assignment operator on to the next line. In our project, we >> >> are okay with this, so giving a value of 5, which is below the value for >> >> 'PenaltyExcessCharacter' ensures that in the end, even 1 character over >> >> the column limit is not worth keeping an assignment on the same line. >> >> >> >> Similarly set the penalty for breaking before the first call parameter >> >> 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking >> >> comments 'PenaltyBreakComment' and the penalty for breaking string >> >> literals 'PenaltyBreakString' also to 5. >> >> >> >> Finally, we really care about not breaking the return type into its own >> >> line and we really care about not breaking before an open parenthesis. >> >> This avoids weird formatting like: >> >> >> >> static const struct strbuf * >> >> a_really_really_large_function_name(struct strbuf resolved, >> >> const char *path, int flags) >> > >> > Is this how it'd be indented without the penalties, or would it do >> > this, with the function name indented the same amount as the return >> > type (which is, in C, probably going to be the 0th column most times): >> > >> > static const struct strbuf * >> > a_really_really_large_function_name(struct strbuf resolved, >> > const char *path, int flags) >> > >> >> It will be indented, so not the 0th column. > > I'm not getting that behavior when I try it. Is this only indented > with your updated penalties? > > Currently on ef8ce8f3d4344fd3af049c17eeba5cd20d98b69f, git status is > clean, added this line (no line breaks, just in case my email client > makes a mess of this) to top of path.h (chosen arbitrarily): > > static const struct strbuf *a_really_really_large_function_name(struct > strbuf resolved, const char *path, int flags); > > and ran `clang-format path.h | head -n20` and got this (where `int > flags` is indented to align with the opening `(`, but tabs cause > problems yet again): > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, const char *path, > int flags); > > `clang-format --version` shows it's a google-internal build, but it > still respects the .clang-format file, so this shouldn't matter? I'm > assuming it's a relatively recent (within the past 1 month) commit > that it's based off of. > You're absolutely right, this is also what I get. Sorry for the confusion, but I assumed you were talking about the third line, i.e. `const char *path, int flags)`. I also ran it on the CI to make it easier on the eyes (specifically with the tabs): [1] >> >> >> >> >> or >> >> >> >> static const struct strbuf *a_really_really_large_function_name( >> >> struct strbuf resolved, const char *path, int flags) >> > >> > Personal opinion: I prefer this over the version that has a single >> > argument on the first line. My preference for reading functions is: >> > >> > return_type func_name(arg1, arg2, >> > arg3, arg4, >> > arg5, arg6, ...); >> > >> > Or >> > >> > return_type func_name( >> > arg1, arg2, arg3, arg4, >> > arg5, arg6, ...); >> > >> >> I'm mostly basing my changes on the current state of the 'clang-format' >> and our code base. I'm happy to change it if everyone agrees on this :) > > I'm wondering if tabs have caused some confusion here... [thought > continued below] > >> >> > or, in some cases, putting every argument on their own line (typically >> > when the majority of the arguments are already on their own line, not >> > having one "hiding" somewhere is preferable, but at this point if >> > that's not what my formatter does, I don't fight it). >> > >> > For functions that accept an obvious first parameter, such as >> > `strbuf_add`, maybe having the first parameter on the first line is >> > acceptable/desirable, since it's "obvious" what it is/does. But for >> > many functions that's not the case, and needing to read the end of the >> > first line, potentially beyond the 80th column, feels weird. >> > >> >> >> >> to instead have something more readable like: >> >> >> >> static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, >> >> const char *path, int flags) > > Does this have `const char *path` aligned with the opening `(`? if so, > then that matches my first example and I'm fine with it. Yes, it is. Here is a CI job to demonstrate. I hope this makes it clearer: [2] > If this is > truly "first argument immediately after (, other arguments indented > one indentation level", then my original comment stands: I don't find > this readable at all, and I don't see evidence of this being an > acceptable format according to our CodingGuidelines document. > > I also don't understand how the penalties would produce t > The penalties define when the linebreak should happen. The alignment is handled by the `AlignAfterOpenBracket: Align` rule we have in '.clang-format'. >> >> >> >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' >> >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a >> >> few characters above the 80 column limit to make code more readable. >> > >> > A few examples, such as by formatting the code using the current rules >> > (since much of the codebase does not currently comply), and then >> > changing the penalties and seeing what changes, might be nice? >> > >> >> You mean apart from the example above? > > The example above is hypothetical, but I think I was thinking about > this incorrectly. Our existing codebase isn't formatted with > clang-format, and formatting it first and then adjusting the penalties > doesn't really provide much useful information. > > Setting the penalties as you have them in this patch, and running it > on a copy of the line you have there, produces this for me: > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, const char *path, > int flags); > > The 80 column limit is still _strictly_ adhered to, it won't even go > over by 1 character: > > static const struct strbuf * > a_really_really_large_function_named_Ed(struct strbuf resolved, > const char *path, int flags); > > (note: switched tabs to spaces because tabs are difficult to use > during discussions like this) > > Specifically: > - 80 column hard limit applied > - return type on its own line > - continuation arguments are aligned on the next line (which is > expected since we have AlignAfterOpenBracket set). > But this is not what I'm seeing though, even the CI [2] confirms that. I'm seeing static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags); where the consecutive lines are aligned on '('. (note: switched tabs to spaces too) Thanks for the back and forth though, I guess I have some feedback for the next version: - Add some examples in a file like Toon suggested, showing how the clang-format would work. - Clarify the commit message to make it clearer about how the penalties work with other rules. [snip] [1]: https://gitlab.com/gitlab-org/git/-/jobs/8105793089 [2]: https://gitlab.com/gitlab-org/git/-/jobs/8105737945
diff --git a/.clang-format b/.clang-format index 41969eca4b..66a2360ae5 100644 --- a/.clang-format +++ b/.clang-format @@ -209,13 +209,14 @@ KeepEmptyLinesAtTheStartOfBlocks: false # Penalties # This decides what order things should be done if a line is too long -PenaltyBreakAssignment: 10 -PenaltyBreakBeforeFirstCallParameter: 30 -PenaltyBreakComment: 10 +PenaltyBreakAssignment: 5 +PenaltyBreakBeforeFirstCallParameter: 5 +PenaltyBreakComment: 5 PenaltyBreakFirstLessLess: 0 -PenaltyBreakString: 10 -PenaltyExcessCharacter: 100 -PenaltyReturnTypeOnItsOwnLine: 60 +PenaltyBreakOpenParenthesis: 300 +PenaltyBreakString: 5 +PenaltyExcessCharacter: 10 +PenaltyReturnTypeOnItsOwnLine: 300 # Don't sort #include's SortIncludes: false
In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we adjusted the line break penalties to really fine tune what we care about while doing line breaks. Modify some of those to be more inline with what we care about in the Git project now. We need to understand that the values set to penalties in '.clang-format' are relative to each other and do not hold any absolute value. The penalty arguments take an 'Unsigned' value, so we have some liberty over the values we can set. First, in that commit, we decided, that under no circumstances do we want to exceed 80 characters. This seems a bit too strict. We do overshoot this limit from time to time to prioritize readability. So let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we that we add a penalty of 10 for each character that exceeds the column limit. By itself this is enough to restrict to column limit. Tuning other penalties in relation to this is what is important. The penalty `PenaltyBreakAssignment` talks about the penalty for breaking an assignment operator on to the next line. In our project, we are okay with this, so giving a value of 5, which is below the value for 'PenaltyExcessCharacter' ensures that in the end, even 1 character over the column limit is not worth keeping an assignment on the same line. Similarly set the penalty for breaking before the first call parameter 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking comments 'PenaltyBreakComment' and the penalty for breaking string literals 'PenaltyBreakString' also to 5. Finally, we really care about not breaking the return type into its own line and we really care about not breaking before an open parenthesis. This avoids weird formatting like: static const struct strbuf * a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags) or static const struct strbuf *a_really_really_large_function_name( struct strbuf resolved, const char *path, int flags) to instead have something more readable like: static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags) This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a few characters above the 80 column limit to make code more readable. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- .clang-format | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)