Message ID | f84c5e8e4a90be3f9fe3cc853e0d40aed4e58826.1710994548.git.dsimic@manjaro.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix a bug in configuration parsing, and improve tests and documentation | expand |
On Thu, Mar 21, 2024 at 12:17 AM Dragan Simic <dsimic@manjaro.org> wrote: > Make it more clear what the whitespace characters are in the context of git > configuration files, and significantly improve the description of the leading > and trailing whitespace handling, especially how it works out together with > the presence of inline comments. > > Helped-by: Junio C Hamano <gitster@pobox.com> > Signed-off-by: Dragan Simic <dsimic@manjaro.org> > --- > diff --git a/Documentation/config.txt b/Documentation/config.txt > @@ -63,13 +64,15 @@ the variable is the boolean "true"). > A line that defines a value can be continued to the next line by > +ending it with a `\`; the backslash and the end-of-line are stripped. > +Leading whitespace characters before 'name =' are discarded. > +The portion of the line after the first comment character, including > +the comment character itself, is discarded. Unless enclosed in double > +quotation marks (`"`), any leading or trailing whitespace characters > +surrounding 'value' are discarded. Internal whitespace characters > +within 'value' are retained verbatim. I find this statement confusing and ambiguous: Unless enclosed in double quotation marks (`"`), any leading or trailing whitespace characters surrounding 'value' are discarded. since it might imply that the shown <SP> and <TAB> whitespace is retained outside the quotes, as well: key =<SP><TAB>" string "<SP> It should be possible to rephrase it to be more definite, while dropping the final sentence altogether. Perhaps: Whitespace surrounding `name`, `=` and `value` is ignored. If `value` is surrounding by double quotation marks (`"`), all characters within the quoted string are retained verbatim, including whitespace. Comments starting with either `#` or `;` and extending to the end of line are discarded. A line that defines a value can be continued to the next line by ending it with a `\`; the backslash and the end-of-line are stripped.
On 2024-03-21 06:11, Eric Sunshine wrote: > On Thu, Mar 21, 2024 at 12:17 AM Dragan Simic <dsimic@manjaro.org> > wrote: >> Make it more clear what the whitespace characters are in the context >> of git >> configuration files, and significantly improve the description of the >> leading >> and trailing whitespace handling, especially how it works out together >> with >> the presence of inline comments. >> >> Helped-by: Junio C Hamano <gitster@pobox.com> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> >> --- >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> @@ -63,13 +64,15 @@ the variable is the boolean "true"). >> A line that defines a value can be continued to the next line by >> +ending it with a `\`; the backslash and the end-of-line are stripped. >> +Leading whitespace characters before 'name =' are discarded. >> +The portion of the line after the first comment character, including >> +the comment character itself, is discarded. Unless enclosed in >> double >> +quotation marks (`"`), any leading or trailing whitespace characters >> +surrounding 'value' are discarded. Internal whitespace characters >> +within 'value' are retained verbatim. > > I find this statement confusing and ambiguous: > > Unless enclosed in double quotation marks (`"`), any leading or > trailing whitespace characters surrounding 'value' are discarded. > > since it might imply that the shown <SP> and <TAB> whitespace is > retained outside the quotes, as well: > > key =<SP><TAB>" string "<SP> > > It should be possible to rephrase it to be more definite, while > dropping the final sentence altogether. Perhaps: > > Whitespace surrounding `name`, `=` and `value` is ignored. If > `value` is surrounding by double quotation marks (`"`), all > characters within the quoted string are retained verbatim, > including whitespace. Comments starting with either `#` or `;` and > extending to the end of line are discarded. A line that defines a > value can be continued to the next line by ending it with a `\`; > the backslash and the end-of-line are stripped. Looking good to me, thanks. I'll include it into the v5, with a small grammar issue fixed.
On Thu, Mar 21, 2024 at 1:16 AM Dragan Simic <dsimic@manjaro.org> wrote: > On 2024-03-21 06:11, Eric Sunshine wrote: > > It should be possible to rephrase it to be more definite, while > > dropping the final sentence altogether. Perhaps: > > > > Whitespace surrounding `name`, `=` and `value` is ignored. If > > `value` is surrounding by double quotation marks (`"`), all > > characters within the quoted string are retained verbatim, > > including whitespace. Comments starting with either `#` or `;` and > > extending to the end of line are discarded. A line that defines a > > value can be continued to the next line by ending it with a `\`; > > the backslash and the end-of-line are stripped. > > Looking good to me, thanks. I'll include it into the v5, with > a small grammar issue fixed. For completeness, I should mention that I intentionally reordered the topics so that the most common/important ones are mentioned earlier rather than later; i.e. (1) surrounding whitespace ignored, (2) double-quoted value, (3) comments, (4) `\` line-splicing with.
On 2024-03-21 06:21, Eric Sunshine wrote: > On Thu, Mar 21, 2024 at 1:16 AM Dragan Simic <dsimic@manjaro.org> > wrote: >> On 2024-03-21 06:11, Eric Sunshine wrote: >> > It should be possible to rephrase it to be more definite, while >> > dropping the final sentence altogether. Perhaps: >> > >> > Whitespace surrounding `name`, `=` and `value` is ignored. If >> > `value` is surrounding by double quotation marks (`"`), all >> > characters within the quoted string are retained verbatim, >> > including whitespace. Comments starting with either `#` or `;` and >> > extending to the end of line are discarded. A line that defines a >> > value can be continued to the next line by ending it with a `\`; >> > the backslash and the end-of-line are stripped. >> >> Looking good to me, thanks. I'll include it into the v5, with >> a small grammar issue fixed. > > For completeness, I should mention that I intentionally reordered the > topics so that the most common/important ones are mentioned earlier > rather than later; i.e. (1) surrounding whitespace ignored, (2) > double-quoted value, (3) comments, (4) `\` line-splicing with. Hmm, I just noticed that your proposed description actually contains some issues, e.g. it implies that the value-internal whitespace is retained verbatim only if the entire value is enclosed in double quotation marks. I'll try to reword it, so this is fixed.
Eric Sunshine <sunshine@sunshineco.com> writes: > Whitespace surrounding `name`, `=` and `value` is ignored. If > `value` is surrounding by double quotation marks (`"`), all > characters within the quoted string are retained verbatim, > including whitespace. Comments starting with either `#` or `;` and > extending to the end of line are discarded. A line that defines a > value can be continued to the next line by ending it with a `\`; > the backslash and the end-of-line are stripped. Nice, but I am not sure how this captures how whitespaces between value and comment are handled, e.g., in this line | name = value # comment$ humans know the space before '#' is removed because it is "whitespace surrounding value". But there is a bit of chicken and egg problem; before you realize '# comment' is a comment and strip it from the line, you do not know where value ends, so your reading of the above need to backtrack.
diff --git a/Documentation/config.txt b/Documentation/config.txt index 782c2bab906c..9d4e99393530 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -22,9 +22,10 @@ multivalued. Syntax ~~~~~~ -The syntax is fairly flexible and permissive; whitespaces are mostly -ignored. The '#' and ';' characters begin comments to the end of line, -blank lines are ignored. +The syntax is fairly flexible and permissive. Whitespace characters, +which in this context are the space character (SP) and the horizontal +tabulation (HT), are mostly ignored. The '#' and ';' characters begin +comments to the end of line. Blank lines are ignored. The file consists of sections and variables. A section begins with the name of the section in square brackets and continues until the next @@ -63,13 +64,15 @@ the variable is the boolean "true"). The variable names are case-insensitive, allow only alphanumeric characters and `-`, and must start with an alphabetic character. + A line that defines a value can be continued to the next line by -ending it with a `\`; the backslash and the end-of-line are -stripped. Leading whitespaces after 'name =', the remainder of the -line after the first comment character '#' or ';', and trailing -whitespaces of the line are discarded unless they are enclosed in -double quotes. Internal whitespaces within the value are retained -verbatim. +ending it with a `\`; the backslash and the end-of-line are stripped. +Leading whitespace characters before 'name =' are discarded. +The portion of the line after the first comment character, including +the comment character itself, is discarded. Unless enclosed in double +quotation marks (`"`), any leading or trailing whitespace characters +surrounding 'value' are discarded. Internal whitespace characters +within 'value' are retained verbatim. Inside double quotes, double quote `"` and backslash `\` characters must be escaped: use `\"` for `"` and `\\` for `\`.
Make it more clear what the whitespace characters are in the context of git configuration files, and significantly improve the description of the leading and trailing whitespace handling, especially how it works out together with the presence of inline comments. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dragan Simic <dsimic@manjaro.org> --- Notes: Changes in v4: - Improved the wording and accuracy of the description of whitespace character handling, as discussed with Junio, [1][2] by taking a more radical approach and rewriting an entire paragraph, because it has reached the point where "patching the patchwork" no longer worked; I'm quite happy with the way it turned out this time - Expanded the patch description a tiny bit - Added a Helped-by tag Changes in v3: - Patch description was expanded a bit, to make it more on point - No changes to the documentation were introduced Changes in v2: - No changes were introduced [1] https://lore.kernel.org/git/xmqqttl1js1o.fsf@gitster.g/ [2] https://lore.kernel.org/git/ce041191a245ff888b1710cdcaad9e61@manjaro.org/ Documentation/config.txt | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)