mbox series

[v2,0/2] Improvements to tests and docs for .gitattributes eol

Message ID 20220214020827.1508706-1-sandals@crustytoothpaste.net (mailing list archive)
Headers show
Series Improvements to tests and docs for .gitattributes eol | expand

Message

brian m. carlson Feb. 14, 2022, 2:08 a.m. UTC
I was answering a question on StackOverflow recently about the
interaction between text=auto and eol, and someone pointed out to me
that what I had written, which was based on the documentation, was not
correct as of Git 2.10 (and more specifically 6523728499 ("convert:
unify the "auto" handling of CRLF", 2016-06-28)).

When I set out to document the behavior correctly, I ran into the fact
that the tests, where I looked for examples of how this behaves, didn't
have any tests for some of these cases, and so I had some trouble
documenting this clearly and accurately.  So this series basically just
adds some tests for existing behavior so we don't change it (and so I
could figure out how it works) and then updates the documentation
accordingly.

Changes from v1:
* Correct documentation inaccuracy with respect to text=auto.

brian m. carlson (2):
  t0027: add tests for eol without text in .gitattributes
  docs: correct documentation about eol attribute

 Documentation/gitattributes.txt | 12 +++++++-----
 t/t0027-auto-crlf.sh            |  6 ++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

Comments

Derrick Stolee Feb. 14, 2022, 2:52 p.m. UTC | #1
On 2/13/2022 9:08 PM, brian m. carlson wrote:
> I was answering a question on StackOverflow recently about the
> interaction between text=auto and eol, and someone pointed out to me
> that what I had written, which was based on the documentation, was not
> correct as of Git 2.10 (and more specifically 6523728499 ("convert:
> unify the "auto" handling of CRLF", 2016-06-28)).
> 
> When I set out to document the behavior correctly, I ran into the fact
> that the tests, where I looked for examples of how this behaves, didn't
> have any tests for some of these cases, and so I had some trouble
> documenting this clearly and accurately.  So this series basically just
> adds some tests for existing behavior so we don't change it (and so I
> could figure out how it works) and then updates the documentation
> accordingly.

Thanks for these updates. They look good as-is.

-Stolee
Junio C Hamano Feb. 14, 2022, 6:15 p.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I was answering a question on StackOverflow recently about the
> interaction between text=auto and eol, and someone pointed out to me
> that what I had written, which was based on the documentation, was not
> correct as of Git 2.10 (and more specifically 6523728499 ("convert:
> unify the "auto" handling of CRLF", 2016-06-28)).
>
> When I set out to document the behavior correctly, I ran into the fact
> that the tests, where I looked for examples of how this behaves, didn't
> have any tests for some of these cases, and so I had some trouble
> documenting this clearly and accurately.  So this series basically just
> adds some tests for existing behavior so we don't change it (and so I
> could figure out how it works) and then updates the documentation
> accordingly.

This seems to be a replacement of the two-patch series that was
merged to 'master' at 8db2f665 (Merge branch 'bc/clarify-eol-attr',
2022-02-11) and was merged to 'next' at dc1db4bd (Merge branch
'bc/clarify-eol-attr' into next, 2022-02-04).  The changes seem to
be in the second step to update Documentation/gitattributes.txt and
it needs to be made an incremental update.

Thanks.

---- >8 -----
From: brian m. carlson <sandals@crustytoothpaste.net>
Subject: doc: clarify interaction between 'eol' and text=auto

The `eol` takes effect on text files only when the index has the
contents in LF line endings.  Paths with contents in CRLF line
endings in the index may become dirty unless text=auto.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/gitattributes.txt | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git c/Documentation/gitattributes.txt w/Documentation/gitattributes.txt
index 60984a4682..a71dad2674 100644
--- c/Documentation/gitattributes.txt
+++ w/Documentation/gitattributes.txt
@@ -161,11 +161,12 @@ unspecified.
 
 This attribute sets a specific line-ending style to be used in the
 working directory.  This attribute has effect only if the `text`
-attribute is set or unspecified, or if it is set to `auto` and the file
-is detected as text.  Note that setting this attribute on paths which
-are in the index with CRLF line endings may make the paths to be
-considered dirty. Adding the path to the index again will normalize the
-line endings in the index.
+attribute is set or unspecified, or if it is set to `auto`, the file is
+detected as text, and it is stored with LF endings in the index.  Note
+that setting this attribute on paths which are in the index with CRLF
+line endings may make the paths to be considered dirty unless
+`text=auto` is set. Adding the path to the index again will normalize
+the line endings in the index.
 
 Set to string value "crlf"::
Torsten Bögershausen Feb. 14, 2022, 8:46 p.m. UTC | #3
[]
> This seems to be a replacement of the two-patch series that was
> merged to 'master' at 8db2f665 (Merge branch 'bc/clarify-eol-attr',
> 2022-02-11) and was merged to 'next' at dc1db4bd (Merge branch
> 'bc/clarify-eol-attr' into next, 2022-02-04).  The changes seem to
> be in the second step to update Documentation/gitattributes.txt and
> it needs to be made an incremental update.
>
> Thanks.
>
> ---- >8 -----
> From: brian m. carlson <sandals@crustytoothpaste.net>
> Subject: doc: clarify interaction between 'eol' and text=auto
>
> The `eol` takes effect on text files only when the index has the
> contents in LF line endings.  Paths with contents in CRLF line
> endings in the index may become dirty unless text=auto.

That is a nice, precise and short summary here in the commit message
as well as the patch further down.

>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/gitattributes.txt | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git c/Documentation/gitattributes.txt w/Documentation/gitattributes.txt
> index 60984a4682..a71dad2674 100644
> --- c/Documentation/gitattributes.txt
> +++ w/Documentation/gitattributes.txt
> @@ -161,11 +161,12 @@ unspecified.
>
>  This attribute sets a specific line-ending style to be used in the
>  working directory.  This attribute has effect only if the `text`
> -attribute is set or unspecified, or if it is set to `auto` and the file
> -is detected as text.  Note that setting this attribute on paths which
> -are in the index with CRLF line endings may make the paths to be
> -considered dirty. Adding the path to the index again will normalize the
> -line endings in the index.
> +attribute is set or unspecified, or if it is set to `auto`, the file is
> +detected as text, and it is stored with LF endings in the index.  Note
> +that setting this attribute on paths which are in the index with CRLF
> +line endings may make the paths to be considered dirty unless
> +`text=auto` is set. Adding the path to the index again will normalize
> +the line endings in the index.
>
>  Set to string value "crlf"::
>
Junio C Hamano Feb. 15, 2022, 12:15 a.m. UTC | #4
Torsten Bögershausen <tboegi@web.de> writes:

>> ---- >8 -----
>> From: brian m. carlson <sandals@crustytoothpaste.net>
>> Subject: doc: clarify interaction between 'eol' and text=auto
>>
>> The `eol` takes effect on text files only when the index has the
>> contents in LF line endings.  Paths with contents in CRLF line
>> endings in the index may become dirty unless text=auto.
>
> That is a nice, precise and short summary here in the commit message
> as well as the patch further down.

Thanks.  Then let's queue it for 'next' and merge it down.

>>
>> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>  Documentation/gitattributes.txt | 11 ++++++-----
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git c/Documentation/gitattributes.txt w/Documentation/gitattributes.txt
>> index 60984a4682..a71dad2674 100644
>> --- c/Documentation/gitattributes.txt
>> +++ w/Documentation/gitattributes.txt
>> @@ -161,11 +161,12 @@ unspecified.
>>
>>  This attribute sets a specific line-ending style to be used in the
>>  working directory.  This attribute has effect only if the `text`
>> -attribute is set or unspecified, or if it is set to `auto` and the file
>> -is detected as text.  Note that setting this attribute on paths which
>> -are in the index with CRLF line endings may make the paths to be
>> -considered dirty. Adding the path to the index again will normalize the
>> -line endings in the index.
>> +attribute is set or unspecified, or if it is set to `auto`, the file is
>> +detected as text, and it is stored with LF endings in the index.  Note
>> +that setting this attribute on paths which are in the index with CRLF
>> +line endings may make the paths to be considered dirty unless
>> +`text=auto` is set. Adding the path to the index again will normalize
>> +the line endings in the index.
>>
>>  Set to string value "crlf"::
>>
Johannes Sixt Feb. 15, 2022, 7:05 a.m. UTC | #5
Am 15.02.22 um 01:15 schrieb Junio C Hamano:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>>> ---- >8 -----
>>> From: brian m. carlson <sandals@crustytoothpaste.net>
>>> Subject: doc: clarify interaction between 'eol' and text=auto
>>>
>>> The `eol` takes effect on text files only when the index has the
>>> contents in LF line endings.  Paths with contents in CRLF line
>>> endings in the index may become dirty unless text=auto.
>>
>> That is a nice, precise and short summary here in the commit message
>> as well as the patch further down.
> 
> Thanks.  Then let's queue it for 'next' and merge it down.
> 
>>>
>>> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>>> ---
>>>  Documentation/gitattributes.txt | 11 ++++++-----
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git c/Documentation/gitattributes.txt w/Documentation/gitattributes.txt
>>> index 60984a4682..a71dad2674 100644
>>> --- c/Documentation/gitattributes.txt
>>> +++ w/Documentation/gitattributes.txt
>>> @@ -161,11 +161,12 @@ unspecified.
>>>
>>>  This attribute sets a specific line-ending style to be used in the
>>>  working directory.  This attribute has effect only if the `text`
>>> -attribute is set or unspecified, or if it is set to `auto` and the file
>>> -is detected as text.  Note that setting this attribute on paths which
>>> -are in the index with CRLF line endings may make the paths to be
>>> -considered dirty. Adding the path to the index again will normalize the
>>> -line endings in the index.
>>> +attribute is set or unspecified, or if it is set to `auto`, the file is
>>> +detected as text, and it is stored with LF endings in the index.  Note
>>> +that setting this attribute on paths which are in the index with CRLF
>>> +line endings may make the paths to be considered dirty unless
>>> +`text=auto` is set. Adding the path to the index again will normalize
>>> +the line endings in the index.

Sorry, I don't find this description clear at all due to the many 'or's
and 'and's and no indication which parts belong together. The original
text was clear (but, of course, not helpful if it was wrong).

I suggest to rewrite the paragraph into format with bullet points:

   ... only if one of the following is true:

  - is set and foo or bar
  - is unspecified and either
      - this
      - or that
  - is set to auto but not...

or something along the lines. I can't propose actual text because I have
no clue what the truth is.

-- Hannes
brian m. carlson Feb. 15, 2022, 10:46 p.m. UTC | #6
On 2022-02-15 at 07:05:44, Johannes Sixt wrote:
> Sorry, I don't find this description clear at all due to the many 'or's
> and 'and's and no indication which parts belong together. The original
> text was clear (but, of course, not helpful if it was wrong).
> 
> I suggest to rewrite the paragraph into format with bullet points:
> 
>    ... only if one of the following is true:
> 
>   - is set and foo or bar
>   - is unspecified and either
>       - this
>       - or that
>   - is set to auto but not...
> 
> or something along the lines. I can't propose actual text because I have
> no clue what the truth is.

Unfortunately, the fact is that this behaviour is complicated.  I can
try a reroll with a bulleted list, though.
Johannes Sixt Feb. 16, 2022, 7 a.m. UTC | #7
Am 15.02.22 um 23:46 schrieb brian m. carlson:
> On 2022-02-15 at 07:05:44, Johannes Sixt wrote:
>> Sorry, I don't find this description clear at all due to the many 'or's
>> and 'and's and no indication which parts belong together. The original
>> text was clear (but, of course, not helpful if it was wrong).
>>
>> I suggest to rewrite the paragraph into format with bullet points:
>>
>>    ... only if one of the following is true:
>>
>>   - is set and foo or bar
>>   - is unspecified and either
>>       - this
>>       - or that
>>   - is set to auto but not...
>>
>> or something along the lines. I can't propose actual text because I have
>> no clue what the truth is.
> 
> Unfortunately, the fact is that this behaviour is complicated.  I can
> try a reroll with a bulleted list, though.

Just so you know where my confusion arises from: Your updated text has
the structure (as I read it)

   if ... set or unspecified or if auto then ... detected ... and LF

It is unclear whether the 'then' conditions apply only to 'if auto'.
Even if the additional 'if' in the middle makes me think that the
'then's apply only to the 'auto' case, it is sufficently vage because in
my mental model there is not much difference between an 'unset' and a
set-to-'auto' attribute, and I wonder why the 'then's should not apply
to the 'unset' case as well.

Moreover, after re-reading the text, I notice that text may be read as
"this attribute has an effect only if <conditions>" where <conditions>
basically means "always except for when the 'if auto' case is not met",
right? Would it perhaps be better to write "has no effect if <very
specific condition>"?

-- Hannes
brian m. carlson Feb. 16, 2022, 10:28 a.m. UTC | #8
On 2022-02-16 at 07:00:24, Johannes Sixt wrote:
> Just so you know where my confusion arises from: Your updated text has
> the structure (as I read it)
> 
>    if ... set or unspecified or if auto then ... detected ... and LF
> 
> It is unclear whether the 'then' conditions apply only to 'if auto'.
> Even if the additional 'if' in the middle makes me think that the
> 'then's apply only to the 'auto' case, it is sufficently vage because in
> my mental model there is not much difference between an 'unset' and a
> set-to-'auto' attribute, and I wonder why the 'then's should not apply
> to the 'unset' case as well.
> 
> Moreover, after re-reading the text, I notice that text may be read as
> "this attribute has an effect only if <conditions>" where <conditions>
> basically means "always except for when the 'if auto' case is not met",
> right? Would it perhaps be better to write "has no effect if <very
> specific condition>"?

The situation is that eol is in effect if and only if:

* text is set;
* text is unspecified; or
* text is auto, the file is detected as text, and the file has LF line
  endings in the index.

Alternately, it has no effect if and only if:

* text is unset;
* text is auto and the file is detected as binary; or
* text is auto and the file is detected as text and has CRLF line
  endings.

I'm not sure one reads significantly easier than the other.  I slightly
prefer the former because it has fewer conditions with multiple nested
entries, though.
Torsten Bögershausen Feb. 16, 2022, 11:52 a.m. UTC | #9
On Wed, Feb 16, 2022 at 10:28:24AM +0000, brian m. carlson wrote:
> On 2022-02-16 at 07:00:24, Johannes Sixt wrote:
> > Just so you know where my confusion arises from: Your updated text has
> > the structure (as I read it)
> >
> >    if ... set or unspecified or if auto then ... detected ... and LF
> >
> > It is unclear whether the 'then' conditions apply only to 'if auto'.
> > Even if the additional 'if' in the middle makes me think that the
> > 'then's apply only to the 'auto' case, it is sufficently vage because in
> > my mental model there is not much difference between an 'unset' and a
> > set-to-'auto' attribute, and I wonder why the 'then's should not apply
> > to the 'unset' case as well.
> >
> > Moreover, after re-reading the text, I notice that text may be read as
> > "this attribute has an effect only if <conditions>" where <conditions>
> > basically means "always except for when the 'if auto' case is not met",
> > right? Would it perhaps be better to write "has no effect if <very
> > specific condition>"?
>
> The situation is that eol is in effect if and only if:

Well written

>
> * text is set;
> * text is unspecified; or
> * text is auto, the file is detected as text, and the file has LF line
>   endings in the index.
>
> Alternately, it has no effect if and only if:
>
> * text is unset;
> * text is auto and the file is detected as binary; or
> * text is auto and the file is detected as text and has CRLF line
>   endings.
... CRLF line endings in the index.
                      ^^^^^^^^^^^^

One of the reasons that the eol attribute is not 100% well-specified
is that people should use the eol attribute together with text.

Either
text=auto eol=crlf
or
text=auto eol=lf
or
text eol=crlf
or
text eol=lf

Older git versions did treat
* text=auto
* eol=crlf

The same as
* text eol=crlf
Which did corrupt binary files.

Never git versions treat
* text=auto
* eol=crlf
as
* text=auto eol=crlf
in the sense that only auto-detected text files are converted,
if they had not been commited with crlf before.

In that sense I feel that the short form
eol=crlf
should be avoided

>
> I'm not sure one reads significantly easier than the other.  I slightly
> prefer the former because it has fewer conditions with multiple nested
> entries, though.
> --
> brian m. carlson (he/him or they/them)
> Toronto, Ontario, CA
Johannes Sixt Feb. 16, 2022, 7:02 p.m. UTC | #10
Am 16.02.22 um 11:28 schrieb brian m. carlson:
> The situation is that eol is in effect if and only if:
> 
> * text is set;
> * text is unspecified; or
> * text is auto, the file is detected as text, and the file has LF line
>   endings in the index.
> 
> Alternately, it has no effect if and only if:
> 
> * text is unset;
> * text is auto and the file is detected as binary; or
> * text is auto and the file is detected as text and has CRLF line
>   endings.
> 
> I'm not sure one reads significantly easier than the other.  I slightly
> prefer the former because it has fewer conditions with multiple nested
> entries, though.

I agree that the first version is easier to understand.

-- Hannes