[0/3] built-in add -p: support diff-so-fancy better

Message ID	pull.1336.git.1661277870.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1336.git.1661277870.gitgitgadget@gmail.com> From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> Date: Tue, 23 Aug 2022 18:04:27 +0000 Subject: [PATCH 0/3] built-in add -p: support diff-so-fancy better Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Philippe Blain <levraiphilippeblain@gmail.com>, Johannes Schindelin <johannes.schindelin@gmx.de> Precedence: bulk
Series	built-in add -p: support diff-so-fancy better \| expand [0/3] built-in add -p: support diff-so-fancy better [1/3] t3701: redefine what is "bogus" output of a diff filter [2/3] add -p: gracefully ignore unparseable hunk headers in colored diffs [3/3] add -p: handle `diff-so-fancy`'s hunk headers better

Kazuhiro Kato via GitGitGadget Aug. 23, 2022, 6:04 p.m. UTC

Philippe Blain reported in
https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com
that there is a problem when running the built-in version of git add -p with
diff-so-fancy [https://github.com/so-fancy/diff-so-fancy] as diff colorizer.
The symptom is this:

    error: could not parse colored hunk header '?[36m?[1m?[38;5;13m@ file:1 @?[1m?[0m'


This patch series addresses that and should fix
https://github.com/so-fancy/diff-so-fancy/issues/437

Johannes Schindelin (3):
  t3701: redefine what is "bogus" output of a diff filter
  add -p: gracefully ignore unparseable hunk headers in colored diffs
  add -p: handle `diff-so-fancy`'s hunk headers better

 add-patch.c                | 21 ++++++++++++---------
 t/t3701-add-interactive.sh | 12 +++++++++++-
 2 files changed, 23 insertions(+), 10 deletions(-)


base-commit: 795ea8776befc95ea2becd8020c7a284677b4161
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1336%2Fdscho%2Fdiff-so-fancy-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1336/dscho/diff-so-fancy-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1336

Philippe Blain Aug. 24, 2022, 3:49 a.m. UTC | #1

Hi Dscho,

Le 2022-08-23 à 14:04, Johannes Schindelin via GitGitGadget a écrit :
> Philippe Blain reported in
> https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com
> that there is a problem when running the built-in version of git add -p with
> diff-so-fancy [https://github.com/so-fancy/diff-so-fancy] as diff colorizer.
> The symptom is this:
> 
>     error: could not parse colored hunk header '?[36m?[1m?[38;5;13m@ file:1 @?[1m?[0m'
> 
> 
> This patch series addresses that and should fix
> https://github.com/so-fancy/diff-so-fancy/issues/437

Thanks for the quick reaction!
I've verified that the patches fix the example in my reproducer, and they also fix it
if using "delta --color-only" as interactive.diffFilter. Delta is another diff colorizer
(and more) that's facing the same issue as diff-so-fancy [1].

However, I've tried it on a more "real-life" example, and then I get:

    error: mismatched output from interactive.diffFilter
    hint: Your filter must maintain a one-to-one correspondence
    hint: between its input and output lines.

This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should 
keep the number of lines the same. This ('--patch') was added in [2], about a month after Peff wrote the message
you mention in https://lore.kernel.org/git/s40ss309-3311-o08s-38r2-9144r33pq549@tzk.qr/.

Again, when using the Perl version with this new example, it works correctly. I'll try to come up with a new
reproducer for this... But this new example does work with delta with the builtin version,
so it might be diff-so-fancy that's the culprit...

Cheers,

Philippe.

[1] https://github.com/dandavison/delta/issues/1114
[2] https://github.com/so-fancy/diff-so-fancy/commit/13d3f8949e15dd62f6b49bc652fe94af6a9bfc79

Johannes Schindelin Aug. 24, 2022, 6:27 a.m. UTC | #2

Hi Philippe,

On Tue, 23 Aug 2022, Philippe Blain wrote:

> Le 2022-08-23 à 14:04, Johannes Schindelin via GitGitGadget a écrit :
> > Philippe Blain reported in
> > https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com
> > that there is a problem when running the built-in version of git add -p with
> > diff-so-fancy [https://github.com/so-fancy/diff-so-fancy] as diff colorizer.
> > The symptom is this:
> >
> >     error: could not parse colored hunk header '?[36m?[1m?[38;5;13m@ file:1 @?[1m?[0m'
> >
> >
> > This patch series addresses that and should fix
> > https://github.com/so-fancy/diff-so-fancy/issues/437
>
> Thanks for the quick reaction!
> I've verified that the patches fix the example in my reproducer, and they also fix it
> if using "delta --color-only" as interactive.diffFilter. Delta is another diff colorizer
> (and more) that's facing the same issue as diff-so-fancy [1].

Great!

> However, I've tried it on a more "real-life" example, and then I get:
>
>     error: mismatched output from interactive.diffFilter
>     hint: Your filter must maintain a one-to-one correspondence
>     hint: between its input and output lines.
>
> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
> keep the number of lines the same.

Would you mind sharing the example with me?

Thanks,
Dscho

> This ('--patch') was added in [2], about a month after Peff wrote the message
> you mention in https://lore.kernel.org/git/s40ss309-3311-o08s-38r2-9144r33pq549@tzk.qr/.
>
> Again, when using the Perl version with this new example, it works correctly. I'll try to come up with a new
> reproducer for this... But this new example does work with delta with the builtin version,
> so it might be diff-so-fancy that's the culprit...
>
> Cheers,
>
> Philippe.
>
> [1] https://github.com/dandavison/delta/issues/1114
> [2] https://github.com/so-fancy/diff-so-fancy/commit/13d3f8949e15dd62f6b49bc652fe94af6a9bfc79
>

Philippe Blain Aug. 24, 2022, 1:21 p.m. UTC | #3

Hi Dscho,

Le 2022-08-24 à 02:27, Johannes Schindelin a écrit :
> Hi Philippe,
> 
> On Tue, 23 Aug 2022, Philippe Blain wrote:
> 
>> However, I've tried it on a more "real-life" example, and then I get:
>>
>>     error: mismatched output from interactive.diffFilter
>>     hint: Your filter must maintain a one-to-one correspondence
>>     hint: between its input and output lines.
>>
>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
>> keep the number of lines the same.
> 
> Would you mind sharing the example with me?
> 
> Thanks,
> Dscho

In trying to write a reproducer, I realize that this is in fact a separate "bug".
I happened to have a submodule in the repository I was testing, and it had modified 
content. This is what was causing the "mismatched output" error, although I'm not
sure why. If I use a pathspec that does not include the submodule when invoking 
'git add -p', it works correctly. 

I'm a bit out of time now but I'll try to write a separate reproducer for this later today.

As for my real-life example, I noticed a little change in the output. 
Here is what it looks like with the Perl version:

$ git -c interactive.diffFilter="diff-so-fancy --patch" -c add.interactive.useBuiltin=false -add p ice_dyn_vp.F90

───────────────────────────────────────────────────────────────────────
modified: ice_dyn_vp.F90
───────────────────────────────────────────────────────────────────────
@ ice_dyn_vp.F90:274 @ subroutine implicit_solver (dt)
         do i = 1, nx_block
            rdg_conv (i,j,iblk) = c0
            rdg_shear(i,j,iblk) = c0
            divu (i,j,iblk) = c0
            divv (i,j,iblk) = c0
            shear(i,j,iblk) = c0
         enddo
         enddo
(1/1) Stage this hunk [y,n,q,a,d,e,?]? 

The first part of the hunk header "ice_dyn_vp.F90:274" is in pink and the subroutine
name is in light mauve (default diff-so-fancy colors). Here is what it looks like
with your patches with the builtin version:

$ git -c interactive.diffFilter="diff-so-fancy --patch" add -p ice_dyn_vp.F90

───────────────────────────────────────────────────────────────────────
modified: ice_dyn_vp.F90
───────────────────────────────────────────────────────────────────────
@@ -271,7 +271,7 @@ @ ice_dyn_vp.F90:274 @ subroutine implicit_solver (dt)
         do i = 1, nx_block
            rdg_conv (i,j,iblk) = c0
            rdg_shear(i,j,iblk) = c0
            divu (i,j,iblk) = c0
            divv (i,j,iblk) = c0
            shear(i,j,iblk) = c0
         enddo
         enddo
(1/1) Stage this hunk [y,n,q,a,d,e,?]? 

Notice we have the line numbers (-271,7 +271,7) in cyan, then the first part
of the hunk header modified by diff-so-fancy (ice_dyn_vp.F90:274) in pink, and
then the subroutine name (in light mauve). 

We can see the same thing with my earlier reproducer, only there is no function
matched. I really don't mind having this first part with the line numbers, I
just wanted to share my observation.

Thanks,

Philippe.

Philippe Blain Aug. 24, 2022, 5:49 p.m. UTC | #4

Hi Dscho,

Le 2022-08-24 à 09:21, Philippe Blain a écrit :
> Hi Dscho,
> 
> Le 2022-08-24 à 02:27, Johannes Schindelin a écrit :
>> Hi Philippe,
>>
>> On Tue, 23 Aug 2022, Philippe Blain wrote:
>>
>>> However, I've tried it on a more "real-life" example, and then I get:
>>>
>>>     error: mismatched output from interactive.diffFilter
>>>     hint: Your filter must maintain a one-to-one correspondence
>>>     hint: between its input and output lines.
>>>
>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
>>> keep the number of lines the same.
>>
>> Would you mind sharing the example with me?
>>
>> Thanks,
>> Dscho
> 
> In trying to write a reproducer, I realize that this is in fact a separate "bug".
> I happened to have a submodule in the repository I was testing, and it had modified 
> content. This is what was causing the "mismatched output" error, although I'm not
> sure why. If I use a pathspec that does not include the submodule when invoking 
> 'git add -p', it works correctly. 
> 
> I'm a bit out of time now but I'll try to write a separate reproducer for this later today.

So in trying to write a new reproducer, I found the cause of the bug. A submodule in "modified 
content" or "untracked content" state (i.e. no new commit) has no "index" line in its diff
header. diff-so-fancy, with --patch, always *adds* a blank line before the diff-header 
because its modified diff header is 3 lines, whereas Git's is 4. So the count is off 
specifically for submodules in  "modified/untracked content" state. 

It seems the C version is less lenient than the Perl version, because the Perl version does
not complain. But I'd say that it's more of a bug on the diff-so-fancy side, even if it 
is a regression of the builtin version...
Delta does not alter the header (in its default configuration) and so it is not affected here.

For what it's worth, here is the updated reproducer script. It also downloads the 'delta' executable (for Linux)
and runs it. With delta, both Perl and C versions work. With diff-so-fancy, the C version 
complains bout mismatched output.

```bash
#!/bin/bash

set -euEo pipefail

repo=repro
rm -rf $repo

TEST_AUTHOR_LOCALNAME=author
TEST_AUTHOR_DOMAIN=example.com
GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN}
GIT_AUTHOR_NAME='A U Thor'
GIT_AUTHOR_DATE='1112354055 +0200'
TEST_COMMITTER_LOCALNAME=committer
TEST_COMMITTER_DOMAIN=example.com
GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN}
GIT_COMMITTER_NAME='C O Mitter'
GIT_COMMITTER_DATE='1112354055 +0200'
export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
export HOME=

git -c init.defaultBranch=unimportant init $repo
cd $repo
date>file
git add file
git commit -m file
git clone https://github.com/so-fancy/diff-so-fancy.git
date>>file
export PATH="diff-so-fancy:$PATH"
git submodule add ./diff-so-fancy
git commit -m "add sub"
date>diff-so-fancy/README.md
curl -sSLO https://github.com/dandavison/delta/releases/download/0.12.1/delta-0.12.1-x86_64-unknown-linux-gnu.tar.gz
tar xzf delta-0.12.1-x86_64-unknown-linux-gnu.tar.gz
export PATH="delta-0.12.1-x86_64-unknown-linux-gnu:$PATH"
echo -----
git -c interactive.diffFilter="delta --color-only" -c add.interactive.useBuiltin=false add -p < <(echo n)
echo -----
git -c interactive.diffFilter="delta --color-only" add -p < <(echo n)
echo -----
git -c interactive.diffFilter="diff-so-fancy --patch" -c add.interactive.useBuiltin=false add -p < <(echo n)
echo -----
git -c interactive.diffFilter="diff-so-fancy --patch" add -p < <(echo n) 
```

Cheers,

Philippe.

Junio C Hamano Aug. 24, 2022, 6:24 p.m. UTC | #5

Philippe Blain <levraiphilippeblain@gmail.com> writes:

>>>> However, I've tried it on a more "real-life" example, and then I get:
>>>>
>>>>     error: mismatched output from interactive.diffFilter
>>>>     hint: Your filter must maintain a one-to-one correspondence
>>>>     hint: between its input and output lines.
>>>>
>>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
>>>> keep the number of lines the same.
>>>
>>> Would you mind sharing the example with me?
>>>
>>> Thanks,
>>> Dscho
>> 
>> In trying to write a reproducer, I realize that this is in fact a separate "bug".
>> I happened to have a submodule in the repository I was testing, and it had modified 
>> content. This is what was causing the "mismatched output" error, although I'm not
>> sure why. If I use a pathspec that does not include the submodule when invoking 
>> 'git add -p', it works correctly. 
>> 
>> I'm a bit out of time now but I'll try to write a separate
>> reproducer for this later today.
>
> So in trying to write a new reproducer, I found the cause of the
> bug. ...

I somehow had an impression that that the C reimplementation was
designed to be a bug-for-bug compatible faithful rewrite of the
original scripted version, but looking at the way how this bug is
handled, I am not sure (it looks as if the initial fix was to patch
the code to match the symptom, not to make the code to behave more
faithfully to the scripted version).  As with any big rewrite, a
complete re-implementation always has risks to introduce unintended
regressions and that is why starting with faithful rewrite with the
staged transition plan is valuable.

In this case, the staged transition plan, the step to flip the
default without before remove the old one, is working beautifully.
And it was only made possible with the actual users reporting issues
before they say "the new one is broken, so let's use the knob to
retain the old implementation".  Please thank those in diff-so-fancy
land for reporting the issue.

Thanks, you two, for working on this.

Johannes Schindelin Aug. 24, 2022, 9:05 p.m. UTC | #6

Hi Junio & Philippe,

On Wed, 24 Aug 2022, Junio C Hamano wrote:

> Philippe Blain <levraiphilippeblain@gmail.com> writes:
>
> >>>> However, I've tried it on a more "real-life" example, and then I get:
> >>>>
> >>>>     error: mismatched output from interactive.diffFilter
> >>>>     hint: Your filter must maintain a one-to-one correspondence
> >>>>     hint: between its input and output lines.
> >>>>
> >>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
> >>>> keep the number of lines the same.
> >>>
> >>> Would you mind sharing the example with me?
> >>
> >> In trying to write a reproducer, I realize that this is in fact a separate "bug".
> >> I happened to have a submodule in the repository I was testing, and it had modified
> >> content. This is what was causing the "mismatched output" error, although I'm not
> >> sure why. If I use a pathspec that does not include the submodule when invoking
> >> 'git add -p', it works correctly.
> >>
> >> I'm a bit out of time now but I'll try to write a separate
> >> reproducer for this later today.
> >
> > So in trying to write a new reproducer, I found the cause of the
> > bug. ...

Ah. Submodules. The gift that keeps on giving.

I'm not _really_ complaining, though. It seems that with beautiful
regularity, submodules come up as crucial parts of attack vectors in
exploits that are reported to the git-security mailing list, requiring
somebody with my abilities to implement fixes. Therefore, submodules award
me with a sort of job security. :-)

> I somehow had an impression that that the C reimplementation was
> designed to be a bug-for-bug compatible faithful rewrite of the
> original scripted version,

Yes, it was designed as that, with a few things done differently, though,
such as avoiding unnecessary `git diff` invocations, and using C-native
data structures and memory management.

One unfortunate consequence of avoiding the `git diff` invocations of
`git-add--interactive.perl` that are not actually necessary is that I
missed that one of those invocations specifically ignores dirty
submodules, and that the Perl script then only processes files whose
numstat shows up in _that_ diff output.

Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git
diff` invocations of `add -p`.

I will submit the next iteration as soon as the PR builds pass.

> but looking at the way how this bug is handled, I am not sure (it looks
> as if the initial fix was to patch the code to match the symptom, not to
> make the code to behave more faithfully to the scripted version).

The built-in `add -p` does something slightly different from the Perl
version when it comes to the line ranges in hunk headers: instead of
storing stringified versions of them and parsing & re-rendering them when
the hunks are modified, the C version stores the actual numbers, adjusts
them as needed (e.g. when hunks are split), and, crucially, renders them
on the fly when displaying the hunks.

That means that also the colorized hunk headers are generated on the fly
whenever the hunks are displayed, and they are never stored in rendered
form, neither while parsing diffs nor when hunks are split or otherwise
edited. That's why the built-in `add -p` looked for a `@@ ... @@` part in
the colorized diff, so that it can display the remainder of that line
after showing the rendered line range.

And yes, this is a deviation from the bug-for-bug principle I follow for
conversions from scripted commands to built-in C code, but it was for a
good reason: memory management in C is less trivial than in Perl (where
one essentially YOLOs memory management), and I was not prepared to invent
a rope data structure just to be able to replace parts of a long string
buffer with rendered text of a different length, nor was I prepared to
call `memmove()` tons of times.

Note that the Perl version does something slightly iffy, too: if a diff
colorizer is configured, it initially shows the original hunk headers
(e.g. "@ file:16 @" in the `diff-so-fancy` case) but when the hunk is
modified in any way, it generates "@@ -1,6 +1,7 @@"-style hunk headers and
does not show the ones generated by the diff colorizer _at all_ anymore.

In this patch series, I teach the built-in `add -p` to be more lenient and
simply show the entire original hunk header after the rendered line range
if no `@@ ... @@` part was found. If that part was found, we still replace
it with the rendered line range.

Yes, it is a deviation from what the Perl version does. I consider it an
improvement, though.

> As with any big rewrite, a complete re-implementation always has risks
> to introduce unintended regressions and that is why starting with
> faithful rewrite with the staged transition plan is valuable.
>
> In this case, the staged transition plan, the step to flip the
> default without before remove the old one, is working beautifully.
> And it was only made possible with the actual users reporting issues
> before they say "the new one is broken, so let's use the knob to
> retain the old implementation".

Indeed, and thank you, Philippe, for bringing this to the Git mailing list
so that I could do something about the bug.

Ciao,
Dscho

Junio C Hamano Aug. 24, 2022, 9:37 p.m. UTC | #7

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> One unfortunate consequence of avoiding the `git diff` invocations of
> `git-add--interactive.perl` that are not actually necessary is that I
> missed that one of those invocations specifically ignores dirty
> submodules, and that the Perl script then only processes files whose
> numstat shows up in _that_ diff output.
>
> Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git
> diff` invocations of `add -p`.

OK, so it wasn't removing what is useless, but removing something
whose result was used.  And our next move is not to bring us closer
to how we used to correctly handle third-party diff filters by
mimicking the original better but doing the handling of this
particular piece differently.

> I will submit the next iteration as soon as the PR builds pass.

OK.

>> As with any big rewrite, a complete re-implementation always has risks
>> to introduce unintended regressions and that is why starting with
>> faithful rewrite with the staged transition plan is valuable.
>>
>> In this case, the staged transition plan, the step to flip the
>> default without before remove the old one, is working beautifully.
>> And it was only made possible with the actual users reporting issues
>> before they say "the new one is broken, so let's use the knob to
>> retain the old implementation".
>
> Indeed, and thank you, Philippe, for bringing this to the Git mailing list
> so that I could do something about the bug.

Yup.

To some people, "refatoring to match my subjective code aesthetics"
may be personally the most important, and to some other people,
"getting rid of scripted Porcelains" may be, but they are both wrong
and these are their second most important goals ;-)  Not regressing
end-user experience is, and we learned a valuable lesson here.  Be
conservative and make sure we can revert to buy time when making a
big rewrite like this one.

Thanks, all.

[0/3] built-in add -p: support diff-so-fancy better

Message

Comments