Message ID | pull.1336.git.1661277870.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | built-in add -p: support diff-so-fancy better | expand |
Hi Dscho, Le 2022-08-23 à 14:04, Johannes Schindelin via GitGitGadget a écrit : > Philippe Blain reported in > https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com > that there is a problem when running the built-in version of git add -p with > diff-so-fancy [https://github.com/so-fancy/diff-so-fancy] as diff colorizer. > The symptom is this: > > error: could not parse colored hunk header '?[36m?[1m?[38;5;13m@ file:1 @?[1m?[0m' > > > This patch series addresses that and should fix > https://github.com/so-fancy/diff-so-fancy/issues/437 Thanks for the quick reaction! I've verified that the patches fix the example in my reproducer, and they also fix it if using "delta --color-only" as interactive.diffFilter. Delta is another diff colorizer (and more) that's facing the same issue as diff-so-fancy [1]. However, I've tried it on a more "real-life" example, and then I get: error: mismatched output from interactive.diffFilter hint: Your filter must maintain a one-to-one correspondence hint: between its input and output lines. This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should keep the number of lines the same. This ('--patch') was added in [2], about a month after Peff wrote the message you mention in https://lore.kernel.org/git/s40ss309-3311-o08s-38r2-9144r33pq549@tzk.qr/. Again, when using the Perl version with this new example, it works correctly. I'll try to come up with a new reproducer for this... But this new example does work with delta with the builtin version, so it might be diff-so-fancy that's the culprit... Cheers, Philippe. [1] https://github.com/dandavison/delta/issues/1114 [2] https://github.com/so-fancy/diff-so-fancy/commit/13d3f8949e15dd62f6b49bc652fe94af6a9bfc79
Hi Philippe, On Tue, 23 Aug 2022, Philippe Blain wrote: > Le 2022-08-23 à 14:04, Johannes Schindelin via GitGitGadget a écrit : > > Philippe Blain reported in > > https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com > > that there is a problem when running the built-in version of git add -p with > > diff-so-fancy [https://github.com/so-fancy/diff-so-fancy] as diff colorizer. > > The symptom is this: > > > > error: could not parse colored hunk header '?[36m?[1m?[38;5;13m@ file:1 @?[1m?[0m' > > > > > > This patch series addresses that and should fix > > https://github.com/so-fancy/diff-so-fancy/issues/437 > > Thanks for the quick reaction! > I've verified that the patches fix the example in my reproducer, and they also fix it > if using "delta --color-only" as interactive.diffFilter. Delta is another diff colorizer > (and more) that's facing the same issue as diff-so-fancy [1]. Great! > However, I've tried it on a more "real-life" example, and then I get: > > error: mismatched output from interactive.diffFilter > hint: Your filter must maintain a one-to-one correspondence > hint: between its input and output lines. > > This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should > keep the number of lines the same. Would you mind sharing the example with me? Thanks, Dscho > This ('--patch') was added in [2], about a month after Peff wrote the message > you mention in https://lore.kernel.org/git/s40ss309-3311-o08s-38r2-9144r33pq549@tzk.qr/. > > Again, when using the Perl version with this new example, it works correctly. I'll try to come up with a new > reproducer for this... But this new example does work with delta with the builtin version, > so it might be diff-so-fancy that's the culprit... > > Cheers, > > Philippe. > > [1] https://github.com/dandavison/delta/issues/1114 > [2] https://github.com/so-fancy/diff-so-fancy/commit/13d3f8949e15dd62f6b49bc652fe94af6a9bfc79 >
Hi Dscho, Le 2022-08-24 à 02:27, Johannes Schindelin a écrit : > Hi Philippe, > > On Tue, 23 Aug 2022, Philippe Blain wrote: > >> However, I've tried it on a more "real-life" example, and then I get: >> >> error: mismatched output from interactive.diffFilter >> hint: Your filter must maintain a one-to-one correspondence >> hint: between its input and output lines. >> >> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should >> keep the number of lines the same. > > Would you mind sharing the example with me? > > Thanks, > Dscho In trying to write a reproducer, I realize that this is in fact a separate "bug". I happened to have a submodule in the repository I was testing, and it had modified content. This is what was causing the "mismatched output" error, although I'm not sure why. If I use a pathspec that does not include the submodule when invoking 'git add -p', it works correctly. I'm a bit out of time now but I'll try to write a separate reproducer for this later today. As for my real-life example, I noticed a little change in the output. Here is what it looks like with the Perl version: $ git -c interactive.diffFilter="diff-so-fancy --patch" -c add.interactive.useBuiltin=false -add p ice_dyn_vp.F90 ─────────────────────────────────────────────────────────────────────── modified: ice_dyn_vp.F90 ─────────────────────────────────────────────────────────────────────── @ ice_dyn_vp.F90:274 @ subroutine implicit_solver (dt) do i = 1, nx_block rdg_conv (i,j,iblk) = c0 rdg_shear(i,j,iblk) = c0 divu (i,j,iblk) = c0 divv (i,j,iblk) = c0 shear(i,j,iblk) = c0 enddo enddo (1/1) Stage this hunk [y,n,q,a,d,e,?]? The first part of the hunk header "ice_dyn_vp.F90:274" is in pink and the subroutine name is in light mauve (default diff-so-fancy colors). Here is what it looks like with your patches with the builtin version: $ git -c interactive.diffFilter="diff-so-fancy --patch" add -p ice_dyn_vp.F90 ─────────────────────────────────────────────────────────────────────── modified: ice_dyn_vp.F90 ─────────────────────────────────────────────────────────────────────── @@ -271,7 +271,7 @@ @ ice_dyn_vp.F90:274 @ subroutine implicit_solver (dt) do i = 1, nx_block rdg_conv (i,j,iblk) = c0 rdg_shear(i,j,iblk) = c0 divu (i,j,iblk) = c0 divv (i,j,iblk) = c0 shear(i,j,iblk) = c0 enddo enddo (1/1) Stage this hunk [y,n,q,a,d,e,?]? Notice we have the line numbers (-271,7 +271,7) in cyan, then the first part of the hunk header modified by diff-so-fancy (ice_dyn_vp.F90:274) in pink, and then the subroutine name (in light mauve). We can see the same thing with my earlier reproducer, only there is no function matched. I really don't mind having this first part with the line numbers, I just wanted to share my observation. Thanks, Philippe.
Hi Dscho, Le 2022-08-24 à 09:21, Philippe Blain a écrit : > Hi Dscho, > > Le 2022-08-24 à 02:27, Johannes Schindelin a écrit : >> Hi Philippe, >> >> On Tue, 23 Aug 2022, Philippe Blain wrote: >> >>> However, I've tried it on a more "real-life" example, and then I get: >>> >>> error: mismatched output from interactive.diffFilter >>> hint: Your filter must maintain a one-to-one correspondence >>> hint: between its input and output lines. >>> >>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should >>> keep the number of lines the same. >> >> Would you mind sharing the example with me? >> >> Thanks, >> Dscho > > In trying to write a reproducer, I realize that this is in fact a separate "bug". > I happened to have a submodule in the repository I was testing, and it had modified > content. This is what was causing the "mismatched output" error, although I'm not > sure why. If I use a pathspec that does not include the submodule when invoking > 'git add -p', it works correctly. > > I'm a bit out of time now but I'll try to write a separate reproducer for this later today. So in trying to write a new reproducer, I found the cause of the bug. A submodule in "modified content" or "untracked content" state (i.e. no new commit) has no "index" line in its diff header. diff-so-fancy, with --patch, always *adds* a blank line before the diff-header because its modified diff header is 3 lines, whereas Git's is 4. So the count is off specifically for submodules in "modified/untracked content" state. It seems the C version is less lenient than the Perl version, because the Perl version does not complain. But I'd say that it's more of a bug on the diff-so-fancy side, even if it is a regression of the builtin version... Delta does not alter the header (in its default configuration) and so it is not affected here. For what it's worth, here is the updated reproducer script. It also downloads the 'delta' executable (for Linux) and runs it. With delta, both Perl and C versions work. With diff-so-fancy, the C version complains bout mismatched output. ```bash #!/bin/bash set -euEo pipefail repo=repro rm -rf $repo TEST_AUTHOR_LOCALNAME=author TEST_AUTHOR_DOMAIN=example.com GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN} GIT_AUTHOR_NAME='A U Thor' GIT_AUTHOR_DATE='1112354055 +0200' TEST_COMMITTER_LOCALNAME=committer TEST_COMMITTER_DOMAIN=example.com GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN} GIT_COMMITTER_NAME='C O Mitter' GIT_COMMITTER_DATE='1112354055 +0200' export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME export GIT_COMMITTER_DATE GIT_AUTHOR_DATE export HOME= git -c init.defaultBranch=unimportant init $repo cd $repo date>file git add file git commit -m file git clone https://github.com/so-fancy/diff-so-fancy.git date>>file export PATH="diff-so-fancy:$PATH" git submodule add ./diff-so-fancy git commit -m "add sub" date>diff-so-fancy/README.md curl -sSLO https://github.com/dandavison/delta/releases/download/0.12.1/delta-0.12.1-x86_64-unknown-linux-gnu.tar.gz tar xzf delta-0.12.1-x86_64-unknown-linux-gnu.tar.gz export PATH="delta-0.12.1-x86_64-unknown-linux-gnu:$PATH" echo ----- git -c interactive.diffFilter="delta --color-only" -c add.interactive.useBuiltin=false add -p < <(echo n) echo ----- git -c interactive.diffFilter="delta --color-only" add -p < <(echo n) echo ----- git -c interactive.diffFilter="diff-so-fancy --patch" -c add.interactive.useBuiltin=false add -p < <(echo n) echo ----- git -c interactive.diffFilter="diff-so-fancy --patch" add -p < <(echo n) ``` Cheers, Philippe.
Philippe Blain <levraiphilippeblain@gmail.com> writes: >>>> However, I've tried it on a more "real-life" example, and then I get: >>>> >>>> error: mismatched output from interactive.diffFilter >>>> hint: Your filter must maintain a one-to-one correspondence >>>> hint: between its input and output lines. >>>> >>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should >>>> keep the number of lines the same. >>> >>> Would you mind sharing the example with me? >>> >>> Thanks, >>> Dscho >> >> In trying to write a reproducer, I realize that this is in fact a separate "bug". >> I happened to have a submodule in the repository I was testing, and it had modified >> content. This is what was causing the "mismatched output" error, although I'm not >> sure why. If I use a pathspec that does not include the submodule when invoking >> 'git add -p', it works correctly. >> >> I'm a bit out of time now but I'll try to write a separate >> reproducer for this later today. > > So in trying to write a new reproducer, I found the cause of the > bug. ... I somehow had an impression that that the C reimplementation was designed to be a bug-for-bug compatible faithful rewrite of the original scripted version, but looking at the way how this bug is handled, I am not sure (it looks as if the initial fix was to patch the code to match the symptom, not to make the code to behave more faithfully to the scripted version). As with any big rewrite, a complete re-implementation always has risks to introduce unintended regressions and that is why starting with faithful rewrite with the staged transition plan is valuable. In this case, the staged transition plan, the step to flip the default without before remove the old one, is working beautifully. And it was only made possible with the actual users reporting issues before they say "the new one is broken, so let's use the knob to retain the old implementation". Please thank those in diff-so-fancy land for reporting the issue. Thanks, you two, for working on this.
Hi Junio & Philippe, On Wed, 24 Aug 2022, Junio C Hamano wrote: > Philippe Blain <levraiphilippeblain@gmail.com> writes: > > >>>> However, I've tried it on a more "real-life" example, and then I get: > >>>> > >>>> error: mismatched output from interactive.diffFilter > >>>> hint: Your filter must maintain a one-to-one correspondence > >>>> hint: between its input and output lines. > >>>> > >>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should > >>>> keep the number of lines the same. > >>> > >>> Would you mind sharing the example with me? > >> > >> In trying to write a reproducer, I realize that this is in fact a separate "bug". > >> I happened to have a submodule in the repository I was testing, and it had modified > >> content. This is what was causing the "mismatched output" error, although I'm not > >> sure why. If I use a pathspec that does not include the submodule when invoking > >> 'git add -p', it works correctly. > >> > >> I'm a bit out of time now but I'll try to write a separate > >> reproducer for this later today. > > > > So in trying to write a new reproducer, I found the cause of the > > bug. ... Ah. Submodules. The gift that keeps on giving. I'm not _really_ complaining, though. It seems that with beautiful regularity, submodules come up as crucial parts of attack vectors in exploits that are reported to the git-security mailing list, requiring somebody with my abilities to implement fixes. Therefore, submodules award me with a sort of job security. :-) > I somehow had an impression that that the C reimplementation was > designed to be a bug-for-bug compatible faithful rewrite of the > original scripted version, Yes, it was designed as that, with a few things done differently, though, such as avoiding unnecessary `git diff` invocations, and using C-native data structures and memory management. One unfortunate consequence of avoiding the `git diff` invocations of `git-add--interactive.perl` that are not actually necessary is that I missed that one of those invocations specifically ignores dirty submodules, and that the Perl script then only processes files whose numstat shows up in _that_ diff output. Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git diff` invocations of `add -p`. I will submit the next iteration as soon as the PR builds pass. > but looking at the way how this bug is handled, I am not sure (it looks > as if the initial fix was to patch the code to match the symptom, not to > make the code to behave more faithfully to the scripted version). The built-in `add -p` does something slightly different from the Perl version when it comes to the line ranges in hunk headers: instead of storing stringified versions of them and parsing & re-rendering them when the hunks are modified, the C version stores the actual numbers, adjusts them as needed (e.g. when hunks are split), and, crucially, renders them on the fly when displaying the hunks. That means that also the colorized hunk headers are generated on the fly whenever the hunks are displayed, and they are never stored in rendered form, neither while parsing diffs nor when hunks are split or otherwise edited. That's why the built-in `add -p` looked for a `@@ ... @@` part in the colorized diff, so that it can display the remainder of that line after showing the rendered line range. And yes, this is a deviation from the bug-for-bug principle I follow for conversions from scripted commands to built-in C code, but it was for a good reason: memory management in C is less trivial than in Perl (where one essentially YOLOs memory management), and I was not prepared to invent a rope data structure just to be able to replace parts of a long string buffer with rendered text of a different length, nor was I prepared to call `memmove()` tons of times. Note that the Perl version does something slightly iffy, too: if a diff colorizer is configured, it initially shows the original hunk headers (e.g. "@ file:16 @" in the `diff-so-fancy` case) but when the hunk is modified in any way, it generates "@@ -1,6 +1,7 @@"-style hunk headers and does not show the ones generated by the diff colorizer _at all_ anymore. In this patch series, I teach the built-in `add -p` to be more lenient and simply show the entire original hunk header after the rendered line range if no `@@ ... @@` part was found. If that part was found, we still replace it with the rendered line range. Yes, it is a deviation from what the Perl version does. I consider it an improvement, though. > As with any big rewrite, a complete re-implementation always has risks > to introduce unintended regressions and that is why starting with > faithful rewrite with the staged transition plan is valuable. > > In this case, the staged transition plan, the step to flip the > default without before remove the old one, is working beautifully. > And it was only made possible with the actual users reporting issues > before they say "the new one is broken, so let's use the knob to > retain the old implementation". Indeed, and thank you, Philippe, for bringing this to the Git mailing list so that I could do something about the bug. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > One unfortunate consequence of avoiding the `git diff` invocations of > `git-add--interactive.perl` that are not actually necessary is that I > missed that one of those invocations specifically ignores dirty > submodules, and that the Perl script then only processes files whose > numstat shows up in _that_ diff output. > > Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git > diff` invocations of `add -p`. OK, so it wasn't removing what is useless, but removing something whose result was used. And our next move is not to bring us closer to how we used to correctly handle third-party diff filters by mimicking the original better but doing the handling of this particular piece differently. > I will submit the next iteration as soon as the PR builds pass. OK. >> As with any big rewrite, a complete re-implementation always has risks >> to introduce unintended regressions and that is why starting with >> faithful rewrite with the staged transition plan is valuable. >> >> In this case, the staged transition plan, the step to flip the >> default without before remove the old one, is working beautifully. >> And it was only made possible with the actual users reporting issues >> before they say "the new one is broken, so let's use the knob to >> retain the old implementation". > > Indeed, and thank you, Philippe, for bringing this to the Git mailing list > so that I could do something about the bug. Yup. To some people, "refatoring to match my subjective code aesthetics" may be personally the most important, and to some other people, "getting rid of scripted Porcelains" may be, but they are both wrong and these are their second most important goals ;-) Not regressing end-user experience is, and we learned a valuable lesson here. Be conservative and make sure we can revert to buy time when making a big rewrite like this one. Thanks, all.