Message ID | CAHd499BT35jvPtsuD9gfJB0HJ=NxtzyQOaiD7-=sHJbFYhphpg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff.renames not working? | expand |
On Fri, Sep 13, 2019 at 1:24 PM Robert Dailey <rcdailey.lists@gmail.com> wrote: > > 2. `diff -M` doesn't actually work either. It should, though. In fact, > I expected it to work as `--follow` does. But it doesn't. Just a small point of clarification: Is -M really what you mean here? Given you indicated you have "diff.renames=copies", wouldn't you need -C? -M only detects renames, not copies. (I haven't looked too deeply into the rest, but this detail caught my eye.) Best regards, Bryan Turner > 3. The `diff.renames` config doesn't seem to be working here, when it should. > > Can someone explain the behavior I'm seeing? I really am confused > about all this... > > [1]: https://git-scm.com/docs/git-diff
On Fri, Sep 13, 2019 at 03:24:06PM -0500, Robert Dailey wrote: > Now my goal is to diff `ZPayClient.hpp` and see the changes to the > moved-out portion of code as it relates to the original state of that > code in `JniPaymentManager.hpp`. To do this, I tried this command: > > ``` > $ git diff master...topic -- ZPay/ZPayClient.hpp > ``` > > The unified diff header I got back is: > > ``` > diff --git ZPay/ZPayClient.hpp ZPay/ZPayClient.hpp > new file mode 100644 > index 00000000..6ebc2a9a > --- /dev/null > +++ ZPay/ZPayClient.hpp > ``` > > Hmm, it's treating it as a new file. Even though I have `diff.renames` > set to `copies`? Even though `diff --name-status` acknowledges the > relationship with the original file for the code on `master`? This is > confusing... This is due to the way that rename detection works. When looking for renames, the tree diff gives us a series of candidate "sources" that were deleted and candidates "dests" that were added. And then we try to match them up. Copy detection differs in that it uses any file touched in the diff as a source, not just deletions. And then "--find-copies-harder" uses even unmodified files as sources (but see below). So here's a simplified setup: git init repo cd repo seq 100 >a git add a git commit -m a cp a b seq 99 >a git add a b git commit -m b and then we can try these commands: # this won't find a rename, because "a" was not deleted git diff-tree --name-status -M HEAD^ HEAD # this will find the copy, because now we consider "a" a source git diff-tree --name-status -C HEAD^ HEAD # this _won't_ find the copy, because we limited our tree diff to # just look at "b"; hence we don't even consider "a" a source git diff-tree --name-status -C HEAD^ HEAD -- b And that last one is the one that confused you. Naively it seems like doing this would work (two "-C" are the same as "--find-copies-harder"): git diff-tree --name-status -C -C HEAD^ HEAD -- b but it doesn't. That's because we're still using the tree diff to find sources, and just adding unmodified entries to the source list. But our pathspec prevents the diff from even considering "a". While this might seem useless at first, you can imagine something like: git diff-tree -C -C HEAD^ HEAD -- subdir/ which would consider all files in subdir as sources, but not those outside (which may be especially important for performance). But there's no (clean) way to expand the set of paths that we consider as sources without also showing them in the output. There are two useful variants I could imagine, though: - a way to consider _all_ paths in the repository, not just those in the pathspec, as sources, but show only the entries from the pathspec. This could probably be a "harder" version of "--find-copies-harder", something like "-C -C -C <revs> -- b". Naturally this would be even more expensive in a big repo. - a way to independently specify the source pathspec and the output-limiting pathspec. This is a cheaper version of the one above, where you could look at a subset of the tree a sources, but limit the set of shown paths even further. It's not conceptually that difficult, but syntactically it gets weird since you have two lists of pathspecs on the command-line. I think the first one wouldn't be _too_ hard if somebody is interested in getting their feet wet with the diffcore-rename.c code. The second is probably not worth the effort. > Out of curiosity, I thought I'd try this command: > > ``` > git diff --follow master...topic -- ZPay/ZPayClient.hpp > ``` So yes, that does work, and is why I added the "(clean)" qualifier above. It behaves like the "-C -C -C" I proposed. But the fact that it does so is entirely accidental. What happens is this: - we're a little sloppy about what constitutes a traversal option and what is a diff option. Many diff commands rely on setup_revisions(), which parses both. So "diff --follow" probably _should_ be flagged as an error, but isn't. - the implementation of "--follow" works by doing a separate, from-scratch tree-level diff on each commit (it _has_ to ignore your pathspec, since by definition it allows only a single file to be in the pathspec). And then rather than throwing away that result, it feeds it to the rest of the diff pipeline, which then shows the output you expected. So it does do what you want, but only for the single-file case. And certainly it was never intended to, and that might change in the future. > Now this looks more like it. I can actually see a useful diff here, > instead of everything looking like a new file. But there is a lot of > confusion here: > > 1. `diff --follow` is not a documented[1] option. Why does it work? Accident. :) See above. > 2. `diff -M` doesn't actually work either. It should, though. In fact, > I expected it to work as `--follow` does. But it doesn't. It doesn't work because this is a copy, not a rename. > 3. The `diff.renames` config doesn't seem to be working here, when it should. It does, but the pathspec prevents it from finding a source candidate. -Peff
On Fri, Sep 13, 2019 at 10:30 PM Jeff King <peff@peff.net> wrote: > SNIP > > > Now this looks more like it. I can actually see a useful diff here, > > instead of everything looking like a new file. But there is a lot of > > confusion here: > > > > 1. `diff --follow` is not a documented[1] option. Why does it work? > > Accident. :) See above. > > > 2. `diff -M` doesn't actually work either. It should, though. In fact, > > I expected it to work as `--follow` does. But it doesn't. > > It doesn't work because this is a copy, not a rename. > > > 3. The `diff.renames` config doesn't seem to be working here, when it should. > > It does, but the pathspec prevents it from finding a source candidate. Jeff, thanks so much. All of your examples help to contrast the different behavior. I thought -M did copies & renames, that was my misunderstanding. But the fact that you explained that the file spec would prevent it from working as I'd like anyway means it doesn't matter too much. Ultimately my goal is to use the pathspec to filter what is visible, I don't necessarily intend it to obscure my results due to internal behavior. I realize there are performance obligations too (as you described) but I feel like something like this should be straightforward for end users. I consider myself a step above most people with my understanding of Git and how it functions, and even I was confused by this. Especially when it comes to folks I work with at my day job, I just don't see them making sense of this. To me, as transparently as possible, moves and copies should be handled intrinsically. I realize there are technical challenges (performance, ambiguities, etc), but anything that can be done to help with that would go a long way with most users. Maybe a 3rd option from your list to solve this issue might be adding some metadata to commits or blob object types to record the pathspec of the file it was a copy or move of. Even if this required you to first do a git cp/mv before modifying the content, or a strict similarity threshold, I think it would go a long way. That means that `diff` can utilize the metadata in the object itself to find the path specs to follow, instead of requiring some complex deduction logic that has a steep performance cost.
Jeff King <peff@peff.net> writes: > - a way to independently specify the source pathspec and the > output-limiting pathspec. This is a cheaper version of the one > above, where you could look at a subset of the tree a sources, but > limit the set of shown paths even further. It's not conceptually > that difficult, but syntactically it gets weird since you have two > lists of pathspecs on the command-line. Yes, the pathspec has always been input limiter and I think we have discussed adding separate output limiter from time to time, but nothing has happened. Back in the scripted porcelain days, it would conceptually have been simpler and cleaner, as you probably would run "diff-tree --raw" with no (or "input") pathspec, filter its output with "output" pathspec, and then convert the raw diff to a patch ( we used to have such a filter, before we gave the -p option to all three "diff" family backends). Introducing an option to say "Pretend as if the system supported output pathspec, and use the pathspec as such, without any input pathspec" feels like a dirty hack compared to conceptual purity of having two separate sets, and having to run without any input pathspec may not be usable in truly large project, but I suspect it may be good enough for most people (i.e. your "first one" that wouldn't be too hard). I am not sure if I like it, though X-<. >> ``` >> git diff --follow master...topic -- ZPay/ZPayClient.hpp >> ``` > > So yes, that does work, and is why I added the "(clean)" qualifier > above. It behaves like the "-C -C -C" I proposed. But the fact that it > does so is entirely accidental. What happens is this: I am kinda surprised that "diff" gets affected by "--follow" (I knew that the command line parser would take that option meant for "log" family without complaining) at all.
diff --git ZPay/ZPayClient.hpp ZPay/ZPayClient.hpp new file mode 100644 index 00000000..6ebc2a9a --- /dev/null diff --git Jni/JniPaymentManager.hpp ZPay/ZPayClient.hpp similarity index 70% copy from Jni/JniPaymentManager.hpp copy to ZPay/ZPayClient.hpp index fc18e2d2..6ebc2a9a 100644 --- Jni/JniPaymentManager.hpp