Message ID | 20200920130945.26399-1-tguyot@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] diff: Fix modified lines stats with --stat and --numstat | expand |
On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote: > In builtin_diffstat(), when both files are coming from "stdin" (which > could be better described as the file's content being written directly > into the file object), oideq() compares two null hashes and ignores the > actual differences for the statistics. > > This patch checks if is_stdin flag is set on both sides and compare > contents directly. > > Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> > --- > Range-diff: > 1: 479c2835fc ! 1: 1f25713d44 diff: Fix modified lines stats with --stat and --numstat > @@ -20,8 +20,12 @@ > } > > - same_contents = oideq(&one->oid, &two->oid); > ++ /* What is_stdin really means is that the file's content is only > ++ * in the filespec's buffer and its oid is zero. We can't compare > ++ * oid's if both are null and we can just diff the buffers */ > + if (one->is_stdin && two->is_stdin) > -+ same_contents = !strcmp(one->data, two->data); > ++ same_contents = (one->size == two->size ? > ++ !memcmp(one->data, two->data, one->size) : 0); > + else > + same_contents = oideq(&one->oid, &two->oid); After reading your explanation in [1], this version makes more sense to me. Thanks. [1]: https://lore.kernel.org/git/f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com/ Taylor
On 2020-09-20 11:39, Taylor Blau wrote: > On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote: >> In builtin_diffstat(), when both files are coming from "stdin" (which >> could be better described as the file's content being written directly >> into the file object), oideq() compares two null hashes and ignores the >> actual differences for the statistics. >> >> This patch checks if is_stdin flag is set on both sides and compare >> contents directly. >> >> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> >> --- >> Range-diff: >> 1: 479c2835fc ! 1: 1f25713d44 diff: Fix modified lines stats with --stat and --numstat >> @@ -20,8 +20,12 @@ >> } >> >> - same_contents = oideq(&one->oid, &two->oid); >> ++ /* What is_stdin really means is that the file's content is only >> ++ * in the filespec's buffer and its oid is zero. We can't compare >> ++ * oid's if both are null and we can just diff the buffers */ >> + if (one->is_stdin && two->is_stdin) >> -+ same_contents = !strcmp(one->data, two->data); >> ++ same_contents = (one->size == two->size ? >> ++ !memcmp(one->data, two->data, one->size) : 0); >> + else >> + same_contents = oideq(&one->oid, &two->oid); > > After reading your explanation in [1], this version makes more sense to > me. > > Thanks. > > [1]: https://lore.kernel.org/git/f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com/ There's a little bit missing... Just before the new code example:, previous to last paragraph: > it's assumed [we can just call oidcmp()] and I won't make complex -- Thomas
Taylor Blau <me@ttaylorr.com> writes: > On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote: >> In builtin_diffstat(), when both files are coming from "stdin" (which >> could be better described as the file's content being written directly >> into the file object), oideq() compares two null hashes and ignores the >> actual differences for the statistics. >> >> This patch checks if is_stdin flag is set on both sides and compare >> contents directly. >> >> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> >> --- >> Range-diff: >> 1: 479c2835fc ! 1: 1f25713d44 diff: Fix modified lines stats with --stat and --numstat >> @@ -20,8 +20,12 @@ >> } >> >> - same_contents = oideq(&one->oid, &two->oid); >> ++ /* What is_stdin really means is that the file's content is only >> ++ * in the filespec's buffer and its oid is zero. We can't compare >> ++ * oid's if both are null and we can just diff the buffers */ >> + if (one->is_stdin && two->is_stdin) >> -+ same_contents = !strcmp(one->data, two->data); >> ++ same_contents = (one->size == two->size ? >> ++ !memcmp(one->data, two->data, one->size) : 0); >> + else >> + same_contents = oideq(&one->oid, &two->oid); > > After reading your explanation in [1], this version makes more sense to > me. These oid fields are prepared by calling diff_fill_oid_info(), and even for paths that are dirty (hence no "free" oid available from index or tree entry), an appropriate oid is computed. But there are paths for which oid cannot be computed without destroying their contents. Such paths are marked by the function with null_oid. It happens to be that stdin is the only class of paths that are treated that way _right_ _now_, but future code may support different kind of paths that share the same trait. When we want to know "is comparing the oid sufficient?", we shouldn't inspect the is_stdin flag ourselves in a caller of diff_fill_oid_info(), because the helper _is_ responsible for knowing what kind of paths are special, and signals that "assume this would not be equal to anything else" by giving null_oid back. The caller should use the info left by diff_fill_oid_info(), namely, "even if the oid on both sides are the same, if it is null_oid, then we know diff_fill_oid_info() didn't actually compute the oid, and we need to compare the blob ourselves". And there is no point in doing memcmp() here, I think. The same_contents() check is done as an optimization to avoid xdl. Even if the two sides were thought to be different at the oid level, xdl comparison may find that there is no difference after all (e.g. think of whitespace ignoring comparison), so we should assume and rely on that the downstream code MUST BE prepared to handle false negatives (i.e. same_contents says they are different, but they actually produce no diffstat). Running memcmp() over contents in potentially a large buffer to find that they are different, and then have xdl process that large buffer again, would be a waste. Summarizing the above, I think the second best fix is this (which means that the posted patch is the third best): /* * diff_fill_oid_info() marked one/two->oid with null_oid * for a path whose oid is not available. Disable early * return optimization for them. */ if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) same_contents = 0; /* could be different */ else if (oideq(&one->oid, &two->oid)) same_contents = 1; /* definitely the same */ else same_contents = 0; /* definitely different */ But I suspect that the best fix is to teach diff_fill_oid_info() to hash the in-memory data to compute the oid, instead of punting and filling the oid field with null_oid. If function builtin_diffstat() is allowed to look at the contents and run memcmp() here, the 'data' field should have been filled and valid when diff_fill_oid_info() looked at it already. The "best" fix will have wider consequences, so we may not want to jump to it right away without careful consideration. For example, the "best" fix will fix another bug. The 'index' header shows a NULL object name in normal "diff --patch" output for these paths in the current code, which means they cannot be used with "apply --3way". That way, this codepath does not have to know anything about the "null means we don't know" convention. Try: $ (cat COPYING; echo) >RENAMING $ git diff --no-index COPYING - <RENAMING | grep '^index ' index 536e55524d..0000000000 100644 and notice that the stdin side has a null object name in the current code. I think we will show the right object name if we fix the diff_fill_oid_info(). Thanks.
Junio C Hamano <gitster@pobox.com> writes: > But I suspect that the best fix is to teach diff_fill_oid_info() to > hash the in-memory data to compute the oid, instead of punting and > filling the oid field with null_oid. If function builtin_diffstat() > is allowed to look at the contents and run memcmp() here, the 'data' > field should have been filled and valid when diff_fill_oid_info() > looked at it already. > > The "best" fix will have wider consequences, so we may not want to > jump to it right away without careful consideration. And then after giving a bit more thought, I don't recommend to go with this approach, because it breaks an established convention that objects with unknown name is perfectly OK and shown with the null oid. In other words, I'd suggest to use the "second best" one I gave in the message I am responding to. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Summarizing the above, I think the second best fix is this (which > means that the posted patch is the third best): > > /* > * diff_fill_oid_info() marked one/two->oid with null_oid > * for a path whose oid is not available. Disable early > * return optimization for them. > */ > if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) > same_contents = 0; /* could be different */ > else if (oideq(&one->oid, &two->oid)) > same_contents = 1; /* definitely the same */ > else > same_contents = 0; /* definitely different */ A tangent. There is this code in diff.c::fill_metainfo() that is used to populate the "index" header element of "diff --patch" output: if (one && two && !oideq(&one->oid, &two->oid)) { const unsigned hexsz = the_hash_algo->hexsz; int abbrev = o->abbrev ? o->abbrev : DEFAULT_ABBREV; if (o->flags.full_index) abbrev = hexsz; if (o->flags.binary) { mmfile_t mf; if ((!fill_mmfile(o->repo, &mf, one) && diff_filespec_is_binary(o->repo, one)) || (!fill_mmfile(o->repo, &mf, two) && diff_filespec_is_binary(o->repo, two))) abbrev = hexsz; } strbuf_addf(msg, "%s%sindex %s..%s", line_prefix, set, diff_abbrev_oid(&one->oid, abbrev), diff_abbrev_oid(&two->oid, abbrev)); if (one->mode == two->mode) strbuf_addf(msg, " %06o", one->mode); strbuf_addf(msg, "%s\n", reset); } Currently it is OK because there can only be one side that diff_fill_oid_info() would mark as "oid unavailable" (e.g. reading standard input stream). If a new feature is introducing a situation where both ends have null_oid, which was so far been impossible, we'd probably need to factor out the condition used in the above into a helper function, e.g. static int cannot_be_the_same(struct diff_filespec *one, struct diff_filespec *two) { if ((oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) return 1; else if (oideq(&one->oid, &two->oid)) return 0; else return 1; } and rewrite the conditional in fill_metainfo() to if (one && two && cannot_be_the_same(one, two)) { ... The "second best fix" could then become a single liner: same_contents = !cannot_be_the_same(one, two); using the helper.
Junio C Hamano <gitster@pobox.com> writes: Sorry, this will be the last message from me on this topic for now. > we'd probably need to factor out the condition used in the above > into a helper function, e.g. > > static int cannot_be_the_same(struct diff_filespec *one, struct diff_filespec *two) The naming of this helper is tricky. In both potential callers, what we want to see is "one and two may be different, we cannot say they are the same with certainty", so "cannot be the same" is a misnomer. Worse, the negated form is hard to grok. Perhaps "may_differ()" is a more correct name. If either side is NULL oid, we cannot say they are the same, so it is true. If two oid that are not NULL oid are the same, there is no possibility that they are different, so we return false. And two oid that are not NULL oid are different, we know they are different, so we return true. > { > if ((oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) > return 1; > else if (oideq(&one->oid, &two->oid)) > return 0; > else > return 1; > } > > and rewrite the conditional in fill_metainfo() to > > if (one && two && cannot_be_the_same(one, two)) { > ... And this becomes much easier to read and understand, i.e. if (one && two && may_differ(one, two)) { ... create the index one->oid..two->oid header ... > The "second best fix" could then become a single liner: > > same_contents = !cannot_be_the_same(one, two); > > using the helper. And this becomes same_contents = !may_differ(one, two); meaning that "there is no possibility that one and two are different". That allows us to optimize out the invocation of the xdl machinery.
On Sun, Sep 20, 2020 at 12:11:20PM -0700, Junio C Hamano wrote: > > After reading your explanation in [1], this version makes more sense to > > me. > > These oid fields are prepared by calling diff_fill_oid_info(), and > even for paths that are dirty (hence no "free" oid available from > index or tree entry), an appropriate oid is computed. This is the part that confused me earlier. I expected these "stdin" entries, just like other some other entries (e.g., stat dirty ones for diff-files, or anything for "diff --no-index") to have bogus oids. But that diff_fill_oid_info() is what actually computes the sha1 from scratch for them. I get why that is needed for generating a git diff, as we have an "index from...to" line there that we'd want to fill. For diffstat, though, it seems like a waste of time; we don't care what the object hash is. I.e., if we were to do this: diff --git a/diff.c b/diff.c index 16eeaf6645..1934af29a5 100644 --- a/diff.c +++ b/diff.c @@ -4564,9 +4564,6 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, if (o->prefix_length) strip_prefix(o->prefix_length, &name, &other); - diff_fill_oid_info(p->one, o->repo->index); - diff_fill_oid_info(p->two, o->repo->index); - builtin_diffstat(name, other, p->one, p->two, diffstat, o, p); } then everything seems to work fine _except_ a "git diff --stat --no-index", exactly because it hits this "same_contents" check we've been discussing. And once that is fixed properly (to handle any case where we have no oid, not just when the stdin flag is set), then perhaps it is worth doing. Or perhaps not. Even if we have to memcmp sometimes in builtin_diffstat(), it would be faster than computing the individual hashes. But it may not be measurably so, and it would be no difference for the common case of filespecs for which we do know the oid for free. I also suspect we'd need to be a little smarter about combined formats (e.g., "--stat --patch" might as well compute the oid as early as possible, since we'll need it eventually for the patch; but we'd hit the call in builtin_diffstat() before the one in run_diff()). > But there are paths for which oid cannot be computed without > destroying their contents. Such paths are marked by the function > with null_oid. I'm not clear how computing the oid destroys the contents. We have them in an in-memory buffer at this point, don't we? So we _could_ generate an oid even for stdin, like this: diff --git a/cache.h b/cache.h index 55d7f61087..1ace143eac 100644 --- a/cache.h +++ b/cache.h @@ -858,6 +858,7 @@ int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, #define HASH_RENORMALIZE 4 int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags); int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags); +int index_mem(struct index_state *istate, struct object_id *oid, void *buf, size_t size, enum object_type type, const char *path, unsigned flags); /* * Record to sd the data from st that we use to check whether a file diff --git a/diff.c b/diff.c index 16eeaf6645..181b632114 100644 --- a/diff.c +++ b/diff.c @@ -4463,7 +4463,10 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is if (!one->oid_valid) { struct stat st; if (one->is_stdin) { - oidclr(&one->oid); + if (index_mem(istate, &one->oid, + one->data, one->size, + OBJ_BLOB, one->path, 0)) + die("cannot hash diff file from stdin"); return; } if (lstat(one->path, &st) < 0) diff --git a/sha1-file.c b/sha1-file.c index 770501d6d1..c7d017b3e0 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -2046,10 +2046,10 @@ static void check_tag(const void *buf, size_t size) die(_("corrupt tag")); } -static int index_mem(struct index_state *istate, - struct object_id *oid, void *buf, size_t size, - enum object_type type, - const char *path, unsigned flags) +int index_mem(struct index_state *istate, + struct object_id *oid, void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) { int ret, re_allocated = 0; int write_object = flags & HASH_WRITE_OBJECT; which is basically your "best fix" from below. It fixes the bug here, and it gives you a non-null index line. I'd consider coupling it with calling fill_oid less often, though (something like range-diff computes a bunch of fake-stdin diffs, and doesn't need to waste time computing the oids at all). > Summarizing the above, I think the second best fix is this (which > means that the posted patch is the third best): > > /* > * diff_fill_oid_info() marked one/two->oid with null_oid > * for a path whose oid is not available. Disable early > * return optimization for them. > */ > if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) > same_contents = 0; /* could be different */ > else if (oideq(&one->oid, &two->oid)) > same_contents = 1; /* definitely the same */ > else > same_contents = 0; /* definitely different */ This is the direction I was getting at in my earlier emails, except that I imagined that first conditional could be checking: if (!one->oid_valid || !two->oid_valid) but I was surprised to see that diff_fill_oid_info() does not set oid_valid. Is that a bug? I also imagined that we'd have to determine right then whether the contents are actually different or not with a memcmp(), to avoid emitting a "0 changes" line, but we do handle that case within the "!same_contents" conditional. See the comment starting with "Omit diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore uninteresting modifications, 2020-08-20). -Peff
Jeff King <peff@peff.net> writes: > For diffstat, though, it seems like a waste of time; we don't care what > the object hash is. I.e., if we were to do this: > > diff --git a/diff.c b/diff.c > index 16eeaf6645..1934af29a5 100644 > --- a/diff.c > +++ b/diff.c > @@ -4564,9 +4564,6 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, > if (o->prefix_length) > strip_prefix(o->prefix_length, &name, &other); > > - diff_fill_oid_info(p->one, o->repo->index); > - diff_fill_oid_info(p->two, o->repo->index); > - > builtin_diffstat(name, other, p->one, p->two, > diffstat, o, p); > } > > then everything seems to work fine _except_ a "git diff --stat > --no-index", exactly because it hits this "same_contents" check we've > been discussing. And once that is fixed properly (to handle any case > where we have no oid, not just when the stdin flag is set), then perhaps > it is worth doing. > Or perhaps not. Even if we have to memcmp sometimes in > builtin_diffstat(), it would be faster than computing the individual > hashes. But it may not be measurably so, and it would be no difference > for the common case of filespecs for which we do know the oid for free. > I also suspect we'd need to be a little smarter about combined formats > (e.g., "--stat --patch" might as well compute the oid as early as > possible, since we'll need it eventually for the patch; but we'd hit the > call in builtin_diffstat() before the one in run_diff()). > >> But there are paths for which oid cannot be computed without >> destroying their contents. Such paths are marked by the function >> with null_oid. > > I'm not clear how computing the oid destroys the contents. We have them > in an in-memory buffer at this point, don't we? So we _could_ generate > an oid even for stdin, like this: Yes, yes yes. That is the "best" (which later retracted) approach I suggested in the same message, but it would end up filling a real-looking object name for working tree side of diff-files, which has a far larger consequence we need to think about and consumes more brain cycles than warranted here, I would think. >> Summarizing the above, I think the second best fix is this (which >> means that the posted patch is the third best): >> >> /* >> * diff_fill_oid_info() marked one/two->oid with null_oid >> * for a path whose oid is not available. Disable early >> * return optimization for them. >> */ >> if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) >> same_contents = 0; /* could be different */ >> else if (oideq(&one->oid, &two->oid)) >> same_contents = 1; /* definitely the same */ >> else >> same_contents = 0; /* definitely different */ > > This is the direction I was getting at in my earlier emails, except that > I imagined that first conditional could be checking: > > if (!one->oid_valid || !two->oid_valid) > > but I was surprised to see that diff_fill_oid_info() does not set > oid_valid. Is that a bug? I do not think so. oid_valid refers to the state during the collection phase (those who called diff_addremove() etc.) and updating it in diff_fill_oid_info() would lose information. Maybe nobody looks at the bit at this late in the processing chain these days, in which case we can start flipping the bit there, but I offhand do not know what consequences such a change would trigger. > I also imagined that we'd have to determine right then whether the > contents are actually different or not with a memcmp(), to avoid > emitting a "0 changes" line, but we do handle that case within the > "!same_contents" conditional. See the comment starting with "Omit > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore > uninteresting modifications, 2020-08-20). Yes, we are essentially on the same page---same_contents bit is merely an optimization to decide cheaply when we do not have to do xdl, but the codepath that does the xdl must be prepared to deal with the "we thought they are different, but after all they turn out to be equivalent" case. Therefore false positive to declare two different things as same cannot be tolerated, but false negative to declare two things that are the same as !same_contents is fine.
On Mon, Sep 21, 2020 at 02:51:21PM -0700, Junio C Hamano wrote: > > This is the direction I was getting at in my earlier emails, except that > > I imagined that first conditional could be checking: > > > > if (!one->oid_valid || !two->oid_valid) > > > > but I was surprised to see that diff_fill_oid_info() does not set > > oid_valid. Is that a bug? > > I do not think so. oid_valid refers to the state during the > collection phase (those who called diff_addremove() etc.) and > updating it in diff_fill_oid_info() would lose information. Maybe > nobody looks at the bit at this late in the processing chain these > days, in which case we can start flipping the bit there, but I > offhand do not know what consequences such a change would trigger. We use the flag to determine whether we need to compute the oid from scratch. So I would think the current code causes us to compute the oid multiple times in many cases. For example, with this patch: diff --git a/diff.c b/diff.c index ee8e8189e9..8363abab5b 100644 --- a/diff.c +++ b/diff.c @@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is die_errno("stat '%s'", one->path); if (index_path(istate, &one->oid, one->path, &st, 0)) die("cannot hash %s", one->path); + warning("computed oid of %s as %s", + one->path, oid_to_hex(&one->oid)); } } else I get (because diff.c is dirty in my working tree due to the patch): $ ./git diff --stat -p warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88 diff.c | 2 ++ 1 file changed, 2 insertions(+) warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88 diff --git a/diff.c b/diff.c index ee8e8189e9..8363abab5b 100644 --- a/diff.c +++ b/diff.c @@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is die_errno("stat '%s'", one->path); if (index_path(istate, &one->oid, one->path, &st, 0)) die("cannot hash %s", one->path); + warning("computed oid of %s as %s", + one->path, oid_to_hex(&one->oid)); } } else even though we already know the oid in the second call, so it's wasted work. I agree that other code could be depending on oid_valid in a weird way, but IMHO that code is probably wrong to do so. But it may not be worth digging into, if nobody has complained about the waste. > > I also imagined that we'd have to determine right then whether the > > contents are actually different or not with a memcmp(), to avoid > > emitting a "0 changes" line, but we do handle that case within the > > "!same_contents" conditional. See the comment starting with "Omit > > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore > > uninteresting modifications, 2020-08-20). > > Yes, we are essentially on the same page---same_contents bit is > merely an optimization to decide cheaply when we do not have to do > xdl, but the codepath that does the xdl must be prepared to deal > with the "we thought they are different, but after all they turn out > to be equivalent" case. Therefore false positive to declare two > different things as same cannot be tolerated, but false negative to > declare two things that are the same as !same_contents is fine. I thought it may matter on "maint", where we do not have 1cf3d5db9b. I.e., I expected: echo foo >a echo foo >b git diff --no-index --stat a b might switch from no output to having a line like: a => b | 0 But we don't even get to builtin_diffstat() there. We throw out the pair in diffcore_skip_stat_unmatch(). Likewise, if you get past that with something like a mode change: chmod +x b git diff --no-index --stat a b then that does generate the "0" stat line. But it does so both before and after the proposed change. The same thing happens in no-index mode: git init echo foo >file git add . git commit -am no-bit chmod +x file git commit -am exec-bit git show --stat will give you: file | 0 I'm not sure if that's the desired behavior or not, but at any rate fixing this builtin_diffstat() conditional won't change it either way. :) -Peff
Jeff King <peff@peff.net> writes: > .... I agree that other code could be depending on oid_valid in a weird > way, but IMHO that code is probably wrong to do so. But it may not be > worth digging into, if nobody has complained about the waste. Yup, that was my feeling when I wrote the message you are responding to.
diff --git a/diff.c b/diff.c index ee8e8189e9..2e47bf824e 100644 --- a/diff.c +++ b/diff.c @@ -3681,7 +3681,14 @@ static void builtin_diffstat(const char *name_a, const char *name_b, return; } - same_contents = oideq(&one->oid, &two->oid); + /* What is_stdin really means is that the file's content is only + * in the filespec's buffer and its oid is zero. We can't compare + * oid's if both are null and we can just diff the buffers */ + if (one->is_stdin && two->is_stdin) + same_contents = (one->size == two->size ? + !memcmp(one->data, two->data, one->size) : 0); + else + same_contents = oideq(&one->oid, &two->oid); if (diff_filespec_is_binary(o->repo, one) || diff_filespec_is_binary(o->repo, two)) { diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index e024cff65c..4715e75b68 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -258,11 +258,11 @@ test_expect_success 'changed commit with --stat diff option' ' a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) 3: $(test_oid t3) ! 3: $(test_oid c3) s/11/B/ - a => b | 0 - 1 file changed, 0 insertions(+), 0 deletions(-) + a => b | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) 4: $(test_oid t4) ! 4: $(test_oid c4) s/12/B/ - a => b | 0 - 1 file changed, 0 insertions(+), 0 deletions(-) + a => b | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) EOF test_cmp expect actual '
In builtin_diffstat(), when both files are coming from "stdin" (which could be better described as the file's content being written directly into the file object), oideq() compares two null hashes and ignores the actual differences for the statistics. This patch checks if is_stdin flag is set on both sides and compare contents directly. Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> --- Range-diff: 1: 479c2835fc ! 1: 1f25713d44 diff: Fix modified lines stats with --stat and --numstat @@ -20,8 +20,12 @@ } - same_contents = oideq(&one->oid, &two->oid); ++ /* What is_stdin really means is that the file's content is only ++ * in the filespec's buffer and its oid is zero. We can't compare ++ * oid's if both are null and we can just diff the buffers */ + if (one->is_stdin && two->is_stdin) -+ same_contents = !strcmp(one->data, two->data); ++ same_contents = (one->size == two->size ? ++ !memcmp(one->data, two->data, one->size) : 0); + else + same_contents = oideq(&one->oid, &two->oid); diff.c | 9 ++++++++- t/t3206-range-diff.sh | 8 ++++---- 2 files changed, 12 insertions(+), 5 deletions(-)