Message ID | 20200918113256.8699-3-tguyot@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] diff: Fix modified lines stats with --stat and --numstat | expand |
Hi Thomas, On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote: > A very handy way to pass data to applications is to use the <() process > substitution syntax in bash variants. It allow comparing files streamed > from a remote server or doing on-the-fly stream processing to alter the > diff. These are usually implemented as a symlink that points to a bogus > name (ex "pipe:[209326419]") but opens as a pipe. This is true in bash, but sh does not support process substitution with <(). > Git normally tracks symlinks targets. This patch makes it detect such > pipes in --no-index mode and read the file normally like it would do for > stdin ("-"), so they can be compared directly. > > Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> > --- > diff-no-index.c | 56 ++++++++++-- > t/t4053-diff-no-index.sh | 189 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 238 insertions(+), 7 deletions(-) > > diff --git a/diff-no-index.c b/diff-no-index.c > index 7814eabfe0..779c686d23 100644 > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list) > */ > static const char file_from_standard_input[] = "-"; > > +/* Check that file is - (STDIN) or unnamed pipe - explicitly > + * avoid on-disk named pipes which could block > + */ > +static int ispipe(const char *name) > +{ > + struct stat st; > + > + if (name == file_from_standard_input) > + return 1; /* STDIN */ > + > + if (!lstat(name, &st)) { > + if (S_ISLNK(st.st_mode)) { I had to read this a few times to make sure that I got it; you want to stat the link itself, and then check that it links to a pipe. I'm not sure why, though. Do you want to avoid handling named FIFOs in the code below? Your comment that they "could block" makes me think you do, but I don't know why that would be a problem. > + /* symlink - read it and check it doesn't exists > + * as a file yet link to a pipe */ > + struct strbuf sb = STRBUF_INIT; > + strbuf_realpath(&sb, name, 0); > + /* We're abusing strbuf_realpath here, it may append > + * pipe:[NNNNNNNNN] to an abs path */ > + if (!stat(sb.buf, &st)) Statting sb.buf is confusing to me (especially when followed up by another stat right below. Could you explain? > +test_expect_success 'diff --no-index can diff piped subshells' ' > + echo 1 >non/git/c && > + test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) && > + test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c && > + test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) && > + test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c && > + test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - && > + test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) && > + test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) - > +' Indeed this test fails (Git thinks that the HERE-DOC is broken, but I suspect it's just getting confused by the '<()'). This test (like almost all other tests in Git) use /bin/sh as its shebang. Does your /bin/sh actually point to bash? If you did want to test something like this, you'd need to source t/lib-bash.sh instead of t/test-lib.sh. Unrelated to the above comment, but there are a few small style nits that I notice: - There is no need to run with 'test_expect_code 0' since the test is marked as 'test_expect_success' and the commands are all in an '&&' chain. (This does appear to be common style for others in t4053, so you may just be matching it--which is fine--but an additional clean-up on top to modernize would be appreciated, too). - The cat pipe is unnecessary, and is also violating a rule that we don't place 'git' on the right-hand side of a pipe (can you redirect the file at the end instead?). Documentation/CodingGuidelines is a great place to look if you are ever curious about whether something is in good style. > +test_expect_success 'diff --no-index finds diff in piped subshells' ' > + ( > + set -- <(cat /dev/null) <(cat /dev/null) Why is this necessary? > + cat <<-EOF >expect > + diff --git a$1 b$2 > + --- a$1 > + +++ b$2 > + @@ -1 +1 @@ > + -1 > + +2 > + EOF > + ) && > + test_expect_code 1 \ > + git diff --no-index <(cat non/git/b) <(sed s/1/2/ non/git/c) >actual && > + test_cmp expect actual > +' Thanks, Taylor
Hi Taylor, On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote: > On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote: > > A very handy way to pass data to applications is to use the <() process > > substitution syntax in bash variants. It allow comparing files streamed > > from a remote server or doing on-the-fly stream processing to alter the > > diff. These are usually implemented as a symlink that points to a bogus > > name (ex "pipe:[209326419]") but opens as a pipe. > > This is true in bash, but sh does not support process substitution with > <(). Bash, ksh, zsh and likely any more moden shell. Other programming languages also setup such pipes. It's much cleaner than creating temp files and cleaning them up and in some cases faster too (I've ran diff's like this over GB's of test data, it's very handy to remove known patterns that would cause needless diffs). > > +/* Check that file is - (STDIN) or unnamed pipe - explicitly > > + * avoid on-disk named pipes which could block > > + */ > > +static int ispipe(const char *name) > > +{ > > + struct stat st; > > + > > + if (name == file_from_standard_input) > > + return 1; /* STDIN */ > > + > > + if (!lstat(name, &st)) { > > + if (S_ISLNK(st.st_mode)) { > > I had to read this a few times to make sure that I got it; you want to > stat the link itself, and then check that it links to a pipe. > > I'm not sure why, though. Do you want to avoid handling named FIFOs in > the code below? Your comment that they "could block" makes me think you > do, but I don't know why that would be a problem. I'll admit the comment was written first and is a bit naive - i'll rephrase that. Yes you don't want to block on pipes like if you run a "grep -R" on a subtree that has fifos - but as I coded this I realized the obvious: git tracks symlinks name so the real bugger would be to detect one as a pipe and try reading it instead or calling readlink(). > > + /* symlink - read it and check it doesn't exists > > + * as a file yet link to a pipe */ > > + struct strbuf sb = STRBUF_INIT; > > + strbuf_realpath(&sb, name, 0); > > + /* We're abusing strbuf_realpath here, it may append > > + * pipe:[NNNNNNNNN] to an abs path */ > > + if (!stat(sb.buf, &st)) > > Statting sb.buf is confusing to me (especially when followed up by > another stat right below. Could you explain? The whole block is under lstat/S_ISLNK (see previous chunk), so the path provided to us was a symlink. Initially I looked at what differentiate these - mainly, stat() st_dev - but that struct is os-specific, you'd want to check major(st_dev) == 0 (at least on linux) and even if we knew how each os behaves, the code isn't portable and would be a pain to support. Gnu's difftools have very incomplete historical source code in git but there's indications they have gotten rid of it too. So what I'm doing instead is trying to resolve the link and see if the destination exists (a clear no). Luckily strbuf_realpath does the heavy lifting and leaves me with a real path to the file the symlink points to (especially useful for relative links), which is bogus for the special pipes we're interested in. Then the block right after (not shown) do a stat() on the initial name and return whenever it's a fifo or not (if it is, but the link is broken, we know it's a special device). Now you mention it, maybe I could do that stat first, rule this out from the beginning... less work for the general case. *untested*: if (!lstat(name, &st)) { if (!S_ISLNK(st.st_mode)) return(0); if (!stat(name, &st)) { if (!S_ISFIFO(st.st_mode)) return(0); /* We have a symlink that points to a pipe. If it's resolved * target doesn't really exist we can safely assume it's a * special file and use it */ struct strbuf sb = STRBUF_INIT; strbuf_realpath(&sb, name, 0); /* We're abusing strbuf_realpath here, it may append * pipe:[NNNNNNNNN] to an abs path */ if (stat(sb.buf, &st)) return(1); /* stat failed, special one */ } } return(0); TL;DR - the conditions we need: - lstat(name) == 0 // name exists - islink(lstat(name)) // name is a symlink - stat(name) == 0 // target of name is reachable - isfifo(stat(name)) // Target of name is a fifo - stat(realpath(readlink(name))) != 0 // Although we can reach it, name's destination doesn't actually exist. BTW is st/sb too confusing ? I took examples elsewhere in the code, I can rename them if it's easier to read. > > +test_expect_success 'diff --no-index can diff piped subshells' ' > > + echo 1 >non/git/c && > > + test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) && > > + test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c && > > + test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) && > > + test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c && > > + test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - && > > + test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) && > > + test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) - > > +' > > Indeed this test fails (Git thinks that the HERE-DOC is broken, but I > suspect it's just getting confused by the '<()'). This test (like almost > all other tests in Git) use /bin/sh as its shebang. Does your /bin/sh > actually point to bash? > > If you did want to test something like this, you'd need to source > t/lib-bash.sh instead of t/test-lib.sh. Thanks for the tip - indeed I think I ran the testsuite directly with back, but the make test failed. > Unrelated to the above comment, but there are a few small style nits > that I notice: > > - There is no need to run with 'test_expect_code 0' since the test is > marked as 'test_expect_success' and the commands are all in an '&&' > chain. (This does appear to be common style for others in t4053, so > you may just be matching it--which is fine--but an additional > clean-up on top to modernize would be appreciated, too). > > - The cat pipe is unnecessary, and is also violating a rule that we > don't place 'git' on the right-hand side of a pipe (can you redirect > the file at the end instead?). Cleanup, no pipelines (I read too fast / assumed last command was ok) - will do! > Documentation/CodingGuidelines is a great place to look if you are ever > curious about whether something is in good style. > > > +test_expect_success 'diff --no-index finds diff in piped subshells' ' > > + ( > > + set -- <(cat /dev/null) <(cat /dev/null) Precautions/portability. The file names are somewhat dynamic (at least the fd part...) this is to be sure I capture the names of the pipes that will be used (assuming the fd's will be reallocated in the same order which I think is fairly safe). An alternative is to sed "actual" to remove known variables, but then I hope it would be reliable (and can I use sed -r?). IIRC earlier versions of bash or on some systems a temp file could be used for these - although it defeats the purpose it's not a reason to fail.... I cannot develop this on other systems but I tested the pipe names on Windows and Sunos, and also using ksh and zsh on Linux (zsh uses /proc directly, kss uses lower fd's which means it can easily clash with scripts if you don't use named fd's, but not our problem....) Thanks, Thomas
On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote: > Hi Taylor, > > On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote: > > On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote: > > > A very handy way to pass data to applications is to use the <() process > > > substitution syntax in bash variants. It allow comparing files streamed > > > from a remote server or doing on-the-fly stream processing to alter the > > > diff. These are usually implemented as a symlink that points to a bogus > > > name (ex "pipe:[209326419]") but opens as a pipe. > > > > This is true in bash, but sh does not support process substitution with > > <(). > > Bash, ksh, zsh and likely any more moden shell. Other programming > languages also setup such pipes. It's much cleaner than creating temp > files and cleaning them up and in some cases faster too (I've ran > diff's like this over GB's of test data, it's very handy to remove > known patterns that would cause needless diffs). Yeah, it's definitely a reasonable thing to want (see below). And from a portability perspective, it is outside of Git's scope; users with those shells can use the feature, and people on other shells don't have to care. But we do have to account for this in the test suite, which must be able to run under a vanilla POSIX shell. So you'd probably want to set up a prerequisite that lets us skip these tests on other shells, like: test_lazy_prereq PROCESS_SUBSTITUTION ' echo foo >expect && cat >actual <(echo foo) && test_cmp expect actual ' test_expect_success PROCESS_SUBSTITUTION 'some test...' ' # safe because we skip this test on shells that do not support it git diff --no-index <(cat whatever) ' Though it is a little sad that people running the suite with a vanilla /bin/sh like dash wouldn't ever run the tests. I wonder if there's a more portable way to formulate it. Getting back to the overall feature, this is definitely something that has come up before. The last I know of is: https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ which everybody seemed to like the direction of; I suspect the original author (cc'd) just never got around to it again. Compared to this approach, it uses a command-line option to avoid dereferencing symlinks. That puts an extra burden on the caller to pass the option, but it's way less magical; you could drop all of the "does this look like a symlink to a pipe" heuristics. It would also be much easier to test. ;) -Peff
On Fri, Sep 18, 2020 at 10:36:47AM -0400, Taylor Blau wrote: > - The cat pipe is unnecessary, and is also violating a rule that we > don't place 'git' on the right-hand side of a pipe (can you redirect > the file at the end instead?). What's wrong with git on the right-hand side of a pipe? On the left-hand side we lose its exit code, which is bad. But on the right hand side, we are only losing the exit code of "cat", which we don't care about. (Though I agree that "cat" is pointless here; we could just be redirecting from a file). -Peff
On Fri, Sep 18, 2020 at 01:19:50PM -0400, Jeff King wrote: > Getting back to the overall feature, this is definitely something that > has come up before. The last I know of is: > > https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ > > which everybody seemed to like the direction of; I suspect the original > author (cc'd) just never got around to it again. Compared to this > approach, it uses a command-line option to avoid dereferencing symlinks. > That puts an extra burden on the caller to pass the option, but it's way > less magical; you could drop all of the "does this look like a symlink > to a pipe" heuristics. It would also be much easier to test. ;) Of course I forgot to add the cc. +cc brian.
Hi Jeff, On Fri, 18 Sep 2020 at 13:19, Jeff King <peff@peff.net> wrote: > On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote: > But we do have to account for this in the test suite, which must be able > to run under a vanilla POSIX shell. So you'd probably want to set up a > prerequisite that lets us skip these tests on other shells, like: Indeed, the bash test library that was suggested earlier may be better as it exec() bash rather than skipping tests. Testing as a prereq works, another approach which may be harder to swallow but allow testing when default shell is /bin/sh is to run each git command through bash - could be coupled with a dep if bash isn't installed at all. > Though it is a little sad that people running the suite with a vanilla > /bin/sh like dash wouldn't ever run the tests. I wonder if there's a > more portable way to formulate it. Debian defaults to dash, a minimalistic and afaik POSIX-compliant shell. > Getting back to the overall feature, this is definitely something that > has come up before. The last I know of is: > > https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ > > which everybody seemed to like the direction of; I suspect the original > author (cc'd) just never got around to it again. Compared to this > approach, it uses a command-line option to avoid dereferencing symlinks. > That puts an extra burden on the caller to pass the option, but it's way > less magical; you could drop all of the "does this look like a symlink > to a pipe" heuristics. It would also be much easier to test. ;) Thanks for the info. Another consideration is how other commands - diff, vim, sed, curl, etc can take input the same way, so being able to swap-in git is a plus imho,and one less switch to learn (or even learn about, I never looked for one for this issue). Regards, Thomas
Jeff King <peff@peff.net> writes: > Getting back to the overall feature, this is definitely something that > has come up before. The last I know of is: > > https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ > > which everybody seemed to like the direction of; I suspect the original > author (cc'd) just never got around to it again. Compared to this > approach, it uses a command-line option to avoid dereferencing symlinks. > That puts an extra burden on the caller to pass the option, but it's way > less magical; you could drop all of the "does this look like a symlink > to a pipe" heuristics. It would also be much easier to test. ;) Yes, I do remember liking the approach very much and wanted to take it once the "do not dereference symlinks everywhere---just limit it to what was given from the command line" was done. To be quite honest, I think "git diff --no-index A B" should unconditionally dereference A and/or B if they are symlinks, whether they are symlinks to pipes, regular files or directories, and otherwise treat symlinks in A and B the same way as "git diff" if A and B are directories. But that is a design guideline that becomes needed only after we start resurrecting Brian's effort, not with these patches that started this thread. Thanks.
Hi Thomas, On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote: > Hi Taylor, > > On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote: > > On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote: > > > A very handy way to pass data to applications is to use the <() process > > > substitution syntax in bash variants. It allow comparing files streamed > > > from a remote server or doing on-the-fly stream processing to alter the > > > diff. These are usually implemented as a symlink that points to a bogus > > > name (ex "pipe:[209326419]") but opens as a pipe. > > > > This is true in bash, but sh does not support process substitution with > > <(). > > Bash, ksh, zsh and likely any more moden shell. Other programming > languages also setup such pipes. It's much cleaner than creating temp > files and cleaning them up and in some cases faster too (I've ran > diff's like this over GB's of test data, it's very handy to remove > known patterns that would cause needless diffs). Oh, yes, I definitely agree that it will be useful for callers who use more modern shells. I was making sure that we would still be able to run this in the test suite (for us, that means making a new file that sources lib-bash and tests only in environments where bash is supported). > Now you mention it, maybe I could do that stat first, rule this out > from the beginning... less work for the general case. > > *untested*: > > if (!lstat(name, &st)) { > if (!S_ISLNK(st.st_mode)) > return(0); > if (!stat(name, &st)) { > if (!S_ISFIFO(st.st_mode)) > return(0); I still don't think I quite understand why FIFOs aren't allowed... > > /* We have a symlink that points to a pipe. If it's resolved > * target doesn't really exist we can safely assume it's a > * special file and use it */ > struct strbuf sb = STRBUF_INIT; > strbuf_realpath(&sb, name, 0); > /* We're abusing strbuf_realpath here, it may append > * pipe:[NNNNNNNNN] to an abs path */ > if (stat(sb.buf, &st)) > return(1); /* stat failed, special one */ By the time that I got to this point I think that what you wrote is much easier to follow. Thanks. > } > } > return(0); > > TL;DR - the conditions we need: > > - lstat(name) == 0 // name exists > - islink(lstat(name)) // name is a symlink > - stat(name) == 0 // target of name is reachable > - isfifo(stat(name)) // Target of name is a fifo > - stat(realpath(readlink(name))) != 0 // Although we can reach it, > name's destination doesn't actually exist. > > BTW is st/sb too confusing ? I took examples elsewhere in the code, I > can rename them if it's easier to read. No, it's fine. I think anecdotally I read 'struct strbuf buf' more than I read 'struct strbuf sb', but I guess that's just the areas of code that I happen to frequent, since some quick grepping around shows 462 uses of 'sb' followed by 425 uses of 'buf' (the next most common names are 'err' and 'out', with 191 and 121 uses, respectively). > > > +test_expect_success 'diff --no-index finds diff in piped subshells' ' > > > + ( > > > + set -- <(cat /dev/null) <(cat /dev/null) > > Precautions/portability. The file names are somewhat dynamic (at least > the fd part...) this is to be sure I capture the names of the pipes > that will be used (assuming the fd's will be reallocated in the same > order which I think is fairly safe). An alternative is to sed "actual" > to remove known variables, but then I hope it would be reliable (and > can I use sed -r?). IIRC earlier versions of bash or on some systems a > temp file could be used for these - although it defeats the purpose > it's not a reason to fail.... OK. > I cannot develop this on other systems but I tested the pipe names on > Windows and Sunos, and also using ksh and zsh on Linux (zsh uses /proc > directly, kss uses lower fd's which means it can easily clash with > scripts if you don't use named fd's, but not our problem....) > > Thanks, > > Thomas Thanks, Taylor
On Fri, Sep 18, 2020 at 01:20:44PM -0400, Jeff King wrote: > On Fri, Sep 18, 2020 at 10:36:47AM -0400, Taylor Blau wrote: > > > - The cat pipe is unnecessary, and is also violating a rule that we > > don't place 'git' on the right-hand side of a pipe (can you redirect > > the file at the end instead?). > > What's wrong with git on the right-hand side of a pipe? Ack, ignore me. The problem would be on the left-hand side only, without set -o pipefail, which we don't do. > On the left-hand side we lose its exit code, which is bad. But on the > right hand side, we are only losing the exit code of "cat", which we > don't care about. > > (Though I agree that "cat" is pointless here; we could just be > redirecting from a file). Yep, but an easy mistake to make nonetheless. > -Peff Thanks, Taylor
On Fri, Sep 18, 2020 at 10:48:41AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Getting back to the overall feature, this is definitely something that > > has come up before. The last I know of is: > > > > https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ > > > > which everybody seemed to like the direction of; I suspect the original > > author (cc'd) just never got around to it again. Compared to this > > approach, it uses a command-line option to avoid dereferencing symlinks. > > That puts an extra burden on the caller to pass the option, but it's way > > less magical; you could drop all of the "does this look like a symlink > > to a pipe" heuristics. It would also be much easier to test. ;) > > Yes, I do remember liking the approach very much and wanted to take > it once the "do not dereference symlinks everywhere---just limit it > to what was given from the command line" was done. > > To be quite honest, I think "git diff --no-index A B" should > unconditionally dereference A and/or B if they are symlinks, whether > they are symlinks to pipes, regular files or directories, and > otherwise treat symlinks in A and B the same way as "git diff" if A > and B are directories. But that is a design guideline that becomes > needed only after we start resurrecting Brian's effort, not with > these patches that started this thread. Yeah, I think I'd be fine with that approach, too. It makes "git diff --no-index" more like other tools out of the box. And if we took brian's patch first, then we'd just be flipping its default, and the option it adds would give an easy escape hatch for somebody who really wants to diff two maybe-symlinks. -Peff
On Fri, Sep 18, 2020 at 01:58:36PM -0400, Taylor Blau wrote: > > *untested*: > > > > if (!lstat(name, &st)) { > > if (!S_ISLNK(st.st_mode)) > > return(0); > > if (!stat(name, &st)) { > > if (!S_ISFIFO(st.st_mode)) > > return(0); > > I still don't think I quite understand why FIFOs aren't allowed... It's the other way around. Non-fifos return "0" from this "is it a pipe" function. I think it is to prevent a false positive with: rm bar ln -s foo bar git diff --no-index foo something-else We'd still want to treat "foo" as a symlink there. I.e., the heuristic for guessing it's a process substitution pipe is: - it's a symlink that doesn't point anywhere - it's also a fifo -Peff
On 2020-09-18 at 11:32:56, Thomas Guyot-Sionnest wrote: > diff --git a/diff-no-index.c b/diff-no-index.c > index 7814eabfe0..779c686d23 100644 > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list) > */ > static const char file_from_standard_input[] = "-"; > > +/* Check that file is - (STDIN) or unnamed pipe - explicitly > + * avoid on-disk named pipes which could block > + */ > +static int ispipe(const char *name) > +{ > + struct stat st; > + > + if (name == file_from_standard_input) > + return 1; /* STDIN */ > + > + if (!lstat(name, &st)) { > + if (S_ISLNK(st.st_mode)) { > + /* symlink - read it and check it doesn't exists > + * as a file yet link to a pipe */ > + struct strbuf sb = STRBUF_INIT; > + strbuf_realpath(&sb, name, 0); > + /* We're abusing strbuf_realpath here, it may append > + * pipe:[NNNNNNNNN] to an abs path */ > + if (!stat(sb.buf, &st)) > + return 0; /* link target exists , not pipe */ > + if (!stat(name, &st)) > + return S_ISFIFO(st.st_mode); This makes a lot of assumptions about the implementation which are specific to Linux, namely that an anonymous pipe will be a symlink to a FIFO. That's not the case on macOS, where the /dev/fd entries are actually named pipes themselves. Granted, in that case, Git just chokes and fails instead of diffing the symlink values, but I suspect you'll want this to work on macOS as well. I don't use macOS that often, but I would appreciate it if it worked when I did, and I'm sure others will as well. I think we can probably get away with just doing a regular stat and seeing if S_ISFIFO is true, which is both simpler and cheaper. > diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh > index 0168946b63..e49f773515 100755 > --- a/t/t4053-diff-no-index.sh > +++ b/t/t4053-diff-no-index.sh > @@ -144,4 +144,193 @@ test_expect_success 'diff --no-index allows external diff' ' > test_cmp expect actual > ' > > +test_expect_success 'diff --no-index can diff piped subshells' ' > + echo 1 >non/git/c && > + test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) && > + test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c && > + test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) && > + test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c && > + test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - && > + test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) && > + test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) - > +' As mentioned by others, this requires non-POSIX syntax. /bin/sh on my Debian system is dash, which doesn't support this. You can either use a prerequisite, or just test by piping from standard input and assume that Git can handle the rest. I would recommend at least adding some POSIX testcases that use only a pipe from standard input to avoid regressing this behavior on Debian and Ubuntu.
On 2020-09-18 14:02, Jeff King wrote: > On Fri, Sep 18, 2020 at 10:48:41AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >>> Getting back to the overall feature, this is definitely something that >>> has come up before. The last I know of is: >>> >>> https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/ >>> >>> which everybody seemed to like the direction of; I suspect the original >>> author (cc'd) just never got around to it again. Compared to this >>> approach, it uses a command-line option to avoid dereferencing symlinks. >>> That puts an extra burden on the caller to pass the option, but it's way >>> less magical; you could drop all of the "does this look like a symlink >>> to a pipe" heuristics. It would also be much easier to test. ;) >> >> Yes, I do remember liking the approach very much and wanted to take >> it once the "do not dereference symlinks everywhere---just limit it >> to what was given from the command line" was done. >> >> To be quite honest, I think "git diff --no-index A B" should >> unconditionally dereference A and/or B if they are symlinks, whether >> they are symlinks to pipes, regular files or directories, and >> otherwise treat symlinks in A and B the same way as "git diff" if A >> and B are directories. But that is a design guideline that becomes >> needed only after we start resurrecting Brian's effort, not with >> these patches that started this thread. > > Yeah, I think I'd be fine with that approach, too. It makes "git diff > --no-index" more like other tools out of the box. And if we took brian's > patch first, then we'd just be flipping its default, and the option it > adds would give an easy escape hatch for somebody who really wants to > diff two maybe-symlinks. Considering the issue with MacOS I'm starting to think the best solution is to not use any heuristic and read passed-in files directly. That said, I don't think it makes much change either way (if I resurrect Brian's patch is will probably end up being a hybrid between the two as both read the pipe at the same place and my approach was simpler further down). I'm not sure which way I prefer to start first - will you accept a patch that reads passed in files as-is if I I start with this one? In the mean time I will submit the first patch fixed) as a standalone one. Regards, Thomas
On Sun, Sep 20, 2020 at 08:54:53AM -0400, Thomas Guyot wrote: > Considering the issue with MacOS I'm starting to think the best solution > is to not use any heuristic and read passed-in files directly. That > said, I don't think it makes much change either way (if I resurrect > Brian's patch is will probably end up being a hybrid between the two as > both read the pipe at the same place and my approach was simpler further > down). > > I'm not sure which way I prefer to start first - will you accept a patch > that reads passed in files as-is if I I start with this one? I think the ideal is: - implement a command-line option to read the content of paths on the command-line literally (i.e., reading from pipes, dereferencing symlinks, etc) - make sure we have the inverse option (which you should get for free in step 1 if you use parse_options) - flip the default to do literal reads We'd sometimes wait several versions before that last step to give people time to adjust scripts, etc. But in this case, I suspect it would be OK to just flip it immediately. We don't consider "git diff" itself part of the stable plumbing, and the --no-index part of it I would consider even less stable. And AFAICT most people consider the current behavior a bug because it doesn't behave like other diff tools. -Peff
Jeff King <peff@peff.net> writes: > We'd sometimes wait several versions before that last step to give > people time to adjust scripts, etc. But in this case, I suspect it would > be OK to just flip it immediately. We don't consider "git diff" itself > part of the stable plumbing, and the --no-index part of it I would > consider even less stable. And AFAICT most people consider the current > behavior a bug because it doesn't behave like other diff tools. The "git diff" proper gets no filenames from the command line and the above strictly applies only to the no-index mode, with or without the explicit "--no-index" option. It was a way to give Git niceties like colored diffs, renames, etc. to non-Git managed two sets of paths, and the primary reason why we have it as a mode of "git diff" is because we chose not to bother with interacting with upstream maintainers of "diff". In an ideal world, "GNU diff" and others would have learned things like renames, word diffs, etc., instead of "git diff" adding "--no-index" mode. And that makes me agree that users expect "git diff --no-index" to behave like other people's diff and that is more important than behaving like "git diff" for that mode.
diff --git a/diff-no-index.c b/diff-no-index.c index 7814eabfe0..779c686d23 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list) */ static const char file_from_standard_input[] = "-"; +/* Check that file is - (STDIN) or unnamed pipe - explicitly + * avoid on-disk named pipes which could block + */ +static int ispipe(const char *name) +{ + struct stat st; + + if (name == file_from_standard_input) + return 1; /* STDIN */ + + if (!lstat(name, &st)) { + if (S_ISLNK(st.st_mode)) { + /* symlink - read it and check it doesn't exists + * as a file yet link to a pipe */ + struct strbuf sb = STRBUF_INIT; + strbuf_realpath(&sb, name, 0); + /* We're abusing strbuf_realpath here, it may append + * pipe:[NNNNNNNNN] to an abs path */ + if (!stat(sb.buf, &st)) + return 0; /* link target exists , not pipe */ + if (!stat(name, &st)) + return S_ISFIFO(st.st_mode); + } + } + return 0; +} + static int get_mode(const char *path, int *mode) { struct stat st; @@ -51,7 +78,7 @@ static int get_mode(const char *path, int *mode) else if (!strcasecmp(path, "nul")) *mode = 0; #endif - else if (path == file_from_standard_input) + else if (ispipe(path)) *mode = create_ce_mode(0666); else if (lstat(path, &st)) return error("Could not access '%s'", path); @@ -60,13 +87,13 @@ static int get_mode(const char *path, int *mode) return 0; } -static int populate_from_stdin(struct diff_filespec *s) +static int populate_from_fd(struct diff_filespec *s, int fd) { struct strbuf buf = STRBUF_INIT; size_t size = 0; - if (strbuf_read(&buf, 0, 0) < 0) - return error_errno("error while reading from stdin"); + if (strbuf_read(&buf, fd, 0) < 0) + return error_errno(_("error while reading from fd %i"), fd); s->should_munmap = 0; s->data = strbuf_detach(&buf, &size); @@ -76,6 +103,20 @@ static int populate_from_stdin(struct diff_filespec *s) return 0; } +static int populate_from_pipe(struct diff_filespec *s, const char *name) +{ + int ret; + FILE *f; + + f = fopen(name, "r"); + if (!f) + return error_errno(_("cannot open %s"), name); + + ret = populate_from_fd(s, fileno(f)); + fclose(f); + return ret; +} + static struct diff_filespec *noindex_filespec(const char *name, int mode) { struct diff_filespec *s; @@ -85,7 +126,9 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode) s = alloc_filespec(name); fill_filespec(s, &null_oid, 0, mode); if (name == file_from_standard_input) - populate_from_stdin(s); + populate_from_fd(s, 0); + else if (ispipe(name)) + populate_from_pipe(s, name); return s; } @@ -218,8 +261,7 @@ static void fixup_paths(const char **path, struct strbuf *replacement) { unsigned int isdir0, isdir1; - if (path[0] == file_from_standard_input || - path[1] == file_from_standard_input) + if (ispipe(path[0]) || ispipe(path[1])) return; isdir0 = is_directory(path[0]); isdir1 = is_directory(path[1]); diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh index 0168946b63..e49f773515 100755 --- a/t/t4053-diff-no-index.sh +++ b/t/t4053-diff-no-index.sh @@ -144,4 +144,193 @@ test_expect_success 'diff --no-index allows external diff' ' test_cmp expect actual ' +test_expect_success 'diff --no-index can diff piped subshells' ' + echo 1 >non/git/c && + test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) && + test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c && + test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) && + test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c && + test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - && + test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) && + test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) - +' + +test_expect_success 'diff --no-index finds diff in piped subshells' ' + ( + set -- <(cat /dev/null) <(cat /dev/null) + cat <<-EOF >expect + diff --git a$1 b$2 + --- a$1 + +++ b$2 + @@ -1 +1 @@ + -1 + +2 + EOF + ) && + test_expect_code 1 \ + git diff --no-index <(cat non/git/b) <(sed s/1/2/ non/git/c) >actual && + test_cmp expect actual +' + +test_expect_success 'diff --no-index with stat and numstat' ' + ( + set -- <(cat /dev/null) <(cat /dev/null) + min=$((${#1} < ${#2} ? ${#1} : ${#2})) + for ((i=0; i<min; i++)); do [ "${1:i:1}" = "${2:i:1}" ] || break; done + base=${1:0:i-1} + cat <<-EOF >expect1 + $base{${1#$base} => ${2#$base}} | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + EOF + cat <<-EOF >expect2 + 1 1 $base{${1#$base} => ${2#$base}} + EOF + ) && + test_expect_code 1 \ + git diff --no-index --stat <(cat non/git/a) <(sed s/1/2/ non/git/b) >actual && + test_cmp expect1 actual && + test_expect_code 1 \ + git diff --no-index --numstat <(cat non/git/a) <(sed s/1/2/ non/git/b) >actual && + test_cmp expect2 actual +' + +test_expect_success PIPE 'diff --no-index on filesystem pipes' ' + ( + cd non/git && + mkdir f g && + mkfifo f/1 g/1 && + test_expect_code 128 git diff --no-index f g && + test_expect_code 128 git diff --no-index ../../a f && + test_expect_code 128 git diff --no-index g ../../a && + test_expect_code 128 git diff --no-index f/1 g/1 && + test_expect_code 128 git diff --no-index f/1 ../../a/1 && + test_expect_code 128 git diff --no-index ../../a/1 g/1 + ) +' + +test_expect_success PIPE 'diff --no-index reads symlinks to named pipes as symlinks' ' + ( + cd non/git && + mkdir h i && + ln -s ../f/1 h/1 && + ln -s ../g/1 i/1 && + test_expect_code 1 git diff --no-index h i >actual && + cat <<-EOF >expect && + diff --git a/h/1 b/i/1 + index d0b5850..d8b9c34 120000 + --- a/h/1 + +++ b/i/1 + @@ -1 +1 @@ + -../f/1 + \ No newline at end of file + +../g/1 + \ No newline at end of file + EOF + test_cmp expect actual && + test_expect_code 1 git diff --no-index ../../a h >actual && + cat <<-EOF >expect && + diff --git a/../../a/1 b/../../a/1 + deleted file mode 100644 + index d00491f..0000000 + --- a/../../a/1 + +++ /dev/null + @@ -1 +0,0 @@ + -1 + diff --git a/h/1 b/h/1 + new file mode 120000 + index 0000000..d0b5850 + --- /dev/null + +++ b/h/1 + @@ -0,0 +1 @@ + +../f/1 + \ No newline at end of file + diff --git a/../../a/2 b/../../a/2 + deleted file mode 100644 + index 0cfbf08..0000000 + --- a/../../a/2 + +++ /dev/null + @@ -1 +0,0 @@ + -2 + EOF + test_cmp expect actual && + test_expect_code 1 git diff --no-index i ../../a >actual && + cat <<-EOF >expect && + diff --git a/i/1 b/i/1 + deleted file mode 120000 + index d8b9c34..0000000 + --- a/i/1 + +++ /dev/null + @@ -1 +0,0 @@ + -../g/1 + \ No newline at end of file + diff --git a/../../a/1 b/../../a/1 + new file mode 100644 + index 0000000..d00491f + --- /dev/null + +++ b/../../a/1 + @@ -0,0 +1 @@ + +1 + diff --git a/../../a/2 b/../../a/2 + new file mode 100644 + index 0000000..0cfbf08 + --- /dev/null + +++ b/../../a/2 + @@ -0,0 +1 @@ + +2 + EOF + test_cmp expect actual && + test_expect_code 1 git diff --no-index h/1 i/1 >actual && + cat <<-EOF >expect && + diff --git a/h/1 b/i/1 + index d0b5850..d8b9c34 120000 + --- a/h/1 + +++ b/i/1 + @@ -1 +1 @@ + -../f/1 + \ No newline at end of file + +../g/1 + \ No newline at end of file + EOF + test_cmp expect actual && + test_expect_code 1 git diff --no-index h/1 ../../a/1 >actual && + cat <<-EOF >expect && + diff --git a/h/1 b/h/1 + deleted file mode 120000 + index d0b5850..0000000 + --- a/h/1 + +++ /dev/null + @@ -1 +0,0 @@ + -../f/1 + \ No newline at end of file + diff --git a/../../a/1 b/../../a/1 + new file mode 100644 + index 0000000..d00491f + --- /dev/null + +++ b/../../a/1 + @@ -0,0 +1 @@ + +1 + EOF + test_cmp expect actual && + test_expect_code 1 git diff --no-index ../../a/1 i/1 >actual && + cat <<-EOF >expect && + diff --git a/../../a/1 b/../../a/1 + deleted file mode 100644 + index d00491f..0000000 + --- a/../../a/1 + +++ /dev/null + @@ -1 +0,0 @@ + -1 + diff --git a/i/1 b/i/1 + new file mode 120000 + index 0000000..d8b9c34 + --- /dev/null + +++ b/i/1 + @@ -0,0 +1 @@ + +../g/1 + \ No newline at end of file + EOF + test_cmp expect actual + ) +' + test_done
A very handy way to pass data to applications is to use the <() process substitution syntax in bash variants. It allow comparing files streamed from a remote server or doing on-the-fly stream processing to alter the diff. These are usually implemented as a symlink that points to a bogus name (ex "pipe:[209326419]") but opens as a pipe. Git normally tracks symlinks targets. This patch makes it detect such pipes in --no-index mode and read the file normally like it would do for stdin ("-"), so they can be compared directly. Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> --- diff-no-index.c | 56 ++++++++++-- t/t4053-diff-no-index.sh | 189 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 238 insertions(+), 7 deletions(-)