Message ID | 990e71882bfdc697285c5b04b92c290679ca22ab.1687874975.git.phillip.wood@dunelm.org.uk (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | diff --no-index: support reading from named pipes | expand |
Phillip Wood <phillip.wood123@gmail.com> writes: > From: Phillip Wood <phillip.wood@dunelm.org.uk> > > In some shells, such as bash and zsh, it's possible to use a command > substitution to provide the output of a command as a file argument to > another process, like so: > > diff -u <(printf "a\nb\n") <(printf "a\nc\n") > > However, this syntax does not produce useful results with "git diff > --no-index". On macOS, the arguments to the command are named pipes > under /dev/fd, and git diff doesn't know how to handle a named pipe. On > Linux, the arguments are symlinks to pipes, so git diff "helpfully" > diffs these symlinks, comparing their targets like "pipe:[1234]" and > "pipe:[5678]". > > To address this "diff --no-index" is changed so that if a path given on > the commandline is a named pipe or a symbolic link that resolves to a > named pipe then we read the data to diff from that pipe. This is > implemented by generalizing the code that already exists to handle > reading from stdin when the user passes the path "-". > > As process substitution is not support by POSIX this change is tested by > using a pipe and a symbolic link to a pipe. > > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> > --- > diff-no-index.c | 80 ++++++++++++++++++++++++---------------- > t/t4053-diff-no-index.sh | 25 +++++++++++++ > 2 files changed, 73 insertions(+), 32 deletions(-) This looks good, if a bit invasive, to a cursory read, at least to me. It is very focused to the real problem at hand, and shows that the way we split the "no-index" mode out to its own implementation of filespec population code does make sense. > -static void populate_from_stdin(struct diff_filespec *s) > +static void populate_from_pipe(struct diff_filespec *s, int is_stdin) > { > struct strbuf buf = STRBUF_INIT; > size_t size = 0; > + int fd = 0; > > - if (strbuf_read(&buf, 0, 0) < 0) > + if (!is_stdin) > + fd = xopen(s->path, O_RDONLY); > + if (strbuf_read(&buf, fd, 0) < 0) > die_errno("error while reading from stdin"); > + if (!is_stdin) > + close(fd); Given that the error message explicitly says "stdin", and there are many "if ([!]is_stdin)" sprinkled in the code, I actually suspect that there should be two separate helpers, one for stdin and one for non-stdin pipe. It is especially true since there is only one caller that does this: > + if (is_pipe) > + populate_from_pipe(s, name == file_from_standard_input); which can be if (is_pipe) { if (name == file_from_standard_input) populate_from_stdin(s); else populate_from_pipe(s); } without losing clarity. The code that you are sharing by forcing them to be a single helper to wrap up a handful of members in the s structure can become its own helper that is called from these two helper functions. > static int queue_diff(struct diff_options *o, > - const char *name1, const char *name2) > + const char *name1, int is_pipe1, > + const char *name2, int is_pipe2) > { > int mode1 = 0, mode2 = 0; > > - if (get_mode(name1, &mode1) || get_mode(name2, &mode2)) > + if (get_mode(name1, is_pipe1, &mode1) || > + get_mode(name2, is_pipe2, &mode2)) > return -1; Makes me wonder why the caller of queue_diff() even needs to know if these two names are pipes; we are calling get_mode() which would run stat(2) anyway, and the result from stat(2) is what you use (in the caller) to determine the values of is_pipeN. Wouldn't it be more appropriate to leave the caller oblivious of special casing of the pipes and let get_mode() handle this? After all, that is how the existing code special cases the standard input so there is a strong precedence. If we go that route, it may make sense to further isolate the "address comparison" trick used for the standard input mode. Perhaps we can and do something like static int get_mode(const char *path, int *mode, int *special) { struct stat st; + *special = 0; /* default - nothing special */ ... else if (path == file_from_standard_input) { *mode = create_ce_mode(0666); + *pipe_kind = 1; /* STDIN */ + } else if (stat(path, &st)) { + ... error ... + } else if (S_ISFIFO(st.st_mode)) { + *mode = create_ce_mode(0666); + *pipe_kind = 2; /* FIFO */ } else if (lstat(path, &st)) { ... error ... } else { *mode = st.st_mode; } and have the caller act on "special" to choose among calling populate_from_stdin(), populate_from_pipe(), or do nothing for the regular files? Side note: this has an added benefit of highlighting that we do stat() and lstat() because of dereferencing. What I suspect is that "git diff --no-index" mode was primarily to give Git niceties like rename detection and diff algorithms to those who wanted to use in contexts (i.e. contents not tracked by Git) they use "diff" by other people like GNU, without bothering to update "diff" by other people. I further suspect that "compare the readlink contents", which is very much necessary within the Git context, may not fall into the "Git niceties" when they invoke "--no-index" mode. Which leads me to imagine a future direction where we only use stat() and not lstat() in the "--no-index" codepath. Having everything including these lstat() and stat() calls inside get_mode() will allow such a future transition hopefully simpler. I do not quite see why you decided to move the "is_dir" processing up and made the caller responsible. Specifically, > - fixup_paths(paths, &replacement); > + if (!is_pipe[0] && !is_pipe[1]) > + fixup_paths(paths, is_dir, &replacement); this seems fishy when one side is pipe and the other one is not. When the user says $ git diff --no-index <(command) path fixup_paths() are bypassed because one of them is pipe. It makes me suspect that it should be an error if "path" is a directory. I do not know if fixup_paths() is the best place for doing such checking, but somebody should be doing that, no?
Hi Junio On 27/06/2023 20:44, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> In some shells, such as bash and zsh, it's possible to use a command >> substitution to provide the output of a command as a file argument to >> another process, like so: >> >> diff -u <(printf "a\nb\n") <(printf "a\nc\n") >> >> However, this syntax does not produce useful results with "git diff >> --no-index". On macOS, the arguments to the command are named pipes >> under /dev/fd, and git diff doesn't know how to handle a named pipe. On >> Linux, the arguments are symlinks to pipes, so git diff "helpfully" >> diffs these symlinks, comparing their targets like "pipe:[1234]" and >> "pipe:[5678]". >> >> To address this "diff --no-index" is changed so that if a path given on >> the commandline is a named pipe or a symbolic link that resolves to a >> named pipe then we read the data to diff from that pipe. This is >> implemented by generalizing the code that already exists to handle >> reading from stdin when the user passes the path "-". >> >> As process substitution is not support by POSIX this change is tested by >> using a pipe and a symbolic link to a pipe. >> >> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> >> --- >> diff-no-index.c | 80 ++++++++++++++++++++++++---------------- >> t/t4053-diff-no-index.sh | 25 +++++++++++++ >> 2 files changed, 73 insertions(+), 32 deletions(-) > > This looks good, if a bit invasive, to a cursory read, at least to > me. It is very focused to the real problem at hand, and shows that > the way we split the "no-index" mode out to its own implementation > of filespec population code does make sense. Yes, it is more invasive than I'd like but I think that stems from needing to treat paths given on the commandline differently to the paths we find when recursing directories. >> -static void populate_from_stdin(struct diff_filespec *s) >> +static void populate_from_pipe(struct diff_filespec *s, int is_stdin) >> { >> struct strbuf buf = STRBUF_INIT; >> size_t size = 0; >> + int fd = 0; >> >> - if (strbuf_read(&buf, 0, 0) < 0) >> + if (!is_stdin) >> + fd = xopen(s->path, O_RDONLY); >> + if (strbuf_read(&buf, fd, 0) < 0) >> die_errno("error while reading from stdin"); >> + if (!is_stdin) >> + close(fd); > > Given that the error message explicitly says "stdin", and there are > many "if ([!]is_stdin)" sprinkled in the code, I actually suspect > that there should be two separate helpers, one for stdin and one for > non-stdin pipe. It is especially true since there is only one > caller that does this: > >> + if (is_pipe) >> + populate_from_pipe(s, name == file_from_standard_input); > > which can be > > if (is_pipe) { > if (name == file_from_standard_input) > populate_from_stdin(s); > else > populate_from_pipe(s); > } > > without losing clarity. The code that you are sharing by forcing > them to be a single helper to wrap up a handful of members in the s > structure can become its own helper that is called from these two > helper functions. Sure, and thanks for pointing out the error message, I'd overlooked that. >> static int queue_diff(struct diff_options *o, >> - const char *name1, const char *name2) >> + const char *name1, int is_pipe1, >> + const char *name2, int is_pipe2) >> { >> int mode1 = 0, mode2 = 0; >> >> - if (get_mode(name1, &mode1) || get_mode(name2, &mode2)) >> + if (get_mode(name1, is_pipe1, &mode1) || >> + get_mode(name2, is_pipe2, &mode2)) >> return -1; > > Makes me wonder why the caller of queue_diff() even needs to know if > these two names are pipes; we are calling get_mode() which would run > stat(2) anyway, and the result from stat(2) is what you use (in the > caller) to determine the values of is_pipeN. Wouldn't it be more > appropriate to leave the caller oblivious of special casing of the > pipes and let get_mode() handle this? After all, that is how the > existing code special cases the standard input so there is a strong > precedence. Maybe is_pipeN should be called force_pipeN. We only want to de-reference symbolic links to pipes for paths that are given on the command line, not when the the user asked to compare two directories. That means we need to pass some kind of flag around to say whether we're recursing or not. An earlier draft of this series had a recursing flag rather than is_pipeN like this -static int get_mode(const char *path, int *mode) +static int get_mode(const char *path, int *mode, int recursing) { struct stat st; if (!path || !strcmp(path, "/dev/null")) *mode = 0; #ifdef GIT_WINDOWS_NATIVE else if (!strcasecmp(path, "nul")) *mode = 0; #endif else if (path == file_from_standard_input) *mode = create_ce_mode(0666); else if (lstat(path, &st)) return error("Could not access '%s'", path); else *mode = st.st_mode; + + /* + * For paths given on the command line de-reference symbolic + * links that resolve to a fifo. + */ + if (!recursing && + S_ISLNK(*mode) && !stat(path, &st) && S_ISFIFO(st.st_mode)) + *mode = st.st_mode; + return 0; } I dropped it in favor of is_pipeN after I realized we need to check if the paths on the command line are pipes before calling fixup_paths(). I think we could use the "special" parameter you suggest below as a recursion indicator by setting it to NULL when recursing. > If we go that route, it may make sense to further isolate the > "address comparison" trick used for the standard input mode. > Perhaps we can and do something like > > static int get_mode(const char *path, int *mode, int *special) > { > struct stat st; > > + *special = 0; /* default - nothing special */ > ... > else if (path == file_from_standard_input) { > *mode = create_ce_mode(0666); > + *pipe_kind = 1; /* STDIN */ > + } else if (stat(path, &st)) { > + ... error ... > + } else if (S_ISFIFO(st.st_mode)) { > + *mode = create_ce_mode(0666); > + *pipe_kind = 2; /* FIFO */ > } else if (lstat(path, &st)) { > ... error ... > } else { > *mode = st.st_mode; > } > > and have the caller act on "special" to choose among calling > populate_from_stdin(), populate_from_pipe(), or do nothing for > the regular files? > > Side note: this has an added benefit of highlighting that we do > stat() and lstat() because of dereferencing. What I suspect is > that "git diff --no-index" mode was primarily to give Git > niceties like rename detection and diff algorithms to those who > wanted to use in contexts (i.e. contents not tracked by Git) > they use "diff" by other people like GNU, without bothering to > update "diff" by other people. I further suspect that "compare > the readlink contents", which is very much necessary within the > Git context, may not fall into the "Git niceties" when they > invoke "--no-index" mode. Which leads me to imagine a future > direction where we only use stat() and not lstat() in the > "--no-index" codepath. Having everything including these > lstat() and stat() calls inside get_mode() will allow such a > future transition hopefully simpler. > > I do not quite see why you decided to move the "is_dir" processing > up and made the caller responsible. To avoid a second stat() call in is_directory() after we've already called stat() to see if the path is a pipe. > Specifically, > >> - fixup_paths(paths, &replacement); >> + if (!is_pipe[0] && !is_pipe[1]) >> + fixup_paths(paths, is_dir, &replacement); > > this seems fishy when one side is pipe and the other one is not. > When the user says > > $ git diff --no-index <(command) path > > fixup_paths() are bypassed because one of them is pipe. It makes me > suspect that it should be an error if "path" is a directory. I do > not know if fixup_paths() is the best place for doing such checking, > but somebody should be doing that, no? I wasn't sure what was best to do in that case. In the end I went with making the behavior the same as "git diff --no-index -- - directory". I'm happy to make both an error. Thanks for your comments, I'll leave it a couple of days to see if there are any more comments and then re-roll. Phillip
diff --git a/diff-no-index.c b/diff-no-index.c index 7b9327f8f3..45fbd7cdbe 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -41,7 +41,7 @@ static int read_directory_contents(const char *path, struct string_list *list) */ static const char file_from_standard_input[] = "-"; -static int get_mode(const char *path, int *mode) +static int get_mode(const char *path, int is_pipe, int *mode) { struct stat st; @@ -51,7 +51,7 @@ static int get_mode(const char *path, int *mode) else if (!strcasecmp(path, "nul")) *mode = 0; #endif - else if (path == file_from_standard_input) + else if (is_pipe) *mode = create_ce_mode(0666); else if (lstat(path, &st)) return error("Could not access '%s'", path); @@ -60,13 +60,18 @@ static int get_mode(const char *path, int *mode) return 0; } -static void populate_from_stdin(struct diff_filespec *s) +static void populate_from_pipe(struct diff_filespec *s, int is_stdin) { struct strbuf buf = STRBUF_INIT; size_t size = 0; + int fd = 0; - if (strbuf_read(&buf, 0, 0) < 0) + if (!is_stdin) + fd = xopen(s->path, O_RDONLY); + if (strbuf_read(&buf, fd, 0) < 0) die_errno("error while reading from stdin"); + if (!is_stdin) + close(fd); s->should_munmap = 0; s->data = strbuf_detach(&buf, &size); @@ -75,40 +80,43 @@ static void populate_from_stdin(struct diff_filespec *s) s->is_stdin = 1; } -static struct diff_filespec *noindex_filespec(const char *name, int mode) +static struct diff_filespec *noindex_filespec(const char *name, int is_pipe, + int mode) { struct diff_filespec *s; if (!name) name = "/dev/null"; s = alloc_filespec(name); fill_filespec(s, null_oid(), 0, mode); - if (name == file_from_standard_input) - populate_from_stdin(s); + if (is_pipe) + populate_from_pipe(s, name == file_from_standard_input); return s; } static int queue_diff(struct diff_options *o, - const char *name1, const char *name2) + const char *name1, int is_pipe1, + const char *name2, int is_pipe2) { int mode1 = 0, mode2 = 0; - if (get_mode(name1, &mode1) || get_mode(name2, &mode2)) + if (get_mode(name1, is_pipe1, &mode1) || + get_mode(name2, is_pipe2, &mode2)) return -1; if (mode1 && mode2 && S_ISDIR(mode1) != S_ISDIR(mode2)) { struct diff_filespec *d1, *d2; if (S_ISDIR(mode1)) { /* 2 is file that is created */ - d1 = noindex_filespec(NULL, 0); - d2 = noindex_filespec(name2, mode2); + d1 = noindex_filespec(NULL, 0, 0); + d2 = noindex_filespec(name2, is_pipe2, mode2); name2 = NULL; mode2 = 0; } else { /* 1 is file that is deleted */ - d1 = noindex_filespec(name1, mode1); - d2 = noindex_filespec(NULL, 0); + d1 = noindex_filespec(name1, is_pipe1, mode1); + d2 = noindex_filespec(NULL, 0, 0); name1 = NULL; mode1 = 0; } @@ -173,7 +181,7 @@ static int queue_diff(struct diff_options *o, n2 = buffer2.buf; } - ret = queue_diff(o, n1, n2); + ret = queue_diff(o, n1, 0, n2, 0); } string_list_clear(&p1, 0); string_list_clear(&p2, 0); @@ -189,8 +197,8 @@ static int queue_diff(struct diff_options *o, SWAP(name1, name2); } - d1 = noindex_filespec(name1, mode1); - d2 = noindex_filespec(name2, mode2); + d1 = noindex_filespec(name1, is_pipe1, mode1); + d2 = noindex_filespec(name2, is_pipe2, mode2); diff_queue(&diff_queued_diff, d1, d2); return 0; } @@ -213,18 +221,11 @@ static void append_basename(struct strbuf *path, const char *dir, const char *fi * Note that we append the basename of F to D/, so "diff a/b/file D" * becomes "diff a/b/file D/file", not "diff a/b/file D/a/b/file". */ -static void fixup_paths(const char **path, struct strbuf *replacement) +static void fixup_paths(const char **path, int *is_dir, struct strbuf *replacement) { - unsigned int isdir0, isdir1; - - if (path[0] == file_from_standard_input || - path[1] == file_from_standard_input) - return; - isdir0 = is_directory(path[0]); - isdir1 = is_directory(path[1]); - if (isdir0 == isdir1) - return; - if (isdir0) { + if (is_dir[0] == is_dir[1]) + return; + if (is_dir[0]) { append_basename(replacement, path[0], path[1]); path[0] = replacement->buf; } else { @@ -246,6 +247,8 @@ int diff_no_index(struct rev_info *revs, int ret = 1; const char *paths[2]; char *to_free[ARRAY_SIZE(paths)] = { 0 }; + int is_dir[ARRAY_SIZE(paths)] = { 0 }; + int is_pipe[ARRAY_SIZE(paths)] = { 0 }; struct strbuf replacement = STRBUF_INIT; const char *prefix = revs->prefix; struct option no_index_options[] = { @@ -267,18 +270,30 @@ int diff_no_index(struct rev_info *revs, FREE_AND_NULL(options); for (i = 0; i < 2; i++) { const char *p = argv[i]; - if (!strcmp(p, "-")) + if (!strcmp(p, "-")) { /* * stdin should be spelled as "-"; if you have * path that is "-", spell it as "./-". */ p = file_from_standard_input; - else if (prefix) - p = to_free[i] = prefix_filename(prefix, p); + is_pipe[i] = 1; + } else { + struct stat st; + + if (prefix) + p = to_free[i] = prefix_filename(prefix, p); + if (stat(p, &st)) + ; + else if (S_ISDIR(st.st_mode)) + is_dir[i] = 1; + else if (S_ISFIFO(st.st_mode)) + is_pipe[i] = 1; + } paths[i] = p; } - fixup_paths(paths, &replacement); + if (!is_pipe[0] && !is_pipe[1]) + fixup_paths(paths, is_dir, &replacement); revs->diffopt.skip_stat_unmatch = 1; if (!revs->diffopt.output_format) @@ -295,7 +310,8 @@ int diff_no_index(struct rev_info *revs, setup_diff_pager(&revs->diffopt); revs->diffopt.flags.exit_with_status = 1; - if (queue_diff(&revs->diffopt, paths[0], paths[1])) + if (queue_diff(&revs->diffopt, + paths[0], is_pipe[0], paths[1], is_pipe[1])) goto out; diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/"); diffcore_std(&revs->diffopt); diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh index d14b194ea2..30b9e2c2f0 100755 --- a/t/t4053-diff-no-index.sh +++ b/t/t4053-diff-no-index.sh @@ -223,4 +223,29 @@ test_expect_success "diff --no-index treats '-' as stdin" ' test_write_lines 1 | git diff --no-index -- a/1 - >actual && test_must_be_empty actual ' + +test_expect_success PIPE,SYMLINKS 'diff --no-index reads from pipes' ' + mkfifo old && + mkfifo new && + ln -s new new-link && + { + (test_write_lines a b c >old) & + (test_write_lines a x c >new) & + } && + + cat >expect <<-EOF && + diff --git a/old b/new-link + --- a/old + +++ b/new-link + @@ -1,3 +1,3 @@ + a + -b + +x + c + EOF + + test_expect_code 1 git diff --no-index old new-link >actual && + test_cmp expect actual +' + test_done