Message ID | ebd2a1356017905245744babf32c5b78a0af1639.1643915286.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add cat-file --batch-command flag | expand |
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode "cat-file: add --batch-command mode" perhaps. The patch touching the file "catfile.c" (which does not even exist) is an irrelevant implementation detail to spend 2 extra bytes in "git shortlog" output. > From: John Cai <johncai86@gmail.com> > > Add new flag --batch-command that will accept commands and arguments > from stdin, similar to git-update-ref --stdin. > > At GitLab, we use a pair of long running cat-file processes when > accessing object content. One for iterating over object metadata with > --batch-check, and the other to grab object contents with --batch. > > However, if we had --batch-command, we wouldnt need to keep both > processes around, and instead just have one process where we can flip > between getting object info, and getting object contents. This means we > can get rid of roughly half of long lived git cat-file processes. This > can lead to huge savings since on a given server there could be hundreds > of git cat-file processes running. Hmph, why hundreds, not two you listed? Do you mean "we have two per repository, but by combining, we can do with just one per repository, halving the number of processes"? > git cat-file --batch-command > > would enter an interactive command mode whereby the user can enter in > commands and their arguments: > > <command> [arg1] [arg2] NL > > This patch adds the basic structure for add command which can be > extended in the future to add more commands. > > This patch also adds the following commands: > > contents <object> NL > info <object> NL > > The contents command takes an <object> argument and prints out the object > contents. > > The info command takes a <object> argument and prints out the object > metadata. > > In addition, we need a set of commands that enable a "read session". > > When a separate process (A) is connected to a git cat-file process (B) This description misleads readers into thinking as if we have just created a daemon process that is running, and an unrelated process can connect to it, which obviously poses a question about securing the connection. It is my understanding that what this creates is just a consumer process (A) starts the cat-file process (B) locally on its behalf under process (A)'s privilege, and they talk over pipe without allowing any third-party to participate in the exchange, so we should avoid misleading users by saying "is connected to" here. > and is interactively writing to and reading from it in --buffer mode, > (A) needs to be able to know when the buffer is flushed to stdout. If A and B are talking over a pair pipes, in order to avoid deadlocking, both ends need to be able to control whose turn it is to speak (and it is turn for the other side to listen). A needs to be able to _control_ (not "know") when the buffer it uses to write to B gets flushed, in order to reliably say "I am done for now, it is your turn to speak" and be assured that it reaches B. The story is the same for the other side. When a request by A needs to be responded with multiple lines of output, B needs to be able to say "And that concludes my response, and I am ready to accept a new request from you" and make sure it reaches A. "know when..." is probably a wrong phrase here. > Currently, from (A)'s perspective, the only way is to either 1. exit > (B)'s process or 2. send an invalid object to stdin. 1. is not ideal > from a performance perspective as it will require spawning a new cat-file > process each time, and 2. is hacky and not a good long term solution. Writing enumeration as bulletted or enumerated list would make it much easier to read, I would think. From (A)'s perspective, the only way is to either 1. exit (B)'s process or 2. send an invalid object to stdin. 1. is not ideal from a performance perspective, as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution. I am not sure what you exactly mean by "exit" in the above. Do you mean "kill" instead? > With the following commands, process (A) can begin a "session" and > send a list of object names over stdin. When "get contents" or "get info" > is issued, this list of object names will be fed into batch_one_object() > to retrieve either info or contents. Finally an fflush() will be called > to end the session. > > begin NL > get contents NL > get info NL > > These can be used in the following way: > > begin > <sha1> > <sha1> > <sha1> > <sha1> > get info > > begin > <sha1> > <sha1> > <sha1> > <sha1> > get contents > > With this mechanism, process (A) can be guaranteed to receive all of the > output even in --buffer mode. OK, so do these "get blah" serve both as command and an implicit "flush"? With an implicit "flush", do we really need "begin"? Also, from the point of view of extensibility, not saying what kind of operation is started when given "begin" is probably not a good idea. "get info" and "get contents" may happen to be the only commands that are supported right now, and the parameters to them may happen to be just list of object names and nothing else, but what happens when a new "git frotz" command is added and its operation is extended with something other than object names and pathnames? The way to parse these parameter lines for the "get" would be different for different commands, and if "cat-file" knows upfront what is to be done to these parameters, it can even start prefetching and precomputing to reduce latency observed by the client before the final "get info" command is given. So, from that point of view, begin <cmd> <parameter> <parameter> ... end may be a better design, no?
Hi Junio, thanks for the review! On 3 Feb 2022, at 14:57, Junio C Hamano wrote: > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode > > "cat-file: add --batch-command mode" perhaps. The patch touching > the file "catfile.c" (which does not even exist) is an irrelevant > implementation detail to spend 2 extra bytes in "git shortlog" > output. > >> From: John Cai <johncai86@gmail.com> >> >> Add new flag --batch-command that will accept commands and arguments >> from stdin, similar to git-update-ref --stdin. >> >> At GitLab, we use a pair of long running cat-file processes when >> accessing object content. One for iterating over object metadata with >> --batch-check, and the other to grab object contents with --batch. >> >> However, if we had --batch-command, we wouldnt need to keep both >> processes around, and instead just have one process where we can flip >> between getting object info, and getting object contents. This means we >> can get rid of roughly half of long lived git cat-file processes. This >> can lead to huge savings since on a given server there could be hundreds >> of git cat-file processes running. > > Hmph, why hundreds, not two you listed? > > Do you mean "we have two per repository, but by combining, we can do > with just one per repository, halving the number of processes"? Yes, exactly. I'll reword this in the next version to be more clear. > >> git cat-file --batch-command >> >> would enter an interactive command mode whereby the user can enter in >> commands and their arguments: >> >> <command> [arg1] [arg2] NL >> >> This patch adds the basic structure for add command which can be >> extended in the future to add more commands. >> >> This patch also adds the following commands: >> >> contents <object> NL >> info <object> NL >> >> The contents command takes an <object> argument and prints out the object >> contents. >> >> The info command takes a <object> argument and prints out the object >> metadata. >> >> In addition, we need a set of commands that enable a "read session". >> >> When a separate process (A) is connected to a git cat-file process (B) > > This description misleads readers into thinking as if we have just > created a daemon process that is running, and an unrelated process > can connect to it, which obviously poses a question about securing > the connection. It is my understanding that what this creates is > just a consumer process (A) starts the cat-file process (B) locally > on its behalf under process (A)'s privilege, and they talk over pipe > without allowing any third-party to participate in the exchange, so > we should avoid misleading users by saying "is connected to" here. Yes this understanding is correct. Will fix wording in next version > >> and is interactively writing to and reading from it in --buffer mode, >> (A) needs to be able to know when the buffer is flushed to stdout. > > If A and B are talking over a pair pipes, in order to avoid > deadlocking, both ends need to be able to control whose turn it is > to speak (and it is turn for the other side to listen). A needs to > be able to _control_ (not "know") when the buffer it uses to write > to B gets flushed, in order to reliably say "I am done for now, it > is your turn to speak" and be assured that it reaches B. The story > is the same for the other side. When a request by A needs to be > responded with multiple lines of output, B needs to be able to say > "And that concludes my response, and I am ready to accept a new > request from you" and make sure it reaches A. "know when..." is > probably a wrong phrase here. Correct, "know" is not exactly right. _control_ would be the more accurate description. > >> Currently, from (A)'s perspective, the only way is to either 1. exit >> (B)'s process or 2. send an invalid object to stdin. 1. is not ideal >> from a performance perspective as it will require spawning a new cat-file >> process each time, and 2. is hacky and not a good long term solution. > > Writing enumeration as bulletted or enumerated list would make it > much easier to read, I would think. > > From (A)'s perspective, the only way is to either > > 1. exit (B)'s process or > 2. send an invalid object to stdin. > > 1. is not ideal from a performance perspective, as it will > require spawning a new cat-file process each time, and 2. is > hacky and not a good long term solution. > > I am not sure what you exactly mean by "exit" in the above. Do you > mean "kill" instead? Yes > >> With the following commands, process (A) can begin a "session" and >> send a list of object names over stdin. When "get contents" or "get info" >> is issued, this list of object names will be fed into batch_one_object() >> to retrieve either info or contents. Finally an fflush() will be called >> to end the session. >> >> begin NL >> get contents NL >> get info NL >> >> These can be used in the following way: >> >> begin >> <sha1> >> <sha1> >> <sha1> >> <sha1> >> get info >> >> begin >> <sha1> >> <sha1> >> <sha1> >> <sha1> >> get contents >> >> With this mechanism, process (A) can be guaranteed to receive all of the >> output even in --buffer mode. > > OK, so do these "get blah" serve both as command and an implicit > "flush"? yes, that's the idea. > > With an implicit "flush", do we really need "begin"? > > Also, from the point of view of extensibility, not saying what kind > of operation is started when given "begin" is probably not a good > idea. "get info" and "get contents" may happen to be the only > commands that are supported right now, and the parameters to them > may happen to be just list of object names and nothing else, but > what happens when a new "git frotz" command is added and its > operation is extended with something other than object names and > pathnames? The way to parse these parameter lines for the "get" > would be different for different commands, and if "cat-file" knows > upfront what is to be done to these parameters, it can even start > prefetching and precomputing to reduce latency observed by the > client before the final "get info" command is given. > > So, from that point of view, > > begin <cmd> > <parameter> > <parameter> > ... > end > > may be a better design, no? Good point. Now I'm wondering if we can simplify where commands get queued up and a "get" will execute them along with an implicit flush. <cmd> <parameter> <cmd> <parameter> <cmd> <parameter> get
()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget <gitgitgadget@gmail.com> wrote: > Add new flag --batch-command that will accept commands and arguments > from stdin, similar to git-update-ref --stdin. > > contents <object> NL > info <object> NL > > With the following commands, process (A) can begin a "session" and > send a list of object names over stdin. When "get contents" or "get info" > is issued, this list of object names will be fed into batch_one_object() > to retrieve either info or contents. Finally an fflush() will be called > to end the session. > > begin > <sha1> > get info > > begin > <sha1> > get contents I had the same reaction to this new command set as Junio expressed upstream[1], and was prepared to suggest an alternative but Junio said everything I wanted to say, so I won't say anything more about it here. That aside, the implementation presented here is overly loose about malformed input and accepts all sorts of bogus invocations which it should be reporting as errors. For instance, a lone: get info without a preceding `begin` should be an error but such bogus input is not diagnosed. `get contents` is likewise affected. Another example: begin <oid> <EOF> which lacks a closing `get info` or `get contents` is silently ignored. Similarly, malformed: begin begin ... should be reported as an error but is not. There is also a bug in which the accumulated list of OID's is never cleared. Thus: begin <oid> get info get info emits info about <oid> twice, once for each `get info` invocation. Similarly: begin <oid1> get info <oid2> get info emits information about <oid1> twice and <oid2> once. Invoking `begin` between the `get info` commands doesn't help because `begin` is a no-op. Thus: begin <oid1> get info begin <oid2> get info likewise emits information about <oid1> twice and <oid2> once. It also incorrectly accepts non-"session" commands in the middle of a session. For instance: begin <oid1> info <oid2> <oid3> immediately emits information about <oid2> and then bombs out claiming that <oid3> is an "unknown command" because the `info` command -- which should not have been allowed within a session -- prematurely ended the "session". The `info` and `contents` commands neglect to do any sort of validation of their arguments, thus any and all bogus invocations are accepted. Thus: info info <arg1> <arg2> info <non-oid> are all accepted as valid invocations, misleadingly producing "<foo> missing" messages, rather than erroring out as they should. The "<oid> missing" message should be reserved for the case when the lone argument to `info` or `contents` is something which looks like a legitimate OID. The above is a long-winded way of saying that it is important not only to check the obvious "expect success" cases when implementing a new feature, but it's also important to add checks for the "expect failure" cases, such as all the above malformed inputs. It's subjective, but it feels like this implementation is trying to be too clever by handling all these cases via a single strbuf_getline() loop in batch_objects_command(). It would be easier to reason about, and would have avoided some of the above problems, for instance, if handling of `begin` was split out to its own function which looped over strbuf_getline() itself, thus could easily have detected premature EOF (lacking `get info` or `get contents`) and likewise would not have allowed `info` or `contents` commands to be executed within a "session". Similarly (again subjective), the generic command dispatcher seems over-engineered and perhaps contributed to the oversight of `info` and `contents` failing to perform validation of their arguments. A simpler hand-rolled command-response loop is more common on this project and often easier to reason about. Perhaps something like this (pseudo-code): while (strbuf_getline(&input, stdin)) { char *end_cmd = strchrnul(&input.buf, ' '); const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd; *end_cmd = '\0'; if (strcmp(input.buf, "info")) show_info(argstart); else if (strcmp(input.buf, "contents")) show_contents(argstart); else if (strcmp(input.buf, "begin")) begin_session(argstart); else die(...); } and each of the command-handler functions would perform its own argument validation (including the case when no argument should be present). [1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/ > +static void parse_cmd_begin(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data, > + struct string_list revs) > +{ > + /* nothing needs to be done here */ > +} Either this function should be clearing the list of collected OID's or... > +static void parse_cmd_get(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data, > + struct string_list revs) > +{ > + struct string_list_item *item; > + for_each_string_list_item(item, &revs) { > + batch_one_object(item->string, output, opt, data); > + } > + if (opt->buffer_output) > + fflush(stdout); ... this function should do so after it's done processing the OID list. > +} > +static void parse_cmd_get_info(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data, > + struct string_list revs) Missing blank line before the function definition. > +static void parse_cmd_get_objects(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data, > + struct string_list revs) > +{ > + opt->print_contents = 1; > + parse_cmd_get(opt, line, output, data, revs); > + if (opt->buffer_output) > + fflush(stdout); > +} Why does this function duplicate the flushing logic of parse_cmd_get() immediately after calling parse_cmd_get()? > +static void batch_objects_command(struct batch_options *opt, > + struct strbuf *output, > + struct expand_data *data) > +{ > + /* Read each line dispatch its command */ Missing "and". > + while (!strbuf_getline(&input, stdin)) { > + if (state == BATCH_STATE_COMMAND) { > + if (*input.buf == '\n') > + die("empty command in input"); This never triggers because strbuf_getline() removes the line terminator before returning. > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); > + } > + > + for (i = 0; i < ARRAY_SIZE(commands); i++) { > + if (!skip_prefix(input.buf, prefix, &cmd_end)) > + continue; > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ > + c = commands[i].takes_args ? ' ' : '\n'; > + if (*cmd_end && *cmd_end != c) > + die("arguments invalid for command: %s", commands[i].prefix); Ditto regarding strbuf_getline() removing line-terminator. > + cmd = &commands[i]; > + if (cmd->takes_args) > + p = cmd_end + 1; > + break; > + } > + > + if (input.buf[input.len - 1] == '\n') > + input.buf[--input.len] = '\0'; Ditto again. This is especially scary since it's accessing element `input.len-1` without even checking if `input.len` is greater than zero. Also, it's better to use published strbuf API rather than mucking with the internals: strbuf_setlen(&input, input.len - 1); > + if (state == BATCH_STATE_INPUT && !cmd){ > + string_list_append(&revs, input.buf); > + continue; > + } > + > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + state = cmd->next_state; > + cmd->fn(opt, p, output, data, revs); > + } > + strbuf_release(&input); > + string_list_clear(&revs, 0); > +}
On Thu, Feb 03 2022, John Cai via GitGitGadget wrote: > From: John Cai <johncai86@gmail.com> [Trying not to pile on and mentioning some things others haven't, but maybe there'll be duplications] > requested batch operation on all objects in the repository and > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 858bca208ff..29d5cd6857b 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -26,6 +26,7 @@ struct batch_options { > int unordered; > int mode; /* may be 'w' or 'c' for --filters or --textconv */ > const char *format; > + int command; > }; Not sure why it's not added to "int mode", but in any case post-ab/cat-file that might be clearer... > + /* Read each line dispatch its command */ > + while (!strbuf_getline(&input, stdin)) { I think comments that are obvious from the code are probably best dropped. We're just doing a fairly obvious consume stdin pattern here. > + int i; > + const struct parse_cmd *cmd = NULL; > + const char *p, *cmd_end; > + > + if (state == BATCH_STATE_COMMAND) { > + if (*input.buf == '\n') > + die("empty command in input"); > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); These should use _() to mark strings for translation, and let's quote %s like '%s' > + } > + > + for (i = 0; i < ARRAY_SIZE(commands); i++) { > + const char *prefix = commands[i].prefix; > + char c; nit: add a \n between variable decls and code. > + if (!skip_prefix(input.buf, prefix, &cmd_end)) > + continue; > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ I'd say ditto on the "the code says that" comments... > + c = commands[i].takes_args ? ' ' : '\n'; > + if (*cmd_end && *cmd_end != c) > + die("arguments invalid for command: %s", commands[i].prefix); > + > + cmd = &commands[i]; > + if (cmd->takes_args) > + p = cmd_end + 1; > + break; > + } > + > + if (input.buf[input.len - 1] == '\n') > + input.buf[--input.len] = '\0'; Don't we mean strbuf_trim_trailing_newline() here, or do we not want Windows newlines to be accepted? But more generally doesn't one of the strbuf_getline_*() functions do the right thing here already? > + > + if (state == BATCH_STATE_INPUT && !cmd){ > + string_list_append(&revs, input.buf); Nit: You can save yourself some malloc() churn here with: string_list_append_nodup(..., strbuf_detach(&input, NULL)); I.e. we're looping over the input, here we're done, so we might as well steal the already alloc'd string.... > + continue; > + } > + > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + state = cmd->next_state; > + cmd->fn(opt, p, output, data, revs); > + } > + strbuf_release(&input); > + string_list_clear(&revs, 0); ...and these will do the right thing, as strbuf will notice the string is stolen (it'll be the slopbuf again), and due to the combination of *_DUP and *_nodup() we'll properly free it here too. > +} > + > static int batch_objects(struct batch_options *opt) > { > struct strbuf input = STRBUF_INIT; > @@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt) > struct expand_data data; > int save_warning; > int retval = 0; > + const int command = opt->command; > > if (!opt->format) > opt->format = "%(objectname) %(objecttype) %(objectsize)"; > @@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt) > save_warning = warn_on_object_refname_ambiguity; > warn_on_object_refname_ambiguity = 0; > > - while (strbuf_getline(&input, stdin) != EOF) { > - if (data.split_on_whitespace) { > - /* > - * Split at first whitespace, tying off the beginning > - * of the string and saving the remainder (or NULL) in > - * data.rest. > - */ > - char *p = strpbrk(input.buf, " \t"); > - if (p) { > - while (*p && strchr(" \t", *p)) > - *p++ = '\0'; > + if (command) > + batch_objects_command(opt, &output, &data); > + else { Style: {} braces for all arms if one requires it. > + while (strbuf_getline(&input, stdin) != EOF) { > + if (data.split_on_whitespace) { diff nit: maybe we can find some way to not require re-indenting the existing code. E.g.: if (command) { batch_objects_command(...); goto cleanup; } ... > + /* > + * Split at first whitespace, tying off the beginning > + * of the string and saving the remainder (or NULL) in > + * data.rest. > + */ > + char *p = strpbrk(input.buf, " \t"); > + if (p) { > + while (*p && strchr(" \t", *p)) > + *p++ = '\0'; > + } > + data.rest = p; > } > - data.rest = p; > + batch_one_object(input.buf, &output, opt, &data); > } > - > - batch_one_object(input.buf, &output, opt, &data); > } > ...and then just add a "cleanup:" label here. > strbuf_release(&input); > @@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt, > > bo->enabled = 1; > bo->print_contents = !strcmp(opt->long_name, "batch"); > + bo->command = !strcmp(opt->long_name, "batch-command"); > bo->format = arg; > > return 0; > @@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) > N_("show info about objects fed from the standard input"), > PARSE_OPT_OPTARG | PARSE_OPT_NONEG, > batch_option_callback), > + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), You're either missing a string here in "", or we don't need N_() to mark it for translation. > + N_("enters batch mode that accepts commands"), > + PARSE_OPT_OPTARG | PARSE_OPT_NONEG, > + batch_option_callback), > + > OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks, > N_("follow in-tree symlinks (used with --batch or --batch-check)")), > OPT_BOOL(0, "batch-all-objects", &batch.all_objects, > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > index 39382fa1958..7360d049113 100755 > --- a/t/t1006-cat-file.sh > +++ b/t/t1006-cat-file.sh > @@ -85,6 +85,34 @@ $content" > test_cmp expect actual > ' > > + test -z "$content" || > + test_expect_success "--batch-command output of $type content is correct" ' > + maybe_remove_timestamp "$batch_output" $no_ts >expect && > + maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual && > + test_cmp expect actual > + ' > + > + test -z "$content" || > + test_expect_success "--batch-command session for $type content is correct" ' > + maybe_remove_timestamp "$batch_output" $no_ts >expect && > + maybe_remove_timestamp \ > + "$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \ > + $no_ts >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "--batch-command output of $type info is correct" ' > + echo "$sha1 $type $size" >expect && > + echo "info $sha1" | git cat-file --batch-command >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "--batch-command session for $type info is correct" ' > + echo "$sha1 $type $size" >expect && > + test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual && > + test_cmp expect actual > + ' > + > test_expect_success "custom --batch-check format" ' > echo "$type $sha1" >expect && > echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && > @@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' > ' > > tree_sha1=$(git write-tree) > + stray newline addition. > tree_size=$(($(test_oid rawsz) + 13)) > tree_pretty_content="100644 blob $hello_sha1 hello" > > @@ -175,7 +204,7 @@ test_expect_success \ > "Reach a blob from a tag pointing to it" \ > "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" > > -for batch in batch batch-check > +for batch in batch batch-check batch-command > do > for opt in t s e p > do > @@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" ' > "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)" > ' > > +test_expect_success "--batch-command with multiple sha1s gives correct format" ' > + echo "$batch_check_output" >expect && > + echo begin >input && > + echo_without_newline "$batch_check_input" >>input && > + echo "get info" >>input && > + git cat-file --batch-command <input >actual && > + test_cmp expect actual > +' indentation with spaces, \t correctly used for the rest.
Hi John On 04/02/2022 04:11, John Cai wrote: > Hi Junio, thanks for the review! > [...] >> So, from that point of view, >> >> begin <cmd> >> <parameter> >> <parameter> >> ... >> end >> >> may be a better design, no? > > Good point. Now I'm wondering if we can simplify where commands get queued up > and a "get" will execute them along with an implicit flush. > > <cmd> <parameter> > <cmd> <parameter> > <cmd> <parameter> > get I think that would be an improvement (I'd suggest calling "get" "flush" instead), the begin ... get <cmd> sequence seems unnecessarily complex compared the the RFC of this series. If the user gives --buffer on the command line then we can buffer the input until we see a "flush" command and if the user does not give --buffer on the command line then we should flush the output after each command. Best Wishes Phillip
Hi John/Ævar On 04/02/2022 12:11, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Feb 03 2022, John Cai via GitGitGadget wrote: > >> From: John Cai <johncai86@gmail.com> >> + if (state == BATCH_STATE_INPUT && !cmd){ >> + string_list_append(&revs, input.buf); > > Nit: You can save yourself some malloc() churn here with: > > string_list_append_nodup(..., strbuf_detach(&input, NULL)); > > I.e. we're looping over the input, here we're done, so we might as well > steal the already alloc'd string.... If we do that then the strbuf will have to reallocate its buffer when it reads the next input line on the next iteration of the loop. As the strbuf will have likely over allocated the buffer using the string_list_append_nodup() + strbuf_detach() will will be less efficient overall than using string_list_append(). Best Wishes Phillip >> + continue; >> + } >> + >> + if (!cmd) >> + die("unknown command: %s", input.buf); >> + >> + state = cmd->next_state; >> + cmd->fn(opt, p, output, data, revs); >> + } >> + strbuf_release(&input); >> + string_list_clear(&revs, 0); > > ...and these will do the right thing, as strbuf will notice the string > is stolen (it'll be the slopbuf again), and due to the combination of > *_DUP and *_nodup() we'll properly free it here too. > >> +} >> + >> static int batch_objects(struct batch_options *opt) >> { >> struct strbuf input = STRBUF_INIT; >> @@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt) >> struct expand_data data; >> int save_warning; >> int retval = 0; >> + const int command = opt->command; >> >> if (!opt->format) >> opt->format = "%(objectname) %(objecttype) %(objectsize)"; >> @@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt) >> save_warning = warn_on_object_refname_ambiguity; >> warn_on_object_refname_ambiguity = 0; >> >> - while (strbuf_getline(&input, stdin) != EOF) { >> - if (data.split_on_whitespace) { >> - /* >> - * Split at first whitespace, tying off the beginning >> - * of the string and saving the remainder (or NULL) in >> - * data.rest. >> - */ >> - char *p = strpbrk(input.buf, " \t"); >> - if (p) { >> - while (*p && strchr(" \t", *p)) >> - *p++ = '\0'; >> + if (command) >> + batch_objects_command(opt, &output, &data); >> + else { > > Style: {} braces for all arms if one requires it. > >> + while (strbuf_getline(&input, stdin) != EOF) { >> + if (data.split_on_whitespace) { > > diff nit: maybe we can find some way to not require re-indenting the existing code. E.g.: > > if (command) { > batch_objects_command(...); > goto cleanup; > } > > ... > >> + /* >> + * Split at first whitespace, tying off the beginning >> + * of the string and saving the remainder (or NULL) in >> + * data.rest. >> + */ >> + char *p = strpbrk(input.buf, " \t"); >> + if (p) { >> + while (*p && strchr(" \t", *p)) >> + *p++ = '\0'; >> + } >> + data.rest = p; >> } >> - data.rest = p; >> + batch_one_object(input.buf, &output, opt, &data); >> } >> - >> - batch_one_object(input.buf, &output, opt, &data); >> } >> > > ...and then just add a "cleanup:" label here. > >> strbuf_release(&input); >> @@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt, >> >> bo->enabled = 1; >> bo->print_contents = !strcmp(opt->long_name, "batch"); >> + bo->command = !strcmp(opt->long_name, "batch-command"); >> bo->format = arg; >> >> return 0; >> @@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) >> N_("show info about objects fed from the standard input"), >> PARSE_OPT_OPTARG | PARSE_OPT_NONEG, >> batch_option_callback), >> + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), > > You're either missing a string here in "", or we don't need N_() to mark > it for translation. > >> + N_("enters batch mode that accepts commands"), >> + PARSE_OPT_OPTARG | PARSE_OPT_NONEG, >> + batch_option_callback), >> + >> OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks, >> N_("follow in-tree symlinks (used with --batch or --batch-check)")), >> OPT_BOOL(0, "batch-all-objects", &batch.all_objects, >> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh >> index 39382fa1958..7360d049113 100755 >> --- a/t/t1006-cat-file.sh >> +++ b/t/t1006-cat-file.sh >> @@ -85,6 +85,34 @@ $content" >> test_cmp expect actual >> ' >> >> + test -z "$content" || >> + test_expect_success "--batch-command output of $type content is correct" ' >> + maybe_remove_timestamp "$batch_output" $no_ts >expect && >> + maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual && >> + test_cmp expect actual >> + ' >> + >> + test -z "$content" || >> + test_expect_success "--batch-command session for $type content is correct" ' >> + maybe_remove_timestamp "$batch_output" $no_ts >expect && >> + maybe_remove_timestamp \ >> + "$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \ >> + $no_ts >actual && >> + test_cmp expect actual >> + ' >> + >> + test_expect_success "--batch-command output of $type info is correct" ' >> + echo "$sha1 $type $size" >expect && >> + echo "info $sha1" | git cat-file --batch-command >actual && >> + test_cmp expect actual >> + ' >> + >> + test_expect_success "--batch-command session for $type info is correct" ' >> + echo "$sha1 $type $size" >expect && >> + test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual && >> + test_cmp expect actual >> + ' >> + >> test_expect_success "custom --batch-check format" ' >> echo "$type $sha1" >expect && >> echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && >> @@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' >> ' >> >> tree_sha1=$(git write-tree) >> + > > stray newline addition. > >> tree_size=$(($(test_oid rawsz) + 13)) >> tree_pretty_content="100644 blob $hello_sha1 hello" >> >> @@ -175,7 +204,7 @@ test_expect_success \ >> "Reach a blob from a tag pointing to it" \ >> "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" >> >> -for batch in batch batch-check >> +for batch in batch batch-check batch-command >> do >> for opt in t s e p >> do >> @@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" ' >> "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)" >> ' >> >> +test_expect_success "--batch-command with multiple sha1s gives correct format" ' >> + echo "$batch_check_output" >expect && >> + echo begin >input && >> + echo_without_newline "$batch_check_input" >>input && >> + echo "get info" >>input && >> + git cat-file --batch-command <input >actual && >> + test_cmp expect actual >> +' > > indentation with spaces, \t correctly used for the rest.
Hi Eric, I appreciate the feedback and the thorough review! On 4 Feb 2022, at 1:45, Eric Sunshine wrote: > ()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> Add new flag --batch-command that will accept commands and arguments >> from stdin, similar to git-update-ref --stdin. >> >> contents <object> NL >> info <object> NL >> >> With the following commands, process (A) can begin a "session" and >> send a list of object names over stdin. When "get contents" or "get info" >> is issued, this list of object names will be fed into batch_one_object() >> to retrieve either info or contents. Finally an fflush() will be called >> to end the session. >> >> begin >> <sha1> >> get info >> >> begin >> <sha1> >> get contents > > I had the same reaction to this new command set as Junio expressed > upstream[1], and was prepared to suggest an alternative but Junio said > everything I wanted to say, so I won't say anything more about it > here. > > That aside, the implementation presented here is overly loose about > malformed input and accepts all sorts of bogus invocations which it > should be reporting as errors. For instance, a lone: > > get info > > without a preceding `begin` should be an error but such bogus input is > not diagnosed. `get contents` is likewise affected. Another example: > > begin > <oid> > <EOF> > > which lacks a closing `get info` or `get contents` is silently > ignored. Similarly, malformed: > > begin > begin > ... > > should be reported as an error but is not. > > There is also a bug in which the accumulated list of OID's is never > cleared. Thus: > > begin > <oid> > get info > get info > > emits info about <oid> twice, once for each `get info` invocation. Similarly: > > begin > <oid1> > get info > <oid2> > get info > > emits information about <oid1> twice and <oid2> once. Invoking `begin` > between the `get info` commands doesn't help because `begin` is a > no-op. Thus: > > begin > <oid1> > get info > begin > <oid2> > get info > > likewise emits information about <oid1> twice and <oid2> once. > > It also incorrectly accepts non-"session" commands in the middle of a > session. For instance: > > begin > <oid1> > info <oid2> > <oid3> > > immediately emits information about <oid2> and then bombs out claiming > that <oid3> is an "unknown command" because the `info` command -- > which should not have been allowed within a session -- prematurely > ended the "session". > > The `info` and `contents` commands neglect to do any sort of > validation of their arguments, thus any and all bogus invocations are > accepted. Thus: > > info > info <arg1> <arg2> > info <non-oid> > > are all accepted as valid invocations, misleadingly producing "<foo> > missing" messages, rather than erroring out as they should. The "<oid> > missing" message should be reserved for the case when the lone > argument to `info` or `contents` is something which looks like a > legitimate OID. So actually the argument to info and contents doesn't have to be an OID but an object name that eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My goal was to maintain this behavior in --batch-command. > > The above is a long-winded way of saying that it is important not only > to check the obvious "expect success" cases when implementing a new > feature, but it's also important to add checks for the "expect > failure" cases, such as all the above malformed inputs. I overall agree and will add test coverage for invalid input. > It's subjective, but it feels like this implementation is trying to be > too clever by handling all these cases via a single strbuf_getline() > loop in batch_objects_command(). It would be easier to reason about, > and would have avoided some of the above problems, for instance, if > handling of `begin` was split out to its own function which looped > over strbuf_getline() itself, thus could easily have detected > premature EOF (lacking `get info` or `get contents`) and likewise > would not have allowed `info` or `contents` commands to be executed > within a "session". > > Similarly (again subjective), the generic command dispatcher seems > over-engineered and perhaps contributed to the oversight of `info` and > `contents` failing to perform validation of their arguments. A simpler > hand-rolled command-response loop is more common on this project and > often easier to reason about. Perhaps something like this > (pseudo-code): > > while (strbuf_getline(&input, stdin)) { > char *end_cmd = strchrnul(&input.buf, ' '); > const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd; > *end_cmd = '\0'; > if (strcmp(input.buf, "info")) > show_info(argstart); > else if (strcmp(input.buf, "contents")) > show_contents(argstart); > else if (strcmp(input.buf, "begin")) > begin_session(argstart); > else > die(...); > } I think I agree that the code in this patch is a little hard to follow, but now I'm planning to simplify the design by getting rid of "begin"[2]. I'm hoping this change will make the code easier to reason about. 2. https://lore.kernel.org/git/767d5f5a-8395-78bc-865f-a39acc39e061@gmail.com/ > > and each of the command-handler functions would perform its own > argument validation (including the case when no argument should be > present). > > [1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/ > >> +static void parse_cmd_begin(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data, >> + struct string_list revs) >> +{ >> + /* nothing needs to be done here */ >> +} > > Either this function should be clearing the list of collected OID's or... > >> +static void parse_cmd_get(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data, >> + struct string_list revs) >> +{ >> + struct string_list_item *item; >> + for_each_string_list_item(item, &revs) { >> + batch_one_object(item->string, output, opt, data); >> + } >> + if (opt->buffer_output) >> + fflush(stdout); > > ... this function should do so after it's done processing the OID list. > >> +} >> +static void parse_cmd_get_info(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data, >> + struct string_list revs) > > Missing blank line before the function definition. > >> +static void parse_cmd_get_objects(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data, >> + struct string_list revs) >> +{ >> + opt->print_contents = 1; >> + parse_cmd_get(opt, line, output, data, revs); >> + if (opt->buffer_output) >> + fflush(stdout); >> +} > > Why does this function duplicate the flushing logic of parse_cmd_get() > immediately after calling parse_cmd_get()? > >> +static void batch_objects_command(struct batch_options *opt, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + /* Read each line dispatch its command */ > > Missing "and". > >> + while (!strbuf_getline(&input, stdin)) { >> + if (state == BATCH_STATE_COMMAND) { >> + if (*input.buf == '\n') >> + die("empty command in input"); > > This never triggers because strbuf_getline() removes the line > terminator before returning. yes, this was an oversight. thanks for catching it! > >> + else if (isspace(*input.buf)) >> + die("whitespace before command: %s", input.buf); >> + } >> + >> + for (i = 0; i < ARRAY_SIZE(commands); i++) { >> + if (!skip_prefix(input.buf, prefix, &cmd_end)) >> + continue; >> + /* >> + * If the command has arguments, verify that it's >> + * followed by a space. Otherwise, it shall be followed >> + * by a line terminator. >> + */ >> + c = commands[i].takes_args ? ' ' : '\n'; >> + if (*cmd_end && *cmd_end != c) >> + die("arguments invalid for command: %s", commands[i].prefix); > > Ditto regarding strbuf_getline() removing line-terminator. > >> + cmd = &commands[i]; >> + if (cmd->takes_args) >> + p = cmd_end + 1; >> + break; >> + } >> + >> + if (input.buf[input.len - 1] == '\n') >> + input.buf[--input.len] = '\0'; > > Ditto again. This is especially scary since it's accessing element > `input.len-1` without even checking if `input.len` is greater than > zero. I realize we actually don't need this if block at all because strubf_getline() strips the line terminator for us already. > > Also, it's better to use published strbuf API rather than mucking with > the internals: > > strbuf_setlen(&input, input.len - 1); > >> + if (state == BATCH_STATE_INPUT && !cmd){ >> + string_list_append(&revs, input.buf); >> + continue; >> + } >> + >> + if (!cmd) >> + die("unknown command: %s", input.buf); >> + >> + state = cmd->next_state; >> + cmd->fn(opt, p, output, data, revs); >> + } >> + strbuf_release(&input); >> + string_list_clear(&revs, 0); >> +}
On Fri, Feb 4, 2022 at 4:42 PM John Cai <johncai86@gmail.com> wrote: > On 4 Feb 2022, at 1:45, Eric Sunshine wrote: > > The `info` and `contents` commands neglect to do any sort of > > validation of their arguments, thus any and all bogus invocations are > > accepted. Thus: > > > > info > > info <arg1> <arg2> > > info <non-oid> > > > > are all accepted as valid invocations, misleadingly producing "<foo> > > missing" messages, rather than erroring out as they should. The "<oid> > > missing" message should be reserved for the case when the lone > > argument to `info` or `contents` is something which looks like a > > legitimate OID. > > So actually the argument to info and contents doesn't have to be an OID but an object name that > eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the > same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My > goal was to maintain this behavior in --batch-command. Okay, that makes sense. I was misled by the commit message talking about "<sha1>" in its examples, and didn't do my due diligence by digging more deeply into the existing documentation and code.
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 27b27e2b300..d1ba0b12e54 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -90,6 +90,33 @@ OPTIONS need to specify the path, separated by whitespace. See the section `BATCH OUTPUT` below for details. +--batch-command:: + Enter a command mode that reads commands and arguments from stdin. + May not be combined with any other options or arguments except + `--textconv` or `--filters`, in which case the input lines also need to + specify the path, separated by whitespace. See the section + `BATCH OUTPUT` below for details. + +contents <object>:: + Print object contents for object reference <object> + +info <object>:: + Print object info for object reference <object> + +begin:: + Begins a session to read object names off of stdin. A session can be + terminated with `get contents` or `get info`. + +get contents:: + After a read session is begun with the `begin` command, and object + names have been fed into stdin, end the session and retrieve contents of + all the objects requested. + +get info:: + After a read session is begun with the `begin` command, and object + names have been fed into stdin, end the session and retrieve info of + all the objects requested. + --batch-all-objects:: Instead of reading a list of objects on stdin, perform the requested batch operation on all objects in the repository and diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 858bca208ff..29d5cd6857b 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -26,6 +26,7 @@ struct batch_options { int unordered; int mode; /* may be 'w' or 'c' for --filters or --textconv */ const char *format; + int command; }; static const char *force_path; @@ -512,6 +513,151 @@ static int batch_unordered_packed(const struct object_id *oid, data); } +static void parse_cmd_object(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + opt->print_contents = 1; + batch_one_object(line, output, opt, data); +} + +static void parse_cmd_info(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + opt->print_contents = 0; + batch_one_object(line, output, opt, data); +} + +static void parse_cmd_begin(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + /* nothing needs to be done here */ +} + +static void parse_cmd_get(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + struct string_list_item *item; + for_each_string_list_item(item, &revs) { + batch_one_object(item->string, output, opt, data); + } + if (opt->buffer_output) + fflush(stdout); +} +static void parse_cmd_get_info(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + opt->print_contents = 0; + parse_cmd_get(opt, line, output, data, revs); +} + +static void parse_cmd_get_objects(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data, + struct string_list revs) +{ + opt->print_contents = 1; + parse_cmd_get(opt, line, output, data, revs); + if (opt->buffer_output) + fflush(stdout); +} + +enum batch_state { + BATCH_STATE_COMMAND, + BATCH_STATE_INPUT, +}; + +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *, + struct strbuf *, struct expand_data *, + struct string_list revs); + +static const struct parse_cmd { + const char *prefix; + parse_cmd_fn_t fn; + unsigned takes_args; + enum batch_state next_state; +} commands[] = { + { "contents", parse_cmd_object, 1, BATCH_STATE_COMMAND}, + { "info", parse_cmd_info, 1, BATCH_STATE_COMMAND}, + { "begin", parse_cmd_begin, 0, BATCH_STATE_INPUT}, + { "get info", parse_cmd_get_info, 0, BATCH_STATE_COMMAND}, + { "get contents", parse_cmd_get_objects, 0, BATCH_STATE_COMMAND}, +}; + +static void batch_objects_command(struct batch_options *opt, + struct strbuf *output, + struct expand_data *data) +{ + struct strbuf input = STRBUF_INIT; + enum batch_state state = BATCH_STATE_COMMAND; + struct string_list revs = STRING_LIST_INIT_DUP; + + /* Read each line dispatch its command */ + while (!strbuf_getline(&input, stdin)) { + int i; + const struct parse_cmd *cmd = NULL; + const char *p, *cmd_end; + + if (state == BATCH_STATE_COMMAND) { + if (*input.buf == '\n') + die("empty command in input"); + else if (isspace(*input.buf)) + die("whitespace before command: %s", input.buf); + } + + for (i = 0; i < ARRAY_SIZE(commands); i++) { + const char *prefix = commands[i].prefix; + char c; + if (!skip_prefix(input.buf, prefix, &cmd_end)) + continue; + /* + * If the command has arguments, verify that it's + * followed by a space. Otherwise, it shall be followed + * by a line terminator. + */ + c = commands[i].takes_args ? ' ' : '\n'; + if (*cmd_end && *cmd_end != c) + die("arguments invalid for command: %s", commands[i].prefix); + + cmd = &commands[i]; + if (cmd->takes_args) + p = cmd_end + 1; + break; + } + + if (input.buf[input.len - 1] == '\n') + input.buf[--input.len] = '\0'; + + if (state == BATCH_STATE_INPUT && !cmd){ + string_list_append(&revs, input.buf); + continue; + } + + if (!cmd) + die("unknown command: %s", input.buf); + + state = cmd->next_state; + cmd->fn(opt, p, output, data, revs); + } + strbuf_release(&input); + string_list_clear(&revs, 0); +} + static int batch_objects(struct batch_options *opt) { struct strbuf input = STRBUF_INIT; @@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt) struct expand_data data; int save_warning; int retval = 0; + const int command = opt->command; if (!opt->format) opt->format = "%(objectname) %(objecttype) %(objectsize)"; @@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt) save_warning = warn_on_object_refname_ambiguity; warn_on_object_refname_ambiguity = 0; - while (strbuf_getline(&input, stdin) != EOF) { - if (data.split_on_whitespace) { - /* - * Split at first whitespace, tying off the beginning - * of the string and saving the remainder (or NULL) in - * data.rest. - */ - char *p = strpbrk(input.buf, " \t"); - if (p) { - while (*p && strchr(" \t", *p)) - *p++ = '\0'; + if (command) + batch_objects_command(opt, &output, &data); + else { + while (strbuf_getline(&input, stdin) != EOF) { + if (data.split_on_whitespace) { + /* + * Split at first whitespace, tying off the beginning + * of the string and saving the remainder (or NULL) in + * data.rest. + */ + char *p = strpbrk(input.buf, " \t"); + if (p) { + while (*p && strchr(" \t", *p)) + *p++ = '\0'; + } + data.rest = p; } - data.rest = p; + batch_one_object(input.buf, &output, opt, &data); } - - batch_one_object(input.buf, &output, opt, &data); } strbuf_release(&input); @@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt, bo->enabled = 1; bo->print_contents = !strcmp(opt->long_name, "batch"); + bo->command = !strcmp(opt->long_name, "batch-command"); bo->format = arg; return 0; @@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) N_("show info about objects fed from the standard input"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, batch_option_callback), + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), + N_("enters batch mode that accepts commands"), + PARSE_OPT_OPTARG | PARSE_OPT_NONEG, + batch_option_callback), + OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks, N_("follow in-tree symlinks (used with --batch or --batch-check)")), OPT_BOOL(0, "batch-all-objects", &batch.all_objects, diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index 39382fa1958..7360d049113 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -85,6 +85,34 @@ $content" test_cmp expect actual ' + test -z "$content" || + test_expect_success "--batch-command output of $type content is correct" ' + maybe_remove_timestamp "$batch_output" $no_ts >expect && + maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual && + test_cmp expect actual + ' + + test -z "$content" || + test_expect_success "--batch-command session for $type content is correct" ' + maybe_remove_timestamp "$batch_output" $no_ts >expect && + maybe_remove_timestamp \ + "$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \ + $no_ts >actual && + test_cmp expect actual + ' + + test_expect_success "--batch-command output of $type info is correct" ' + echo "$sha1 $type $size" >expect && + echo "info $sha1" | git cat-file --batch-command >actual && + test_cmp expect actual + ' + + test_expect_success "--batch-command session for $type info is correct" ' + echo "$sha1 $type $size" >expect && + test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual && + test_cmp expect actual + ' + test_expect_success "custom --batch-check format" ' echo "$type $sha1" >expect && echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && @@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' ' tree_sha1=$(git write-tree) + tree_size=$(($(test_oid rawsz) + 13)) tree_pretty_content="100644 blob $hello_sha1 hello" @@ -175,7 +204,7 @@ test_expect_success \ "Reach a blob from a tag pointing to it" \ "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" -for batch in batch batch-check +for batch in batch batch-check batch-command do for opt in t s e p do @@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" ' "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)" ' +test_expect_success "--batch-command with multiple sha1s gives correct format" ' + echo "$batch_check_output" >expect && + echo begin >input && + echo_without_newline "$batch_check_input" >>input && + echo "get info" >>input && + git cat-file --batch-command <input >actual && + test_cmp expect actual +' + test_expect_success 'setup blobs which are likely to delta' ' test-tool genrandom foo 10240 >foo && { cat foo && echo plus; } >foo-plus && @@ -872,4 +910,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace' test_cmp expect actual ' +test_expect_success 'batch-command unknown command' ' + echo unknown_command >cmd && + test_expect_code 128 git cat-file --batch-command < cmd 2>err && + grep -E "^fatal:.*unknown command.*" err +' + test_done