Message ID | 2d687baeed82e7b90d383bad8e209f50e0ce8c87.1636149400.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cat-file: force flush of stdout on empty string | expand |
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, > int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; > enum get_oid_result result; > > + if (opt->buffer_output && obj_name[0] == '\0') { > + fflush(stdout); > + return; > + } > + This might work in practice, but it a bad design taste to add this change here. The function is designed to take an object name string, and it even prepares a flag variable needed to make a call to turn that object name into object data. We do not need to contaminate the interface with "usually this takes an object name, but there are these other special cases ...". The higher in the callchain we place special cases, the better the lower level functions become, as that allows them to concentrate on doing one single thing well. > result = get_oid_with_context(the_repository, obj_name, > flags, &data->oid, &ctx); > if (result != FOUND) { > @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt) > data.rest = p; > } > > - batch_one_object(input.buf, &output, opt, &data); > + /* > + * When in buffer mode and input.buf is an empty string, > + * flush to stdout. > + */ Checking "do we have the flush instruction (in which case we'd do the flush here), or do we have textual name of an object (in which case we'd call batch_one_object())?" here would be far cleaner and results in an easier-to-explain code. With a cleanly written code to do so, it probably does not even need a new comment here. This brings up another issue. Is "flushing" the *ONLY* special thing we would ever do in this codepath in the future? I doubt so. Squatting on an "empty string" is a selfish design that hurts those who will come after you in the future, as they need to find other ways to ask for a "special thing". If we are inventing a special syntax that allows us to spell commands that are distinguishable from a validly-spelled object name to cause something special (like "flushing the output stream"), perhaps we want to use a bit more extensible and explicit syntax and use it from day one? For example, if no string that begins with three dots can ever be a valid way to spell an object name, perhaps "...flush" might be a better "please do this special thing" syntax than an empty string. It is easily extensible (the next special thing can follow suit to say "...$verb" to tell the machinery to $verb the input). When we compare between an empty string and "...flush", the latter clearly is more descriptive, too. Note that I offhand do not know if "a valid string that name an object would never begin with three-dot" is true. Please check if that is true if you choose to use it, or you can find and use another convention that allows us to clearly distinguish the "special" instruction and object names. Thanks. > + batch_one_object(input.buf, &output, opt, &data); > } > > strbuf_release(&input);
On Fri, Nov 05 2021, Junio C Hamano wrote: > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, >> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; >> enum get_oid_result result; >> >> + if (opt->buffer_output && obj_name[0] == '\0') { >> + fflush(stdout); >> + return; >> + } >> + > > This might work in practice, but it a bad design taste to add this > change here. The function is designed to take an object name > string, and it even prepares a flag variable needed to make a call > to turn that object name into object data. We do not need to > contaminate the interface with "usually this takes an object name, > but there are these other special cases ...". The higher in the > callchain we place special cases, the better the lower level > functions become, as that allows them to concentrate on doing one > single thing well. > >> result = get_oid_with_context(the_repository, obj_name, >> flags, &data->oid, &ctx); >> if (result != FOUND) { >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt) >> data.rest = p; >> } >> >> - batch_one_object(input.buf, &output, opt, &data); >> + /* >> + * When in buffer mode and input.buf is an empty string, >> + * flush to stdout. >> + */ > > Checking "do we have the flush instruction (in which case we'd do > the flush here), or do we have textual name of an object (in which > case we'd call batch_one_object())?" here would be far cleaner and > results in an easier-to-explain code. With a cleanly written code > to do so, it probably does not even need a new comment here. > > This brings up another issue. Is "flushing" the *ONLY* special > thing we would ever do in this codepath in the future? I doubt so. > Squatting on an "empty string" is a selfish design that hurts those > who will come after you in the future, as they need to find other > ways to ask for a "special thing". > > If we are inventing a special syntax that allows us to spell > commands that are distinguishable from a validly-spelled object name > to cause something special (like "flushing the output stream"), > perhaps we want to use a bit more extensible and explicit syntax and > use it from day one? > > For example, if no string that begins with three dots can ever be a > valid way to spell an object name, perhaps "...flush" might be a > better "please do this special thing" syntax than an empty string. > It is easily extensible (the next special thing can follow suit to > say "...$verb" to tell the machinery to $verb the input). When we > compare between an empty string and "...flush", the latter clearly > is more descriptive, too. > > Note that I offhand do not know if "a valid string that name an > object would never begin with three-dot" is true. Please check > if that is true if you choose to use it, or you can find and use > another convention that allows us to clearly distinguish the > "special" instruction and object names. I had much the same thought, this is a useful feature, but let's not squat on the one bit of open syntax we have. John: I think a better direction here is to add a mode to cat-file to emulate what "git update-ref --stdin" supports. Here's a demo of that (also quoted below): https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 That's on top of some general UI improvements to cat-file I've got locally: https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling That WIP patch on top follows below, of course it's a *lot* more initial scaffolding, but I think once we get past that initial step it's a much better path forward. As noted the code is also almost entirely copy/pasted from update-ref.c, and perhaps some of the shared parts could be moved to some library both could use. I couldn't think of a better name than --stdin-cmd, suggestions most welcome. From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001 Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Date: Sat, 6 Nov 2021 04:54:04 +0100 Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This WIP patch is mostly stealing code from builtin/update-ref.c and implementing the same sort of prefixed command-mode that it supports. I.e. in addition to --batch now supporting: <object> LF It'll support with --stdin-cmd, with and without -z, respectively: object <object> NL object <object> NUL The plus being that we can now implement additional commands. Let's start that by scratching the itch John Cai wanted to address in [1] and implement a (with and without -z): fflush NL fflush NUL That command simply calls fflush(stdout), which could be done as an emergent effect before by feeding the input a "NL". I think this will be useful for other things, e.g. I've observed in the past that a not-trivial part of "cat-file --batch" time is spent on parsing its <object> argument and seeing if it's a revision, ref etc. So we could e.g. add a command that only accepts a full-length 40 character SHA-1, or switch the --format output mid-request etc. 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 115 insertions(+), 1 deletion(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index b76f2a00046..afdb976c6e7 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -26,7 +26,10 @@ struct batch_options { int unordered; int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ const char *format; + int stdin_cmd; + int end_null; }; +static char line_termination = '\n'; static const char *force_path; @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid, data); } +enum batch_state { + /* Non-transactional state open for commands. */ + BATCH_STATE_OPEN, +}; + +static void parse_cmd_object(struct batch_options *opt, + const char *next, const char *end, + struct strbuf *output, + struct expand_data *data) +{ + size_t len = end - next - 1; + char *p = (char *)next; + char old = p[len]; + + p[len] = '\0'; + batch_one_object(next, output, opt, data); + p[len] = old; +} + +static void parse_cmd_fflush(struct batch_options *opt, + const char *next, const char *end, + struct strbuf *output, + struct expand_data *data) +{ + if (*next != line_termination) + die("fflush: extra input: %s", next); + fflush(stdout); +} + +static const struct parse_cmd { + const char *prefix; + void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *); + unsigned args; + enum batch_state state; +} command[] = { + { "object", parse_cmd_object, 1, BATCH_STATE_OPEN }, + { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN }, +}; + +static void batch_objects_stdin_cmd(struct batch_options *opt, + struct strbuf *output, + struct expand_data *data) +{ + struct strbuf input = STRBUF_INIT; + enum batch_state state = BATCH_STATE_OPEN; + + /* Read each line dispatch its command */ + while (!strbuf_getwholeline(&input, stdin, line_termination)) { + size_t i, j; + const struct parse_cmd *cmd = NULL; + + if (*input.buf == line_termination) + die("empty command in input"); + else if (isspace(*input.buf)) + die("whitespace before command: %s", input.buf); + + for (i = 0; i < ARRAY_SIZE(command); i++) { + const char *prefix = command[i].prefix; + char c; + + if (!starts_with(input.buf, prefix)) + continue; + + /* + * If the command has arguments, verify that it's + * followed by a space. Otherwise, it shall be followed + * by a line terminator. + */ + c = command[i].args ? ' ' : line_termination; + if (input.buf[strlen(prefix)] != c) + continue; + + cmd = &command[i]; + break; + } + if (!cmd) + die("unknown command: %s", input.buf); + + /* + * Read additional arguments if NUL-terminated. Do not raise an + * error in case there is an early EOF to let the command + * handle missing arguments with a proper error message. + */ + for (j = 1; line_termination == '\0' && j < cmd->args; j++) + if (strbuf_appendwholeline(&input, stdin, line_termination)) + break; + + switch (state) { + case BATCH_STATE_OPEN: + /* TODO: command state management */ + break; + } + + cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args, + input.buf + input.len, output, data); + } + + strbuf_release(&input); +} + static int batch_objects(struct batch_options *opt) { struct strbuf input = STRBUF_INIT; @@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt) struct expand_data data; int save_warning; int retval = 0; + const int stdin_cmd = opt->stdin_cmd; if (!opt->format) opt->format = "%(objectname) %(objecttype) %(objectsize)"; @@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt) save_warning = warn_on_object_refname_ambiguity; warn_on_object_refname_ambiguity = 0; - while (strbuf_getline(&input, stdin) != EOF) { + while (!stdin_cmd && + strbuf_getline(&input, stdin) != EOF) { if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning @@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt) batch_one_object(input.buf, &output, opt, &data); } + if (stdin_cmd) + batch_objects_stdin_cmd(opt, &output, &data); + strbuf_release(&input); strbuf_release(&output); warn_on_object_refname_ambiguity = save_warning; @@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) batch_option_callback), OPT_CMDMODE(0, "batch-all-objects", &opt, N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'), + OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd, + N_("with --batch[-check]: enters stdin 'command mode")), + OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")), + /* Batch-specific options */ OPT_GROUP(N_("Change or optimize batch output")), OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")), @@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) /* Batch defaults */ if (batch.buffer_output < 0) batch.buffer_output = batch.all_objects; + if (batch.end_null) + line_termination = '\0'; /* Return early if we're in batch mode? */ if (batch.enabled) {
O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Nov 05 2021, Junio C Hamano wrote: > > > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, > >> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; > >> enum get_oid_result result; > >> > >> + if (opt->buffer_output && obj_name[0] == '\0') { > >> + fflush(stdout); > >> + return; > >> + } > >> + > > > > This might work in practice, but it a bad design taste to add this > > change here. The function is designed to take an object name > > string, and it even prepares a flag variable needed to make a call > > to turn that object name into object data. We do not need to > > contaminate the interface with "usually this takes an object name, > > but there are these other special cases ...". The higher in the > > callchain we place special cases, the better the lower level > > functions become, as that allows them to concentrate on doing one > > single thing well. > > > >> result = get_oid_with_context(the_repository, obj_name, > >> flags, &data->oid, &ctx); > >> if (result != FOUND) { > >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt) > >> data.rest = p; > >> } > >> > >> - batch_one_object(input.buf, &output, opt, &data); > >> + /* > >> + * When in buffer mode and input.buf is an empty string, > >> + * flush to stdout. > >> + */ > > > > Checking "do we have the flush instruction (in which case we'd do > > the flush here), or do we have textual name of an object (in which > > case we'd call batch_one_object())?" here would be far cleaner and > > results in an easier-to-explain code. With a cleanly written code > > to do so, it probably does not even need a new comment here. > > > > This brings up another issue. Is "flushing" the *ONLY* special > > thing we would ever do in this codepath in the future? I doubt so. > > Squatting on an "empty string" is a selfish design that hurts those > > who will come after you in the future, as they need to find other > > ways to ask for a "special thing". > > > > If we are inventing a special syntax that allows us to spell > > commands that are distinguishable from a validly-spelled object name > > to cause something special (like "flushing the output stream"), > > perhaps we want to use a bit more extensible and explicit syntax and > > use it from day one? > > > > For example, if no string that begins with three dots can ever be a > > valid way to spell an object name, perhaps "...flush" might be a > > better "please do this special thing" syntax than an empty string. > > It is easily extensible (the next special thing can follow suit to > > say "...$verb" to tell the machinery to $verb the input). When we > > compare between an empty string and "...flush", the latter clearly > > is more descriptive, too. > > > > Note that I offhand do not know if "a valid string that name an > > object would never begin with three-dot" is true. Please check > > if that is true if you choose to use it, or you can find and use > > another convention that allows us to clearly distinguish the > > "special" instruction and object names. > > I had much the same thought, this is a useful feature, but let's not > squat on the one bit of open syntax we have. > > John: I think a better direction here is to add a mode to cat-file to > emulate what "git update-ref --stdin" supports. Here's a demo of that > (also quoted below): > https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 > > That's on top of some general UI improvements to cat-file I've got > locally: > https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling > > That WIP patch on top follows below, of course it's a *lot* more initial > scaffolding, but I think once we get past that initial step it's a much > better path forward. As noted the code is also almost entirely > copy/pasted from update-ref.c, and perhaps some of the shared parts > could be moved to some library both could use. > > I couldn't think of a better name than --stdin-cmd, suggestions most > welcome. > > From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001 > Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com> > From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= > <avarab@gmail.com> > Date: Sat, 6 Nov 2021 04:54:04 +0100 > Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This WIP patch is mostly stealing code from builtin/update-ref.c and > implementing the same sort of prefixed command-mode that it > supports. I.e. in addition to --batch now supporting: > > <object> LF > > It'll support with --stdin-cmd, with and without -z, respectively: > > object <object> NL > object <object> NUL > > The plus being that we can now implement additional commands. Let's > start that by scratching the itch John Cai wanted to address in [1] > and implement a (with and without -z): > > fflush NL > fflush NUL > > That command simply calls fflush(stdout), which could be done as an > emergent effect before by feeding the input a "NL". > > I think this will be useful for other things, e.g. I've observed in > the past that a not-trivial part of "cat-file --batch" time is spent > on parsing its <object> argument and seeing if it's a revision, ref > etc. > > So we could e.g. add a command that only accepts a full-length 40 > character SHA-1, or switch the --format output mid-request etc. > > 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 115 insertions(+), 1 deletion(-) > > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index b76f2a00046..afdb976c6e7 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -26,7 +26,10 @@ struct batch_options { > int unordered; > int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ > const char *format; > + int stdin_cmd; > + int end_null; > }; > +static char line_termination = '\n'; > > static const char *force_path; > > @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid, > data); > } > > +enum batch_state { > + /* Non-transactional state open for commands. */ > + BATCH_STATE_OPEN, > +}; > + > +static void parse_cmd_object(struct batch_options *opt, > + const char *next, const char *end, > + struct strbuf *output, > + struct expand_data *data) > +{ > + size_t len = end - next - 1; > + char *p = (char *)next; > + char old = p[len]; > + > + p[len] = '\0'; > + batch_one_object(next, output, opt, data); > + p[len] = old; > +} > + > +static void parse_cmd_fflush(struct batch_options *opt, > + const char *next, const char *end, > + struct strbuf *output, > + struct expand_data *data) > +{ > + if (*next != line_termination) > + die("fflush: extra input: %s", next); > + fflush(stdout); > +} > + > +static const struct parse_cmd { > + const char *prefix; > + void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *); > + unsigned args; > + enum batch_state state; > +} command[] = { > + { "object", parse_cmd_object, 1, BATCH_STATE_OPEN }, > + { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN }, > +}; I think overall this approach is cleaner and makes sense. My only question is, are there more commands in the future that will need some special command syntax? Just wondering whether YAGNI applies here. > + > +static void batch_objects_stdin_cmd(struct batch_options *opt, > + struct strbuf *output, > + struct expand_data *data) > +{ > + struct strbuf input = STRBUF_INIT; > + enum batch_state state = BATCH_STATE_OPEN; > + > + /* Read each line dispatch its command */ > + while (!strbuf_getwholeline(&input, stdin, line_termination)) { > + size_t i, j; > + const struct parse_cmd *cmd = NULL; > + > + if (*input.buf == line_termination) > + die("empty command in input"); > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); > + > + for (i = 0; i < ARRAY_SIZE(command); i++) { > + const char *prefix = command[i].prefix; > + char c; > + > + if (!starts_with(input.buf, prefix)) > + continue; > + > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ > + c = command[i].args ? ' ' : line_termination; > + if (input.buf[strlen(prefix)] != c) > + continue; > + > + cmd = &command[i]; > + break; > + } > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + /* > + * Read additional arguments if NUL-terminated. Do not raise an > + * error in case there is an early EOF to let the command > + * handle missing arguments with a proper error message. > + */ > + for (j = 1; line_termination == '\0' && j < cmd->args; j++) > + if (strbuf_appendwholeline(&input, stdin, line_termination)) > + break; > + > + switch (state) { > + case BATCH_STATE_OPEN: > + /* TODO: command state management */ > + break; > + } > + > + cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args, > + input.buf + input.len, output, data); > + } > + > + strbuf_release(&input); > +} > + > static int batch_objects(struct batch_options *opt) > { > struct strbuf input = STRBUF_INIT; > @@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt) > struct expand_data data; > int save_warning; > int retval = 0; > + const int stdin_cmd = opt->stdin_cmd; > > if (!opt->format) > opt->format = "%(objectname) %(objecttype) %(objectsize)"; > @@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt) > save_warning = warn_on_object_refname_ambiguity; > warn_on_object_refname_ambiguity = 0; > > - while (strbuf_getline(&input, stdin) != EOF) { > + while (!stdin_cmd && > + strbuf_getline(&input, stdin) != EOF) { > if (data.split_on_whitespace) { > /* > * Split at first whitespace, tying off the beginning > @@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt) > batch_one_object(input.buf, &output, opt, &data); > } > > + if (stdin_cmd) > + batch_objects_stdin_cmd(opt, &output, &data); > + > strbuf_release(&input); > strbuf_release(&output); > warn_on_object_refname_ambiguity = save_warning; > @@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) > batch_option_callback), > OPT_CMDMODE(0, "batch-all-objects", &opt, > N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'), > + OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd, > + N_("with --batch[-check]: enters stdin 'command mode")), > + OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")), > + > /* Batch-specific options */ > OPT_GROUP(N_("Change or optimize batch output")), > OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")), > @@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) > /* Batch defaults */ > if (batch.buffer_output < 0) > batch.buffer_output = batch.all_objects; > + if (batch.end_null) > + line_termination = '\0'; > > /* Return early if we're in batch mode? */ > if (batch.enabled) { > -- > 2.34.0.rc1.741.gab7bfd97031 >
On Sun, Nov 07 2021, John Cai wrote: > O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote: >> >> On Fri, Nov 05 2021, Junio C Hamano wrote: >> >> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: >> > >> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, >> >> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; >> >> enum get_oid_result result; >> >> >> >> + if (opt->buffer_output && obj_name[0] == '\0') { >> >> + fflush(stdout); >> >> + return; >> >> + } >> >> + >> > >> > This might work in practice, but it a bad design taste to add this >> > change here. The function is designed to take an object name >> > string, and it even prepares a flag variable needed to make a call >> > to turn that object name into object data. We do not need to >> > contaminate the interface with "usually this takes an object name, >> > but there are these other special cases ...". The higher in the >> > callchain we place special cases, the better the lower level >> > functions become, as that allows them to concentrate on doing one >> > single thing well. >> > >> >> result = get_oid_with_context(the_repository, obj_name, >> >> flags, &data->oid, &ctx); >> >> if (result != FOUND) { >> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt) >> >> data.rest = p; >> >> } >> >> >> >> - batch_one_object(input.buf, &output, opt, &data); >> >> + /* >> >> + * When in buffer mode and input.buf is an empty string, >> >> + * flush to stdout. >> >> + */ >> > >> > Checking "do we have the flush instruction (in which case we'd do >> > the flush here), or do we have textual name of an object (in which >> > case we'd call batch_one_object())?" here would be far cleaner and >> > results in an easier-to-explain code. With a cleanly written code >> > to do so, it probably does not even need a new comment here. >> > >> > This brings up another issue. Is "flushing" the *ONLY* special >> > thing we would ever do in this codepath in the future? I doubt so. >> > Squatting on an "empty string" is a selfish design that hurts those >> > who will come after you in the future, as they need to find other >> > ways to ask for a "special thing". >> > >> > If we are inventing a special syntax that allows us to spell >> > commands that are distinguishable from a validly-spelled object name >> > to cause something special (like "flushing the output stream"), >> > perhaps we want to use a bit more extensible and explicit syntax and >> > use it from day one? >> > >> > For example, if no string that begins with three dots can ever be a >> > valid way to spell an object name, perhaps "...flush" might be a >> > better "please do this special thing" syntax than an empty string. >> > It is easily extensible (the next special thing can follow suit to >> > say "...$verb" to tell the machinery to $verb the input). When we >> > compare between an empty string and "...flush", the latter clearly >> > is more descriptive, too. >> > >> > Note that I offhand do not know if "a valid string that name an >> > object would never begin with three-dot" is true. Please check >> > if that is true if you choose to use it, or you can find and use >> > another convention that allows us to clearly distinguish the >> > "special" instruction and object names. >> >> I had much the same thought, this is a useful feature, but let's not >> squat on the one bit of open syntax we have. >> >> John: I think a better direction here is to add a mode to cat-file to >> emulate what "git update-ref --stdin" supports. Here's a demo of that >> (also quoted below): >> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 >> >> That's on top of some general UI improvements to cat-file I've got >> locally: >> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling >> >> That WIP patch on top follows below, of course it's a *lot* more initial >> scaffolding, but I think once we get past that initial step it's a much >> better path forward. As noted the code is also almost entirely >> copy/pasted from update-ref.c, and perhaps some of the shared parts >> could be moved to some library both could use. >> >> I couldn't think of a better name than --stdin-cmd, suggestions most >> welcome. >> >> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001 >> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com> >> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= >> <avarab@gmail.com> >> Date: Sat, 6 Nov 2021 04:54:04 +0100 >> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode >> MIME-Version: 1.0 >> Content-Type: text/plain; charset=UTF-8 >> Content-Transfer-Encoding: 8bit >> >> This WIP patch is mostly stealing code from builtin/update-ref.c and >> implementing the same sort of prefixed command-mode that it >> supports. I.e. in addition to --batch now supporting: >> >> <object> LF >> >> It'll support with --stdin-cmd, with and without -z, respectively: >> >> object <object> NL >> object <object> NUL >> >> The plus being that we can now implement additional commands. Let's >> start that by scratching the itch John Cai wanted to address in [1] >> and implement a (with and without -z): >> >> fflush NL >> fflush NUL >> >> That command simply calls fflush(stdout), which could be done as an >> emergent effect before by feeding the input a "NL". >> >> I think this will be useful for other things, e.g. I've observed in >> the past that a not-trivial part of "cat-file --batch" time is spent >> on parsing its <object> argument and seeing if it's a revision, ref >> etc. >> >> So we could e.g. add a command that only accepts a full-length 40 >> character SHA-1, or switch the --format output mid-request etc. >> >> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 115 insertions(+), 1 deletion(-) >> >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index b76f2a00046..afdb976c6e7 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -26,7 +26,10 @@ struct batch_options { >> int unordered; >> int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ >> const char *format; >> + int stdin_cmd; >> + int end_null; >> }; >> +static char line_termination = '\n'; >> >> static const char *force_path; >> >> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid, >> data); >> } >> >> +enum batch_state { >> + /* Non-transactional state open for commands. */ >> + BATCH_STATE_OPEN, >> +}; >> + >> +static void parse_cmd_object(struct batch_options *opt, >> + const char *next, const char *end, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + size_t len = end - next - 1; >> + char *p = (char *)next; >> + char old = p[len]; >> + >> + p[len] = '\0'; >> + batch_one_object(next, output, opt, data); >> + p[len] = old; >> +} >> + >> +static void parse_cmd_fflush(struct batch_options *opt, >> + const char *next, const char *end, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + if (*next != line_termination) >> + die("fflush: extra input: %s", next); >> + fflush(stdout); >> +} >> + >> +static const struct parse_cmd { >> + const char *prefix; >> + void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *); >> + unsigned args; >> + enum batch_state state; >> +} command[] = { >> + { "object", parse_cmd_object, 1, BATCH_STATE_OPEN }, >> + { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN }, >> +}; > I think overall this approach is cleaner and makes sense. My only > question is, are there more commands in the future that will need some > special command syntax? Just wondering whether YAGNI applies here. An obvious addition is to at least add the ability to set the various options on the fly, i.e. now you need to use --batch-check, and then kill it and restart if you'd like the content with --batch, ditto for --textconv. E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes around just to flip-flop between those two, sometimes you want the content, sometimes you're just checking if the object exists. I'd also like to add something to expose the likes of -e and -t directly, i.e. even with --batch-check you often want to just check existence, but get the size too, you could supply a format, but like the above you sometimes want the size or whatever, and killing/starting a new process just for that is a hassle...
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> I think overall this approach is cleaner and makes sense. My only >> question is, are there more commands in the future that will need some >> special command syntax? Just wondering whether YAGNI applies here. > > An obvious addition is to at least add the ability to set the various > options on the fly, i.e. now you need to use --batch-check, and then > kill it and restart if you'd like the content with --batch, ditto for > --textconv. > > E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes > around just to flip-flop between those two, sometimes you want the > content, sometimes you're just checking if the object exists. > > I'd also like to add something to expose the likes of -e and -t > directly, i.e. even with --batch-check you often want to just check > existence, but get the size too, you could supply a format, but like the > above you sometimes want the size or whatever, and killing/starting a > new process just for that is a hassle... Yeah, with "plug" and "unplug" instruction you do not have to keep issuing "flush" when you want to go interactive, and other things become easy to do, so even though it would make it a bit more verbose to require "object " prefix for the kind of lines that were historically the only ones accepted by the command, I think it is a good direction to go in. Thanks.
diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 86fc03242b8..4d17f30f24e 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; enum get_oid_result result; + if (opt->buffer_output && obj_name[0] == '\0') { + fflush(stdout); + return; + } + result = get_oid_with_context(the_repository, obj_name, flags, &data->oid, &ctx); if (result != FOUND) { @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - batch_one_object(input.buf, &output, opt, &data); + /* + * When in buffer mode and input.buf is an empty string, + * flush to stdout. + */ + batch_one_object(input.buf, &output, opt, &data); } strbuf_release(&input);