diff mbox series

[1/2] cat-file: force flush of stdout on empty string

Message ID 2d687baeed82e7b90d383bad8e209f50e0ce8c87.1636149400.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series cat-file: force flush of stdout on empty string | expand

Commit Message

John Cai Nov. 5, 2021, 9:56 p.m. UTC
From: John Cai <johncai86@gmail.com>

When in --buffer mode, it is very useful for the caller to have control
over when the buffer is flushed. Currently there is no convenient way to
signal for the buffer to be flushed. One workaround is to provide any
nonexisting commit to git-cat-file's stdin, in which case the buffer
will be flushed and a "$FOO missing" message will be displayed. However,
this is not an ideal workaround.

Instead, this commit teaches git-cat-file to look for an empty string in
stdin, which will trigger a flush of stdout.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Nov. 6, 2021, 12:58 a.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>  	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>  	enum get_oid_result result;
>  
> +	if (opt->buffer_output && obj_name[0] == '\0') {
> +		fflush(stdout);
> +		return;
> +	}
> +

This might work in practice, but it a bad design taste to add this
change here.  The function is designed to take an object name
string, and it even prepares a flag variable needed to make a call
to turn that object name into object data.  We do not need to
contaminate the interface with "usually this takes an object name,
but there are these other special cases ...".  The higher in the
callchain we place special cases, the better the lower level
functions become, as that allows them to concentrate on doing one
single thing well.

>  	result = get_oid_with_context(the_repository, obj_name,
>  				      flags, &data->oid, &ctx);
>  	if (result != FOUND) {
> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>  			data.rest = p;
>  		}
>  
> -		batch_one_object(input.buf, &output, opt, &data);
> +		 /*
> +		  * When in buffer mode and input.buf is an empty string,
> +		  * flush to stdout.
> +		  */

Checking "do we have the flush instruction (in which case we'd do
the flush here), or do we have textual name of an object (in which
case we'd call batch_one_object())?" here would be far cleaner and
results in an easier-to-explain code.  With a cleanly written code
to do so, it probably does not even need a new comment here.

This brings up another issue.  Is "flushing" the *ONLY* special
thing we would ever do in this codepath in the future?  I doubt so.
Squatting on an "empty string" is a selfish design that hurts those
who will come after you in the future, as they need to find other
ways to ask for a "special thing".

If we are inventing a special syntax that allows us to spell
commands that are distinguishable from a validly-spelled object name
to cause something special (like "flushing the output stream"),
perhaps we want to use a bit more extensible and explicit syntax and
use it from day one?

For example, if no string that begins with three dots can ever be a
valid way to spell an object name, perhaps "...flush" might be a
better "please do this special thing" syntax than an empty string.
It is easily extensible (the next special thing can follow suit to
say "...$verb" to tell the machinery to $verb the input).  When we
compare between an empty string and "...flush", the latter clearly
is more descriptive, too.

Note that I offhand do not know if "a valid string that name an
object would never begin with three-dot" is true.  Please check
if that is true if you choose to use it, or you can find and use
another convention that allows us to clearly distinguish the
"special" instruction and object names.

Thanks.

> +		 batch_one_object(input.buf, &output, opt, &data);
>  	}
>  
>  	strbuf_release(&input);
Ævar Arnfjörð Bjarmason Nov. 6, 2021, 4:01 a.m. UTC | #2
On Fri, Nov 05 2021, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>>  	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>>  	enum get_oid_result result;
>>  
>> +	if (opt->buffer_output && obj_name[0] == '\0') {
>> +		fflush(stdout);
>> +		return;
>> +	}
>> +
>
> This might work in practice, but it a bad design taste to add this
> change here.  The function is designed to take an object name
> string, and it even prepares a flag variable needed to make a call
> to turn that object name into object data.  We do not need to
> contaminate the interface with "usually this takes an object name,
> but there are these other special cases ...".  The higher in the
> callchain we place special cases, the better the lower level
> functions become, as that allows them to concentrate on doing one
> single thing well.
>
>>  	result = get_oid_with_context(the_repository, obj_name,
>>  				      flags, &data->oid, &ctx);
>>  	if (result != FOUND) {
>> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>>  			data.rest = p;
>>  		}
>>  
>> -		batch_one_object(input.buf, &output, opt, &data);
>> +		 /*
>> +		  * When in buffer mode and input.buf is an empty string,
>> +		  * flush to stdout.
>> +		  */
>
> Checking "do we have the flush instruction (in which case we'd do
> the flush here), or do we have textual name of an object (in which
> case we'd call batch_one_object())?" here would be far cleaner and
> results in an easier-to-explain code.  With a cleanly written code
> to do so, it probably does not even need a new comment here.
>
> This brings up another issue.  Is "flushing" the *ONLY* special
> thing we would ever do in this codepath in the future?  I doubt so.
> Squatting on an "empty string" is a selfish design that hurts those
> who will come after you in the future, as they need to find other
> ways to ask for a "special thing".
>
> If we are inventing a special syntax that allows us to spell
> commands that are distinguishable from a validly-spelled object name
> to cause something special (like "flushing the output stream"),
> perhaps we want to use a bit more extensible and explicit syntax and
> use it from day one?
>
> For example, if no string that begins with three dots can ever be a
> valid way to spell an object name, perhaps "...flush" might be a
> better "please do this special thing" syntax than an empty string.
> It is easily extensible (the next special thing can follow suit to
> say "...$verb" to tell the machinery to $verb the input).  When we
> compare between an empty string and "...flush", the latter clearly
> is more descriptive, too.
>
> Note that I offhand do not know if "a valid string that name an
> object would never begin with three-dot" is true.  Please check
> if that is true if you choose to use it, or you can find and use
> another convention that allows us to clearly distinguish the
> "special" instruction and object names.

I had much the same thought, this is a useful feature, but let's not
squat on the one bit of open syntax we have.

John: I think a better direction here is to add a mode to cat-file to
emulate what "git update-ref --stdin" supports. Here's a demo of that
(also quoted below):
https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9

That's on top of some general UI improvements to cat-file I've got
locally:
https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling

That WIP patch on top follows below, of course it's a *lot* more initial
scaffolding, but I think once we get past that initial step it's a much
better path forward. As noted the code is also almost entirely
copy/pasted from update-ref.c, and perhaps some of the shared parts
could be moved to some library both could use.

I couldn't think of a better name than --stdin-cmd, suggestions most
welcome.

From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>
Date: Sat, 6 Nov 2021 04:54:04 +0100
Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This WIP patch is mostly stealing code from builtin/update-ref.c and
implementing the same sort of prefixed command-mode that it
supports. I.e. in addition to --batch now supporting:

    <object> LF

It'll support with --stdin-cmd, with and without -z, respectively:

    object <object> NL
    object <object> NUL

The plus being that we can now implement additional commands. Let's
start that by scratching the itch John Cai wanted to address in [1]
and implement a (with and without -z):

    fflush NL
    fflush NUL

That command simply calls fflush(stdout), which could be done as an
emergent effect before by feeding the input a "NL".

I think this will be useful for other things, e.g. I've observed in
the past that a not-trivial part of "cat-file --batch" time is spent
on parsing its <object> argument and seeing if it's a revision, ref
etc.

So we could e.g. add a command that only accepts a full-length 40
character SHA-1, or switch the --format output mid-request etc.

1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 115 insertions(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b76f2a00046..afdb976c6e7 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,7 +26,10 @@ struct batch_options {
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
+	int stdin_cmd;
+	int end_null;
 };
+static char line_termination = '\n';
 
 static const char *force_path;
 
@@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+enum batch_state {
+	/* Non-transactional state open for commands. */
+	BATCH_STATE_OPEN,
+};
+
+static void parse_cmd_object(struct batch_options *opt,
+			     const char *next, const char *end,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	size_t len = end - next - 1;
+	char *p = (char *)next;
+	char old = p[len];
+
+	p[len] = '\0';
+	batch_one_object(next, output, opt, data);
+	p[len] = old;
+}
+
+static void parse_cmd_fflush(struct batch_options *opt,
+			     const char *next, const char *end,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	if (*next != line_termination)
+		die("fflush: extra input: %s", next);
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
+	unsigned args;
+	enum batch_state state;
+} command[] = {
+	{ "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
+	{ "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
+};
+
+static void batch_objects_stdin_cmd(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	enum batch_state state = BATCH_STATE_OPEN;
+
+	/* Read each line dispatch its command */
+	while (!strbuf_getwholeline(&input, stdin, line_termination)) {
+		size_t i, j;
+		const struct parse_cmd *cmd = NULL;
+
+		if (*input.buf == line_termination)
+			die("empty command in input");
+		else if (isspace(*input.buf))
+			die("whitespace before command: %s", input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(command); i++) {
+			const char *prefix = command[i].prefix;
+			char c;
+
+			if (!starts_with(input.buf, prefix))
+				continue;
+
+			/*
+			 * If the command has arguments, verify that it's
+			 * followed by a space. Otherwise, it shall be followed
+			 * by a line terminator.
+			 */
+			c = command[i].args ? ' ' : line_termination;
+			if (input.buf[strlen(prefix)] != c)
+				continue;
+
+			cmd = &command[i];
+			break;
+		}
+		if (!cmd)
+			die("unknown command: %s", input.buf);
+
+		/*
+		 * Read additional arguments if NUL-terminated. Do not raise an
+		 * error in case there is an early EOF to let the command
+		 * handle missing arguments with a proper error message.
+		 */
+		for (j = 1; line_termination == '\0' && j < cmd->args; j++)
+			if (strbuf_appendwholeline(&input, stdin, line_termination))
+				break;
+
+		switch (state) {
+		case BATCH_STATE_OPEN:
+			/* TODO: command state management */
+			break;
+		}
+
+		cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args,
+			input.buf + input.len, output, data);
+	}
+
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt)
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
+	const int stdin_cmd = opt->stdin_cmd;
 
 	if (!opt->format)
 		opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
-	while (strbuf_getline(&input, stdin) != EOF) {
+	while (!stdin_cmd &&
+	       strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
 			 * Split at first whitespace, tying off the beginning
@@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+	if (stdin_cmd)
+		batch_objects_stdin_cmd(opt, &output, &data);
+
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
+		OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd,
+			 N_("with --batch[-check]: enters stdin 'command mode")),
+		OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")),
+
 		/* Batch-specific options */
 		OPT_GROUP(N_("Change or optimize batch output")),
 		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
@@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
 		batch.buffer_output = batch.all_objects;
+	if (batch.end_null)
+		line_termination = '\0';
 
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
John Cai Nov. 8, 2021, 3:42 a.m. UTC | #3
O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Nov 05 2021, Junio C Hamano wrote:
> 
> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
> >>  	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
> >>  	enum get_oid_result result;
> >>  
> >> +	if (opt->buffer_output && obj_name[0] == '\0') {
> >> +		fflush(stdout);
> >> +		return;
> >> +	}
> >> +
> >
> > This might work in practice, but it a bad design taste to add this
> > change here.  The function is designed to take an object name
> > string, and it even prepares a flag variable needed to make a call
> > to turn that object name into object data.  We do not need to
> > contaminate the interface with "usually this takes an object name,
> > but there are these other special cases ...".  The higher in the
> > callchain we place special cases, the better the lower level
> > functions become, as that allows them to concentrate on doing one
> > single thing well.
> >
> >>  	result = get_oid_with_context(the_repository, obj_name,
> >>  				      flags, &data->oid, &ctx);
> >>  	if (result != FOUND) {
> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
> >>  			data.rest = p;
> >>  		}
> >>  
> >> -		batch_one_object(input.buf, &output, opt, &data);
> >> +		 /*
> >> +		  * When in buffer mode and input.buf is an empty string,
> >> +		  * flush to stdout.
> >> +		  */
> >
> > Checking "do we have the flush instruction (in which case we'd do
> > the flush here), or do we have textual name of an object (in which
> > case we'd call batch_one_object())?" here would be far cleaner and
> > results in an easier-to-explain code.  With a cleanly written code
> > to do so, it probably does not even need a new comment here.
> >
> > This brings up another issue.  Is "flushing" the *ONLY* special
> > thing we would ever do in this codepath in the future?  I doubt so.
> > Squatting on an "empty string" is a selfish design that hurts those
> > who will come after you in the future, as they need to find other
> > ways to ask for a "special thing".
> >
> > If we are inventing a special syntax that allows us to spell
> > commands that are distinguishable from a validly-spelled object name
> > to cause something special (like "flushing the output stream"),
> > perhaps we want to use a bit more extensible and explicit syntax and
> > use it from day one?
> >
> > For example, if no string that begins with three dots can ever be a
> > valid way to spell an object name, perhaps "...flush" might be a
> > better "please do this special thing" syntax than an empty string.
> > It is easily extensible (the next special thing can follow suit to
> > say "...$verb" to tell the machinery to $verb the input).  When we
> > compare between an empty string and "...flush", the latter clearly
> > is more descriptive, too.
> >
> > Note that I offhand do not know if "a valid string that name an
> > object would never begin with three-dot" is true.  Please check
> > if that is true if you choose to use it, or you can find and use
> > another convention that allows us to clearly distinguish the
> > "special" instruction and object names.
> 
> I had much the same thought, this is a useful feature, but let's not
> squat on the one bit of open syntax we have.
> 
> John: I think a better direction here is to add a mode to cat-file to
> emulate what "git update-ref --stdin" supports. Here's a demo of that
> (also quoted below):
> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
> 
> That's on top of some general UI improvements to cat-file I've got
> locally:
> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
> 
> That WIP patch on top follows below, of course it's a *lot* more initial
> scaffolding, but I think once we get past that initial step it's a much
> better path forward. As noted the code is also almost entirely
> copy/pasted from update-ref.c, and perhaps some of the shared parts
> could be moved to some library both could use.
> 
> I couldn't think of a better name than --stdin-cmd, suggestions most
> welcome.
> 
> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
>  <avarab@gmail.com>
> Date: Sat, 6 Nov 2021 04:54:04 +0100
> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This WIP patch is mostly stealing code from builtin/update-ref.c and
> implementing the same sort of prefixed command-mode that it
> supports. I.e. in addition to --batch now supporting:
> 
>     <object> LF
> 
> It'll support with --stdin-cmd, with and without -z, respectively:
> 
>     object <object> NL
>     object <object> NUL
> 
> The plus being that we can now implement additional commands. Let's
> start that by scratching the itch John Cai wanted to address in [1]
> and implement a (with and without -z):
> 
>     fflush NL
>     fflush NUL
> 
> That command simply calls fflush(stdout), which could be done as an
> emergent effect before by feeding the input a "NL".
> 
> I think this will be useful for other things, e.g. I've observed in
> the past that a not-trivial part of "cat-file --batch" time is spent
> on parsing its <object> argument and seeing if it's a revision, ref
> etc.
> 
> So we could e.g. add a command that only accepts a full-length 40
> character SHA-1, or switch the --format output mid-request etc.
> 
> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 115 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b76f2a00046..afdb976c6e7 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -26,7 +26,10 @@ struct batch_options {
>  	int unordered;
>  	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
>  	const char *format;
> +	int stdin_cmd;
> +	int end_null;
>  };
> +static char line_termination = '\n';
>  
>  static const char *force_path;
>  
> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
>  				      data);
>  }
>  
> +enum batch_state {
> +	/* Non-transactional state open for commands. */
> +	BATCH_STATE_OPEN,
> +};
> +
> +static void parse_cmd_object(struct batch_options *opt,
> +			     const char *next, const char *end,
> +			     struct strbuf *output,
> +			     struct expand_data *data)
> +{
> +	size_t len = end - next - 1;
> +	char *p = (char *)next;
> +	char old = p[len];
> +
> +	p[len] = '\0';
> +	batch_one_object(next, output, opt, data);
> +	p[len] = old;
> +}
> +
> +static void parse_cmd_fflush(struct batch_options *opt,
> +			     const char *next, const char *end,
> +			     struct strbuf *output,
> +			     struct expand_data *data)
> +{
> +	if (*next != line_termination)
> +		die("fflush: extra input: %s", next);
> +	fflush(stdout);
> +}
> +
> +static const struct parse_cmd {
> +	const char *prefix;
> +	void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
> +	unsigned args;
> +	enum batch_state state;
> +} command[] = {
> +	{ "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
> +	{ "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
> +};
I think overall this approach is cleaner and makes sense. My only
question is, are there more commands in the future that will need some
special command syntax? Just wondering whether YAGNI applies here.
> +
> +static void batch_objects_stdin_cmd(struct batch_options *opt,
> +				    struct strbuf *output,
> +				    struct expand_data *data)
> +{
> +	struct strbuf input = STRBUF_INIT;
> +	enum batch_state state = BATCH_STATE_OPEN;
> +
> +	/* Read each line dispatch its command */
> +	while (!strbuf_getwholeline(&input, stdin, line_termination)) {
> +		size_t i, j;
> +		const struct parse_cmd *cmd = NULL;
> +
> +		if (*input.buf == line_termination)
> +			die("empty command in input");
> +		else if (isspace(*input.buf))
> +			die("whitespace before command: %s", input.buf);
> +
> +		for (i = 0; i < ARRAY_SIZE(command); i++) {
> +			const char *prefix = command[i].prefix;
> +			char c;
> +
> +			if (!starts_with(input.buf, prefix))
> +				continue;
> +
> +			/*
> +			 * If the command has arguments, verify that it's
> +			 * followed by a space. Otherwise, it shall be followed
> +			 * by a line terminator.
> +			 */
> +			c = command[i].args ? ' ' : line_termination;
> +			if (input.buf[strlen(prefix)] != c)
> +				continue;
> +
> +			cmd = &command[i];
> +			break;
> +		}
> +		if (!cmd)
> +			die("unknown command: %s", input.buf);
> +
> +		/*
> +		 * Read additional arguments if NUL-terminated. Do not raise an
> +		 * error in case there is an early EOF to let the command
> +		 * handle missing arguments with a proper error message.
> +		 */
> +		for (j = 1; line_termination == '\0' && j < cmd->args; j++)
> +			if (strbuf_appendwholeline(&input, stdin, line_termination))
> +				break;
> +
> +		switch (state) {
> +		case BATCH_STATE_OPEN:
> +			/* TODO: command state management */
> +			break;
> +		}
> +
> +		cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args,
> +			input.buf + input.len, output, data);
> +	}
> +
> +	strbuf_release(&input);
> +}
> +
>  static int batch_objects(struct batch_options *opt)
>  {
>  	struct strbuf input = STRBUF_INIT;
> @@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt)
>  	struct expand_data data;
>  	int save_warning;
>  	int retval = 0;
> +	const int stdin_cmd = opt->stdin_cmd;
>  
>  	if (!opt->format)
>  		opt->format = "%(objectname) %(objecttype) %(objectsize)";
> @@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt)
>  	save_warning = warn_on_object_refname_ambiguity;
>  	warn_on_object_refname_ambiguity = 0;
>  
> -	while (strbuf_getline(&input, stdin) != EOF) {
> +	while (!stdin_cmd &&
> +	       strbuf_getline(&input, stdin) != EOF) {
>  		if (data.split_on_whitespace) {
>  			/*
>  			 * Split at first whitespace, tying off the beginning
> @@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt)
>  		batch_one_object(input.buf, &output, opt, &data);
>  	}
>  
> +	if (stdin_cmd)
> +		batch_objects_stdin_cmd(opt, &output, &data);
> +
>  	strbuf_release(&input);
>  	strbuf_release(&output);
>  	warn_on_object_refname_ambiguity = save_warning;
> @@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  			batch_option_callback),
>  		OPT_CMDMODE(0, "batch-all-objects", &opt,
>  			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
> +		OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd,
> +			 N_("with --batch[-check]: enters stdin 'command mode")),
> +		OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")),
> +
>  		/* Batch-specific options */
>  		OPT_GROUP(N_("Change or optimize batch output")),
>  		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
> @@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  	/* Batch defaults */
>  	if (batch.buffer_output < 0)
>  		batch.buffer_output = batch.all_objects;
> +	if (batch.end_null)
> +		line_termination = '\0';
>  
>  	/* Return early if we're in batch mode? */
>  	if (batch.enabled) {
> -- 
> 2.34.0.rc1.741.gab7bfd97031
>
Ævar Arnfjörð Bjarmason Nov. 8, 2021, 3:15 p.m. UTC | #4
On Sun, Nov 07 2021, John Cai wrote:

> O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Nov 05 2021, Junio C Hamano wrote:
>> 
>> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> >
>> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>> >>  	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>> >>  	enum get_oid_result result;
>> >>  
>> >> +	if (opt->buffer_output && obj_name[0] == '\0') {
>> >> +		fflush(stdout);
>> >> +		return;
>> >> +	}
>> >> +
>> >
>> > This might work in practice, but it a bad design taste to add this
>> > change here.  The function is designed to take an object name
>> > string, and it even prepares a flag variable needed to make a call
>> > to turn that object name into object data.  We do not need to
>> > contaminate the interface with "usually this takes an object name,
>> > but there are these other special cases ...".  The higher in the
>> > callchain we place special cases, the better the lower level
>> > functions become, as that allows them to concentrate on doing one
>> > single thing well.
>> >
>> >>  	result = get_oid_with_context(the_repository, obj_name,
>> >>  				      flags, &data->oid, &ctx);
>> >>  	if (result != FOUND) {
>> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>> >>  			data.rest = p;
>> >>  		}
>> >>  
>> >> -		batch_one_object(input.buf, &output, opt, &data);
>> >> +		 /*
>> >> +		  * When in buffer mode and input.buf is an empty string,
>> >> +		  * flush to stdout.
>> >> +		  */
>> >
>> > Checking "do we have the flush instruction (in which case we'd do
>> > the flush here), or do we have textual name of an object (in which
>> > case we'd call batch_one_object())?" here would be far cleaner and
>> > results in an easier-to-explain code.  With a cleanly written code
>> > to do so, it probably does not even need a new comment here.
>> >
>> > This brings up another issue.  Is "flushing" the *ONLY* special
>> > thing we would ever do in this codepath in the future?  I doubt so.
>> > Squatting on an "empty string" is a selfish design that hurts those
>> > who will come after you in the future, as they need to find other
>> > ways to ask for a "special thing".
>> >
>> > If we are inventing a special syntax that allows us to spell
>> > commands that are distinguishable from a validly-spelled object name
>> > to cause something special (like "flushing the output stream"),
>> > perhaps we want to use a bit more extensible and explicit syntax and
>> > use it from day one?
>> >
>> > For example, if no string that begins with three dots can ever be a
>> > valid way to spell an object name, perhaps "...flush" might be a
>> > better "please do this special thing" syntax than an empty string.
>> > It is easily extensible (the next special thing can follow suit to
>> > say "...$verb" to tell the machinery to $verb the input).  When we
>> > compare between an empty string and "...flush", the latter clearly
>> > is more descriptive, too.
>> >
>> > Note that I offhand do not know if "a valid string that name an
>> > object would never begin with three-dot" is true.  Please check
>> > if that is true if you choose to use it, or you can find and use
>> > another convention that allows us to clearly distinguish the
>> > "special" instruction and object names.
>> 
>> I had much the same thought, this is a useful feature, but let's not
>> squat on the one bit of open syntax we have.
>> 
>> John: I think a better direction here is to add a mode to cat-file to
>> emulate what "git update-ref --stdin" supports. Here's a demo of that
>> (also quoted below):
>> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
>> 
>> That's on top of some general UI improvements to cat-file I've got
>> locally:
>> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
>> 
>> That WIP patch on top follows below, of course it's a *lot* more initial
>> scaffolding, but I think once we get past that initial step it's a much
>> better path forward. As noted the code is also almost entirely
>> copy/pasted from update-ref.c, and perhaps some of the shared parts
>> could be moved to some library both could use.
>> 
>> I couldn't think of a better name than --stdin-cmd, suggestions most
>> welcome.
>> 
>> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
>> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
>> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
>>  <avarab@gmail.com>
>> Date: Sat, 6 Nov 2021 04:54:04 +0100
>> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>> 
>> This WIP patch is mostly stealing code from builtin/update-ref.c and
>> implementing the same sort of prefixed command-mode that it
>> supports. I.e. in addition to --batch now supporting:
>> 
>>     <object> LF
>> 
>> It'll support with --stdin-cmd, with and without -z, respectively:
>> 
>>     object <object> NL
>>     object <object> NUL
>> 
>> The plus being that we can now implement additional commands. Let's
>> start that by scratching the itch John Cai wanted to address in [1]
>> and implement a (with and without -z):
>> 
>>     fflush NL
>>     fflush NUL
>> 
>> That command simply calls fflush(stdout), which could be done as an
>> emergent effect before by feeding the input a "NL".
>> 
>> I think this will be useful for other things, e.g. I've observed in
>> the past that a not-trivial part of "cat-file --batch" time is spent
>> on parsing its <object> argument and seeing if it's a revision, ref
>> etc.
>> 
>> So we could e.g. add a command that only accepts a full-length 40
>> character SHA-1, or switch the --format output mid-request etc.
>> 
>> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 115 insertions(+), 1 deletion(-)
>> 
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index b76f2a00046..afdb976c6e7 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -26,7 +26,10 @@ struct batch_options {
>>  	int unordered;
>>  	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
>>  	const char *format;
>> +	int stdin_cmd;
>> +	int end_null;
>>  };
>> +static char line_termination = '\n';
>>  
>>  static const char *force_path;
>>  
>> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
>>  				      data);
>>  }
>>  
>> +enum batch_state {
>> +	/* Non-transactional state open for commands. */
>> +	BATCH_STATE_OPEN,
>> +};
>> +
>> +static void parse_cmd_object(struct batch_options *opt,
>> +			     const char *next, const char *end,
>> +			     struct strbuf *output,
>> +			     struct expand_data *data)
>> +{
>> +	size_t len = end - next - 1;
>> +	char *p = (char *)next;
>> +	char old = p[len];
>> +
>> +	p[len] = '\0';
>> +	batch_one_object(next, output, opt, data);
>> +	p[len] = old;
>> +}
>> +
>> +static void parse_cmd_fflush(struct batch_options *opt,
>> +			     const char *next, const char *end,
>> +			     struct strbuf *output,
>> +			     struct expand_data *data)
>> +{
>> +	if (*next != line_termination)
>> +		die("fflush: extra input: %s", next);
>> +	fflush(stdout);
>> +}
>> +
>> +static const struct parse_cmd {
>> +	const char *prefix;
>> +	void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
>> +	unsigned args;
>> +	enum batch_state state;
>> +} command[] = {
>> +	{ "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
>> +	{ "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
>> +};
> I think overall this approach is cleaner and makes sense. My only
> question is, are there more commands in the future that will need some
> special command syntax? Just wondering whether YAGNI applies here.

An obvious addition is to at least add the ability to set the various
options on the fly, i.e. now you need to use --batch-check, and then
kill it and restart if you'd like the content with --batch, ditto for
--textconv.

E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes
around just to flip-flop between those two, sometimes you want the
content, sometimes you're just checking if the object exists.

I'd also like to add something to expose the likes of -e and -t
directly, i.e. even with --batch-check you often want to just check
existence, but get the size too, you could supply a format, but like the
above you sometimes want the size or whatever, and killing/starting a
new process just for that is a hassle...
Junio C Hamano Nov. 8, 2021, 8:11 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I think overall this approach is cleaner and makes sense. My only
>> question is, are there more commands in the future that will need some
>> special command syntax? Just wondering whether YAGNI applies here.
>
> An obvious addition is to at least add the ability to set the various
> options on the fly, i.e. now you need to use --batch-check, and then
> kill it and restart if you'd like the content with --batch, ditto for
> --textconv.
>
> E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes
> around just to flip-flop between those two, sometimes you want the
> content, sometimes you're just checking if the object exists.
>
> I'd also like to add something to expose the likes of -e and -t
> directly, i.e. even with --batch-check you often want to just check
> existence, but get the size too, you could supply a format, but like the
> above you sometimes want the size or whatever, and killing/starting a
> new process just for that is a hassle...

Yeah, with "plug" and "unplug" instruction you do not have to keep
issuing "flush" when you want to go interactive, and other things
become easy to do, so even though it would make it a bit more
verbose to require "object " prefix for the kind of lines that were
historically the only ones accepted by the command, I think it is a
good direction to go in.

Thanks.
diff mbox series

Patch

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 86fc03242b8..4d17f30f24e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -405,6 +405,11 @@  static void batch_one_object(const char *obj_name,
 	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
 	enum get_oid_result result;
 
+	if (opt->buffer_output && obj_name[0] == '\0') {
+		fflush(stdout);
+		return;
+	}
+
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
@@ -609,7 +614,11 @@  static int batch_objects(struct batch_options *opt)
 			data.rest = p;
 		}
 
-		batch_one_object(input.buf, &output, opt, &data);
+		 /*
+		  * When in buffer mode and input.buf is an empty string,
+		  * flush to stdout.
+		  */
+		 batch_one_object(input.buf, &output, opt, &data);
 	}
 
 	strbuf_release(&input);