mbox series

[v9,0/8] cat-file: add remote-object-info to batch-command

Message ID 20250108183740.67022-1-eric.peijian@gmail.com (mailing list archive)
Headers show
Series cat-file: add remote-object-info to batch-command | expand

Message

Eric Ju Jan. 8, 2025, 6:37 p.m. UTC
Because I mistakenly sent a wrong range-diff in v8, please consider this v9 as
both an update addressing new comments from Calvin Wan at
https://lore.kernel.org/git/CAFySSZAqh6J14+r9JLM3LmRmV02ZvPRf5dB3rWVnUZS_5XaHcQ@mail.gmail.com/
and a resend of the corrected range-diff for v8.

This patch series is a continuation of Calvin Wan’s (calvinwan@google.com)
patch series [PATCH v5 0/6] cat-file: add --batch-command remote-object-info
command at [1].

Sometimes it is beneficial to retrieve information about an object without
having to download it completely. The server logic for retrieving size has
already been implemented and merged in "a2ba162cda (object-info: support for
retrieving object info, 2021-04-20)"[2]. This patch series implement the client
option for it.

This patch series add the `remote-object-info` command to
`cat-file --batch-command`. This command allows the client to make an
object-info command request to a server that supports protocol v2.

If the server uses protocol v2 but does not support the object-info capability,
`cat-file --batch-command` will die.

If a user attempts to use `remote-object-info` with protocol v1,,
`cat-file --batch-command` will die.

Currently, only the size (%(objectsize)) is supported in this implementation.
The type (%(objecttype)) is not included in this patch series, as it is not yet
supported on the server side either. The plan is to implement the necessary
logic for both the server and client in a subsequent series.

The default format for remote-object-info is set to %(objectname) %(objectsize).
Once %(objecttype) is supported, the default format will be unified accordingly.

If the batch command format includes unsupported fields such as %(objecttype),
%(objectsize:disk), or %(deltabase), the command will terminate with an error.

Changes since V7 (v8 had an incorrect range-diff)
================
- Introduced strtoul_ul() in git-compat-util.h to ensure proper error handling
  using strtoul from the standard library.
- Separated the test library into its own commit for better clarity
  and organization.
- Use string_list_has_string() instead of unsorted_string_list_has_string() to
  avoid quadratic runtime behaviour
- Added a documentation link to the wire format in the commit message to
  provide additional context.
- New test case "remote-object-info fails on not providing OID"
- Fixed typos and formatting issues for improved readability.
- Flattened the memory allocation logic of sizep in object_info_data for better
  intuitiveness and readability.

Calvin Wan (4):
  fetch-pack: refactor packet writing
  fetch-pack: move fetch initialization
  serve: advertise object-info feature
  transport: add client support for object-info

Eric Ju (4):
  git-compat-util: add strtoul_ul() with error handling
  cat-file: add declaration of variable i inside its for loop
  cat-file: split test utility functions into a separate library file
  cat-file: add remote-object-info to batch-command

 Documentation/git-cat-file.txt         |  24 +-
 Makefile                               |   1 +
 builtin/cat-file.c                     | 110 +++-
 connect.c                              |  34 ++
 connect.h                              |   8 +
 fetch-object-info.c                    |  85 ++++
 fetch-object-info.h                    |  22 +
 fetch-pack.c                           |  51 +-
 fetch-pack.h                           |   2 +
 git-compat-util.h                      |  18 +
 object-file.c                          |  11 +
 object-store-ll.h                      |   3 +
 serve.c                                |   4 +-
 t/lib-cat-file.sh                      |  16 +
 t/t1006-cat-file.sh                    |  13 +-
 t/t1017-cat-file-remote-object-info.sh | 664 +++++++++++++++++++++++++
 transport-helper.c                     |  11 +-
 transport.c                            |  28 +-
 transport.h                            |  11 +
 19 files changed, 1048 insertions(+), 68 deletions(-)
 create mode 100644 fetch-object-info.c
 create mode 100644 fetch-object-info.h
 create mode 100644 t/lib-cat-file.sh
 create mode 100755 t/t1017-cat-file-remote-object-info.sh

Range-diff against v7:
-:  ---------- > 1:  63997081d1 git-compat-util: add strtoul_ul() with error handling
1:  5181e849eb ! 2:  f188962f05 cat-file: add declaration of variable i inside its for loop
    @@ fetch-pack.c: static void write_fetch_command_and_capabilities(struct strbuf *re
     -		int i;
      		ensure_server_supports_v2("server-option");
     -		for (i = 0; i < server_options->nr; i++)
    -+		for (int i = 0; i < server_options->nr; i++)
    ++		for (size_t i = 0; i < server_options->nr; i++)
      			packet_buf_write(req_buf, "server-option=%s",
      					 server_options->items[i].string);
      	}
-:  ---------- > 3:  71250a03d2 cat-file: split test utility functions into a separate library file
2:  0c6acf58c2 ! 4:  0ab26e6cd5 fetch-pack: refactor packet writing
    @@ connect.c: int server_supports(const char *feature)
     +		packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
     +	if (server_options && server_options->nr) {
     +		ensure_server_supports_v2("server-option");
    -+		for (int i = 0; i < server_options->nr; i++)
    ++		for (size_t i = 0; i < server_options->nr; i++)
     +			packet_buf_write(req_buf, "server-option=%s",
     +					 server_options->items[i].string);
     +	}
    @@ connect.c: int server_supports(const char *feature)
      	PROTO_FILE,
     
      ## connect.h ##
    -@@
    - #ifndef CONNECT_H
    - #define CONNECT_H
    - 
    -+#include "string-list.h"
    - #include "protocol.h"
    - 
    - #define CONNECT_VERBOSE       (1u << 0)
     @@ connect.h: void check_stateless_delimiter(int stateless_rpc,
      			       struct packet_reader *reader,
      			       const char *error);
      
    -+/**
    -+ * write_command_and_capabilities writes a command along with the requested
    ++/*
    ++ * Writes a command along with the requested
     + * server capabilities/features into a request buffer.
     + */
    ++struct string_list;
     +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
     +				    const struct string_list *server_options);
     +
    @@ fetch-pack.c: static int add_haves(struct fetch_negotiator *negotiator,
     -		packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
     -	if (server_options && server_options->nr) {
     -		ensure_server_supports_v2("server-option");
    --		for (int i = 0; i < server_options->nr; i++)
    +-		for (size_t i = 0; i < server_options->nr; i++)
     -			packet_buf_write(req_buf, "server-option=%s",
     -					 server_options->items[i].string);
     -	}
3:  28ef74980c = 5:  8b381b4bdc fetch-pack: move fetch initialization
4:  cb5bf65b88 = 6:  a0a15e1e4f serve: advertise object-info feature
5:  79eab87dd2 ! 7:  e1aad1ec30 transport: add client support for object-info
    @@ Commit message
         Sometimes, it is beneficial to retrieve information about an object
         without downloading it entirely. The server-side logic for this
         functionality was implemented in commit "a2ba162cda (object-info:
    -    support for retrieving object info, 2021-04-20)."
    +    support for retrieving object info, 2021-04-20)." And the wire
    +    format is documented at
    +    https://git-scm.com/docs/protocol-v2#_object_info.
     
         This commit introduces client functions to interact with the server.
     
         Currently, the client supports requesting a list of object IDs with
    -    the ‘size’ feature from a v2 server. If the server does not advertise
    +    the 'size' feature from a v2 server. If the server does not advertise
         this feature (i.e., transfer.advertiseobjectinfo is set to false),
         the client will return an error and exit.
     
    +    Notice that the entire request is written into req_buf before being
    +    sent to the remote. This approach follows the pattern used in the
    +    `send_fetch_request()` logic within fetch-pack.c.
    +    Streaming the request is not addressed in this patch.
    +
         Helped-by: Jonathan Tan <jonathantanmy@google.com>
         Helped-by: Christian Couder <chriscool@tuxfamily.org>
         Signed-off-by: Calvin Wan <calvinwan@google.com>
    @@ fetch-object-info.c (new)
     +#include "fetch-object-info.h"
     +#include "string-list.h"
     +
    -+/**
    -+ * send_object_info_request sends git-cat-file object-info command and its
    -+ * arguments into the request buffer.
    -+ */
    ++/* Sends git-cat-file object-info command and its arguments into the request buffer. */
     +static void send_object_info_request(const int fd_out, struct object_info_args *args)
     +{
     +	struct strbuf req_buf = STRBUF_INIT;
    @@ fetch-object-info.c (new)
     +	if (unsorted_string_list_has_string(args->object_info_options, "size"))
     +		packet_buf_write(&req_buf, "size");
     +
    -+	if (args->oids) {
    ++	if (args->oids)
     +		for (size_t i = 0; i < args->oids->nr; i++)
     +			packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
    -+	}
     +
     +	packet_buf_flush(&req_buf);
     +	if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
    @@ fetch-object-info.c (new)
     +	strbuf_release(&req_buf);
     +}
     +
    -+/**
    -+ * fetch_object_info sends git-cat-file object-info command into the request buf
    -+ * and read the results from packets.
    -+ */
     +int fetch_object_info(const enum protocol_version version, struct object_info_args *args,
     +		      struct packet_reader *reader, struct object_info *object_info_data,
     +		      const int stateless_rpc, const int fd_out)
    @@ fetch-object-info.c (new)
     +		break;
     +	case protocol_v1:
     +	case protocol_v0:
    -+		die(_("wrong protocol version. expected v2"));
    ++		die(_("unsupported protocol version. expected v2"));
     +	case protocol_unknown_version:
     +		BUG("unknown protocol version");
     +	}
    @@ fetch-object-info.c (new)
     +			check_stateless_delimiter(stateless_rpc, reader, "stateless delimiter expected");
     +			return -1;
     +		}
    -+		if (unsorted_string_list_has_string(args->object_info_options, reader->line)) {
    -+			if (!strcmp(reader->line, "size")) {
    -+				size_index = i;
    -+				for (size_t j = 0; j < args->oids->nr; j++)
    -+					object_info_data[j].sizep = xcalloc(1, sizeof(long));
    -+			}
    -+			continue;
    ++		if (!string_list_has_string(args->object_info_options, reader->line))
    ++			return -1;
    ++		if (!strcmp(reader->line, "size")) {
    ++			size_index = i;
    ++			for (size_t j = 0; j < args->oids->nr; j++)
    ++				object_info_data[j].sizep = xcalloc(1, sizeof(*object_info_data[j].sizep));
     +		}
    -+		return -1;
     +	}
     +
     +	for (size_t i = 0; packet_reader_read(reader) == PACKET_READ_NORMAL && i < args->oids->nr; i++){
    @@ fetch-object-info.c (new)
     +				die("object-info: not our ref %s",
     +					object_info_values.items[0].string);
     +
    -+			*object_info_data[i].sizep = strtoul(object_info_values.items[1 + size_index].string, NULL, 10);
    ++			if (strtoul_ul(object_info_values.items[1 + size_index].string, 10, object_info_data[i].sizep))
    ++				die("object-info: ref %s has invalid size %s",
    ++					object_info_values.items[0].string,
    ++					object_info_values.items[1 + size_index].string);
     +		}
     +
     +		string_list_clear(&object_info_values, 0);
    @@ fetch-object-info.h (new)
     +	struct oid_array *oids;
     +};
     +
    ++/*
    ++ * Sends git-cat-file object-info command into the request buf and read the
    ++ * results from packets.
    ++ */
     +int fetch_object_info(enum protocol_version version, struct object_info_args *args,
     +		      struct packet_reader *reader, struct object_info *object_info_data,
     +		      int stateless_rpc, int fd_out);
    @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
      	args.reject_shallow_remote = transport->smart_options->reject_shallow;
     +	args.object_info = transport->smart_options->object_info;
     +
    -+	if (transport->smart_options
    -+		&& transport->smart_options->object_info
    -+		&& transport->smart_options->object_info_oids->nr > 0) {
    ++	if (transport->smart_options && transport->smart_options->object_info
    ++	    && transport->smart_options->object_info_oids->nr > 0) {
     +		struct packet_reader reader;
     +		struct object_info_args obj_info_args = { 0 };
     +
     +		obj_info_args.server_options = transport->server_options;
    -+		obj_info_args.object_info_options = transport->smart_options->object_info_options;
     +		obj_info_args.oids = transport->smart_options->object_info_oids;
    ++		obj_info_args.object_info_options = transport->smart_options->object_info_options;
    ++		string_list_sort(obj_info_args.object_info_options);
     +
     +		connect_setup(transport, 0);
     +		packet_reader_init(&reader, data->fd[0], NULL, 0,
6:  b60863aa5b ! 8:  0795ad53fe cat-file: add remote-object-info to batch-command
    @@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
     +			die(_("Not a valid object name %s"), argv[i]);
     +		oid_array_append(&object_info_oids, &oid);
     +	}
    -+
    ++	if (object_info_oids.nr == 0) {
    ++		die(_("remote-object-info requires objects"));
    ++	}
     +	gtransport = transport_get(remote, NULL);
     +	if (gtransport->smart_options) {
     +		CALLOC_ARRAY(remote_object_info, object_info_oids.nr);
    @@ builtin/cat-file.c: static void parse_cmd_info(struct batch_options *opt,
     +	opt->use_remote_info = 1;
     +	data->skip_object_info = 1;
     +	for (size_t i = 0; i < object_info_oids.nr; i++) {
    -+
     +		data->oid = object_info_oids.oid[i];
    -+
     +		if (remote_object_info[i].sizep) {
     +			/*
     +			 * When reaching here, it means remote-object-info can retrieve
    @@ object-store-ll.h: int for_each_object_in_pack(struct packed_git *p,
     +
      #endif /* OBJECT_STORE_LL_H */
     
    - ## t/lib-cat-file.sh (new) ##
    -@@
    -+# Library of git-cat-file related tests.
    -+
    -+# Print a string without a trailing newline
    -+echo_without_newline () {
    -+	printf '%s' "$*"
    -+}
    -+
    -+# Print a string without newlines and replaces them with a NULL character (\0).
    -+echo_without_newline_nul () {
    -+	echo_without_newline "$@" | tr '\n' '\0'
    -+}
    -+
    -+# Calculate the length of a string removing any leading spaces.
    -+strlen () {
    -+	echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
    -+}
    -
    - ## t/t1006-cat-file.sh ##
    -@@
    - test_description='git cat-file'
    - 
    - . ./test-lib.sh
    -+. "$TEST_DIRECTORY"/lib-cat-file.sh
    - 
    - test_cmdmode_usage () {
    - 	test_expect_code 129 "$@" 2>err &&
    -@@ t/t1006-cat-file.sh: do
    - 	'
    - done
    - 
    --echo_without_newline () {
    --    printf '%s' "$*"
    --}
    --
    --echo_without_newline_nul () {
    --	echo_without_newline "$@" | tr '\n' '\0'
    --}
    --
    --strlen () {
    --    echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
    --}
    --
    - run_tests () {
    -     type=$1
    -     oid=$2
    -
      ## t/t1017-cat-file-remote-object-info.sh (new) ##
     @@
     +#!/bin/sh
    @@ t/t1017-cat-file-remote-object-info.sh (new)
     +	)
     +'
     +
    -+test_expect_success 'remote-object-info fails on server with legacy protocol' '
    ++test_expect_success 'remote-object-info fails on server with legacy protocol with default filter' '
     +	(
     +		set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
     +		cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
    @@ t/t1017-cat-file-remote-object-info.sh (new)
     +	)
     +'
     +
    ++test_expect_success 'remote-object-info fails on not providing OID' '
    ++	(
    ++		set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
    ++		cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
    ++
    ++		test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
    ++		remote-object-info "$HTTPD_URL/smart/http_parent"
    ++		EOF
    ++		test_grep "remote-object-info requires objects" err
    ++	)
    ++'
    ++
     +
     +# Test --batch-command remote-object-info with 'http://' transport and
     +# transfer.advertiseobjectinfo set to false, i.e. server does not have object-info capability