mbox series

[v2,0/5] commit-graph: use 'struct object_directory *' everywhere

Message ID cover.1580764494.git.me@ttaylorr.com (mailing list archive)
Headers show
Series commit-graph: use 'struct object_directory *' everywhere | expand

Message

Taylor Blau Feb. 3, 2020, 9:17 p.m. UTC
Hi,

Here is a re-roll of the series in this thread to replace string-based
path comparison in 'commit-graph.c' code with raw pointer comparison of
'struct object_directory *'s.

The only thing that has changed substantially since v1 is that I swapped
the order of patches 2/6 and 4/6. What was patch 3/6 got folded into
what is now patch 2/5. This resolves an inconvenience where we had to
define a helper function in 'commit-graph.c', when it really belonged in
'builtin/commit-graph.c'.

I took a few other suggestions from Martin in what is now patch 4/5, and
noticed a few other small things along the way. A range-diff since v1 is
included below.

Thanks as always,
Taylor

Taylor Blau (5):
  t5318: don't pass non-object directory to '--object-dir'
  commit-graph.h: store an odb in 'struct write_commit_graph_context'
  commit-graph.h: store object directory in 'struct commit_graph'
  commit-graph.c: remove path normalization, comparison
  commit-graph.h: use odb in 'load_commit_graph_one_fd_st'

 Documentation/git-commit-graph.txt |   5 +-
 builtin/commit-graph.c             |  34 +++++++--
 builtin/commit.c                   |   2 +-
 builtin/fetch.c                    |   2 +-
 builtin/gc.c                       |   2 +-
 commit-graph.c                     | 115 +++++++++++++----------------
 commit-graph.h                     |  15 ++--
 t/helper/test-read-graph.c         |   8 +-
 t/t5318-commit-graph.sh            |   4 +-
 9 files changed, 100 insertions(+), 87 deletions(-)

Range-diff against v1:
1:  09ba72c85a = 1:  84a8709ad1 t5318: don't pass non-object directory to '--object-dir'
4:  9784a5db3c ! 2:  d9819cfb33 commit-graph.h: store an odb in 'struct write_commit_graph_context'
    @@ Metadata
      ## Commit message ##
         commit-graph.h: store an odb in 'struct write_commit_graph_context'

    -    In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
    -    replaced with a 'struct object_directory'. This patch applies the same
    -    treatement to 'struct write_commit_graph_context', which is an
    -    intermediate step towards getting rid of all path normalization in
    -    'commit-graph.c'.
    -
    -    Instead of taking a 'char *object_dir', functions that construct a
    -    'struct write_commit_graph_context' now take a 'struct object_directory
    -    *'. Any code that needs an object directory path use '->path' instead.
    -
    -    This ensures that all calls to functions that perform path normalization
    -    are given arguments which do not themselves require normalization. This
    -    prepares those functions to drop their normalization entirely, which
    -    will occur in the subsequent patch.
    +    There are lots of places in 'commit-graph.h' where a function either has
    +    (or almost has) a full 'struct object_directory *', accesses '->path',
    +    and then throws away the rest of the struct.
    +
    +    This can cause headaches when comparing the locations of object
    +    directories across alternates (e.g., in the case of deciding if two
    +    commit-graph layers can be merged). These paths are normalized with
    +    'normalize_path_copy()' which mitigates some comparison issues, but not
    +    all [1].
    +
    +    Replace usage of 'char *object_dir' with 'odb->path' by storing a
    +    'struct object_directory *' in the 'write_commit_graph_context'
    +    structure. This is an intermediate step towards getting rid of all path
    +    normalization in 'commit-graph.c'.
    +
    +    Resolving a user-provided '--object-dir' argument now requires that we
    +    compare it to the known alternates for equality.  Prior to this patch,
    +    an unknown '--object-dir' argument would silently exit with status zero.
    +
    +    This can clearly lead to unintended behavior, such as verifying
    +    commit-graphs that aren't in a repository's own object store (or one of
    +    its alternates), or causing a typo to mask a legitimate commit-graph
    +    verification failure. Make this error non-silent by 'die()'-ing when the
    +    given '--object-dir' does not match any known alternate object store.
    +
    +    [1]: In my testing, for example, I can get one side of the commit-graph
    +    code to fill object_dir with "./objects" and the other with just
    +    "objects".

         Signed-off-by: Taylor Blau <me@ttaylorr.com>

    + ## Documentation/git-commit-graph.txt ##
    +@@ Documentation/git-commit-graph.txt: OPTIONS
    + 	file. This parameter exists to specify the location of an alternate
    + 	that only has the objects directory, not a full `.git` directory. The
    + 	commit-graph file is expected to be in the `<dir>/info` directory and
    +-	the packfiles are expected to be in `<dir>/pack`.
    ++	the packfiles are expected to be in `<dir>/pack`. If the directory
    ++	could not be made into an absolute path, or does not match any known
    ++	object directory, `git commit-graph ...` will exit with non-zero
    ++	status.
    +
    + --[no-]progress::
    + 	Turn progress on/off explicitly. If neither is specified, progress is
    +
      ## builtin/commit-graph.c ##
    +@@ builtin/commit-graph.c: static struct opts_commit_graph {
    + 	int progress;
    + } opts;
    +
    ++struct object_directory *find_odb(struct repository *r, const char *obj_dir)
    ++{
    ++	struct object_directory *odb;
    ++	char *obj_dir_real = real_pathdup(obj_dir, 1);
    ++
    ++	prepare_alt_odb(r);
    ++	for (odb = r->objects->odb; odb; odb = odb->next) {
    ++		if (!strcmp(obj_dir_real, real_path(odb->path)))
    ++			break;
    ++	}
    ++
    ++	free(obj_dir_real);
    ++
    ++	if (!odb)
    ++		die(_("could not find object directory matching %s"), obj_dir);
    ++	return odb;
    ++}
    ++
    + static int graph_verify(int argc, const char **argv)
    + {
    + 	struct commit_graph *graph = NULL;
    ++	struct object_directory *odb = NULL;
    + 	char *graph_name;
    + 	int open_ok;
    + 	int fd;
    +@@ builtin/commit-graph.c: static int graph_verify(int argc, const char **argv)
    + 	if (opts.progress)
    + 		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
    +
    +-	graph_name = get_commit_graph_filename(opts.obj_dir);
    ++	odb = find_odb(the_repository, opts.obj_dir);
    ++	graph_name = get_commit_graph_filename(odb->path);
    + 	open_ok = open_commit_graph(graph_name, &fd, &st);
    + 	if (!open_ok && errno != ENOENT)
    + 		die_errno(_("Could not open commit-graph '%s'"), graph_name);
    +@@ builtin/commit-graph.c: static int graph_verify(int argc, const char **argv)
    +
    + 	if (open_ok)
    + 		graph = load_commit_graph_one_fd_st(fd, &st);
    +-	 else
    +-		graph = read_commit_graph_one(the_repository, opts.obj_dir);
    ++	else
    ++		graph = read_commit_graph_one(the_repository, odb->path);
    +
    + 	/* Return failure if open_ok predicted success */
    + 	if (!graph)
     @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    - 	odb = find_odb_or_die(the_repository, opts.obj_dir);
    + {
    + 	struct string_list *pack_indexes = NULL;
    + 	struct string_list *commit_hex = NULL;
    ++	struct object_directory *odb = NULL;
    + 	struct string_list lines;
    + 	int result = 0;
    + 	enum commit_graph_write_flags flags = 0;
    +@@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    + 		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
    +
    + 	read_replace_refs = 0;
    ++	odb = find_odb(the_repository, opts.obj_dir);

      	if (opts.reachable) {
    --		if (write_commit_graph_reachable(odb->path, flags, &split_opts))
    +-		if (write_commit_graph_reachable(opts.obj_dir, flags, &split_opts))
     +		if (write_commit_graph_reachable(odb, flags, &split_opts))
      			return 1;
      		return 0;
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
      		UNLEAK(buf);
      	}

    --	if (write_commit_graph(odb->path,
    +-	if (write_commit_graph(opts.obj_dir,
     +	if (write_commit_graph(odb,
      			       pack_indexes,
      			       commit_hex,
    @@ commit-graph.c: static void split_graph_merge_strategy(struct write_commit_graph

      	while (g && (g->num_commits <= size_mult * num_commits ||
      		    (max_commits && num_commits > max_commits))) {
    --		if (strcmp(g->odb->path, ctx->obj_dir))
    -+		if (strcmp(g->odb->path, ctx->odb->path))
    +-		if (strcmp(g->obj_dir, ctx->obj_dir))
    ++		if (strcmp(g->obj_dir, ctx->odb->path))
      			break;

      		num_commits += g->num_commits;
     @@ commit-graph.c: static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
    - 		char *old_graph_name = get_commit_graph_filename(g->odb->path);
    + 		char *old_graph_name = get_commit_graph_filename(g->obj_dir);

      		if (!strcmp(g->filename, old_graph_name) &&
    --		    strcmp(g->odb->path, ctx->obj_dir)) {
    -+		    strcmp(g->odb->path, ctx->odb->path)) {
    +-		    strcmp(g->obj_dir, ctx->obj_dir)) {
    ++		    strcmp(g->obj_dir, ctx->odb->path)) {
      			ctx->num_commit_graphs_after = 1;
      			ctx->new_base_graph = NULL;
      		}
    @@ commit-graph.c: int write_commit_graph(const char *obj_dir,
      		ctx->oids.alloc = split_opts->max_commits;

      	if (ctx->append) {
    --		struct object_directory *odb = find_odb(ctx->r, ctx->obj_dir);
    --		prepare_commit_graph_one(ctx->r, odb);
    -+		prepare_commit_graph_one(ctx->r, ctx->odb);
    +-		prepare_commit_graph_one(ctx->r, ctx->obj_dir);
    ++		prepare_commit_graph_one(ctx->r, ctx->odb->path);
      		if (ctx->r->objects->commit_graph)
      			ctx->oids.alloc += ctx->r->objects->commit_graph->num_commits;
      	}
    @@ commit-graph.c: int write_commit_graph(const char *obj_dir,
      		for (i = 0; i < ctx->num_commit_graphs_after; i++) {

      ## commit-graph.h ##
    +@@
    + #include "repository.h"
    + #include "string-list.h"
    + #include "cache.h"
    ++#include "object-store.h"
    +
    + #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
    + #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
     @@ commit-graph.h: struct split_commit_graph_opts {
       * is not compatible with the commit-graph feature, then the
       * methods will return 0 without writing a commit-graph.
2:  328884f5f6 ! 3:  5fd5cfca6e commit-graph.h: store object directory in 'struct commit_graph'
    @@ Metadata
      ## Commit message ##
         commit-graph.h: store object directory in 'struct commit_graph'

    -    There are lots of places in 'commit-graph.h' where a function either has
    -    (or almost has) a full 'struct object_directory *', accesses '->path',
    -    and then throws away the rest of the struct.
    +    In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
    +    replaced with a 'struct object_directory'. This patch applies the same
    +    treatment to 'struct commit_graph', which is another intermediate step
    +    towards getting rid of all path normalization in 'commit-graph.c'.

    -    This can cause headaches when comparing the locations of object
    -    directories across alternates (e.g., in the case of deciding if two
    -    commit-graph layers can be merged). These paths are normalized with
    -    'normalize_path_copy()' which mitigates some comparison issues, but not
    -    all [1].
    +    Instead of taking a 'char *object_dir', functions that construct a
    +    'struct commit_graph' now take a 'struct object_directory *'. Any code
    +    that needs an object directory path use '->path' instead.

    -    Instead of getting rid of the 'struct object_directory *', store that
    -    insead of a 'char *odb' in 'struct commit_graph'. Once the 'struct
    -    write_commit_graph_context' has an object_directory pointer, too, this
    -    will allow calling code to replace these error-prone path comparisons
    -    with raw pointer comparisons, thereby circumventing any
    -    normalization-related errors. This will be introduced in a subsequent
    -    patch.
    -
    -    [1]: In my testing, for example, I can get one side of the commit-graph
    -    code to fill object_dir with "./objects" and the other with just
    -    "objects".
    +    This ensures that all calls to functions that perform path normalization
    +    are given arguments which do not themselves require normalization. This
    +    prepares those functions to drop their normalization entirely, which
    +    will occur in the subsequent patch.

         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## builtin/commit-graph.c ##
     @@ builtin/commit-graph.c: static int graph_verify(int argc, const char **argv)
    -
      	if (open_ok)
      		graph = load_commit_graph_one_fd_st(fd, &st);
    --	 else
    --		graph = read_commit_graph_one(the_repository, opts.obj_dir);
    -+	else {
    -+		struct object_directory *odb;
    -+		if ((odb = find_odb(the_repository, opts.obj_dir)))
    -+			graph = read_commit_graph_one(the_repository, odb);
    -+	}
    + 	else
    +-		graph = read_commit_graph_one(the_repository, odb->path);
    ++		graph = read_commit_graph_one(the_repository, odb);

      	/* Return failure if open_ok predicted success */
      	if (!graph)
    -@@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    - 	struct string_list lines;
    - 	int result = 0;
    - 	enum commit_graph_write_flags flags = 0;
    -+	struct object_directory *odb = NULL;
    -
    - 	static struct option builtin_commit_graph_write_options[] = {
    - 		OPT_STRING(0, "object-dir", &opts.obj_dir,
    -@@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    - 		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
    -
    - 	read_replace_refs = 0;
    -+	odb = find_odb(the_repository, opts.obj_dir);
    -
    - 	if (opts.reachable) {
    --		if (write_commit_graph_reachable(opts.obj_dir, flags, &split_opts))
    -+		if (write_commit_graph_reachable(odb->path, flags, &split_opts))
    - 			return 1;
    - 		return 0;
    - 	}
    -@@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    - 		UNLEAK(buf);
    - 	}
    -
    --	if (write_commit_graph(opts.obj_dir,
    -+	if (write_commit_graph(odb->path,
    - 			       pack_indexes,
    - 			       commit_hex,
    - 			       flags,
    -
    - ## builtin/commit.c ##
    -@@
    - #include "help.h"
    - #include "commit-reach.h"
    - #include "commit-graph.h"
    -+#include "object-store.h"
    -
    - static const char * const builtin_commit_usage[] = {
    - 	N_("git commit [<options>] [--] <pathspec>..."),

      ## commit-graph.c ##
    -@@ commit-graph.c: static uint8_t oid_version(void)
    - 	return 1;
    - }
    -
    -+struct object_directory *find_odb(struct repository *r, const char *obj_dir)
    -+{
    -+	struct object_directory *odb;
    -+	char *obj_dir_real = real_pathdup(obj_dir, 1);
    -+	int cmp = -1;
    -+
    -+	prepare_alt_odb(r);
    -+	for (odb = r->objects->odb; odb; odb = odb->next) {
    -+		cmp = strcmp(obj_dir_real, real_path(odb->path));
    -+		if (!cmp)
    -+			break;
    -+	}
    -+
    -+	free(obj_dir_real);
    -+
    -+	if (cmp)
    -+		odb = NULL;
    -+	return odb;
    -+}
    -+
    - static struct commit_graph *alloc_commit_graph(void)
    - {
    - 	struct commit_graph *g = xcalloc(1, sizeof(*g));
     @@ commit-graph.c: static struct commit_graph *load_commit_graph_one(const char *graph_file)
      	return g;
      }
    @@ commit-graph.c: static void split_graph_merge_strategy(struct write_commit_graph

      	while (g && (g->num_commits <= size_mult * num_commits ||
      		    (max_commits && num_commits > max_commits))) {
    --		if (strcmp(g->obj_dir, ctx->obj_dir))
    -+		if (strcmp(g->odb->path, ctx->obj_dir))
    +-		if (strcmp(g->obj_dir, ctx->odb->path))
    ++		if (strcmp(g->odb->path, ctx->odb->path))
      			break;

      		num_commits += g->num_commits;
    @@ commit-graph.c: static void split_graph_merge_strategy(struct write_commit_graph
     +		char *old_graph_name = get_commit_graph_filename(g->odb->path);

      		if (!strcmp(g->filename, old_graph_name) &&
    --		    strcmp(g->obj_dir, ctx->obj_dir)) {
    -+		    strcmp(g->odb->path, ctx->obj_dir)) {
    +-		    strcmp(g->obj_dir, ctx->odb->path)) {
    ++		    strcmp(g->odb->path, ctx->odb->path)) {
      			ctx->num_commit_graphs_after = 1;
      			ctx->new_base_graph = NULL;
      		}
    -@@ commit-graph.c: int write_commit_graph(const char *obj_dir,
    +@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      		ctx->oids.alloc = split_opts->max_commits;

      	if (ctx->append) {
    --		prepare_commit_graph_one(ctx->r, ctx->obj_dir);
    -+		struct object_directory *odb = find_odb(ctx->r, ctx->obj_dir);
    -+		prepare_commit_graph_one(ctx->r, odb);
    +-		prepare_commit_graph_one(ctx->r, ctx->odb->path);
    ++		prepare_commit_graph_one(ctx->r, ctx->odb);
      		if (ctx->r->objects->commit_graph)
      			ctx->oids.alloc += ctx->r->objects->commit_graph->num_commits;
      	}

      ## commit-graph.h ##
    -@@
    - #include "repository.h"
    - #include "string-list.h"
    - #include "cache.h"
    -+#include "object-store.h"
    -
    - #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
    - #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
    -@@ commit-graph.h: struct commit;
    - char *get_commit_graph_filename(const char *obj_dir);
    - int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
    -
    -+struct object_directory *find_odb(struct repository *r, const char *obj_dir);
    -+
    - /*
    -  * Given a commit struct, try to fill the commit struct info, including:
    -  *  1. tree object
     @@ commit-graph.h: struct commit_graph {
      	uint32_t num_commits;
      	struct object_id oid;
3:  ce884d7742 < -:  ---------- builtin/commit-graph.c: die() with unknown '--object-dir'
5:  c9b2ba46ab ! 4:  f14e95aa7e commit-graph.c: remove path normalization, comparison
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## builtin/commit-graph.c ##
    -@@ builtin/commit-graph.c: static struct object_directory *find_odb_or_die(struct repository *r,
    - static int graph_verify(int argc, const char **argv)
    - {
    - 	struct commit_graph *graph = NULL;
    -+	struct object_directory *odb = NULL;
    - 	char *graph_name;
    - 	int open_ok;
    - 	int fd;
     @@ builtin/commit-graph.c: static int graph_verify(int argc, const char **argv)
    - 	if (opts.progress)
      		flags |= COMMIT_GRAPH_WRITE_PROGRESS;

    --	graph_name = get_commit_graph_filename(opts.obj_dir);
    -+	odb = find_odb_or_die(the_repository, opts.obj_dir);
    + 	odb = find_odb(the_repository, opts.obj_dir);
    +-	graph_name = get_commit_graph_filename(odb->path);
     +	graph_name = get_commit_graph_filename(odb);
      	open_ok = open_commit_graph(graph_name, &fd, &st);
      	if (!open_ok && errno != ENOENT)
      		die_errno(_("Could not open commit-graph '%s'"), graph_name);
    -@@ builtin/commit-graph.c: static int graph_verify(int argc, const char **argv)
    -
    - 	if (open_ok)
    - 		graph = load_commit_graph_one_fd_st(fd, &st);
    --	else {
    --		struct object_directory *odb;
    --		if ((odb = find_odb_or_die(the_repository, opts.obj_dir)))
    --			graph = read_commit_graph_one(the_repository, odb);
    --	}
    -+	else
    -+		graph = read_commit_graph_one(the_repository, odb);
    -
    - 	/* Return failure if open_ok predicted success */
    - 	if (!graph)

      ## commit-graph.c ##
     @@
    @@ commit-graph.h
     +char *get_commit_graph_filename(struct object_directory *odb);
      int open_commit_graph(const char *graph_file, int *fd, struct stat *st);

    - struct object_directory *find_odb(struct repository *r, const char *obj_dir);
    + /*

      ## t/helper/test-read-graph.c ##
     @@ t/helper/test-read-graph.c: int cmd__read_graph(int argc, const char **argv)
6:  e70483f771 = 5:  aa12b7378b commit-graph.h: use odb in 'load_commit_graph_one_fd_st'
--
2.25.0.119.gaa12b7378b

Comments

Jeff King Feb. 5, 2020, 12:30 p.m. UTC | #1
On Mon, Feb 03, 2020 at 01:17:53PM -0800, Taylor Blau wrote:

> Here is a re-roll of the series in this thread to replace string-based
> path comparison in 'commit-graph.c' code with raw pointer comparison of
> 'struct object_directory *'s.
> 
> The only thing that has changed substantially since v1 is that I swapped
> the order of patches 2/6 and 4/6. What was patch 3/6 got folded into
> what is now patch 2/5. This resolves an inconvenience where we had to
> define a helper function in 'commit-graph.c', when it really belonged in
> 'builtin/commit-graph.c'.
> 
> I took a few other suggestions from Martin in what is now patch 4/5, and
> noticed a few other small things along the way. A range-diff since v1 is
> included below.

Thanks, I read over this and the new layout looks quite good.

-Peff