Message ID | 890e0e7136204f5ca47f0703f32b4adb99ad8d7e.1580424766.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-graph: use 'struct object_directory *' everywhere | expand |
On Fri, 31 Jan 2020 at 00:03, Taylor Blau <me@ttaylorr.com> wrote: > Instead of getting rid of the 'struct object_directory *', store that > insead of a 'char *odb' in 'struct commit_graph'. Once the 'struct s/insead/instead/ > if (open_ok) > graph = load_commit_graph_one_fd_st(fd, &st); > - else > - graph = read_commit_graph_one(the_repository, opts.obj_dir); > + else { > + struct object_directory *odb; > + if ((odb = find_odb(the_repository, opts.obj_dir))) > + graph = read_commit_graph_one(the_repository, odb); > + } I'm a tiny bit allergic to this assignment-within-if. It's wrapped by another pair of parentheses, which both compilers and humans know to interpret as "trust me, this is not a mistake", but I still find this easier to read: odb = find_odb(...); if (odb) .... > --- a/builtin/commit.c > +++ b/builtin/commit.c > +#include "object-store.h" This is the only change in this file, which looks a bit odd. I haven't actually applied your patches, to be honest, but is this inclusion really needed? > --- a/commit-graph.c > +++ b/commit-graph.c > +struct object_directory *find_odb(struct repository *r, const char *obj_dir) This doesn't look commit-graph related -- could/should it go somewhere else? > +{ > + struct object_directory *odb; > + char *obj_dir_real = real_pathdup(obj_dir, 1); > + int cmp = -1; > + > + prepare_alt_odb(r); > + for (odb = r->objects->odb; odb; odb = odb->next) { > + cmp = strcmp(obj_dir_real, real_path(odb->path)); > + if (!cmp) > + break; > + } At this point, either odb is NULL or cmp is zero. Those are the only two ways out of the loop. > + free(obj_dir_real); > + > + if (cmp) > + odb = NULL; Meaning that this doesn't do much? If the most recent comparison failed, it's because we didn't find anything, so odb will be NULL. > + return odb; > +} I think you could drop `cmp` and that final check, and write the loop body as "if (!strcmp(...)) break". You could also have an empty loop body, but I wouldn't go there -- I'd find that less readable. (Maybe that's just me.) Martin
On Fri, Jan 31, 2020 at 07:52:02AM +0100, Martin Ågren wrote: > > --- a/commit-graph.c > > +++ b/commit-graph.c > > > +struct object_directory *find_odb(struct repository *r, const char *obj_dir) > > This doesn't look commit-graph related -- could/should it go somewhere > else? I think the right place is actually as a static inside builtin/commit-graph.c, as this is really about handling its weird --object-dir options. But it can't go there in this patch, because there's a caller in commit-graph.c. In patch 4, we convert write_commit_graph() to take an odb, too, and that call goes away. At that point, we could move it into the builtin as a static. Ideally we could flip the order of this patch and patch 4, but that doesn't work either: by switching to an odb we lose our path normalization, but if the other side hasn't switched either, then we can't just compare odb pointers. It would be a temporary regression. So there's a circular dependency between the two patches. I think we ought to do done of: - move find_odb() to a static as a cleanup on top - squash those two patches together into a single - swap the patch order, but have write_commit_graph_ctx store both the "odb" _and_ the normalized copy of the path we do now. That leaves it correct, and then it can be cleaned up in favor of an odb pointer comparison in patch 5, along with the rest of the normalized bits. I'm OK with any of those. The second two have the added bonus that we could introduce the die() behavior into find_odb() immediately, and explain it (there's another temporary weirdness in this patch where specifying an --object-dir outside of the repository becomes a silent noop, and then the next patch turns it into an error, but that could all be done in a single step when we introduce find_odb()). > I think you could drop `cmp` and that final check, and write the loop > body as "if (!strcmp(...)) break". You could also have an empty loop > body, but I wouldn't go there -- I'd find that less readable. (Maybe > that's just me.) Yeah, I believe you are correct (and this is a nice simplification worth doing). -Peff
On Fri, 31 Jan 2020 at 11:20, Jeff King <peff@peff.net> wrote: > > On Fri, Jan 31, 2020 at 07:52:02AM +0100, Martin Ågren wrote: > > > > --- a/commit-graph.c > > > +++ b/commit-graph.c > > > > > +struct object_directory *find_odb(struct repository *r, const char *obj_dir) > > > > This doesn't look commit-graph related -- could/should it go somewhere > > else? > > I think the right place is actually as a static inside > builtin/commit-graph.c, as this is really about handling its weird > --object-dir options. > > But it can't go there in this patch, because there's a caller in > commit-graph.c. In patch 4, we convert write_commit_graph() to take an > odb, too, and that call goes away. At that point, we could move it into > the builtin as a static. Thanks for explaining the issue. > Ideally we could flip the order of this patch and patch 4, but that > doesn't work either: by switching to an odb we lose our path > normalization, but if the other side hasn't switched either, then we > can't just compare odb pointers. It would be a temporary regression. > > So there's a circular dependency between the two patches. I think we > ought to do done of: > > - move find_odb() to a static as a cleanup on top > > - squash those two patches together into a single > > - swap the patch order, but have write_commit_graph_ctx store both the > "odb" _and_ the normalized copy of the path we do now. That leaves > it correct, and then it can be cleaned up in favor of an odb pointer > comparison in patch 5, along with the rest of the normalized bits. > > I'm OK with any of those. The second two have the added bonus that we > could introduce the die() behavior into find_odb() immediately, and > explain it (there's another temporary weirdness in this patch where > specifying an --object-dir outside of the repository becomes a silent > noop, and then the next patch turns it into an error, but that could all > be done in a single step when we introduce find_odb()). ... and these ways of addressing it. Martin
Martin Ågren <martin.agren@gmail.com> writes: >> + struct object_directory *odb; >> + if ((odb = find_odb(the_repository, opts.obj_dir))) >> + graph = read_commit_graph_one(the_repository, odb); >> + } > > I'm a tiny bit allergic to this assignment-within-if. It's wrapped by > another pair of parentheses, which both compilers and humans know to > interpret as "trust me, this is not a mistake", but I still find this > easier to read: > > odb = find_odb(...); > if (odb) > .... Yup.
Hi Martin, Thanks for your review! Your comments were all quite helpful, and I applied all of your suggested changes. On Fri, Jan 31, 2020 at 07:52:02AM +0100, Martin Ågren wrote: > On Fri, 31 Jan 2020 at 00:03, Taylor Blau <me@ttaylorr.com> wrote: > > Instead of getting rid of the 'struct object_directory *', store that > > insead of a 'char *odb' in 'struct commit_graph'. Once the 'struct > > s/insead/instead/ Typo. Thanks for noticing. I fixed this in my local copy of this branch. > > if (open_ok) > > graph = load_commit_graph_one_fd_st(fd, &st); > > - else > > - graph = read_commit_graph_one(the_repository, opts.obj_dir); > > + else { > > + struct object_directory *odb; > > + if ((odb = find_odb(the_repository, opts.obj_dir))) > > + graph = read_commit_graph_one(the_repository, odb); > > + } > > I'm a tiny bit allergic to this assignment-within-if. It's wrapped by > another pair of parentheses, which both compilers and humans know to > interpret as "trust me, this is not a mistake", but I still find this > easier to read: > > odb = find_odb(...); > if (odb) > .... To be honest, I'm not such a fan of this style myself, but it seemed odd to me to write: struct object_directory *odb; odb = ...; if (odb) { } when we were really only trying to call 'find_odb()' and do something with its result, but only if it was non-NULL. I counted 152 of these assign-if's laying around with: $ git grep 'if ((.* =[^=]' | wc -l but it seems like they are in poor style (as evidenced by your and Junio's response later in the thread). So, I removed this and instead promoted 'odb' to a local variable at the function level, since we do that promotion anyway in a couple of patches later. This reduces the churn, and avoids either an assign-if, or a define/assign/check. > > --- a/builtin/commit.c > > +++ b/builtin/commit.c > > > +#include "object-store.h" No; this is a stray left over from some development on this branch. I'll remove it. > This is the only change in this file, which looks a bit odd. I haven't > actually applied your patches, to be honest, but is this inclusion > really needed? > > > --- a/commit-graph.c > > +++ b/commit-graph.c > > > +struct object_directory *find_odb(struct repository *r, const char *obj_dir) > > This doesn't look commit-graph related -- could/should it go somewhere > else? I'll respond in more complete detail further down in the thread, but the short answer is "yes, this should go in builtin/commit-graph.c". > > +{ > > + struct object_directory *odb; > > + char *obj_dir_real = real_pathdup(obj_dir, 1); > > + int cmp = -1; > > + > > + prepare_alt_odb(r); > > + for (odb = r->objects->odb; odb; odb = odb->next) { > > + cmp = strcmp(obj_dir_real, real_path(odb->path)); > > + if (!cmp) > > + break; > > + } > > At this point, either odb is NULL or cmp is zero. Those are the only two > ways out of the loop. > > > + free(obj_dir_real); > > + > > + if (cmp) > > + odb = NULL; > > Meaning that this doesn't do much? If the most recent comparison failed, > it's because we didn't find anything, so odb will be NULL. > > > + return odb; > > +} > > I think you could drop `cmp` and that final check, and write the loop > body as "if (!strcmp(...)) break". You could also have an empty loop > body, but I wouldn't go there -- I'd find that less readable. (Maybe > that's just me.) Thanks, I changed this to remove the 'cmp' check outside of the loop, which I agree is unnecessary. > Martin Thanks, Taylor
On Fri, Jan 31, 2020 at 05:20:49AM -0500, Jeff King wrote: > On Fri, Jan 31, 2020 at 07:52:02AM +0100, Martin Ågren wrote: > > > > --- a/commit-graph.c > > > +++ b/commit-graph.c > > > > > +struct object_directory *find_odb(struct repository *r, const char *obj_dir) > > > > This doesn't look commit-graph related -- could/should it go somewhere > > else? > > I think the right place is actually as a static inside > builtin/commit-graph.c, as this is really about handling its weird > --object-dir options. > > But it can't go there in this patch, because there's a caller in > commit-graph.c. In patch 4, we convert write_commit_graph() to take an > odb, too, and that call goes away. At that point, we could move it into > the builtin as a static. > > Ideally we could flip the order of this patch and patch 4, but that > doesn't work either: by switching to an odb we lose our path > normalization, but if the other side hasn't switched either, then we > can't just compare odb pointers. It would be a temporary regression. > > So there's a circular dependency between the two patches. I think we > ought to do done of: > > - move find_odb() to a static as a cleanup on top > > - squash those two patches together into a single > > - swap the patch order, but have write_commit_graph_ctx store both the > "odb" _and_ the normalized copy of the path we do now. That leaves > it correct, and then it can be cleaned up in favor of an odb pointer > comparison in patch 5, along with the rest of the normalized bits. Thanks for describing the problem, and for laying out a few options! I went with this option, which was to swap patches 2/6 and 4/6, and then squashing 3/6 into what _was_ 4/6 (but is now 2/5). That all said, I am not sure that I follow the explanation that we need to have 'write_commit_graph_ctx' store the normalized object_dir, since we can use 'odb->path' for this instead. I double checked, and we already do the same loose normalization in 'sha1-file.c:link_alt_odb_entry()', where we remove trailing slashes. This isn't the exact same normalization strictly speaking. It differs only in that we remove more than one trailing slash if it exists, but this seems acceptable for the 2 patches that this behavior changes lives on for. I double checked that the 'odb->path' is normalized relative to the $PWD, in which case I think that we can rely on it as a stand-in for 'object_dir', and don't have to store both. > I'm OK with any of those. The second two have the added bonus that we > could introduce the die() behavior into find_odb() immediately, and > explain it (there's another temporary weirdness in this patch where > specifying an --object-dir outside of the repository becomes a silent > noop, and then the next patch turns it into an error, but that could all > be done in a single step when we introduce find_odb()). > > > I think you could drop `cmp` and that final check, and write the loop > > body as "if (!strcmp(...)) break". You could also have an empty loop > > body, but I wouldn't go there -- I'd find that less readable. (Maybe > > that's just me.) > > Yeah, I believe you are correct (and this is a nice simplification worth > doing). > > -Peff Thanks, Taylor
On Sun, Feb 02, 2020 at 08:36:46PM -0800, Taylor Blau wrote: > That all said, I am not sure that I follow the explanation that we need > to have 'write_commit_graph_ctx' store the normalized object_dir, since > we can use 'odb->path' for this instead. I double checked, and we > already do the same loose normalization in > 'sha1-file.c:link_alt_odb_entry()', where we remove trailing slashes. I was assuming that the normalization in write_commit_graph_ctx was necessary (and I think there are some tests around this), but I admit I didn't actually try it. If it's not failing tests, I'm OK with comparing the non-normalized paths as an intermediate step, since we end up comparing pointers after the final patches. -Peff
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index e0c6fc4bbf..3edac318e8 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -76,8 +76,11 @@ static int graph_verify(int argc, const char **argv) if (open_ok) graph = load_commit_graph_one_fd_st(fd, &st); - else - graph = read_commit_graph_one(the_repository, opts.obj_dir); + else { + struct object_directory *odb; + if ((odb = find_odb(the_repository, opts.obj_dir))) + graph = read_commit_graph_one(the_repository, odb); + } /* Return failure if open_ok predicted success */ if (!graph) @@ -97,6 +100,7 @@ static int graph_write(int argc, const char **argv) struct string_list lines; int result = 0; enum commit_graph_write_flags flags = 0; + struct object_directory *odb = NULL; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, @@ -145,9 +149,10 @@ static int graph_write(int argc, const char **argv) flags |= COMMIT_GRAPH_WRITE_PROGRESS; read_replace_refs = 0; + odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(opts.obj_dir, flags, &split_opts)) + if (write_commit_graph_reachable(odb->path, flags, &split_opts)) return 1; return 0; } @@ -169,7 +174,7 @@ static int graph_write(int argc, const char **argv) UNLEAK(buf); } - if (write_commit_graph(opts.obj_dir, + if (write_commit_graph(odb->path, pack_indexes, commit_hex, flags, diff --git a/builtin/commit.c b/builtin/commit.c index aa1332308a..bd071169d7 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -36,6 +36,7 @@ #include "help.h" #include "commit-reach.h" #include "commit-graph.h" +#include "object-store.h" static const char * const builtin_commit_usage[] = { N_("git commit [<options>] [--] <pathspec>..."), diff --git a/commit-graph.c b/commit-graph.c index b205e65ed1..2c06876b26 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -75,6 +75,26 @@ static uint8_t oid_version(void) return 1; } +struct object_directory *find_odb(struct repository *r, const char *obj_dir) +{ + struct object_directory *odb; + char *obj_dir_real = real_pathdup(obj_dir, 1); + int cmp = -1; + + prepare_alt_odb(r); + for (odb = r->objects->odb; odb; odb = odb->next) { + cmp = strcmp(obj_dir_real, real_path(odb->path)); + if (!cmp) + break; + } + + free(obj_dir_real); + + if (cmp) + odb = NULL; + return odb; +} + static struct commit_graph *alloc_commit_graph(void) { struct commit_graph *g = xcalloc(1, sizeof(*g)); @@ -327,14 +347,15 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file) return g; } -static struct commit_graph *load_commit_graph_v1(struct repository *r, const char *obj_dir) +static struct commit_graph *load_commit_graph_v1(struct repository *r, + struct object_directory *odb) { - char *graph_name = get_commit_graph_filename(obj_dir); + char *graph_name = get_commit_graph_filename(odb->path); struct commit_graph *g = load_commit_graph_one(graph_name); free(graph_name); if (g) - g->obj_dir = obj_dir; + g->odb = odb; return g; } @@ -372,14 +393,15 @@ static int add_graph_to_chain(struct commit_graph *g, return 1; } -static struct commit_graph *load_commit_graph_chain(struct repository *r, const char *obj_dir) +static struct commit_graph *load_commit_graph_chain(struct repository *r, + struct object_directory *odb) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; struct stat st; struct object_id *oids; int i = 0, valid = 1, count; - char *chain_name = get_chain_filename(obj_dir); + char *chain_name = get_chain_filename(odb->path); FILE *fp; int stat_res; @@ -418,7 +440,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, const free(graph_name); if (g) { - g->obj_dir = odb->path; + g->odb = odb; if (add_graph_to_chain(g, graph_chain, oids, i)) { graph_chain = g; @@ -442,23 +464,25 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, const return graph_chain; } -struct commit_graph *read_commit_graph_one(struct repository *r, const char *obj_dir) +struct commit_graph *read_commit_graph_one(struct repository *r, + struct object_directory *odb) { - struct commit_graph *g = load_commit_graph_v1(r, obj_dir); + struct commit_graph *g = load_commit_graph_v1(r, odb); if (!g) - g = load_commit_graph_chain(r, obj_dir); + g = load_commit_graph_chain(r, odb); return g; } -static void prepare_commit_graph_one(struct repository *r, const char *obj_dir) +static void prepare_commit_graph_one(struct repository *r, + struct object_directory *odb) { if (r->objects->commit_graph) return; - r->objects->commit_graph = read_commit_graph_one(r, obj_dir); + r->objects->commit_graph = read_commit_graph_one(r, odb); } /* @@ -505,7 +529,7 @@ static int prepare_commit_graph(struct repository *r) for (odb = r->objects->odb; !r->objects->commit_graph && odb; odb = odb->next) - prepare_commit_graph_one(r, odb->path); + prepare_commit_graph_one(r, odb); return !!r->objects->commit_graph; } @@ -1470,7 +1494,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) if (ctx->split && ctx->base_graph_name && ctx->num_commit_graphs_after > 1) { char *new_base_hash = xstrdup(oid_to_hex(&ctx->new_base_graph->oid)); - char *new_base_name = get_split_graph_filename(ctx->new_base_graph->obj_dir, new_base_hash); + char *new_base_name = get_split_graph_filename(ctx->new_base_graph->odb->path, new_base_hash); free(ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 2]); free(ctx->commit_graph_hash_after[ctx->num_commit_graphs_after - 2]); @@ -1553,7 +1577,7 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) while (g && (g->num_commits <= size_mult * num_commits || (max_commits && num_commits > max_commits))) { - if (strcmp(g->obj_dir, ctx->obj_dir)) + if (strcmp(g->odb->path, ctx->obj_dir)) break; num_commits += g->num_commits; @@ -1565,10 +1589,10 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) ctx->new_base_graph = g; if (ctx->num_commit_graphs_after == 2) { - char *old_graph_name = get_commit_graph_filename(g->obj_dir); + char *old_graph_name = get_commit_graph_filename(g->odb->path); if (!strcmp(g->filename, old_graph_name) && - strcmp(g->obj_dir, ctx->obj_dir)) { + strcmp(g->odb->path, ctx->obj_dir)) { ctx->num_commit_graphs_after = 1; ctx->new_base_graph = NULL; } @@ -1824,7 +1848,8 @@ int write_commit_graph(const char *obj_dir, ctx->oids.alloc = split_opts->max_commits; if (ctx->append) { - prepare_commit_graph_one(ctx->r, ctx->obj_dir); + struct object_directory *odb = find_odb(ctx->r, ctx->obj_dir); + prepare_commit_graph_one(ctx->r, odb); if (ctx->r->objects->commit_graph) ctx->oids.alloc += ctx->r->objects->commit_graph->num_commits; } diff --git a/commit-graph.h b/commit-graph.h index 7f5c933fa2..9700a6c7c2 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -5,6 +5,7 @@ #include "repository.h" #include "string-list.h" #include "cache.h" +#include "object-store.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" @@ -14,6 +15,8 @@ struct commit; char *get_commit_graph_filename(const char *obj_dir); int open_commit_graph(const char *graph_file, int *fd, struct stat *st); +struct object_directory *find_odb(struct repository *r, const char *obj_dir); + /* * Given a commit struct, try to fill the commit struct info, including: * 1. tree object @@ -48,7 +51,7 @@ struct commit_graph { uint32_t num_commits; struct object_id oid; char *filename; - const char *obj_dir; + struct object_directory *odb; uint32_t num_commits_in_base; struct commit_graph *base_graph; @@ -61,7 +64,8 @@ struct commit_graph { }; struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st); -struct commit_graph *read_commit_graph_one(struct repository *r, const char *obj_dir); +struct commit_graph *read_commit_graph_one(struct repository *r, + struct object_directory *odb); struct commit_graph *parse_commit_graph(void *graph_map, int fd, size_t graph_size);
There are lots of places in 'commit-graph.h' where a function either has (or almost has) a full 'struct object_directory *', accesses '->path', and then throws away the rest of the struct. This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged). These paths are normalized with 'normalize_path_copy()' which mitigates some comparison issues, but not all [1]. Instead of getting rid of the 'struct object_directory *', store that insead of a 'char *odb' in 'struct commit_graph'. Once the 'struct write_commit_graph_context' has an object_directory pointer, too, this will allow calling code to replace these error-prone path comparisons with raw pointer comparisons, thereby circumventing any normalization-related errors. This will be introduced in a subsequent patch. [1]: In my testing, for example, I can get one side of the commit-graph code to fill object_dir with "./objects" and the other with just "objects". Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/commit-graph.c | 13 +++++++--- builtin/commit.c | 1 + commit-graph.c | 59 ++++++++++++++++++++++++++++++------------ commit-graph.h | 8 ++++-- 4 files changed, 58 insertions(+), 23 deletions(-)