Message ID | 4f5bc19903f8a1f5b153b5665de378e743e12744.1580430057.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | builtin/commit-graph.c: new split/merge options | expand |
On 1/30/2020 7:28 PM, Taylor Blau wrote: > With '--split', the commit-graph machinery writes new commits in another > incremental commit-graph which is part of the existing chain, and > optionally decides to condense the chain into a single commit-graph. > This is done to ensure that the aysmptotic behavior of looking up a > commit in an incremental chain is dominated by the number of > incrementals in that chain. It can be controlled by the '--max-commits' > and '--size-multiple' options. > > On occasion, callers may want to ensure that 'git commit-graph write > --split' always writes an incremental, and never spends effort > condensing the incremental chain [1]. Previously, this was possible by > passing '--size-multiple=0', but this no longer the case following > 63020f175f (commit-graph: prefer default size_mult when given zero, > 2020-01-02). > > Reintroduce a less-magical variant of the above with a new pair of > arguments to '--split': '--split=no-merge' and '--split=merge-all'. When > '--split=no-merge' is given, the commit-graph machinery will never > condense an existing chain and will always write a new incremental. > Conversely, if '--split=merge-all' is given, any invocation including it > will always condense a chain if one exists. If '--split' is given with > no arguments, it behaves as before and defers to '--size-multiple', and > so on. > > [1]: This might occur when, for example, a server administrator running > some program after each push may want to ensure that each job runs > proportional in time to the size of the push, and does not "jump" when > the commit-graph machinery decides to trigger a merge. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > Documentation/git-commit-graph.txt | 18 +++++++++++----- > builtin/commit-graph.c | 33 ++++++++++++++++++++++++++---- > commit-graph.c | 19 +++++++++-------- > commit-graph.h | 7 +++++++ > t/t5324-split-commit-graph.sh | 25 ++++++++++++++++++++++ > 5 files changed, 85 insertions(+), 17 deletions(-) > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > index 28d1fee505..8d61ba9f56 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt > @@ -57,11 +57,19 @@ or `--stdin-packs`.) > With the `--append` option, include all commits that are present in the > existing commit-graph file. > + > -With the `--split` option, write the commit-graph as a chain of multiple > -commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > -not already in the commit-graph are added in a new "tip" file. This file > -is merged with the existing file if the following merge conditions are > -met: > +With the `--split[=<strategy>]` option, write the commit-graph as a > +chain of multiple commit-graph files stored in > +`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the > +strategy and other splitting options. The new commits not already in the > +commit-graph are added in a new "tip" file. This file is merged with the > +existing file if the following merge conditions are met: > +* If `--split=merge-always` is specified, then a merge is always > +conducted, and the remaining options are ignored. Conversely, if > +`--split=no-merge` is specified, a merge is never performed, and the > +remaining options are ignored. A bare `--split` defers to the remaining > +options. (Note that merging a chain of commit graphs replaces the > +existing chain with a length-1 chain where the first and only > +incremental holds the entire graph). > + > * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new > tip file would have `N` commits and the previous tip has `M` commits and > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index de321c71ad..f03b46d627 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -9,7 +9,9 @@ > > static char const * const builtin_commit_graph_usage[] = { > N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append] " > + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " > + "[--[no-]progress] <split options>"), > NULL > }; > > @@ -19,7 +21,9 @@ static const char * const builtin_commit_graph_verify_usage[] = { > }; > > static const char * const builtin_commit_graph_write_usage[] = { > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append] " > + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " > + "[--[no-]progress] <split options>"), > NULL > }; > > @@ -101,6 +105,25 @@ static int graph_verify(int argc, const char **argv) > extern int read_replace_refs; > static struct split_commit_graph_opts split_opts; > > +static int write_option_parse_split(const struct option *opt, const char *arg, > + int unset) > +{ > + enum commit_graph_split_flags *flags = opt->value; > + > + opts.split = 1; > + if (!arg) > + return 0; This allows `--split` to continue working as-is. But should we also set "*flags = COMMIT_GRAPH_SPLIT_UNSPECIFIED" here? This allows one to run "git commit-graph write --split=no-merge --split" (which could happen if "--split=no-merge" is inside an alias). > +test_expect_success '--split=merge-all always merges incrementals' ' > + test_when_finished rm -rf a b c && > + rm -rf $graphdir $infodir/commit-graph && > + git reset --hard commits/10 && > + git rev-list -3 HEAD~4 >a && > + git rev-list -2 HEAD~2 >b && > + git rev-list -2 HEAD >c && > + git commit-graph write --split=no-merge --stdin-commits <a && > + git commit-graph write --split=no-merge --stdin-commits <b && > + test_line_count = 2 $graphdir/commit-graph-chain && > + git commit-graph write --split=merge-all --stdin-commits <c && > + test_line_count = 1 $graphdir/commit-graph-chain > +' > + > +test_expect_success '--split=no-merge always writes an incremental' ' > + test_when_finished rm -rf a b && > + rm -rf $graphdir && > + git reset --hard commits/2 && > + git rev-list HEAD~1 >a && > + git rev-list HEAD >b && > + git commit-graph write --split --stdin-commits <a && > + git commit-graph write --split=no-merge --stdin-commits <b && > + test_line_count = 2 $graphdir/commit-graph-chain > +' > + > test_done Good tests! Thanks, -Stolee
On Fri, 31 Jan 2020 at 01:29, Taylor Blau <me@ttaylorr.com> wrote: > With '--split', the commit-graph machinery writes new commits in another > incremental commit-graph which is part of the existing chain, and > optionally decides to condense the chain into a single commit-graph. > This is done to ensure that the aysmptotic behavior of looking up a asymptotic > commit in an incremental chain is dominated by the number of > incrementals in that chain. It can be controlled by the '--max-commits' > and '--size-multiple' options. > > On occasion, callers may want to ensure that 'git commit-graph write > --split' always writes an incremental, and never spends effort > condensing the incremental chain [1]. Previously, this was possible by > passing '--size-multiple=0', but this no longer the case following > 63020f175f (commit-graph: prefer default size_mult when given zero, > 2020-01-02). > > Reintroduce a less-magical variant of the above with a new pair of > arguments to '--split': '--split=no-merge' and '--split=merge-all'. When > '--split=no-merge' is given, the commit-graph machinery will never > condense an existing chain and will always write a new incremental. > Conversely, if '--split=merge-all' is given, any invocation including it > will always condense a chain if one exists. If '--split' is given with > no arguments, it behaves as before and defers to '--size-multiple', and > so on. I understand your motivation for doing this -- it all seems quite sound to me. Not being too familiar with this commit-graph splitting and merging, I had a hard time groking this terminology though. From what I understand, before this patch, `--split` means "write the commit-graph using the 'split' file-format / in a split fashion". Ok, that makes sense. From seeing `--split=no-merge`, I have no idea how to even parse that. Of course I don't want to merge, I want to split! Well not split, but write split files. I think it would help me (and others like me) if we could somehow separate "I want to use 'split' files" from "and here's how I want you to decide on the merging". That is, which "strategy" to use. Obviously, talking about a "merge strategy" would be stupid and "split strategy" also seems a bit odd. "Coalescing strategy"? "Joining strategy"? Or can you convince me otherwise? From which angle should I look at this? > -With the `--split` option, write the commit-graph as a chain of multiple > -commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > -not already in the commit-graph are added in a new "tip" file. This file > -is merged with the existing file if the following merge conditions are > -met: > +With the `--split[=<strategy>]` option, write the commit-graph as a > +chain of multiple commit-graph files stored in > +`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the > +strategy and other splitting options. The new commits not already in the > +commit-graph are added in a new "tip" file. This file is merged with the > +existing file if the following merge conditions are met: > +* If `--split=merge-always` is specified, then a merge is always > +conducted, and the remaining options are ignored. Conversely, if > +`--split=no-merge` is specified, a merge is never performed, and the > +remaining options are ignored. A bare `--split` defers to the remaining > +options. (Note that merging a chain of commit graphs replaces the > +existing chain with a length-1 chain where the first and only > +incremental holds the entire graph). To better understand the background for this patch, I read the manpage as it stands today. From the section on `--split`, I got this impression: Let's say that `--max-commits` is huge, so all that matters is the `--size-multiple`. Let's say it's two. If the current tip contains three commits and we're about to write one with two, then 2*2 > 3 so we will merge, i.e., write a tip file with five commits. Unless of course *that* is more than half the size of the file before. And so on. We might just merge once, or maybe "many" files in an avalanche effect. Every now and then, such avalanches will go all the way back to the first file. Now this says something different, namely that once we decide to merge, we do it all the way back, no matter what. The commit message of 1771be90c8 ("commit-graph: merge commit-graph chains", 2019-06-18) seems to support my original understanding, at least for `--size-multiple`, but not `--max-commits`, curiously enough. Can you clarify? > - OPT_BOOL(0, "split", &opts.split, > - N_("allow writing an incremental commit-graph file")), > + OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, > + N_("allow writing an incremental commit-graph file"), This still sounds very boolean. Cramming in the "strategy" might be hard -- is this an argument in favor of having two separate options? ;-) > +enum commit_graph_split_flags { > + COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0, > + COMMIT_GRAPH_SPLIT_MERGE_REQUIRED = 1, > + COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 2 > +}; I wonder if this should be "MERGE_AUTO" rather than "UNSPECIFIED". This is related to Stolee's comment, I think. Martin
On Thu, Jan 30, 2020 at 04:28:17PM -0800, Taylor Blau wrote: > diff --git a/commit-graph.c b/commit-graph.c > index 6d34829f57..02e6ad9d1f 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > num_commits = ctx->commits.nr; > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > - while (g && (g->num_commits <= size_mult * num_commits || > - (max_commits && num_commits > max_commits))) { > - if (g->odb != ctx->odb) > - break; > + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { This line segfaults in the tests 'fetch.writeCommitGraph' and 'fetch.writeCommitGraph with submodules' in 't5510-fetch.sh', because 'git fetch' doesn't pass a 'split_opts' to the commit-graph functions. Thread 1 "git" received signal SIGSEGV, Segmentation fault. 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) at commit-graph.c:1568 1568 if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { (gdb) p *ctx $1 = {r = 0x9ae2a0 <the_repo>, odb = 0x9c0df0, graph_name = 0x0, oids = { list = 0xa02660, nr = 12, alloc = 1024}, commits = {list = 0x9caa00, nr = 6, alloc = 6}, num_extra_edges = 0, approx_nr_objects = 0, progress = 0x0, progress_done = 0, progress_cnt = 0, base_graph_name = 0x0, num_commit_graphs_before = 0, num_commit_graphs_after = 1, commit_graph_filenames_before = 0x0, commit_graph_filenames_after = 0x0, commit_graph_hash_after = 0x0, new_num_commits_in_base = 0, new_base_graph = 0x0, append = 0, report_progress = 1, split = 1, check_oids = 0, split_opts = 0x0} ^^^^^^^^^^^^^^^^ (gdb) bt #0 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) at commit-graph.c:1568 #1 0x0000000000512446 in write_commit_graph (odb=0x9c0df0, pack_indexes=0x0, commit_hex=0x7fffffffd550, flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), split_opts=0x0) at commit-graph.c:1891 #2 0x000000000050fd86 in write_commit_graph_reachable (odb=0x9c0df0, flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), split_opts=0x0) at commit-graph.c:1174 ^^^^^^^^^^^^^^ #3 0x0000000000444ea4 in cmd_fetch (argc=0, argv=0x7fffffffd9b8, prefix=0x0) at builtin/fetch.c:1873 #4 0x0000000000406154 in run_builtin (p=0x969a88 <commands+840>, argc=2, argv=0x7fffffffd9b0) at git.c:444 #5 0x00000000004064a4 in handle_builtin (argc=2, argv=0x7fffffffd9b0) at git.c:674 #6 0x000000000040674c in run_argv (argcp=0x7fffffffd84c, argv=0x7fffffffd840) at git.c:741 #7 0x0000000000406bbd in cmd_main (argc=2, argv=0x7fffffffd9b0) at git.c:872 #8 0x00000000004cfd4e in main (argc=5, argv=0x7fffffffd998) at common-main.c:52 Note that this function split_graph_merge_strategy() does look at various fields in 'ctx->split_opts' a bit earlier, but those accesses are protected by a 'if (ctx->split_opts)' condition. expire_commit_graphs() does the same. > + while (g && (g->num_commits <= size_mult * num_commits || > + (max_commits && num_commits > max_commits) || > + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > + if (g->odb != ctx->odb) > + break; > > - num_commits += g->num_commits; > - g = g->base_graph; > + num_commits += g->num_commits; > + g = g->base_graph; > > - ctx->num_commit_graphs_after--; > + ctx->num_commit_graphs_after--; > + } > } >
Hi, On Sat, 1 Feb 2020, SZEDER Gábor wrote: > On Thu, Jan 30, 2020 at 04:28:17PM -0800, Taylor Blau wrote: > > diff --git a/commit-graph.c b/commit-graph.c > > index 6d34829f57..02e6ad9d1f 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > num_commits = ctx->commits.nr; > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > - while (g && (g->num_commits <= size_mult * num_commits || > > - (max_commits && num_commits > max_commits))) { > > - if (g->odb != ctx->odb) > > - break; > > + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > This line segfaults in the tests 'fetch.writeCommitGraph' and > 'fetch.writeCommitGraph with submodules' in 't5510-fetch.sh', because > 'git fetch' doesn't pass a 'split_opts' to the commit-graph functions. I noticed the same. This patch seems to fix it for me: -- snip -- diff --git a/commit-graph.c b/commit-graph.c index a5d7624073f..af5c58833cf 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1566,7 +1566,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) num_commits = ctx->commits.nr; ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; - if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { + if (ctx->split_opts && + ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { while (g && (g->num_commits <= size_mult * num_commits || (max_commits && num_commits > max_commits) || (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { -- snap -- For your convenience, I also pushed this up as `tb/commit-graph-split-merge` to https://github.com/dscho/git Thanks, Dscho > > Thread 1 "git" received signal SIGSEGV, Segmentation fault. > 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > at commit-graph.c:1568 > 1568 if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > (gdb) p *ctx > $1 = {r = 0x9ae2a0 <the_repo>, odb = 0x9c0df0, graph_name = 0x0, oids = { > list = 0xa02660, nr = 12, alloc = 1024}, commits = {list = 0x9caa00, > nr = 6, alloc = 6}, num_extra_edges = 0, approx_nr_objects = 0, > progress = 0x0, progress_done = 0, progress_cnt = 0, base_graph_name = 0x0, > num_commit_graphs_before = 0, num_commit_graphs_after = 1, > commit_graph_filenames_before = 0x0, commit_graph_filenames_after = 0x0, > commit_graph_hash_after = 0x0, new_num_commits_in_base = 0, > new_base_graph = 0x0, append = 0, report_progress = 1, split = 1, > check_oids = 0, split_opts = 0x0} > ^^^^^^^^^^^^^^^^ > (gdb) bt > #0 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > at commit-graph.c:1568 > #1 0x0000000000512446 in write_commit_graph (odb=0x9c0df0, pack_indexes=0x0, > commit_hex=0x7fffffffd550, > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > split_opts=0x0) at commit-graph.c:1891 > #2 0x000000000050fd86 in write_commit_graph_reachable (odb=0x9c0df0, > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > split_opts=0x0) at commit-graph.c:1174 > ^^^^^^^^^^^^^^ > #3 0x0000000000444ea4 in cmd_fetch (argc=0, argv=0x7fffffffd9b8, prefix=0x0) > at builtin/fetch.c:1873 > #4 0x0000000000406154 in run_builtin (p=0x969a88 <commands+840>, argc=2, > argv=0x7fffffffd9b0) at git.c:444 > #5 0x00000000004064a4 in handle_builtin (argc=2, argv=0x7fffffffd9b0) > at git.c:674 > #6 0x000000000040674c in run_argv (argcp=0x7fffffffd84c, argv=0x7fffffffd840) > at git.c:741 > #7 0x0000000000406bbd in cmd_main (argc=2, argv=0x7fffffffd9b0) at git.c:872 > #8 0x00000000004cfd4e in main (argc=5, argv=0x7fffffffd998) > at common-main.c:52 > > Note that this function split_graph_merge_strategy() does look at > various fields in 'ctx->split_opts' a bit earlier, but those accesses > are protected by a 'if (ctx->split_opts)' condition. > expire_commit_graphs() does the same. > > > > + while (g && (g->num_commits <= size_mult * num_commits || > > + (max_commits && num_commits > max_commits) || > > + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > + if (g->odb != ctx->odb) > > + break; > > > > - num_commits += g->num_commits; > > - g = g->base_graph; > > + num_commits += g->num_commits; > > + g = g->base_graph; > > > > - ctx->num_commit_graphs_after--; > > + ctx->num_commit_graphs_after--; > > + } > > } > > >
On Sat, Feb 01, 2020 at 10:25:36PM +0100, Johannes Schindelin wrote: > Hi, > > On Sat, 1 Feb 2020, SZEDER Gábor wrote: > > > On Thu, Jan 30, 2020 at 04:28:17PM -0800, Taylor Blau wrote: > > > diff --git a/commit-graph.c b/commit-graph.c > > > index 6d34829f57..02e6ad9d1f 100644 > > > --- a/commit-graph.c > > > +++ b/commit-graph.c > > > @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > > num_commits = ctx->commits.nr; > > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > > > - while (g && (g->num_commits <= size_mult * num_commits || > > > - (max_commits && num_commits > max_commits))) { > > > - if (g->odb != ctx->odb) > > > - break; > > > + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > > > This line segfaults in the tests 'fetch.writeCommitGraph' and > > 'fetch.writeCommitGraph with submodules' in 't5510-fetch.sh', because > > 'git fetch' doesn't pass a 'split_opts' to the commit-graph functions. > > I noticed the same. This patch seems to fix it for me: > > -- snip -- > diff --git a/commit-graph.c b/commit-graph.c > index a5d7624073f..af5c58833cf 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1566,7 +1566,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > num_commits = ctx->commits.nr; > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > - if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > + if (ctx->split_opts && > + ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > while (g && (g->num_commits <= size_mult * num_commits || > (max_commits && num_commits > max_commits) || > (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > -- snap -- Yeah, that's what I noted below, but I'm not sure that this is the right solution. Why doesn't cmd_fetch() call write_commit_graph_reachable() with a real 'split_opts' parameter in the first place? Wouldn't it be better if it did? > > Thread 1 "git" received signal SIGSEGV, Segmentation fault. > > 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > > at commit-graph.c:1568 > > 1568 if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > (gdb) p *ctx > > $1 = {r = 0x9ae2a0 <the_repo>, odb = 0x9c0df0, graph_name = 0x0, oids = { > > list = 0xa02660, nr = 12, alloc = 1024}, commits = {list = 0x9caa00, > > nr = 6, alloc = 6}, num_extra_edges = 0, approx_nr_objects = 0, > > progress = 0x0, progress_done = 0, progress_cnt = 0, base_graph_name = 0x0, > > num_commit_graphs_before = 0, num_commit_graphs_after = 1, > > commit_graph_filenames_before = 0x0, commit_graph_filenames_after = 0x0, > > commit_graph_hash_after = 0x0, new_num_commits_in_base = 0, > > new_base_graph = 0x0, append = 0, report_progress = 1, split = 1, > > check_oids = 0, split_opts = 0x0} > > ^^^^^^^^^^^^^^^^ > > (gdb) bt > > #0 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > > at commit-graph.c:1568 > > #1 0x0000000000512446 in write_commit_graph (odb=0x9c0df0, pack_indexes=0x0, > > commit_hex=0x7fffffffd550, > > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > > split_opts=0x0) at commit-graph.c:1891 > > #2 0x000000000050fd86 in write_commit_graph_reachable (odb=0x9c0df0, > > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > > split_opts=0x0) at commit-graph.c:1174 > > ^^^^^^^^^^^^^^ > > #3 0x0000000000444ea4 in cmd_fetch (argc=0, argv=0x7fffffffd9b8, prefix=0x0) > > at builtin/fetch.c:1873 > > #4 0x0000000000406154 in run_builtin (p=0x969a88 <commands+840>, argc=2, > > argv=0x7fffffffd9b0) at git.c:444 > > #5 0x00000000004064a4 in handle_builtin (argc=2, argv=0x7fffffffd9b0) > > at git.c:674 > > #6 0x000000000040674c in run_argv (argcp=0x7fffffffd84c, argv=0x7fffffffd840) > > at git.c:741 > > #7 0x0000000000406bbd in cmd_main (argc=2, argv=0x7fffffffd9b0) at git.c:872 > > #8 0x00000000004cfd4e in main (argc=5, argv=0x7fffffffd998) > > at common-main.c:52 > > > > Note that this function split_graph_merge_strategy() does look at > > various fields in 'ctx->split_opts' a bit earlier, but those accesses > > are protected by a 'if (ctx->split_opts)' condition. > > expire_commit_graphs() does the same. > > > > > > > + while (g && (g->num_commits <= size_mult * num_commits || > > > + (max_commits && num_commits > max_commits) || > > > + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > > + if (g->odb != ctx->odb) > > > + break; > > > > > > - num_commits += g->num_commits; > > > - g = g->base_graph; > > > + num_commits += g->num_commits; > > > + g = g->base_graph; > > > > > > - ctx->num_commit_graphs_after--; > > > + ctx->num_commit_graphs_after--; > > > + } > > > } > > > > >
On Mon, Feb 03, 2020 at 11:47:30AM +0100, SZEDER Gábor wrote: > > -- snip -- > > diff --git a/commit-graph.c b/commit-graph.c > > index a5d7624073f..af5c58833cf 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -1566,7 +1566,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > num_commits = ctx->commits.nr; > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > - if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > + if (ctx->split_opts && > > + ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > while (g && (g->num_commits <= size_mult * num_commits || > > (max_commits && num_commits > max_commits) || > > (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > -- snap -- > > Yeah, that's what I noted below, but I'm not sure that this is the > right solution. Why doesn't cmd_fetch() call > write_commit_graph_reachable() with a real 'split_opts' parameter in > the first place? Wouldn't it be better if it did? It used to provide a "blank" split_opts until 63020f175f (commit-graph: prefer default size_mult when given zero, 2020-01-02), which caused a bug. That bug was since fixed, but the idea was to keep things convenient for callers. Whether that's a good idea or not I guess is up for debate, but it certainly is what the commit-graph code has tried to provide for some time. If we're not going to follow that in this new code, then we should presumably also rip out all of the other "if (ctx->split_opts)" lines. -Peff
On Fri, Jan 31, 2020 at 09:19:19AM -0500, Derrick Stolee wrote: > On 1/30/2020 7:28 PM, Taylor Blau wrote: > > With '--split', the commit-graph machinery writes new commits in another > > incremental commit-graph which is part of the existing chain, and > > optionally decides to condense the chain into a single commit-graph. > > This is done to ensure that the aysmptotic behavior of looking up a > > commit in an incremental chain is dominated by the number of > > incrementals in that chain. It can be controlled by the '--max-commits' > > and '--size-multiple' options. > > > > On occasion, callers may want to ensure that 'git commit-graph write > > --split' always writes an incremental, and never spends effort > > condensing the incremental chain [1]. Previously, this was possible by > > passing '--size-multiple=0', but this no longer the case following > > 63020f175f (commit-graph: prefer default size_mult when given zero, > > 2020-01-02). > > > > Reintroduce a less-magical variant of the above with a new pair of > > arguments to '--split': '--split=no-merge' and '--split=merge-all'. When > > '--split=no-merge' is given, the commit-graph machinery will never > > condense an existing chain and will always write a new incremental. > > Conversely, if '--split=merge-all' is given, any invocation including it > > will always condense a chain if one exists. If '--split' is given with > > no arguments, it behaves as before and defers to '--size-multiple', and > > so on. > > > > [1]: This might occur when, for example, a server administrator running > > some program after each push may want to ensure that each job runs > > proportional in time to the size of the push, and does not "jump" when > > the commit-graph machinery decides to trigger a merge. > > > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > --- > > Documentation/git-commit-graph.txt | 18 +++++++++++----- > > builtin/commit-graph.c | 33 ++++++++++++++++++++++++++---- > > commit-graph.c | 19 +++++++++-------- > > commit-graph.h | 7 +++++++ > > t/t5324-split-commit-graph.sh | 25 ++++++++++++++++++++++ > > 5 files changed, 85 insertions(+), 17 deletions(-) > > > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > > index 28d1fee505..8d61ba9f56 100644 > > --- a/Documentation/git-commit-graph.txt > > +++ b/Documentation/git-commit-graph.txt > > @@ -57,11 +57,19 @@ or `--stdin-packs`.) > > With the `--append` option, include all commits that are present in the > > existing commit-graph file. > > + > > -With the `--split` option, write the commit-graph as a chain of multiple > > -commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > > -not already in the commit-graph are added in a new "tip" file. This file > > -is merged with the existing file if the following merge conditions are > > -met: > > +With the `--split[=<strategy>]` option, write the commit-graph as a > > +chain of multiple commit-graph files stored in > > +`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the > > +strategy and other splitting options. The new commits not already in the > > +commit-graph are added in a new "tip" file. This file is merged with the > > +existing file if the following merge conditions are met: > > +* If `--split=merge-always` is specified, then a merge is always > > +conducted, and the remaining options are ignored. Conversely, if > > +`--split=no-merge` is specified, a merge is never performed, and the > > +remaining options are ignored. A bare `--split` defers to the remaining > > +options. (Note that merging a chain of commit graphs replaces the > > +existing chain with a length-1 chain where the first and only > > +incremental holds the entire graph). > > + > > * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new > > tip file would have `N` commits and the previous tip has `M` commits and > > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > > index de321c71ad..f03b46d627 100644 > > --- a/builtin/commit-graph.c > > +++ b/builtin/commit-graph.c > > @@ -9,7 +9,9 @@ > > > > static char const * const builtin_commit_graph_usage[] = { > > N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), > > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > > + N_("git commit-graph write [--object-dir <objdir>] [--append] " > > + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " > > + "[--[no-]progress] <split options>"), > > NULL > > }; > > > > @@ -19,7 +21,9 @@ static const char * const builtin_commit_graph_verify_usage[] = { > > }; > > > > static const char * const builtin_commit_graph_write_usage[] = { > > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > > + N_("git commit-graph write [--object-dir <objdir>] [--append] " > > + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " > > + "[--[no-]progress] <split options>"), > > NULL > > }; > > > > @@ -101,6 +105,25 @@ static int graph_verify(int argc, const char **argv) > > extern int read_replace_refs; > > static struct split_commit_graph_opts split_opts; > > > > +static int write_option_parse_split(const struct option *opt, const char *arg, > > + int unset) > > +{ > > + enum commit_graph_split_flags *flags = opt->value; > > + > > + opts.split = 1; > > + if (!arg) > > + return 0; > > This allows `--split` to continue working as-is. But should we also > set "*flags = COMMIT_GRAPH_SPLIT_UNSPECIFIED" here? This allows one > to run "git commit-graph write --split=no-merge --split" (which could > happen if "--split=no-merge" is inside an alias). Yeah, this is an oversight on my part. I think that we should set the split option to 'COMMIT_GRAPH_SPLIT_UNSPECIFIED' when '--split' is given, for exactly the reason you outlined above. Thanks for the suggestion! > > +test_expect_success '--split=merge-all always merges incrementals' ' > > + test_when_finished rm -rf a b c && > > + rm -rf $graphdir $infodir/commit-graph && > > + git reset --hard commits/10 && > > + git rev-list -3 HEAD~4 >a && > > + git rev-list -2 HEAD~2 >b && > > + git rev-list -2 HEAD >c && > > + git commit-graph write --split=no-merge --stdin-commits <a && > > + git commit-graph write --split=no-merge --stdin-commits <b && > > + test_line_count = 2 $graphdir/commit-graph-chain && > > + git commit-graph write --split=merge-all --stdin-commits <c && > > + test_line_count = 1 $graphdir/commit-graph-chain > > +' > > + > > +test_expect_success '--split=no-merge always writes an incremental' ' > > + test_when_finished rm -rf a b && > > + rm -rf $graphdir && > > + git reset --hard commits/2 && > > + git rev-list HEAD~1 >a && > > + git rev-list HEAD >b && > > + git commit-graph write --split --stdin-commits <a && > > + git commit-graph write --split=no-merge --stdin-commits <b && > > + test_line_count = 2 $graphdir/commit-graph-chain > > +' > > + > > test_done > > Good tests! Thanks :-). > Thanks, > -Stolee Thanks, Taylor
On Mon, Feb 03, 2020 at 06:11:17AM -0500, Jeff King wrote: > On Mon, Feb 03, 2020 at 11:47:30AM +0100, SZEDER Gábor wrote: > > > > -- snip -- > > > diff --git a/commit-graph.c b/commit-graph.c > > > index a5d7624073f..af5c58833cf 100644 > > > --- a/commit-graph.c > > > +++ b/commit-graph.c > > > @@ -1566,7 +1566,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > > num_commits = ctx->commits.nr; > > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > > > - if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > > + if (ctx->split_opts && > > > + ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > > while (g && (g->num_commits <= size_mult * num_commits || > > > (max_commits && num_commits > max_commits) || > > > (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > > -- snap -- > > > > Yeah, that's what I noted below, but I'm not sure that this is the > > right solution. Why doesn't cmd_fetch() call > > write_commit_graph_reachable() with a real 'split_opts' parameter in > > the first place? Wouldn't it be better if it did? > > It used to provide a "blank" split_opts until 63020f175f (commit-graph: > prefer default size_mult when given zero, 2020-01-02), which caused a > bug. That bug was since fixed, but the idea was to keep things > convenient for callers. > > Whether that's a good idea or not I guess is up for debate, but it > certainly is what the commit-graph code has tried to provide for some > time. If we're not going to follow that in this new code, then we should > presumably also rip out all of the other "if (ctx->split_opts)" lines. I think that this seems like a good step that we should probably take, but I don't think that it's necessary for the series at hand. The pattern in this function is to define a local variable which has the same value as in split_opts, or a reasonable default if split_opts is NULL (c.f., 'max_commits' and 'size_mult'). So, I think that a safe thing to do which prevents the segv and doesn't change the pattern too much is to write: enum commit_graph_split_flags flags = COMMIT_GRAPH_SPLIT_MERGE_AUTO; if (ctx->split_opts) { /* ... */ flags = ctx->split_opts->flags; } /* ... */ if (flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { while ( ... ) { ... } } This is adding another local variable, which seems like an odd thing to do *every* time that we add another member to split_opts. So for that reason it seems like in the longer-term we should either force the caller to pass in a blank, or do something else that doesn't require this, but I think that the intermediate cost isn't too bad. > -Peff Thanks, Taylor
On Sat, Feb 01, 2020 at 12:34:34AM +0100, SZEDER Gábor wrote: > On Thu, Jan 30, 2020 at 04:28:17PM -0800, Taylor Blau wrote: > > diff --git a/commit-graph.c b/commit-graph.c > > index 6d34829f57..02e6ad9d1f 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > num_commits = ctx->commits.nr; > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > - while (g && (g->num_commits <= size_mult * num_commits || > > - (max_commits && num_commits > max_commits))) { > > - if (g->odb != ctx->odb) > > - break; > > + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > This line segfaults in the tests 'fetch.writeCommitGraph' and > 'fetch.writeCommitGraph with submodules' in 't5510-fetch.sh', because > 'git fetch' doesn't pass a 'split_opts' to the commit-graph functions. Thanks for pointing it out. I responded in more detail lower in the thread with the fix, but I appreciate your testing each patch. Clearly, I just should have run the full suite myself before sending! > Thread 1 "git" received signal SIGSEGV, Segmentation fault. > 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > at commit-graph.c:1568 > 1568 if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > (gdb) p *ctx > $1 = {r = 0x9ae2a0 <the_repo>, odb = 0x9c0df0, graph_name = 0x0, oids = { > list = 0xa02660, nr = 12, alloc = 1024}, commits = {list = 0x9caa00, > nr = 6, alloc = 6}, num_extra_edges = 0, approx_nr_objects = 0, > progress = 0x0, progress_done = 0, progress_cnt = 0, base_graph_name = 0x0, > num_commit_graphs_before = 0, num_commit_graphs_after = 1, > commit_graph_filenames_before = 0x0, commit_graph_filenames_after = 0x0, > commit_graph_hash_after = 0x0, new_num_commits_in_base = 0, > new_base_graph = 0x0, append = 0, report_progress = 1, split = 1, > check_oids = 0, split_opts = 0x0} > ^^^^^^^^^^^^^^^^ > (gdb) bt > #0 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > at commit-graph.c:1568 > #1 0x0000000000512446 in write_commit_graph (odb=0x9c0df0, pack_indexes=0x0, > commit_hex=0x7fffffffd550, > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > split_opts=0x0) at commit-graph.c:1891 > #2 0x000000000050fd86 in write_commit_graph_reachable (odb=0x9c0df0, > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > split_opts=0x0) at commit-graph.c:1174 > ^^^^^^^^^^^^^^ > #3 0x0000000000444ea4 in cmd_fetch (argc=0, argv=0x7fffffffd9b8, prefix=0x0) > at builtin/fetch.c:1873 > #4 0x0000000000406154 in run_builtin (p=0x969a88 <commands+840>, argc=2, > argv=0x7fffffffd9b0) at git.c:444 > #5 0x00000000004064a4 in handle_builtin (argc=2, argv=0x7fffffffd9b0) > at git.c:674 > #6 0x000000000040674c in run_argv (argcp=0x7fffffffd84c, argv=0x7fffffffd840) > at git.c:741 > #7 0x0000000000406bbd in cmd_main (argc=2, argv=0x7fffffffd9b0) at git.c:872 > #8 0x00000000004cfd4e in main (argc=5, argv=0x7fffffffd998) > at common-main.c:52 > > Note that this function split_graph_merge_strategy() does look at > various fields in 'ctx->split_opts' a bit earlier, but those accesses > are protected by a 'if (ctx->split_opts)' condition. > expire_commit_graphs() does the same. > > > > + while (g && (g->num_commits <= size_mult * num_commits || > > + (max_commits && num_commits > max_commits) || > > + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > + if (g->odb != ctx->odb) > > + break; > > > > - num_commits += g->num_commits; > > - g = g->base_graph; > > + num_commits += g->num_commits; > > + g = g->base_graph; > > > > - ctx->num_commit_graphs_after--; > > + ctx->num_commit_graphs_after--; > > + } > > } > > Thanks, Taylor
On Sat, Feb 01, 2020 at 10:25:36PM +0100, Johannes Schindelin wrote: > Hi, > > On Sat, 1 Feb 2020, SZEDER Gábor wrote: > > > On Thu, Jan 30, 2020 at 04:28:17PM -0800, Taylor Blau wrote: > > > diff --git a/commit-graph.c b/commit-graph.c > > > index 6d34829f57..02e6ad9d1f 100644 > > > --- a/commit-graph.c > > > +++ b/commit-graph.c > > > @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > > > num_commits = ctx->commits.nr; > > > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > > > > > - while (g && (g->num_commits <= size_mult * num_commits || > > > - (max_commits && num_commits > max_commits))) { > > > - if (g->odb != ctx->odb) > > > - break; > > > + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > > > This line segfaults in the tests 'fetch.writeCommitGraph' and > > 'fetch.writeCommitGraph with submodules' in 't5510-fetch.sh', because > > 'git fetch' doesn't pass a 'split_opts' to the commit-graph functions. > > I noticed the same. This patch seems to fix it for me: > > -- snip -- > diff --git a/commit-graph.c b/commit-graph.c > index a5d7624073f..af5c58833cf 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1566,7 +1566,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) > num_commits = ctx->commits.nr; > ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; > > - if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > + if (ctx->split_opts && > + ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > while (g && (g->num_commits <= size_mult * num_commits || > (max_commits && num_commits > max_commits) || > (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > -- snap -- > > For your convenience, I also pushed this up as > `tb/commit-graph-split-merge` to https://github.com/dscho/git Thanks, Dscho. I also published it under the same name on my fork at 'https://github.com/ttaylorr/git'. > Thanks, > Dscho > > > > > > Thread 1 "git" received signal SIGSEGV, Segmentation fault. > > 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > > at commit-graph.c:1568 > > 1568 if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > > (gdb) p *ctx > > $1 = {r = 0x9ae2a0 <the_repo>, odb = 0x9c0df0, graph_name = 0x0, oids = { > > list = 0xa02660, nr = 12, alloc = 1024}, commits = {list = 0x9caa00, > > nr = 6, alloc = 6}, num_extra_edges = 0, approx_nr_objects = 0, > > progress = 0x0, progress_done = 0, progress_cnt = 0, base_graph_name = 0x0, > > num_commit_graphs_before = 0, num_commit_graphs_after = 1, > > commit_graph_filenames_before = 0x0, commit_graph_filenames_after = 0x0, > > commit_graph_hash_after = 0x0, new_num_commits_in_base = 0, > > new_base_graph = 0x0, append = 0, report_progress = 1, split = 1, > > check_oids = 0, split_opts = 0x0} > > ^^^^^^^^^^^^^^^^ > > (gdb) bt > > #0 0x00000000005113dd in split_graph_merge_strategy (ctx=0x9ca930) > > at commit-graph.c:1568 > > #1 0x0000000000512446 in write_commit_graph (odb=0x9c0df0, pack_indexes=0x0, > > commit_hex=0x7fffffffd550, > > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > > split_opts=0x0) at commit-graph.c:1891 > > #2 0x000000000050fd86 in write_commit_graph_reachable (odb=0x9c0df0, > > flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_SPLIT), > > split_opts=0x0) at commit-graph.c:1174 > > ^^^^^^^^^^^^^^ > > #3 0x0000000000444ea4 in cmd_fetch (argc=0, argv=0x7fffffffd9b8, prefix=0x0) > > at builtin/fetch.c:1873 > > #4 0x0000000000406154 in run_builtin (p=0x969a88 <commands+840>, argc=2, > > argv=0x7fffffffd9b0) at git.c:444 > > #5 0x00000000004064a4 in handle_builtin (argc=2, argv=0x7fffffffd9b0) > > at git.c:674 > > #6 0x000000000040674c in run_argv (argcp=0x7fffffffd84c, argv=0x7fffffffd840) > > at git.c:741 > > #7 0x0000000000406bbd in cmd_main (argc=2, argv=0x7fffffffd9b0) at git.c:872 > > #8 0x00000000004cfd4e in main (argc=5, argv=0x7fffffffd998) > > at common-main.c:52 > > > > Note that this function split_graph_merge_strategy() does look at > > various fields in 'ctx->split_opts' a bit earlier, but those accesses > > are protected by a 'if (ctx->split_opts)' condition. > > expire_commit_graphs() does the same. > > > > > > > + while (g && (g->num_commits <= size_mult * num_commits || > > > + (max_commits && num_commits > max_commits) || > > > + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { > > > + if (g->odb != ctx->odb) > > > + break; > > > > > > - num_commits += g->num_commits; > > > - g = g->base_graph; > > > + num_commits += g->num_commits; > > > + g = g->base_graph; > > > > > > - ctx->num_commit_graphs_after--; > > > + ctx->num_commit_graphs_after--; > > > + } > > > } > > > > > Thanks, Taylor
On Fri, Jan 31, 2020 at 08:27:27PM +0100, Martin Ågren wrote: > On Fri, 31 Jan 2020 at 01:29, Taylor Blau <me@ttaylorr.com> wrote: > > With '--split', the commit-graph machinery writes new commits in another > > incremental commit-graph which is part of the existing chain, and > > optionally decides to condense the chain into a single commit-graph. > > This is done to ensure that the aysmptotic behavior of looking up a > > asymptotic Thanks, fixed. > > commit in an incremental chain is dominated by the number of > > incrementals in that chain. It can be controlled by the '--max-commits' > > and '--size-multiple' options. > > > > On occasion, callers may want to ensure that 'git commit-graph write > > --split' always writes an incremental, and never spends effort > > condensing the incremental chain [1]. Previously, this was possible by > > passing '--size-multiple=0', but this no longer the case following > > 63020f175f (commit-graph: prefer default size_mult when given zero, > > 2020-01-02). > > > > Reintroduce a less-magical variant of the above with a new pair of > > arguments to '--split': '--split=no-merge' and '--split=merge-all'. When > > '--split=no-merge' is given, the commit-graph machinery will never > > condense an existing chain and will always write a new incremental. > > Conversely, if '--split=merge-all' is given, any invocation including it > > will always condense a chain if one exists. If '--split' is given with > > no arguments, it behaves as before and defers to '--size-multiple', and > > so on. > > I understand your motivation for doing this -- it all seems quite sound > to me. Not being too familiar with this commit-graph splitting and > merging, I had a hard time groking this terminology though. From what I > understand, before this patch, `--split` means "write the commit-graph > using the 'split' file-format / in a split fashion". Ok, that makes > sense. > > >From seeing `--split=no-merge`, I have no idea how to even parse that. > Of course I don't want to merge, I want to split! Well not split, but > write split files. > > I think it would help me (and others like me) if we could somehow > separate "I want to use 'split' files" from "and here's how I want you > to decide on the merging". That is, which "strategy" to use. Obviously, > talking about a "merge strategy" would be stupid and "split strategy" > also seems a bit odd. "Coalescing strategy"? "Joining strategy"? > > Or can you convince me otherwise? From which angle should I look at > this? Heh. This is all very reminiscent of an off-list discussion that I had with Peff and Stolee before sending this upstream. Originally, I had implemented this as: $ git commit-graph write --split --[no-]merge but we decided that this '--merge' and '--no-merge' requiring '--split' seemed to indicate that this was better off as an argument to '--split'. Of course, there's no getting around that it is... odd to say '--split=no-merge' for exactly the reason you suggest. Here's another way of looking at it: the presence of '--split' means "work with split graph files" and the '[no-]merge' argument means: "always/never condense multiple layers". For me, this not only makes the new option language jive, but makes it clearer to me than the combination of '--split', '--split --no-merge' and '--split --merge', where the third one is truly bizarre. At least condensing the second '--' and making 'merge' an argument to 'split' makes it clear that the two work together somehow. If you have a different suggestion, I'd certainly love to hear about it and discuss. But, at least as far as our internal discussions have gone, this is by far the best option that we have been able to come up with. > > -With the `--split` option, write the commit-graph as a chain of multiple > > -commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > > -not already in the commit-graph are added in a new "tip" file. This file > > -is merged with the existing file if the following merge conditions are > > -met: > > +With the `--split[=<strategy>]` option, write the commit-graph as a > > +chain of multiple commit-graph files stored in > > +`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the > > +strategy and other splitting options. The new commits not already in the > > +commit-graph are added in a new "tip" file. This file is merged with the > > +existing file if the following merge conditions are met: > > +* If `--split=merge-always` is specified, then a merge is always > > +conducted, and the remaining options are ignored. Conversely, if > > +`--split=no-merge` is specified, a merge is never performed, and the > > +remaining options are ignored. A bare `--split` defers to the remaining > > +options. (Note that merging a chain of commit graphs replaces the > > +existing chain with a length-1 chain where the first and only > > +incremental holds the entire graph). > > To better understand the background for this patch, I read the manpage > as it stands today. From the section on `--split`, I got this > impression: Let's say that `--max-commits` is huge, so all that matters > is the `--size-multiple`. Let's say it's two. If the current tip > contains three commits and we're about to write one with two, then 2*2 > > 3 so we will merge, i.e., write a tip file with five commits. Unless of > course *that* is more than half the size of the file before. And so on. > We might just merge once, or maybe "many" files in an avalanche effect. > Every now and then, such avalanches will go all the way back to the > first file. > > Now this says something different, namely that once we decide to merge, > we do it all the way back, no matter what. > > The commit message of 1771be90c8 ("commit-graph: merge commit-graph > chains", 2019-06-18) seems to support my original understanding, at > least for `--size-multiple`, but not `--max-commits`, curiously enough. > > Can you clarify? 1771be90c8 is right, and this documentation is wrong. Upon re-reading it, I found the contents of this documentation between those two parenthesis to be confusing rather than helpful. For that reason, I simply removed it. > > - OPT_BOOL(0, "split", &opts.split, > > - N_("allow writing an incremental commit-graph file")), > > + OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, > > + N_("allow writing an incremental commit-graph file"), > > This still sounds very boolean. Cramming in the "strategy" might be hard > -- is this an argument in favor of having two separate options? ;-) Heh. Exactly how we started these patches when I originally wrote them... > > +enum commit_graph_split_flags { > > + COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0, > > + COMMIT_GRAPH_SPLIT_MERGE_REQUIRED = 1, > > + COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 2 > > +}; > > I wonder if this should be "MERGE_AUTO" rather than "UNSPECIFIED". This > is related to Stolee's comment, I think. I think you're right. I changed it in my local v2. > > Martin Thanks, Taylor
On Mon, Feb 03, 2020 at 07:58:21PM -0800, Taylor Blau wrote: > I think that this seems like a good step that we should probably take, > but I don't think that it's necessary for the series at hand. The > pattern in this function is to define a local variable which has the > same value as in split_opts, or a reasonable default if split_opts is > NULL (c.f., 'max_commits' and 'size_mult'). > > So, I think that a safe thing to do which prevents the segv and doesn't > change the pattern too much is to write: > > enum commit_graph_split_flags flags = COMMIT_GRAPH_SPLIT_MERGE_AUTO; > if (ctx->split_opts) { > /* ... */ > flags = ctx->split_opts->flags; > } > > /* ... */ > > if (flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { > while ( ... ) { ... } > } > > This is adding another local variable, which seems like an odd thing to > do *every* time that we add another member to split_opts. So for that > reason it seems like in the longer-term we should either force the > caller to pass in a blank, or do something else that doesn't require > this, but I think that the intermediate cost isn't too bad. It would perhaps be simpler to turn NULL into a _single_ default-filled split_opts variable, and then just use it throughout the function. And since presumably zero-initialization would yield good defaults (or we'd define an INIT macro for the convenience of callers), it would be a one-liner that we'd only have to do once. But I think that can wait; the "if" solution discussed seems like a straightforward way to make this patch correct both on top of master, and when merged with next. -Peff
On Tue, 4 Feb 2020 at 05:06, Taylor Blau <me@ttaylorr.com> wrote: > > > Or can you convince me otherwise? From which angle should I look at > > this? > > Heh. This is all very reminiscent of an off-list discussion that I had > with Peff and Stolee before sending this upstream. Originally, I had > implemented this as: > > $ git commit-graph write --split --[no-]merge > > but we decided that this '--merge' and '--no-merge' requiring '--split' > seemed to indicate that this was better off as an argument to '--split'. > Of course, there's no getting around that it is... odd to say > '--split=no-merge' for exactly the reason you suggest. > > Here's another way of looking at it: the presence of '--split' means > "work with split graph files" and the '[no-]merge' argument means: > "always/never condense multiple layers". > > For me, this not only makes the new option language jive, but makes it > clearer to me than the combination of '--split', '--split --no-merge' > and '--split --merge', where the third one is truly bizarre. At least > condensing the second '--' and making 'merge' an argument to 'split' > makes it clear that the two work together somehow. Yes, "--split --merge" sounds no better. :-) > If you have a different suggestion, I'd certainly love to hear about it > and discuss. But, at least as far as our internal discussions have gone, > this is by far the best option that we have been able to come up with. I can't come up with anything better, so please feel free to carry on (as you already have). > > > - OPT_BOOL(0, "split", &opts.split, > > > - N_("allow writing an incremental commit-graph file")), > > > + OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, > > > + N_("allow writing an incremental commit-graph file"), > > > > This still sounds very boolean. Cramming in the "strategy" might be hard > > -- is this an argument in favor of having two separate options? ;-) > > Heh. Exactly how we started these patches when I originally wrote > them... You left this as-is in v2. I don't have any immediate improvements to offer. I could see shortening the original to "use the 'split' file format", in which case maybe one could allude to a strategy here. (I don't think "allow" is really needed, right? Maybe it tries to cover for the situation where there's no commit graph yet, so you might say we wouldn't write an "incremental" one, but the format would still be the same, AFAIU. Anyway, that's outside the scope of this patch.) Martin
On Thu, Feb 06, 2020 at 08:15:03PM +0100, Martin Ågren wrote: > On Tue, 4 Feb 2020 at 05:06, Taylor Blau <me@ttaylorr.com> wrote: > > > > > Or can you convince me otherwise? From which angle should I look at > > > this? > > > > Heh. This is all very reminiscent of an off-list discussion that I had > > with Peff and Stolee before sending this upstream. Originally, I had > > implemented this as: > > > > $ git commit-graph write --split --[no-]merge > > > > but we decided that this '--merge' and '--no-merge' requiring '--split' > > seemed to indicate that this was better off as an argument to '--split'. > > Of course, there's no getting around that it is... odd to say > > '--split=no-merge' for exactly the reason you suggest. > > > > Here's another way of looking at it: the presence of '--split' means > > "work with split graph files" and the '[no-]merge' argument means: > > "always/never condense multiple layers". > > > > For me, this not only makes the new option language jive, but makes it > > clearer to me than the combination of '--split', '--split --no-merge' > > and '--split --merge', where the third one is truly bizarre. At least > > condensing the second '--' and making 'merge' an argument to 'split' > > makes it clear that the two work together somehow. > > Yes, "--split --merge" sounds no better. :-) :-). > > If you have a different suggestion, I'd certainly love to hear about it > > and discuss. But, at least as far as our internal discussions have gone, > > this is by far the best option that we have been able to come up with. > > I can't come up with anything better, so please feel free to carry on > (as you already have). Sounds good. It looks like you might have had some further thoughts a little bit lower down in the thread, so I'll respond to those shortly just to make sure that I didn't miss anything before readying a 'v3' for submission. > > > > > - OPT_BOOL(0, "split", &opts.split, > > > > - N_("allow writing an incremental commit-graph file")), > > > > + OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, > > > > + N_("allow writing an incremental commit-graph file"), > > > > > > This still sounds very boolean. Cramming in the "strategy" might be hard > > > -- is this an argument in favor of having two separate options? ;-) > > > > Heh. Exactly how we started these patches when I originally wrote > > them... > > You left this as-is in v2. I don't have any immediate improvements to > offer. I could see shortening the original to "use the 'split' file > format", in which case maybe one could allude to a strategy here. (I > don't think "allow" is really needed, right? Maybe it tries to cover for > the situation where there's no commit graph yet, so you might say we > wouldn't write an "incremental" one, but the format would still be the > same, AFAIU. Anyway, that's outside the scope of this patch.) Yeah, I agree that the use of "allow" is a little funny for these reasons. That said, I don't think that it's in dire need of changing, and so since we agree that it's outside the scope of this series, I'm happy to ignore it for now. > Martin Thanks, Taylor
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 28d1fee505..8d61ba9f56 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -57,11 +57,19 @@ or `--stdin-packs`.) With the `--append` option, include all commits that are present in the existing commit-graph file. + -With the `--split` option, write the commit-graph as a chain of multiple -commit-graph files stored in `<dir>/info/commit-graphs`. The new commits -not already in the commit-graph are added in a new "tip" file. This file -is merged with the existing file if the following merge conditions are -met: +With the `--split[=<strategy>]` option, write the commit-graph as a +chain of multiple commit-graph files stored in +`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the +strategy and other splitting options. The new commits not already in the +commit-graph are added in a new "tip" file. This file is merged with the +existing file if the following merge conditions are met: +* If `--split=merge-always` is specified, then a merge is always +conducted, and the remaining options are ignored. Conversely, if +`--split=no-merge` is specified, a merge is never performed, and the +remaining options are ignored. A bare `--split` defers to the remaining +options. (Note that merging a chain of commit graphs replaces the +existing chain with a length-1 chain where the first and only +incremental holds the entire graph). + * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new tip file would have `N` commits and the previous tip has `M` commits and diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index de321c71ad..f03b46d627 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -9,7 +9,9 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), + N_("git commit-graph write [--object-dir <objdir>] [--append] " + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " + "[--[no-]progress] <split options>"), NULL }; @@ -19,7 +21,9 @@ static const char * const builtin_commit_graph_verify_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), + N_("git commit-graph write [--object-dir <objdir>] [--append] " + "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " + "[--[no-]progress] <split options>"), NULL }; @@ -101,6 +105,25 @@ static int graph_verify(int argc, const char **argv) extern int read_replace_refs; static struct split_commit_graph_opts split_opts; +static int write_option_parse_split(const struct option *opt, const char *arg, + int unset) +{ + enum commit_graph_split_flags *flags = opt->value; + + opts.split = 1; + if (!arg) + return 0; + + if (!strcmp(arg, "merge-all")) + *flags = COMMIT_GRAPH_SPLIT_MERGE_REQUIRED; + else if (!strcmp(arg, "no-merge")) + *flags = COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED; + else + die(_("unrecognized --split argument, %s"), arg); + + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list *pack_indexes = NULL; @@ -123,8 +146,10 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "append", &opts.append, N_("include all commits already in the commit-graph file")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_BOOL(0, "split", &opts.split, - N_("allow writing an incremental commit-graph file")), + OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + N_("allow writing an incremental commit-graph file"), + PARSE_OPT_OPTARG | PARSE_OPT_NONEG, + write_option_parse_split), OPT_INTEGER(0, "max-commits", &split_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, diff --git a/commit-graph.c b/commit-graph.c index 6d34829f57..02e6ad9d1f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1565,15 +1565,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) num_commits = ctx->commits.nr; ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; - while (g && (g->num_commits <= size_mult * num_commits || - (max_commits && num_commits > max_commits))) { - if (g->odb != ctx->odb) - break; + if (ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) { + while (g && (g->num_commits <= size_mult * num_commits || + (max_commits && num_commits > max_commits) || + (ctx->split_opts->flags == COMMIT_GRAPH_SPLIT_MERGE_REQUIRED))) { + if (g->odb != ctx->odb) + break; - num_commits += g->num_commits; - g = g->base_graph; + num_commits += g->num_commits; + g = g->base_graph; - ctx->num_commit_graphs_after--; + ctx->num_commit_graphs_after--; + } } ctx->new_base_graph = g; @@ -1881,7 +1884,7 @@ int write_commit_graph(struct object_directory *odb, goto cleanup; } - if (!ctx->commits.nr) + if (!ctx->commits.nr && (!ctx->split || ctx->split_opts->flags != COMMIT_GRAPH_SPLIT_MERGE_REQUIRED)) goto cleanup; if (ctx->split) { diff --git a/commit-graph.h b/commit-graph.h index 7d9fc9c16a..dadcc03808 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -84,10 +84,17 @@ enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3) }; +enum commit_graph_split_flags { + COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0, + COMMIT_GRAPH_SPLIT_MERGE_REQUIRED = 1, + COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 2 +}; + struct split_commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; + enum commit_graph_split_flags flags; }; /* diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index c24823431f..a165b48afe 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -344,4 +344,29 @@ test_expect_success 'split across alternate where alternate is not split' ' test_cmp commit-graph .git/objects/info/commit-graph ' +test_expect_success '--split=merge-all always merges incrementals' ' + test_when_finished rm -rf a b c && + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/10 && + git rev-list -3 HEAD~4 >a && + git rev-list -2 HEAD~2 >b && + git rev-list -2 HEAD >c && + git commit-graph write --split=no-merge --stdin-commits <a && + git commit-graph write --split=no-merge --stdin-commits <b && + test_line_count = 2 $graphdir/commit-graph-chain && + git commit-graph write --split=merge-all --stdin-commits <c && + test_line_count = 1 $graphdir/commit-graph-chain +' + +test_expect_success '--split=no-merge always writes an incremental' ' + test_when_finished rm -rf a b && + rm -rf $graphdir && + git reset --hard commits/2 && + git rev-list HEAD~1 >a && + git rev-list HEAD >b && + git commit-graph write --split --stdin-commits <a && + git commit-graph write --split=no-merge --stdin-commits <b && + test_line_count = 2 $graphdir/commit-graph-chain +' + test_done
With '--split', the commit-graph machinery writes new commits in another incremental commit-graph which is part of the existing chain, and optionally decides to condense the chain into a single commit-graph. This is done to ensure that the aysmptotic behavior of looking up a commit in an incremental chain is dominated by the number of incrementals in that chain. It can be controlled by the '--max-commits' and '--size-multiple' options. On occasion, callers may want to ensure that 'git commit-graph write --split' always writes an incremental, and never spends effort condensing the incremental chain [1]. Previously, this was possible by passing '--size-multiple=0', but this no longer the case following 63020f175f (commit-graph: prefer default size_mult when given zero, 2020-01-02). Reintroduce a less-magical variant of the above with a new pair of arguments to '--split': '--split=no-merge' and '--split=merge-all'. When '--split=no-merge' is given, the commit-graph machinery will never condense an existing chain and will always write a new incremental. Conversely, if '--split=merge-all' is given, any invocation including it will always condense a chain if one exists. If '--split' is given with no arguments, it behaves as before and defers to '--size-multiple', and so on. [1]: This might occur when, for example, a server administrator running some program after each push may want to ensure that each job runs proportional in time to the size of the push, and does not "jump" when the commit-graph machinery decides to trigger a merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/git-commit-graph.txt | 18 +++++++++++----- builtin/commit-graph.c | 33 ++++++++++++++++++++++++++---- commit-graph.c | 19 +++++++++-------- commit-graph.h | 7 +++++++ t/t5324-split-commit-graph.sh | 25 ++++++++++++++++++++++ 5 files changed, 85 insertions(+), 17 deletions(-)