Message ID | 4ff11cec37d17d788a3ee076b7c3de1c873a5fbd.1599664389.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | more miscellaneous Bloom filter improvements, redux | expand |
On Wed, Sep 09, 2020 at 11:24:00AM -0400, Taylor Blau wrote: > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > index 17405c73a9..81a2e65903 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt > @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume > that this option was intended. Use `--no-changed-paths` to stop storing this > data. > + > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is > +enforced. Commits whose filters are not calculated are stored as a > +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. > +Overrides the `commitGraph.maxNewFilters` configuration. The BFXL chunk doesn't exist anymore in this iteration, right? I wondered about having a user-facing "-1" here. My gut feeling is that we usually use "0" to mean "no limit" in other places, and it probably make sense to be consistent. It does look like we use both, though, and I'm having trouble formulating a grep pattern to find examples that doesn't produce a lot of noise. These are "0 is no limit": pack.windowMemory pack.deltaCacheSize git-daemon --max-connections These are "-1 is no limit": git-grep --max-depth rev-list --max-parents (I think?) So I dunno. It's a pretty minor thing, but I think it's good to aim for consistency, and since this is user-facing we won't be able to change it later. -Peff
On Fri, Sep 11, 2020 at 01:52:16PM -0400, Jeff King wrote: > On Wed, Sep 09, 2020 at 11:24:00AM -0400, Taylor Blau wrote: > > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom > > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is > > +enforced. Commits whose filters are not calculated are stored as a > > +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. > > +Overrides the `commitGraph.maxNewFilters` configuration. > > The BFXL chunk doesn't exist anymore in this iteration, right? Ack; I'll have to drop that. > I wondered about having a user-facing "-1" here. My gut feeling is that > we usually use "0" to mean "no limit" in other places, and it probably > make sense to be consistent. It does look like we use both, though, and > I'm having trouble formulating a grep pattern to find examples that > doesn't produce a lot of noise. > > These are "0 is no limit": > > pack.windowMemory > pack.deltaCacheSize > git-daemon --max-connections > > These are "-1 is no limit": > > git-grep --max-depth > rev-list --max-parents (I think?) > > So I dunno. It's a pretty minor thing, but I think it's good to aim for > consistency, and since this is user-facing we won't be able to change it > later. I think that we have to treat "-1" as the no-limit indicator, or otherwise we'd have to specify some other way to say we don't want to generate any filters. With this patch, users can write: $ git commit-graph write --changed-paths .. --max-new-filters=0 to generate a commit-graph without writing any new filters. This is important to be able to do since we also have a 'commitGraph.maxNewFilters' configuration, which callers may want to override. You may wonder why you wouldn't just write '--no-changed-paths' instead. Doing so would indeed generate no new filters, but it also wouldn't write any already existing filters into a new graph which is important when rolling up graph layers that already have incrementals, for example with '--split'. I'm happy to include all or none of this in a re-rolled commit message if you think it's relevant, too. > -Peff Thanks, Taylor
On Fri, Sep 11, 2020 at 02:59:34PM -0400, Taylor Blau wrote: > On Fri, Sep 11, 2020 at 01:52:16PM -0400, Jeff King wrote: > > On Wed, Sep 09, 2020 at 11:24:00AM -0400, Taylor Blau wrote: > > > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom > > > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is > > > +enforced. Commits whose filters are not calculated are stored as a > > > +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. > > > +Overrides the `commitGraph.maxNewFilters` configuration. > > > > The BFXL chunk doesn't exist anymore in this iteration, right? > > Ack; I'll have to drop that. Junio, I know that I've already sent one replacement patch. If you don't mind, here's another (and if you do mind, I'm happy to re-roll the series). Thanks. --- >8 --- Subject: [PATCH] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Introduce a command-line flag and configuration variable to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 6 ++++ bloom.c | 7 ++-- builtin/commit-graph.c | 39 ++++++++++++++++++-- commit-graph.c | 9 +++-- commit-graph.h | 1 + t/t4216-log-bloom.sh | 53 ++++++++++++++++++++++++++++ 7 files changed, 110 insertions(+), 9 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..60df4e4bfa 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter. Overrides the `commitGraph.maxNewFilters` +configuration. ++ With the `--split[=<strategy>]` option, write the commit-graph as a chain of multiple commit-graph files stored in `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index d24747a1d5..230a515831 100644 --- a/bloom.c +++ b/bloom.c @@ -204,12 +204,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..e7a1539b08 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index dcc27b74e3..1d9f8cc7e9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1422,6 +1422,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; struct commit **sorted_commits; + int max_new_filters; init_bloom_filters(); @@ -1438,13 +1439,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) else QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { @@ -1455,7 +1459,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->total_bloom_filter_data_size += filter + ? sizeof(unsigned char) * filter->len : 0; display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index b7914b0a7a..a22bd86701 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,6 +110,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a56327ffd4..24deb8104a 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -287,5 +287,58 @@ test_expect_success 'correctly report commits with no changed paths' ' grep "\"filter_trunc_large\":0" trace ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ + $commit_graph_args && + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 3 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 4 0 1 + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 6 0 0 + ) +' test_done -- 2.27.0.2924.ga64bac9092.dirty
Jeff King <peff@peff.net> writes: > I wondered about having a user-facing "-1" here. My gut feeling is that > we usually use "0" to mean "no limit" in other places, and it probably > make sense to be consistent. It does look like we use both, though, and > I'm having trouble formulating a grep pattern to find examples that > doesn't produce a lot of noise. > > These are "0 is no limit": > > pack.windowMemory > pack.deltaCacheSize > git-daemon --max-connections > > These are "-1 is no limit": > > git-grep --max-depth > rev-list --max-parents (I think?) I am unsure if "limiting to the top-level" is depth 0 or depth 1, but if it is depth 0, --max-depth=0 that does not recurse is sensible and cannot be used as a signal for "unlimited". Same for --max-parents=0 that would be a legit way to ask for "root commits only". I do not think the system fundamentally would not work with 0 bytes of window memory or 0 connections, so "0 is unlimited" for them sounds appropriate. I would not be surprised if the reason why "0 is unlimited" fields did not choose to use "-1" as the "unlimited" signal was because the internal type for these fields is unsigned.
On Fri, Sep 11, 2020 at 02:59:34PM -0400, Taylor Blau wrote: > I think that we have to treat "-1" as the no-limit indicator, or > otherwise we'd have to specify some other way to say we don't want to > generate any filters. With this patch, users can write: > > $ git commit-graph write --changed-paths .. --max-new-filters=0 > > to generate a commit-graph without writing any new filters. This is > important to be able to do since we also have a > 'commitGraph.maxNewFilters' configuration, which callers may want to > override. OK, that makes sense. Consistency would be nice, but I agree it just wouldn't work here (and we're not entirely consistent anyway, so it's not that big a loss). -Peff
On Fri, Sep 11, 2020 at 03:25:55PM -0400, Taylor Blau wrote: > On Fri, Sep 11, 2020 at 02:59:34PM -0400, Taylor Blau wrote: > > On Fri, Sep 11, 2020 at 01:52:16PM -0400, Jeff King wrote: > > > On Wed, Sep 09, 2020 at 11:24:00AM -0400, Taylor Blau wrote: > > > > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom > > > > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is > > > > +enforced. Commits whose filters are not calculated are stored as a > > > > +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. > > > > +Overrides the `commitGraph.maxNewFilters` configuration. > > > > > > The BFXL chunk doesn't exist anymore in this iteration, right? > > > > Ack; I'll have to drop that. > > Junio, I know that I've already sent one replacement patch. If you don't > mind, here's another (and if you do mind, I'm happy to re-roll the > series). Just kidding. Let's use *this* version which fixes a bug reading the commitGraph.maxNewFilters configuration. At this point, the fix-ups are: - This patch (attached below the scisors) instead of 12/12, and - This [1] patch instead of 10/12. [1]: https://lore.kernel.org/git/20200910154516.GA32117@nand.local/ Let me know if you'd rather have a full re-roll. --- 8< --- Subject: [PATCH] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Introduce a command-line flag and configuration variable to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/commitgraph.txt | 4 ++ Documentation/git-commit-graph.txt | 6 +++ bloom.c | 7 ++-- builtin/commit-graph.c | 41 ++++++++++++++++++++- commit-graph.c | 9 ++++- commit-graph.h | 1 + t/t4216-log-bloom.sh | 55 ++++++++++++++++++++++++++++ 7 files changed, 115 insertions(+), 8 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..60df4e4bfa 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter. Overrides the `commitGraph.maxNewFilters` +configuration. ++ With the `--split[=<strategy>]` option, write the commit-graph as a chain of multiple commit-graph files stored in `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index d24747a1d5..230a515831 100644 --- a/bloom.c +++ b/bloom.c @@ -204,12 +204,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..988445abdf 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -162,6 +164,35 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + +static int git_commit_graph_write_config(const char *var, const char *value, + void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) + write_opts.max_new_filters = git_config_int(var, value); + /* + * No need to fall-back to 'git_default_config', since this was already + * called in 'cmd_commit_graph()'. + */ + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +228,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,9 +239,12 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); + git_config(git_commit_graph_write_config, &opts); + argc = parse_options(argc, argv, NULL, builtin_commit_graph_write_options, builtin_commit_graph_write_usage, 0); diff --git a/commit-graph.c b/commit-graph.c index dcc27b74e3..1d9f8cc7e9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1422,6 +1422,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; struct commit **sorted_commits; + int max_new_filters; init_bloom_filters(); @@ -1438,13 +1439,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) else QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { @@ -1455,7 +1459,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->total_bloom_filter_data_size += filter + ? sizeof(unsigned char) * filter->len : 0; display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index b7914b0a7a..a22bd86701 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,6 +110,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a56327ffd4..3cb766301d 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -287,5 +287,60 @@ test_expect_success 'correctly report commits with no changed paths' ' grep "\"filter_trunc_large\":0" trace ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ + $commit_graph_args && + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 3 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + # Check specifying commitGraph.maxNewFilters over "git config" works. + test_config -C limits commitGraph.maxNewFilters 1 && + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ + 4 0 1 + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 6 0 0 + ) +' test_done -- 2.27.0.2924.ga64bac9092.dirty
On 9/14/2020 4:12 PM, Taylor Blau wrote: > On Fri, Sep 11, 2020 at 03:25:55PM -0400, Taylor Blau wrote: >> On Fri, Sep 11, 2020 at 02:59:34PM -0400, Taylor Blau wrote: >>> On Fri, Sep 11, 2020 at 01:52:16PM -0400, Jeff King wrote: >>>> On Wed, Sep 09, 2020 at 11:24:00AM -0400, Taylor Blau wrote: >>>>> +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom >>>>> +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is >>>>> +enforced. Commits whose filters are not calculated are stored as a >>>>> +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. >>>>> +Overrides the `commitGraph.maxNewFilters` configuration. >>>> >>>> The BFXL chunk doesn't exist anymore in this iteration, right? >>> >>> Ack; I'll have to drop that. >> >> Junio, I know that I've already sent one replacement patch. If you don't >> mind, here's another (and if you do mind, I'm happy to re-roll the >> series). > > Just kidding. Let's use *this* version which fixes a bug reading the > commitGraph.maxNewFilters configuration. At this point, the > fix-ups are: > > - This patch (attached below the scisors) instead of 12/12, and > > - This [1] patch instead of 10/12. > > [1]: https://lore.kernel.org/git/20200910154516.GA32117@nand.local/ > > Let me know if you'd rather have a full re-roll. It's getting a bit difficult to track all of these "use this instead" patches. But, I'm not the one applying them, so maybe that's not actually a problem. You might need a re-roll, anyway, as I have a few comments here: > --- 8< --- > > Subject: [PATCH] builtin/commit-graph.c: introduce '--max-new-filters=<n>' You also introduce commitGraph.maxNewFitlers here, which is not mentioned in the commit message anywhere. In fact, it might be good to include it as a separate patch so its implementation and tests can be isolated from the command-line functionality. > +length zero Bloom filter. Overrides the `commitGraph.maxNewFilters` > +configuration. We have found it valuable to demonstrate these overrides in tests. Let's inspect your tests for this. > +test_bloom_filters_computed () { > + commit_graph_args=$1 > + rm -f "$TRASH_DIRECTORY/trace.event" && > + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ > + $commit_graph_args && > + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && > + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && > + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" > +} If the arguments were moved to the last parameter, then we could do a few interesting things here. test_bloom_filters_computed () { NOT_COMPUTED="\"filter_not_computed\":$1" && shift && TRUNCATED="\"filter_trunc_large\":$1" && shift && COMPUTED="\"filter_computed\":$1" && shift && rm -f "$TRASH_DIRECTORY/trace.event" && GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $@ && grep "$NOT_COMPUTED" "$TRASH_DIRECTORY/trace.event" && grep "$TRUNCATED" "$TRASH_DIRECTORY/trace.event" && grep "$COMPUTED" "$TRASH_DIRECTORY/trace.event" } (I have not tested this script. It might need some work.) This would make your callers a bit cleaner-looking, for example: test_expect_success 'Bloom generation is limited by --max-new-filters' ' ( cd limits && test_commit c2 filter && test_commit c3 filter && test_commit c4 no-filter && test_bloom_filters_computed 3 0 2 \ --reachable --changed-paths --split=replace --max-new-filters=2 ) ' At least, this looks nicer to me. > +test_expect_success 'Bloom generation backfills previously-skipped filters' ' > + # Check specifying commitGraph.maxNewFilters over "git config" works. > + test_config -C limits commitGraph.maxNewFilters 1 && > + ( > + cd limits && > + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ > + 4 0 1 > + ) > +' Adding a case for `commitGraph.maxNewFilters=1` and `--max-new-filters=2` might be interesting for the override rules. > + > +test_expect_success 'Bloom generation backfills empty commits' ' > + git init empty && > + test_when_finished "rm -fr empty" && > + ( > + cd empty && > + for i in $(test_seq 1 6) > + do > + git commit --allow-empty -m "$i" > + done && > + > + # Generate Bloom filters for empty commits 1-6, two at a time. > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > + 4 0 2 && > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > + 4 0 2 && > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > + 4 0 2 && I'm concerned that the max-new-filters limit (2) is a divisor of the full number of commits (6). It might be good to add one more commit here and test again with a limit of 2. That would handle both "equal to limit" and "less than limit" cases. > + # Finally, make sure that once all commits have filters, that > + # none are subsequently recomputed. > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > + 6 0 0 > + ) > +' Thanks, -Stolee
On Mon, Sep 14, 2020 at 04:31:03PM -0400, Derrick Stolee wrote: > On 9/14/2020 4:12 PM, Taylor Blau wrote: > > - This patch (attached below the scisors) instead of 12/12, and > > > > - This [1] patch instead of 10/12. > > > > [1]: https://lore.kernel.org/git/20200910154516.GA32117@nand.local/ > > > > Let me know if you'd rather have a full re-roll. > > It's getting a bit difficult to track all of these "use this instead" > patches. But, I'm not the one applying them, so maybe that's not actually > a problem. The above list is the only changes that I've made, so I'm happy if Junio wants to follow what's written there, but I'm equally happy to send a new reroll. > You might need a re-roll, anyway, as I have a few comments here: Let's take a look... > You also introduce commitGraph.maxNewFitlers here, which is not > mentioned in the commit message anywhere. In fact, it might be > good to include it as a separate patch so its implementation and > tests can be isolated from the command-line functionality. I could go either way on both of these, to be honest. I don't think there's anything interesting that isn't said in the documentation changes introduced by that commit that is worth convering there, so I'm not sue 'commitGraph.maxNewFilters' needs the additional call-out. > > +length zero Bloom filter. Overrides the `commitGraph.maxNewFilters` > > +configuration. > > We have found it valuable to demonstrate these overrides in tests. > Let's inspect your tests for this. > > > +test_bloom_filters_computed () { > > + commit_graph_args=$1 > > + rm -f "$TRASH_DIRECTORY/trace.event" && > > + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ > > + $commit_graph_args && > > + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && > > + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && > > + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" > > +} > > If the arguments were moved to the last parameter, then we could do a few > interesting things here. > > test_bloom_filters_computed () { > NOT_COMPUTED="\"filter_not_computed\":$1" && > shift && > TRUNCATED="\"filter_trunc_large\":$1" && > shift && > COMPUTED="\"filter_computed\":$1" && > shift && > rm -f "$TRASH_DIRECTORY/trace.event" && > GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $@ && > grep "$NOT_COMPUTED" "$TRASH_DIRECTORY/trace.event" && > grep "$TRUNCATED" "$TRASH_DIRECTORY/trace.event" && > grep "$COMPUTED" "$TRASH_DIRECTORY/trace.event" > } > > > (I have not tested this script. It might need some work.) > This would make your callers a bit cleaner-looking, for example: > > test_expect_success 'Bloom generation is limited by --max-new-filters' ' > ( > cd limits && > test_commit c2 filter && > test_commit c3 filter && > test_commit c4 no-filter && > test_bloom_filters_computed 3 0 2 \ > --reachable --changed-paths --split=replace --max-new-filters=2 > ) > ' > > At least, this looks nicer to me. Yeah, but I think we're still stuck with the test_config below unless you write "git $@" instead of "git commit-graph write $@". I don't think that I have strong feelings about this unless you do. > > +test_expect_success 'Bloom generation backfills previously-skipped filters' ' > > + # Check specifying commitGraph.maxNewFilters over "git config" works. > > + test_config -C limits commitGraph.maxNewFilters 1 && > > + ( > > + cd limits && > > + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ > > + 4 0 1 > > + ) > > +' > > Adding a case for `commitGraph.maxNewFilters=1` and `--max-new-filters=2` might > be interesting for the override rules. Potentially. I'm equally happy to do it in a follow-up series. I worry slightly about adding too many test-cases for somewhat trivial behavior. > > + > > +test_expect_success 'Bloom generation backfills empty commits' ' > > + git init empty && > > + test_when_finished "rm -fr empty" && > > + ( > > + cd empty && > > + for i in $(test_seq 1 6) > > + do > > + git commit --allow-empty -m "$i" > > + done && > > + > > + # Generate Bloom filters for empty commits 1-6, two at a time. > > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > > + 4 0 2 && > > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > > + 4 0 2 && > > + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ > > + 4 0 2 && > > I'm concerned that the max-new-filters limit (2) is a divisor > of the full number of commits (6). It might be good to add one > more commit here and test again with a limit of 2. That would > handle both "equal to limit" and "less than limit" cases. That case is already covered in the test two above this one ("Bloom generation is limited by --max-new-filters"). > Thanks, > -Stolee Thanks, Taylor
On 9/14/2020 4:36 PM, Taylor Blau wrote: > On Mon, Sep 14, 2020 at 04:31:03PM -0400, Derrick Stolee wrote: >> On 9/14/2020 4:12 PM, Taylor Blau wrote: >>> - This patch (attached below the scisors) instead of 12/12, and >>> >>> - This [1] patch instead of 10/12. >>> >>> [1]: https://lore.kernel.org/git/20200910154516.GA32117@nand.local/ >>> >>> Let me know if you'd rather have a full re-roll. >> >> It's getting a bit difficult to track all of these "use this instead" >> patches. But, I'm not the one applying them, so maybe that's not actually >> a problem. > > The above list is the only changes that I've made, so I'm happy if Junio > wants to follow what's written there, but I'm equally happy to send a > new reroll. > >> You might need a re-roll, anyway, as I have a few comments here: > > Let's take a look... > >> You also introduce commitGraph.maxNewFitlers here, which is not >> mentioned in the commit message anywhere. In fact, it might be >> good to include it as a separate patch so its implementation and >> tests can be isolated from the command-line functionality. > > I could go either way on both of these, to be honest. I don't think > there's anything interesting that isn't said in the documentation > changes introduced by that commit that is worth convering there, so I'm > not sue 'commitGraph.maxNewFilters' needs the additional call-out. This is fine. Adding an option along with the config version of it is easy enough. Just a thought for future series. I'm fine with the series as-is. My nits are just that. Thanks, -Stolee
On Mon, Sep 14, 2020 at 08:59:28PM -0400, Derrick Stolee wrote: > This is fine. Adding an option along with the config version of it > is easy enough. Just a thought for future series. > > I'm fine with the series as-is. My nits are just that. Thanks for your review, and for all of your thoughts and help on this series in general. I'm sorry if I seemed dismissive; I just wanted to avoid holding up some important fixes besides the '--max-new-filters' feature. I'm not quite sure how to handle series like these. On the one hand, I'd like to send a couple of small series to fix the important bugs quickly. On the other hand, the small series can generate a lot of noise and burden reviewers and the maintainer when the dependencies between those series aren't straightforward. So, I dunno. That's at least shedding a little bit of light on how this series came to be / got so large, which is part of the reason that it took so long. Anyway, thank you again. I'm looking forward to seeing this merged hopefully soon. > Thanks, > -Stolee Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: >> It's getting a bit difficult to track all of these "use this instead" >> patches. But, I'm not the one applying them, so maybe that's not actually >> a problem. > > The above list is the only changes that I've made, so I'm happy if Junio > wants to follow what's written there, but I'm equally happy to send a > new reroll. It's getting so unorganized to follow from sidelines. Even resending just the few steps that needs replacement, indicating which ones are replaced with them, would be easier to manage (and full replacement would be the easiest to handle). Thanks.
On Tue, Sep 15, 2020 at 02:49:37PM -0700, Junio C Hamano wrote: > It's getting so unorganized to follow from sidelines. Even > resending just the few steps that needs replacement, indicating > which ones are replaced with them, would be easier to manage (and > full replacement would be the easiest to handle). > > Thanks. I'll send you a full re-roll, no problem. Thanks, Taylor
diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..81a2e65903 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. +Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=<strategy>]` option, write the commit-graph as a chain of multiple commit-graph files stored in `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index 194b6ab8ad..022dd6e0f9 100644 --- a/bloom.c +++ b/bloom.c @@ -197,12 +197,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..e7a1539b08 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir <objdir>] [--append] " "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] <split options>"), + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " + "<split options>"), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index dcc27b74e3..1d9f8cc7e9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1422,6 +1422,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; struct commit **sorted_commits; + int max_new_filters; init_bloom_filters(); @@ -1438,13 +1439,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) else QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { @@ -1455,7 +1459,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->total_bloom_filter_data_size += filter + ? sizeof(unsigned char) * filter->len : 0; display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index b7914b0a7a..a22bd86701 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,6 +110,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a56327ffd4..24deb8104a 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -287,5 +287,58 @@ test_expect_success 'correctly report commits with no changed paths' ' grep "\"filter_trunc_large\":0" trace ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ + $commit_graph_args && + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 3 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 4 0 1 + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 6 0 0 + ) +' test_done
Introduce a command-line flag and configuration variable to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 6 ++++ bloom.c | 7 ++-- builtin/commit-graph.c | 39 ++++++++++++++++++-- commit-graph.c | 9 +++-- commit-graph.h | 1 + t/t4216-log-bloom.sh | 53 ++++++++++++++++++++++++++++ 7 files changed, 110 insertions(+), 9 deletions(-)