Message ID | 5bbdc7124d58526a7a2d7b3bdc807ddd204a6df1.1730366765.git.karthik.188@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | packfile: avoid using the 'the_repository' global variable | expand |
On Thu, Oct 31, 2024 at 10:39:51AM +0100, Karthik Nayak wrote: > @@ -652,20 +688,25 @@ unsigned char *use_pack(struct packed_git *p, > break; > } > if (!win) { > - size_t window_align = packed_git_window_size / 2; > + struct packfile_config config = PACKFILE_CONFIG_INIT; > + size_t window_align; > off_t len; > > + repo_config(p->repo, packfile_config, &config); > + window_align = config.packed_git_window_size / 2; > + Parsing config like this is somewhat expensive (remember we're going to hit your callback for every single config key in the system, user, and repo-level config files). And use_pack() is a relatively hot code path, as we call it any time we need to access bytes from a mapped pack! This "!win" conditional isn't quite as hot, as it only triggers when we establish a new window for a pack. But that still happens at least once per pack, more if we need to move the window around in a big pack, and lots more if we are under memory pressure and need to open/close windows a lot. I think we need to parse these values once and then store them somewhere with cheaper access. Can we grab them in prepare_repo_settings(), for example, which would cache them? We need a repo struct, but we have one (the same packed_git->repo you are using to call repo_config()). -Peff
On Fri, Nov 01, 2024 at 01:45:47PM -0400, Jeff King wrote: > On Thu, Oct 31, 2024 at 10:39:51AM +0100, Karthik Nayak wrote: > > > @@ -652,20 +688,25 @@ unsigned char *use_pack(struct packed_git *p, > > break; > > } > > if (!win) { > > - size_t window_align = packed_git_window_size / 2; > > + struct packfile_config config = PACKFILE_CONFIG_INIT; > > + size_t window_align; > > off_t len; > > > > + repo_config(p->repo, packfile_config, &config); > > + window_align = config.packed_git_window_size / 2; > > + > > Parsing config like this is somewhat expensive (remember we're going to > hit your callback for every single config key in the system, user, and > repo-level config files). > > And use_pack() is a relatively hot code path, as we call it any time we > need to access bytes from a mapped pack! This "!win" conditional isn't > quite as hot, as it only triggers when we establish a new window for a > pack. But that still happens at least once per pack, more if we need to > move the window around in a big pack, and lots more if we are under > memory pressure and need to open/close windows a lot. > > I think we need to parse these values once and then store them somewhere > with cheaper access. Can we grab them in prepare_repo_settings(), for > example, which would cache them? We need a repo struct, but we have one > (the same packed_git->repo you are using to call repo_config()). Oh, wow, I can't believe that I missed this in my earlier reviews. Yes, we should definitely *not* be calling an expensive function which computes the same value every time in a hot path like 'use_pack()'. Thanks for spotting. Thanks, Taylor
Jeff King <peff@peff.net> writes: > On Thu, Oct 31, 2024 at 10:39:51AM +0100, Karthik Nayak wrote: > >> @@ -652,20 +688,25 @@ unsigned char *use_pack(struct packed_git *p, >> break; >> } >> if (!win) { >> - size_t window_align = packed_git_window_size / 2; >> + struct packfile_config config = PACKFILE_CONFIG_INIT; >> + size_t window_align; >> off_t len; >> >> + repo_config(p->repo, packfile_config, &config); >> + window_align = config.packed_git_window_size / 2; >> + > > Parsing config like this is somewhat expensive (remember we're going to > hit your callback for every single config key in the system, user, and > repo-level config files). > > And use_pack() is a relatively hot code path, as we call it any time we > need to access bytes from a mapped pack! This "!win" conditional isn't > quite as hot, as it only triggers when we establish a new window for a > pack. But that still happens at least once per pack, more if we need to > move the window around in a big pack, and lots more if we are under > memory pressure and need to open/close windows a lot. > I must admit, I'm not too aware of the pack objects code base, but that was my assumption indeed, that this conditional wasn't the hot path. But even once per pack seems like quite the regression then. > I think we need to parse these values once and then store them somewhere > with cheaper access. Can we grab them in prepare_repo_settings(), for > example, which would cache them? We need a repo struct, but we have one > (the same packed_git->repo you are using to call repo_config()). > > -Peff This seems like a good idea, I will amend this commit to move the config to `repo_settings`. I think the previous commit doesn't require any changes and can stay. Thanks - Karthik
diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 3ccc4c5722..0ece070260 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -3539,7 +3539,7 @@ static void parse_argv(void) int cmd_fast_import(int argc, const char **argv, const char *prefix, - struct repository *repo UNUSED) + struct repository *repo) { unsigned int i; @@ -3660,7 +3660,7 @@ int cmd_fast_import(int argc, fprintf(stderr, " pools: %10lu KiB\n", (unsigned long)((tree_entry_allocd + fi_mem_pool.pool_alloc) /1024)); fprintf(stderr, " objects: %10" PRIuMAX " KiB\n", (alloc_count*sizeof(struct object_entry))/1024); fprintf(stderr, "---------------------------------------------------------------------\n"); - pack_report(); + pack_report(repo); fprintf(stderr, "---------------------------------------------------------------------\n"); fprintf(stderr, "\n"); } diff --git a/config.c b/config.c index 728ef98e42..2c295f7430 100644 --- a/config.c +++ b/config.c @@ -1493,28 +1493,11 @@ static int git_default_core_config(const char *var, const char *value, return 0; } - if (!strcmp(var, "core.packedgitwindowsize")) { - int pgsz_x2 = getpagesize() * 2; - packed_git_window_size = git_config_ulong(var, value, ctx->kvi); - - /* This value must be multiple of (pagesize * 2) */ - packed_git_window_size /= pgsz_x2; - if (packed_git_window_size < 1) - packed_git_window_size = 1; - packed_git_window_size *= pgsz_x2; - return 0; - } - if (!strcmp(var, "core.bigfilethreshold")) { big_file_threshold = git_config_ulong(var, value, ctx->kvi); return 0; } - if (!strcmp(var, "core.packedgitlimit")) { - packed_git_limit = git_config_ulong(var, value, ctx->kvi); - return 0; - } - if (!strcmp(var, "core.autocrlf")) { if (value && !strcasecmp(value, "input")) { auto_crlf = AUTO_CRLF_INPUT; diff --git a/environment.c b/environment.c index 8e5022c282..8389a27270 100644 --- a/environment.c +++ b/environment.c @@ -49,8 +49,6 @@ int fsync_object_files = -1; int use_fsync = -1; enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT; enum fsync_component fsync_components = FSYNC_COMPONENTS_DEFAULT; -size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; -size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; unsigned long big_file_threshold = 512 * 1024 * 1024; char *editor_program; char *askpass_program; diff --git a/packfile.c b/packfile.c index 2ae35dd03f..f626d38071 100644 --- a/packfile.c +++ b/packfile.c @@ -1,4 +1,3 @@ -#define USE_THE_REPOSITORY_VARIABLE #include "git-compat-util.h" #include "environment.h" @@ -27,6 +26,17 @@ #include "config.h" #include "pack-objects.h" +struct packfile_config { + unsigned long packed_git_window_size; + unsigned long packed_git_limit; +}; + +#define PACKFILE_CONFIG_INIT \ +{ \ + .packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE, \ + .packed_git_limit = DEFAULT_PACKED_GIT_LIMIT, \ +} + char *odb_pack_name(struct repository *r, struct strbuf *buf, const unsigned char *hash, const char *ext) { @@ -48,15 +58,41 @@ static size_t pack_mapped; #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } -void pack_report(void) +static int packfile_config(const char *var, const char *value, + const struct config_context *ctx, void *cb) +{ + struct packfile_config *config = cb; + + if (!strcmp(var, "core.packedgitwindowsize")) { + int pgsz_x2 = getpagesize() * 2; + config->packed_git_window_size = git_config_ulong(var, value, ctx->kvi); + + /* This value must be multiple of (pagesize * 2) */ + config->packed_git_window_size /= pgsz_x2; + if (config->packed_git_window_size < 1) + config->packed_git_window_size = 1; + config->packed_git_window_size *= pgsz_x2; + return 0; + } else if (!strcmp(var, "core.packedgitlimit")) { + config->packed_git_limit = git_config_ulong(var, value, ctx->kvi); + return 0; + } else { + return git_default_config(var, value, ctx, cb); + } +} + +void pack_report(struct repository *repo) { + struct packfile_config config = PACKFILE_CONFIG_INIT; + repo_config(repo, packfile_config, &config); + fprintf(stderr, "pack_report: getpagesize() = %10" SZ_FMT "\n" "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n" "pack_report: core.packedGitLimit = %10" SZ_FMT "\n", sz_fmt(getpagesize()), - sz_fmt(packed_git_window_size), - sz_fmt(packed_git_limit)); + sz_fmt(config.packed_git_window_size), + sz_fmt(config.packed_git_limit)); fprintf(stderr, "pack_report: pack_used_ctr = %10u\n" "pack_report: pack_mmap_calls = %10u\n" @@ -652,20 +688,25 @@ unsigned char *use_pack(struct packed_git *p, break; } if (!win) { - size_t window_align = packed_git_window_size / 2; + struct packfile_config config = PACKFILE_CONFIG_INIT; + size_t window_align; off_t len; + repo_config(p->repo, packfile_config, &config); + window_align = config.packed_git_window_size / 2; + if (p->pack_fd == -1 && open_packed_git(p)) die("packfile %s cannot be accessed", p->pack_name); CALLOC_ARRAY(win, 1); win->offset = (offset / window_align) * window_align; len = p->pack_size - win->offset; - if (len > packed_git_window_size) - len = packed_git_window_size; + if (len > config.packed_git_window_size) + len = config.packed_git_window_size; win->len = (size_t)len; pack_mapped += win->len; - while (packed_git_limit < pack_mapped + + while (config.packed_git_limit < pack_mapped && unuse_one_window(p)) ; /* nothing */ win->base = xmmap_gently(NULL, win->len, diff --git a/packfile.h b/packfile.h index addb95b0c4..58104fa009 100644 --- a/packfile.h +++ b/packfile.h @@ -89,7 +89,7 @@ unsigned long repo_approximate_object_count(struct repository *r); struct packed_git *find_oid_pack(const struct object_id *oid, struct packed_git *packs); -void pack_report(void); +void pack_report(struct repository *repo); /* * mmap the index file for the specified packfile (if it is not
The variables `packed_git_window_size` and `packed_git_limit` are global config variables used in the `packfile.c` file. Since it is only used in this file, let's change it from being a global config variable to a local variable for the subsystem. We do this by introducing a new local `packfile_config` struct in `packfile.c` and also adding the required function to parse the said config. We then use this within `packfile.c` to obtain the variables. With this, we rid `packfile.c` from all global variable usage and this means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- builtin/fast-import.c | 4 +-- config.c | 17 ------------- environment.c | 2 -- packfile.c | 57 +++++++++++++++++++++++++++++++++++++------ packfile.h | 2 +- 5 files changed, 52 insertions(+), 30 deletions(-)