mbox series

[v5,0/9] packfile: avoid using the 'the_repository' global variable

Message ID cover.1730714298.git.karthik.188@gmail.com (mailing list archive)
Headers show
Series packfile: avoid using the 'the_repository' global variable | expand

Message

karthik nayak Nov. 4, 2024, 11:41 a.m. UTC
The `packfile.c` file uses the global variable 'the_repository' extensively
throughout the code. Let's remove all usecases of this, by modifying the
required functions to accept a 'struct repository' instead. This is to clean up
usage of global state.

The first 3 patches are mostly internal to `packfile.c`, we add the repository
field to the `packed_git` struct and this is used to clear up some useages of
the global variables.

The next 3 patches are more disruptive, they modify the function definition of
`odb_pack_name`, `has_object[_kept]_pack` and `for_each_packed_object` to receive
a repository, helping remove other usages of 'the_repository' variable.

Finally, the last two patches deal with global config values. These values are
localized.

For v5, I've rebased the series off the new master: 8f8d6eee53 (The seventh
batch, 2024-11-01), as a dependency for this series 'jk/dumb-http-finalize' was
merged to master. I've found no conflicts while merging with seen & next. But
since this series does touch multiple files, there could be future conflicts.

Karthik Nayak (9):
  packfile: add repository to struct `packed_git`
  packfile: use `repository` from `packed_git` directly
  packfile: pass `repository` to static function in the file
  packfile: pass down repository to `odb_pack_name`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `for_each_packed_object`
  config: make `delta_base_cache_limit` a non-global variable
  config: make `packed_git_(limit|window_size)` non-global variables
  midx: add repository to `multi_pack_index` struct

 builtin/cat-file.c       |   7 +-
 builtin/count-objects.c  |   2 +-
 builtin/fast-import.c    |  15 ++--
 builtin/fsck.c           |  20 +++---
 builtin/gc.c             |   5 +-
 builtin/index-pack.c     |  20 ++++--
 builtin/pack-objects.c   |  11 +--
 builtin/pack-redundant.c |   2 +-
 builtin/repack.c         |   2 +-
 builtin/rev-list.c       |   2 +-
 commit-graph.c           |   4 +-
 config.c                 |  22 ------
 connected.c              |   3 +-
 diff.c                   |   3 +-
 environment.c            |   3 -
 environment.h            |   1 -
 fsck.c                   |   2 +-
 http.c                   |   4 +-
 list-objects.c           |   7 +-
 midx-write.c             |   2 +-
 midx.c                   |   3 +-
 midx.h                   |   3 +
 object-store-ll.h        |   9 ++-
 pack-bitmap.c            |  90 ++++++++++++++----------
 pack-objects.h           |   3 +-
 pack-write.c             |   1 +
 pack.h                   |   1 +
 packfile.c               | 144 ++++++++++++++++++++++-----------------
 packfile.h               |  18 +++--
 promisor-remote.c        |   2 +-
 prune-packed.c           |   2 +-
 reachable.c              |   4 +-
 repo-settings.c          |  14 ++++
 repo-settings.h          |   5 ++
 revision.c               |  13 ++--
 tag.c                    |   2 +-
 36 files changed, 261 insertions(+), 190 deletions(-)

Range-diff against v4:
 1:  b3d518e998 =  1:  6c00e25c86 packfile: add repository to struct `packed_git`
 2:  bb9d9aa744 =  2:  70fc8a79af packfile: use `repository` from `packed_git` directly
 3:  d5df50fa36 =  3:  167a1f3a11 packfile: pass `repository` to static function in the file
 4:  0107801c3b =  4:  b7cfe78217 packfile: pass down repository to `odb_pack_name`
 5:  2d7608a367 =  5:  5566f5554c packfile: pass down repository to `has_object[_kept]_pack`
 6:  2c84026d02 =  6:  1b26e45a9b packfile: pass down repository to `for_each_packed_object`
 7:  84b89c8a0e !  7:  7654bf5e7e config: make `delta_base_cache_limit` a non-global variable
    @@ Commit message
         this value from the repository config, since the value is only used once
         in the entire subsystem.
     
    +    The type of the value is changed from `size_t` to an `unsigned long`
    +    since the default value is small enough to fit inside the latter on all
    +    platforms.
    +
         These changes are made to remove the usage of `delta_base_cache_limit`
         as a global variable in `packfile.c`. This brings us one step closer to
         removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
 8:  5bbdc7124d !  8:  2730aacd8e config: make `packed_git_(limit|window_size)` non-global variables
    @@ packfile.c
      
      #include "git-compat-util.h"
      #include "environment.h"
    -@@
    - #include "config.h"
    - #include "pack-objects.h"
    - 
    -+struct packfile_config {
    -+	unsigned long packed_git_window_size;
    -+	unsigned long packed_git_limit;
    -+};
    -+
    -+#define PACKFILE_CONFIG_INIT \
    -+{ \
    -+	.packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE, \
    -+	.packed_git_limit = DEFAULT_PACKED_GIT_LIMIT, \
    -+}
    -+
    - char *odb_pack_name(struct repository *r, struct strbuf *buf,
    - 		    const unsigned char *hash, const char *ext)
    - {
     @@ packfile.c: static size_t pack_mapped;
      #define SZ_FMT PRIuMAX
      static inline uintmax_t sz_fmt(size_t s) { return s; }
      
     -void pack_report(void)
    -+static int packfile_config(const char *var, const char *value,
    -+			   const struct config_context *ctx, void *cb)
    -+{
    -+	struct packfile_config *config = cb;
    -+
    -+	if (!strcmp(var, "core.packedgitwindowsize")) {
    -+		int pgsz_x2 = getpagesize() * 2;
    -+		config->packed_git_window_size = git_config_ulong(var, value, ctx->kvi);
    -+
    -+		/* This value must be multiple of (pagesize * 2) */
    -+		config->packed_git_window_size /= pgsz_x2;
    -+		if (config->packed_git_window_size < 1)
    -+			config->packed_git_window_size = 1;
    -+		config->packed_git_window_size *= pgsz_x2;
    -+		return 0;
    -+	} else if (!strcmp(var, "core.packedgitlimit")) {
    -+		config->packed_git_limit = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	} else {
    -+		return git_default_config(var, value, ctx, cb);
    -+	}
    -+}
    -+
     +void pack_report(struct repository *repo)
      {
    -+	struct packfile_config config = PACKFILE_CONFIG_INIT;
    -+	repo_config(repo, packfile_config, &config);
    -+
      	fprintf(stderr,
      		"pack_report: getpagesize()            = %10" SZ_FMT "\n"
      		"pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
    @@ packfile.c: static size_t pack_mapped;
      		sz_fmt(getpagesize()),
     -		sz_fmt(packed_git_window_size),
     -		sz_fmt(packed_git_limit));
    -+		sz_fmt(config.packed_git_window_size),
    -+		sz_fmt(config.packed_git_limit));
    ++		sz_fmt(repo->settings.packed_git_window_size),
    ++		sz_fmt(repo->settings.packed_git_limit));
      	fprintf(stderr,
      		"pack_report: pack_used_ctr            = %10u\n"
      		"pack_report: pack_mmap_calls          = %10u\n"
    @@ packfile.c: unsigned char *use_pack(struct packed_git *p,
      		}
      		if (!win) {
     -			size_t window_align = packed_git_window_size / 2;
    -+			struct packfile_config config = PACKFILE_CONFIG_INIT;
     +			size_t window_align;
      			off_t len;
    - 
    -+			repo_config(p->repo, packfile_config, &config);
    -+			window_align = config.packed_git_window_size / 2;
    ++			struct repo_settings *settings = &p->repo->settings;
     +
    ++			window_align = settings->packed_git_window_size / 2;
    + 
      			if (p->pack_fd == -1 && open_packed_git(p))
      				die("packfile %s cannot be accessed", p->pack_name);
    - 
    +@@ packfile.c: unsigned char *use_pack(struct packed_git *p,
      			CALLOC_ARRAY(win, 1);
      			win->offset = (offset / window_align) * window_align;
      			len = p->pack_size - win->offset;
     -			if (len > packed_git_window_size)
     -				len = packed_git_window_size;
    -+			if (len > config.packed_git_window_size)
    -+				len = config.packed_git_window_size;
    ++			if (len > settings->packed_git_window_size)
    ++				len = settings->packed_git_window_size;
      			win->len = (size_t)len;
      			pack_mapped += win->len;
     -			while (packed_git_limit < pack_mapped
     +
    -+			while (config.packed_git_limit < pack_mapped
    ++			while (settings->packed_git_limit < pack_mapped
      				&& unuse_one_window(p))
      				; /* nothing */
      			win->base = xmmap_gently(NULL, win->len,
    @@ packfile.h: unsigned long repo_approximate_object_count(struct repository *r);
      
      /*
       * mmap the index file for the specified packfile (if it is not
    +
    + ## repo-settings.c ##
    +@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    + 	const char *strval;
    + 	int manyfiles;
    + 	int read_changed_paths;
    ++	unsigned long longval;
    + 
    + 	if (!r->gitdir)
    + 		BUG("Cannot add settings for uninitialized repository");
    +@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    + 	 * removed.
    + 	 */
    + 	r->settings.command_requires_full_index = 1;
    ++
    ++	if (!repo_config_get_ulong(r, "core.packedgitwindowsize", &longval)) {
    ++		int pgsz_x2 = getpagesize() * 2;
    ++
    ++		/* This value must be multiple of (pagesize * 2) */
    ++		longval /= pgsz_x2;
    ++		if (longval < 1)
    ++			longval = 1;
    ++		r->settings.packed_git_window_size = longval * pgsz_x2;
    ++	}
    ++
    ++	if (!repo_config_get_ulong(r, "core.packedgitlimit", &longval))
    ++		r->settings.packed_git_limit = longval;
    + }
    + 
    + enum log_refs_config repo_settings_get_log_all_ref_updates(struct repository *repo)
    +
    + ## repo-settings.h ##
    +@@ repo-settings.h: struct repo_settings {
    + 
    + 	int core_multi_pack_index;
    + 	int warn_ambiguous_refs; /* lazily loaded via accessor */
    ++
    ++	size_t packed_git_window_size;
    ++	size_t packed_git_limit;
    + };
    + #define REPO_SETTINGS_INIT { \
    + 	.index_version = -1, \
    + 	.core_untracked_cache = UNTRACKED_CACHE_KEEP, \
    + 	.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE, \
    + 	.warn_ambiguous_refs = -1, \
    ++	.packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE, \
    ++	.packed_git_limit = DEFAULT_PACKED_GIT_LIMIT, \
    + }
    + 
    + void prepare_repo_settings(struct repository *r);
 9:  bb15a0be56 =  9:  8e33d40077 midx: add repository to `multi_pack_index` struct

Comments

Jeff King Nov. 4, 2024, 5:32 p.m. UTC | #1
On Mon, Nov 04, 2024 at 12:41:38PM +0100, Karthik Nayak wrote:

> Range-diff against v4:
>  1:  b3d518e998 =  1:  6c00e25c86 packfile: add repository to struct `packed_git`
>  2:  bb9d9aa744 =  2:  70fc8a79af packfile: use `repository` from `packed_git` directly
>  3:  d5df50fa36 =  3:  167a1f3a11 packfile: pass `repository` to static function in the file
>  4:  0107801c3b =  4:  b7cfe78217 packfile: pass down repository to `odb_pack_name`
>  5:  2d7608a367 =  5:  5566f5554c packfile: pass down repository to `has_object[_kept]_pack`
>  6:  2c84026d02 =  6:  1b26e45a9b packfile: pass down repository to `for_each_packed_object`
>  7:  84b89c8a0e !  7:  7654bf5e7e config: make `delta_base_cache_limit` a non-global variable
>     @@ Commit message
>          this value from the repository config, since the value is only used once
>          in the entire subsystem.
>      
>     +    The type of the value is changed from `size_t` to an `unsigned long`
>     +    since the default value is small enough to fit inside the latter on all
>     +    platforms.
>     +

I think this change is not ideal, for the same reason that the other
type changes were: you can conceivably have a 4GB or larger cache here.
On Windows using "unsigned long" would prevent that. (On most other
systems it is OK either way since "unsigned long" and "size_t" are
generally the same size).

I do think the config parsing should change to use size_t here (like I
mentioned elsewhere in the thread), which would fix it on Windows.
That's outside the scope of your patch, but in the meantime we should
not be making things worse by moving the variable itself to the inferior
type.

-Peff
karthik nayak Nov. 5, 2024, 9:43 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Mon, Nov 04, 2024 at 12:41:38PM +0100, Karthik Nayak wrote:
>
>> Range-diff against v4:
>>  1:  b3d518e998 =  1:  6c00e25c86 packfile: add repository to struct `packed_git`
>>  2:  bb9d9aa744 =  2:  70fc8a79af packfile: use `repository` from `packed_git` directly
>>  3:  d5df50fa36 =  3:  167a1f3a11 packfile: pass `repository` to static function in the file
>>  4:  0107801c3b =  4:  b7cfe78217 packfile: pass down repository to `odb_pack_name`
>>  5:  2d7608a367 =  5:  5566f5554c packfile: pass down repository to `has_object[_kept]_pack`
>>  6:  2c84026d02 =  6:  1b26e45a9b packfile: pass down repository to `for_each_packed_object`
>>  7:  84b89c8a0e !  7:  7654bf5e7e config: make `delta_base_cache_limit` a non-global variable
>>     @@ Commit message
>>          this value from the repository config, since the value is only used once
>>          in the entire subsystem.
>>
>>     +    The type of the value is changed from `size_t` to an `unsigned long`
>>     +    since the default value is small enough to fit inside the latter on all
>>     +    platforms.
>>     +
>
> I think this change is not ideal, for the same reason that the other
> type changes were: you can conceivably have a 4GB or larger cache here.
> On Windows using "unsigned long" would prevent that. (On most other
> systems it is OK either way since "unsigned long" and "size_t" are
> generally the same size).
>
> I do think the config parsing should change to use size_t here (like I
> mentioned elsewhere in the thread), which would fix it on Windows.
> That's outside the scope of your patch, but in the meantime we should
> not be making things worse by moving the variable itself to the inferior
> type.
>

There is a subtle difference though. Both the configs (this and next
commit) although are initialized with 'size_t' they are bounded by
'unsigned long's range, since they do to the functions used to obtain
the information. The only difference being the defaults, which could've
been greater than the range of 'unsigned long'.

While having said that, keeping it 'size_t' makes it easier to know
which configs need to use the yet_to_be introduced 'size_t' variants of
the config parsers. So let me change this too. Thanks.

Karthik