mbox series

[v8,00/10] packfile: avoid using the 'the_repository' global variable

Message ID cover.1732269840.git.karthik.188@gmail.com (mailing list archive)
Headers show
Series packfile: avoid using the 'the_repository' global variable | expand

Message

karthik nayak Nov. 22, 2024, 10:08 a.m. UTC
The `packfile.c` file uses the global variable 'the_repository' extensively
throughout the code. Let's remove all usecases of this, by modifying the
required functions to accept a 'struct repository' instead. This is to clean up
usage of global state.

The first 3 patches are mostly internal to `packfile.c`, we add the repository
field to the `packed_git` struct and this is used to clear up some useages of
the global variables.

The next 3 patches are more disruptive, they modify the function definition of
`odb_pack_name`, `has_object[_kept]_pack` and `for_each_packed_object` to receive
a repository, helping remove other usages of 'the_repository' variable.

Finally, the next two patches deal with global config values. These values are
localized. The last patch is removal of an unecessary call to `prepare_packed_git()`.
For v5 onwards, I've rebased the series off the master: 8f8d6eee53 (The
seventh batch, 2024-11-01), as a dependency for this series 'jk/dumb-http-finalize'
was merged to master. I've found no conflicts while merging with seen & next. But
since this series does touch multiple files, there could be future conflicts.

Changes in v8:
- Fix typos in comments
- For packfile.c use delta_base_cache_limit from the repository
settings, this avoids loading the config in hot paths.
- Rename `longval` to `ulongval` to better signify the type.

Changes in v7:
- Cleanup stale commit message.
- Add missing space in `if` statement.
- Fix typo s/incase/in case/.

Changes in v6:
- Lazy load repository settings in packfile.c. This ensures that the settings are
available for sure and we do not rely on callees setting it.
- Use `size_t` for `delta_base_cache_limit`.

Changes in v5:
- Move packed_git* settings to repo_settings to ensure we don't keep reparsing the
settings in `use_pack`.

Changes in v4:
- Renamed the repository field within `packed_git` and `multi_pack_index` from
`r` to `repo`, while keeping function parameters to be `r`.
- Fixed bad braces.

Changes in v3:
- Improved commit messages. In the first commit to talk about how packed_git
struct could also be part of the alternates of a repository. In the 7th commit
to talk about the motive behind removing the global variable.
- Changed 'packed_git->repo' to 'packed_git->r' to keep it consistent with the
rest of the code base.
- Replaced 'the_repository' with locally available access to the repository
struct in multiple regions.
- Removed unecessary inclusion of the 'repository.h' header file by forward
declaring the 'repository' struct.
- Replace memcpy with hashcpy.
- Change the logic in the 7th patch to use if else statements.
- Added an extra commit to cleanup `pack-bitmap.c`.

Karthik Nayak (9):
  packfile: add repository to struct `packed_git`
  packfile: use `repository` from `packed_git` directly
  packfile: pass `repository` to static function in the file
  packfile: pass down repository to `odb_pack_name`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `for_each_packed_object`
  config: make `delta_base_cache_limit` a non-global variable
  config: make `packed_git_(limit|window_size)` non-global variables
  midx: add repository to `multi_pack_index` struct

Taylor Blau (1):
  packfile.c: remove unnecessary prepare_packed_git() call

 builtin/cat-file.c       |   7 +-
 builtin/count-objects.c  |   2 +-
 builtin/fast-import.c    |  15 ++--
 builtin/fsck.c           |  20 +++---
 builtin/gc.c             |   8 ++-
 builtin/index-pack.c     |  20 ++++--
 builtin/pack-objects.c   |  11 +--
 builtin/pack-redundant.c |   2 +-
 builtin/repack.c         |   2 +-
 builtin/rev-list.c       |   2 +-
 commit-graph.c           |   4 +-
 config.c                 |  22 ------
 connected.c              |   3 +-
 diff.c                   |   3 +-
 environment.c            |   3 -
 environment.h            |   1 -
 fsck.c                   |   2 +-
 http.c                   |   4 +-
 list-objects.c           |   7 +-
 midx-write.c             |   2 +-
 midx.c                   |   3 +-
 midx.h                   |   3 +
 object-store-ll.h        |   9 ++-
 pack-bitmap.c            |  90 ++++++++++++++----------
 pack-objects.h           |   3 +-
 pack-write.c             |   1 +
 pack.h                   |   2 +
 packfile.c               | 144 ++++++++++++++++++++++-----------------
 packfile.h               |  18 +++--
 promisor-remote.c        |   2 +-
 prune-packed.c           |   2 +-
 reachable.c              |   4 +-
 repo-settings.c          |  18 +++++
 repo-settings.h          |   7 ++
 revision.c               |  13 ++--
 tag.c                    |   2 +-
 36 files changed, 271 insertions(+), 190 deletions(-)

Range-diff against v7:
 1:  6c00e25c86 !  1:  d1fdd6996a packfile: add repository to struct `packed_git`
    @@ object-store-ll.h: struct packed_git {
      	const uint32_t *mtimes_map;
      	size_t mtimes_size;
     +
    -+	/* repo dentoes the repository this packed file belongs to */
    ++	/* repo denotes the repository this packfile belongs to */
     +	struct repository *repo;
     +
      	/* something like ".git/objects/pack/xxxxx.pack" */
 2:  70fc8a79af =  2:  65c09858ce packfile: use `repository` from `packed_git` directly
 3:  167a1f3a11 =  3:  80632934d1 packfile: pass `repository` to static function in the file
 4:  b7cfe78217 =  4:  67d71eab83 packfile: pass down repository to `odb_pack_name`
 5:  5566f5554c =  5:  ee210fa153 packfile: pass down repository to `has_object[_kept]_pack`
 6:  1b26e45a9b =  6:  8db7094f4e packfile: pass down repository to `for_each_packed_object`
 7:  1bdc34f4d8 !  7:  d1b6e8801b config: make `delta_base_cache_limit` a non-global variable
    @@ Commit message
     
         The `delta_base_cache_limit` variable is a global config variable used
         by multiple subsystems. Let's make this non-global, by adding this
    -    variable to the stack of each of the subsystems where it is used.
    +    variable independently to the subsystems where it is used.
     
    -    In `gc.c` we add it to the `gc_config` struct and also the constructor
    -    function. In `index-pack.c` we add it to the `pack_idx_option` struct
    -    and its constructor. Finally, in `packfile.c` we dynamically retrieve
    -    this value from the repository config, since the value is only used once
    -    in the entire subsystem.
    +    First, add the setting to the `repo_settings` struct, this provides
    +    access to the config in places where the repository is available. Use
    +    this in `packfile.c`.
    +
    +    In `index-pack.c` we add it to the `pack_idx_option` struct and its
    +    constructor. While the repository struct is available here, it may not
    +    be set  because `git index-pack` can be used without a repository.
    +
    +    In `gc.c` add it to the `gc_config` struct and also the constructor
    +    function. The gc functions currently do not have direct access to a
    +    repository struct.
     
         These changes are made to remove the usage of `delta_base_cache_limit`
         as a global variable in `packfile.c`. This brings us one step closer to
    @@ pack.h: struct pack_idx_option {
      	 */
      	int anomaly_alloc, anomaly_nr;
      	uint32_t *anomaly;
    -+	unsigned long delta_base_cache_limit;
    ++
    ++	size_t delta_base_cache_limit;
      };
      
      void reset_pack_idx_option(struct pack_idx_option *);
     
      ## packfile.c ##
    -@@
    - #include "commit-graph.h"
    - #include "pack-revindex.h"
    - #include "promisor-remote.h"
    -+#include "config.h"
    -+#include "pack-objects.h"
    - 
    - char *odb_pack_name(struct repository *r, struct strbuf *buf,
    - 		    const unsigned char *hash, const char *ext)
     @@ packfile.c: void clear_delta_base_cache(void)
      }
      
    @@ packfile.c: void clear_delta_base_cache(void)
      	struct delta_base_cache_entry *ent;
      	struct list_head *lru, *tmp;
     @@ packfile.c: void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
    - 	struct unpack_entry_stack_ent *delta_stack = small_delta_stack;
      	int delta_stack_nr = 0, delta_stack_alloc = UNPACK_ENTRY_STACK_PREALLOC;
      	int base_from_cache = 0;
    -+	unsigned long delta_base_cache_limit = DEFAULT_DELTA_BASE_CACHE_LIMIT;
    -+
    -+	repo_config_get_ulong(r, "core.deltabasecachelimit", &delta_base_cache_limit);
      
    ++	prepare_repo_settings(p->repo);
    ++
      	write_pack_access_log(p, obj_offset);
      
    + 	/* PHASE 1: drill down to the innermost base object */
     @@ packfile.c: void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
      		 * before we are done using it.
      		 */
      		if (!external_base)
     -			add_delta_base_cache(p, base_obj_offset, base, base_size, type);
    -+			add_delta_base_cache(p, base_obj_offset, base,
    -+					     base_size, delta_base_cache_limit,
    ++			add_delta_base_cache(p, base_obj_offset, base, base_size,
    ++					     p->repo->settings.delta_base_cache_limit,
     +					     type);
      
      		free(delta_data);
      		free(external_base);
    +
    + ## repo-settings.c ##
    +@@
    + #include "repo-settings.h"
    + #include "repository.h"
    + #include "midx.h"
    ++#include "pack-objects.h"
    + 
    + static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
    + 			  int def)
    +@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    + 	const char *strval;
    + 	int manyfiles;
    + 	int read_changed_paths;
    ++	unsigned long ulongval;
    + 
    + 	if (!r->gitdir)
    + 		BUG("Cannot add settings for uninitialized repository");
    +@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    + 	 * removed.
    + 	 */
    + 	r->settings.command_requires_full_index = 1;
    ++
    ++	if (!repo_config_get_ulong(r, "core.deltabasecachelimit", &ulongval))
    ++		r->settings.delta_base_cache_limit = ulongval;
    + }
    + 
    + enum log_refs_config repo_settings_get_log_all_ref_updates(struct repository *repo)
    +
    + ## repo-settings.h ##
    +@@ repo-settings.h: struct repo_settings {
    + 
    + 	int core_multi_pack_index;
    + 	int warn_ambiguous_refs; /* lazily loaded via accessor */
    ++
    ++	size_t delta_base_cache_limit;
    + };
    + #define REPO_SETTINGS_INIT { \
    + 	.index_version = -1, \
    + 	.core_untracked_cache = UNTRACKED_CACHE_KEEP, \
    + 	.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE, \
    + 	.warn_ambiguous_refs = -1, \
    ++	.delta_base_cache_limit = DEFAULT_DELTA_BASE_CACHE_LIMIT, \
    + }
    + 
    + void prepare_repo_settings(struct repository *r);
 8:  7b6baa89ac !  8:  30a52f192f config: make `packed_git_(limit|window_size)` non-global variables
    @@ packfile.h: unsigned long repo_approximate_object_count(struct repository *r);
     
      ## repo-settings.c ##
     @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    - 	const char *strval;
    - 	int manyfiles;
    - 	int read_changed_paths;
    -+	unsigned long longval;
      
    - 	if (!r->gitdir)
    - 		BUG("Cannot add settings for uninitialized repository");
    -@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
    - 	 * removed.
    - 	 */
    - 	r->settings.command_requires_full_index = 1;
    + 	if (!repo_config_get_ulong(r, "core.deltabasecachelimit", &ulongval))
    + 		r->settings.delta_base_cache_limit = ulongval;
     +
    -+	if (!repo_config_get_ulong(r, "core.packedgitwindowsize", &longval)) {
    ++	if (!repo_config_get_ulong(r, "core.packedgitwindowsize", &ulongval)) {
     +		int pgsz_x2 = getpagesize() * 2;
     +
     +		/* This value must be multiple of (pagesize * 2) */
    -+		longval /= pgsz_x2;
    -+		if (longval < 1)
    -+			longval = 1;
    -+		r->settings.packed_git_window_size = longval * pgsz_x2;
    ++		ulongval /= pgsz_x2;
    ++		if (ulongval < 1)
    ++			ulongval = 1;
    ++		r->settings.packed_git_window_size = ulongval * pgsz_x2;
     +	}
     +
    -+	if (!repo_config_get_ulong(r, "core.packedgitlimit", &longval))
    -+		r->settings.packed_git_limit = longval;
    ++	if (!repo_config_get_ulong(r, "core.packedgitlimit", &ulongval))
    ++		r->settings.packed_git_limit = ulongval;
      }
      
      enum log_refs_config repo_settings_get_log_all_ref_updates(struct repository *repo)
     
      ## repo-settings.h ##
     @@ repo-settings.h: struct repo_settings {
    - 
    - 	int core_multi_pack_index;
      	int warn_ambiguous_refs; /* lazily loaded via accessor */
    -+
    + 
    + 	size_t delta_base_cache_limit;
     +	size_t packed_git_window_size;
     +	size_t packed_git_limit;
      };
      #define REPO_SETTINGS_INIT { \
      	.index_version = -1, \
    - 	.core_untracked_cache = UNTRACKED_CACHE_KEEP, \
    +@@ repo-settings.h: struct repo_settings {
      	.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE, \
      	.warn_ambiguous_refs = -1, \
    + 	.delta_base_cache_limit = DEFAULT_DELTA_BASE_CACHE_LIMIT, \
     +	.packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE, \
     +	.packed_git_limit = DEFAULT_PACKED_GIT_LIMIT, \
      }
 9:  a3667d87ec =  9:  2fe5d2506f midx: add repository to `multi_pack_index` struct
 -:  ---------- > 10:  05989c2e27 packfile.c: remove unnecessary prepare_packed_git() call