mbox series

[v3,00/10] Split up "object-file.c"

Message ID 20250415-pks-split-object-file-v3-0-6aa7db7ad7b0@pks.im (mailing list archive)
Headers show
Series Split up "object-file.c" | expand

Message

Patrick Steinhardt April 15, 2025, 9:38 a.m. UTC
Hi,

"object-file.c" is quite a grab-bag of all kinds of different functions.
Many of these functions aren't really a good fit though and should be
owned by a different subsystem. This patch series tries to split up
concerns a bit better by splitting out this functionality into other
files:

  - `safe_create_leading_directories()` is moved into "dir.c".
  - `xmmap()` is moved into "wrapper.c".
  - `git_open_cloexec()` is moved into "compat/open.c".
  - Several functions attached to `struct index_state` are moved into
    "read-cache.c".
  - Several functions related to `struct object_store` are moved into a
    new file "object-store.c".

"object-file.c" now mostly contains logic to read and write loose object
files, whereas "object-store.c" contains the higher-level logic to
manage different object directories for a repository. Eventually, these
will become the loose object backend as well as the `struct ref_store`
equivalent for objects, respectively.

The series is built on top of 9d22ac51228 (The third batch, 2025-04-07)
with ps/object-wo-the-repository at 9442b1c919a (Merge remote-tracking
branch 'junio/ps/object-wo-the-repository' into HEAD, 2025-04-08) merged
into it.

Changes in v2:
  - Fix a grammar issue in one of the commit messages.
  - Link to v1: https://lore.kernel.org/r/20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im

Changes in v3:
  - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match
    naming of similar functions.
  - Move `safe_create_leading_directories()` et al into "path.c" instead
    of into "dir.c". This also requires us to start injecting a repo via
    parameters as "path.c" doesn't have `the_repository` available
    anymore.
  - Drop the commit that moves `index_blob_stream()` and related
    functions.
  - Expand the reasoning why we want to have cached objects per object
    store instead of globally.
  - Drop `index_blob_stream()`, which is a trivial wrapper around
    `index_blob_bulk_checkin()`.
  - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (10):
      object-file: move `mkdir_in_gitdir()` into "path.c"
      object-file: move `safe_create_leading_directories()` into "path.c"
      object-file: move `git_open_cloexec()` to "compat/open.c"
      object-file: move `xmmap()` into "wrapper.c"
      object-file: split out functions relating to object store subsystem
      object-file: split up concerns of `HASH_*` flags
      object-file: drop `index_blob_stream()`
      object: split out functions relating to object store subsystem
      object-store: remove global array of cached objects
      object-store: merge "object-store-ll.h" and "object-store.h"

 Makefile                           |    3 +-
 apply.c                            |    2 +-
 archive-tar.c                      |    2 +-
 archive-zip.c                      |    2 +-
 archive.c                          |    2 +-
 attr.c                             |    2 +-
 bisect.c                           |    2 +-
 blame.c                            |    4 +-
 builtin/backfill.c                 |    2 +-
 builtin/blame.c                    |    2 +-
 builtin/bugreport.c                |    4 +-
 builtin/cat-file.c                 |    2 +-
 builtin/checkout.c                 |    3 +-
 builtin/clone.c                    |    6 +-
 builtin/commit-graph.c             |    2 +-
 builtin/commit-tree.c              |    2 +-
 builtin/count-objects.c            |    2 +-
 builtin/credential-cache--daemon.c |    4 +-
 builtin/describe.c                 |    2 +-
 builtin/diagnose.c                 |    4 +-
 builtin/difftool.c                 |   31 +-
 builtin/fast-export.c              |    2 +-
 builtin/fast-import.c              |    4 +-
 builtin/fetch.c                    |    2 +-
 builtin/fsck.c                     |    4 +-
 builtin/gc.c                       |    9 +-
 builtin/grep.c                     |    2 +-
 builtin/hash-object.c              |   25 +-
 builtin/index-pack.c               |    2 +-
 builtin/init-db.c                  |    3 +-
 builtin/log.c                      |    6 +-
 builtin/ls-tree.c                  |    2 +-
 builtin/merge-file.c               |    1 +
 builtin/merge-tree.c               |    2 +-
 builtin/mktag.c                    |    2 +-
 builtin/mktree.c                   |    3 +-
 builtin/multi-pack-index.c         |    2 +-
 builtin/mv.c                       |    3 +-
 builtin/notes.c                    |    3 +-
 builtin/pack-objects.c             |    2 +-
 builtin/pack-redundant.c           |    2 +-
 builtin/prune.c                    |    2 +-
 builtin/rebase.c                   |    3 +-
 builtin/receive-pack.c             |    3 +-
 builtin/remote.c                   |    2 +-
 builtin/repack.c                   |    2 +-
 builtin/replace.c                  |    4 +-
 builtin/rev-list.c                 |    2 +-
 builtin/show-ref.c                 |    2 +-
 builtin/sparse-checkout.c          |    5 +-
 builtin/submodule--helper.c        |    6 +-
 builtin/tag.c                      |    3 +-
 builtin/unpack-file.c              |    3 +-
 builtin/unpack-objects.c           |    3 +-
 builtin/update-index.c             |    2 +-
 builtin/worktree.c                 |    8 +-
 bulk-checkin.c                     |    8 +-
 bulk-checkin.h                     |   15 +
 bundle-uri.c                       |    2 +-
 bundle.c                           |    2 +-
 cache-tree.c                       |    4 +-
 combine-diff.c                     |    2 +-
 commit-graph.c                     |    5 +-
 commit-graph.h                     |    2 +-
 commit.c                           |    3 +-
 compat/open.c                      |   29 +
 config.c                           |    2 +-
 connected.c                        |    2 +-
 convert.c                          |    2 +-
 diagnose.c                         |    2 +-
 diff.c                             |    2 +-
 diffcore-rename.c                  |    2 +-
 dir.c                              |    7 +-
 entry.c                            |    2 +-
 fetch-pack.c                       |    2 +-
 fmt-merge-msg.c                    |    2 +-
 fsck.c                             |    2 +-
 git-compat-util.h                  |    3 +
 grep.c                             |    2 +-
 http-backend.c                     |    2 +-
 http-push.c                        |    3 +-
 http-walker.c                      |    2 +-
 http.c                             |    2 +-
 list-objects-filter.c              |    2 +-
 list-objects.c                     |    2 +-
 log-tree.c                         |    2 +-
 mailmap.c                          |    2 +-
 match-trees.c                      |    3 +-
 merge-blobs.c                      |    2 +-
 merge-ort.c                        |    3 +-
 merge-recursive.c                  |    6 +-
 meson.build                        |    2 +
 midx-write.c                       |    2 +-
 midx.c                             |    1 -
 notes-cache.c                      |    3 +-
 notes-merge.c                      |    8 +-
 notes.c                            |    3 +-
 object-file.c                      | 1220 +-----------------------------------
 object-file.h                      |  121 ++--
 object-name.c                      |    2 +-
 object-store-ll.h                  |  556 ----------------
 object-store.c                     | 1050 +++++++++++++++++++++++++++++++
 object-store.h                     |  516 ++++++++++++++-
 object.c                           |   67 --
 oss-fuzz/fuzz-pack-idx.c           |    2 +-
 pack-bitmap-write.c                |    2 +-
 pack-bitmap.c                      |    3 +-
 pack-check.c                       |    2 +-
 pack-mtimes.c                      |    3 +-
 pack-objects.h                     |    2 +-
 pack-revindex.c                    |    3 +-
 packfile.c                         |    2 +-
 path.c                             |  111 +++-
 path.h                             |   45 ++
 promisor-remote.c                  |    2 +-
 protocol-caps.c                    |    2 +-
 prune-packed.c                     |    2 +-
 reachable.c                        |    2 +-
 read-cache.c                       |    6 +-
 ref-filter.c                       |    2 +-
 reflog.c                           |    2 +-
 refs.c                             |    2 +-
 refs/files-backend.c               |    4 +-
 remote.c                           |    2 +-
 replace-object.c                   |    2 +-
 replace-object.h                   |    2 +-
 repository.c                       |    2 +-
 rerere.c                           |    7 +-
 revision.c                         |    2 +-
 send-pack.c                        |    2 +-
 sequencer.c                        |    6 +-
 server-info.c                      |    4 +-
 shallow.c                          |    2 +-
 streaming.c                        |    2 +-
 submodule-config.c                 |    2 +-
 submodule.c                        |    4 +-
 t/helper/test-pack-mtimes.c        |    2 +-
 t/helper/test-partial-clone.c      |    2 +-
 t/helper/test-read-graph.c         |    2 +-
 t/helper/test-read-midx.c          |    2 +-
 t/helper/test-ref-store.c          |    2 +-
 tag.c                              |    2 +-
 tmp-objdir.c                       |    2 +-
 tree-walk.c                        |    2 +-
 tree.c                             |    2 +-
 unpack-trees.c                     |    2 +-
 upload-pack.c                      |    2 +-
 walker.c                           |    2 +-
 wrapper.c                          |   48 ++
 xdiff-interface.c                  |    2 +-
 150 files changed, 2147 insertions(+), 2067 deletions(-)

Range-diff versus v2:

 1:  ed337338970 <  -:  ----------- object-file: move `safe_create_leading_directories()` into "dir.c"
 -:  ----------- >  1:  8d838f92936 object-file: move `mkdir_in_gitdir()` into "path.c"
 -:  ----------- >  2:  c7c723db86c object-file: move `safe_create_leading_directories()` into "path.c"
 2:  48730ada01e =  3:  2b4db7b2090 object-file: move `git_open_cloexec()` to "compat/open.c"
 3:  a9436d9e4a6 =  4:  7e8b750d652 object-file: move `xmmap()` into "wrapper.c"
 4:  628d0bcce6c !  5:  3b90e8a5841 object-file: split out functions relating to object store subsystem
    @@ object-file.c: static int get_conv_flags(unsigned flags)
      		return 0;
      }
      
    --
     -int odb_mkstemp(struct strbuf *temp_filename, const char *pattern)
     -{
     -	int fd;
    @@ object-file.c: static int get_conv_flags(unsigned flags)
     -	/* slow path */
     -	/* some mkstemp implementations erase temp_filename on failure */
     -	repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern);
    --	safe_create_leading_directories(temp_filename->buf);
    +-	safe_create_leading_directories(the_repository, temp_filename->buf);
     -	return xmkstemp_mode(temp_filename->buf, mode);
     -}
     -
    @@ object-file.c: static int get_conv_flags(unsigned flags)
     -		return fd;
     -
     -	/* slow path */
    --	safe_create_leading_directories_const(name);
    +-	safe_create_leading_directories_const(the_repository, name);
     -	return open(name, O_RDWR|O_CREAT|O_EXCL, 0600);
     -}
     -
    @@ object-store.c (new)
     +	/* slow path */
     +	/* some mkstemp implementations erase temp_filename on failure */
     +	repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern);
    -+	safe_create_leading_directories(temp_filename->buf);
    ++	safe_create_leading_directories(the_repository, temp_filename->buf);
     +	return xmkstemp_mode(temp_filename->buf, mode);
     +}
     +
    @@ object-store.c (new)
     +		return fd;
     +
     +	/* slow path */
    -+	safe_create_leading_directories_const(name);
    ++	safe_create_leading_directories_const(the_repository, name);
     +	return open(name, O_RDWR|O_CREAT|O_EXCL, 0600);
     +}
     +
 5:  fd102d9e50a =  6:  6bc5d4811d3 object-file: split up concerns of `HASH_*` flags
 6:  298a867897d <  -:  ----------- object-file: split out functions relating to index subsystem
 -:  ----------- >  7:  c1e323f1673 object-file: drop `index_blob_stream()`
 7:  53e2bd9d117 =  8:  6b46541fe2a object: split out functions relating to object store subsystem
 8:  00b4a52b8ce !  9:  59df5f22b9d object-store: remove global array of cached objects
    @@ Commit message
         object-store: remove global array of cached objects
     
         Cached objects are virtual objects that can be set up without writing
    -    anything into the object store directly. This mechanism for example
    -    allows us to create fake commits in git-blame(1).
    +    anything into the object store directly, which is used by git-blame(1)
    +    to create fake commits for the working tree.
     
    -    The cached objects are stored in a global variable. Refactor the code so
    -    that we instead store the array as part of the raw object store. This is
    -    another step into the direction of libifying our object database.
    +    These cached objects are stored in a global variable, which is another
    +    roadblock for libification of the object subsystem. Refactor the code so
    +    that we instead store the array as part of the raw object store.
    +
    +    This refactoring raises the question whether virtual objects should
    +    really be specific to a single repository (or rather a single object
    +    store). Hypothetical usecases might for example span across submodules,
    +    and here it may or may not be the right thing to provide virtual objects
    +    across submodule boundaries.
    +
    +    The only existing usecase is git-blame(1) though, which does not know to
    +    blame across submodule boundaries in the first place. As such, storing
    +    these objects both globally and per-repository would achieve the same
    +    result right now. But arguably, if we learned to blame across submodule
    +    boundaries, we would likely want to create separate fare working tree
    +    commits for each of the submodules so that the user can learn which
    +    worktree a specific uncommitted change belongs to. And even if we would
    +    want to create the same fake commit for each of the submodules we could
    +    do that when storing separate virtual objects per object store.
    +
    +    While this is all rather hypothetical, the takeaway is that handling
    +    virtual objects per-object store gives us more flexibility compared to
    +    storing them globally. In a hypothetical future where we have achieved
    +    full libification one might be able to handle unrelated repositories in
    +    a single process, where the state of one repository should not have an
    +    impact on the state of another repository. As such, storing these cached
    +    objects per object store will enable more usecases and should lead to
    +    less surprising outcomes overall.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 9:  6365dc7cd16 ! 10:  df6a749245f object-store: merge "object-store-ll.h" and "object-store.h"
    @@ builtin/replace.c
      #include "object-name.h"
     -#include "object-store-ll.h"
     +#include "object-store.h"
    - #include "read-cache.h"
      #include "replace-object.h"
      #include "tag.h"
    + #include "wildmatch.h"
     
      ## builtin/rev-list.c ##
     @@
    @@ notes-merge.c
     -#include "object-store-ll.h"
     +#include "object-store.h"
      #include "path.h"
    - #include "read-cache.h"
      #include "repository.h"
    + #include "diff.h"
     
      ## object-file.c ##
     @@
    - 
      #include "git-compat-util.h"
      #include "bulk-checkin.h"
    + #include "convert.h"
     +#include "dir.h"
      #include "environment.h"
    + #include "fsck.h"
      #include "gettext.h"
    - #include "hex.h"
     
      ## object-name.c ##
     @@

---
base-commit: 9442b1c919af9aed513eb0a484fe96358a500cf5
change-id: 20250408-pks-split-object-file-c61d7cd2a21f

Comments

Elijah Newren April 16, 2025, 6:41 a.m. UTC | #1
On Tue, Apr 15, 2025 at 2:38 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> Hi,
>
> "object-file.c" is quite a grab-bag of all kinds of different functions.
> Many of these functions aren't really a good fit though and should be
> owned by a different subsystem. This patch series tries to split up
> concerns a bit better by splitting out this functionality into other
> files:
>
>   - `safe_create_leading_directories()` is moved into "dir.c".
>   - `xmmap()` is moved into "wrapper.c".
>   - `git_open_cloexec()` is moved into "compat/open.c".
>   - Several functions attached to `struct index_state` are moved into
>     "read-cache.c".
>   - Several functions related to `struct object_store` are moved into a
>     new file "object-store.c".
>
> "object-file.c" now mostly contains logic to read and write loose object
> files, whereas "object-store.c" contains the higher-level logic to
> manage different object directories for a repository. Eventually, these
> will become the loose object backend as well as the `struct ref_store`
> equivalent for objects, respectively.
>
> The series is built on top of 9d22ac51228 (The third batch, 2025-04-07)
> with ps/object-wo-the-repository at 9442b1c919a (Merge remote-tracking
> branch 'junio/ps/object-wo-the-repository' into HEAD, 2025-04-08) merged
> into it.
>
> Changes in v2:
>   - Fix a grammar issue in one of the commit messages.
>   - Link to v1: https://lore.kernel.org/r/20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im
>
> Changes in v3:
>   - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match
>     naming of similar functions.
>   - Move `safe_create_leading_directories()` et al into "path.c" instead
>     of into "dir.c". This also requires us to start injecting a repo via
>     parameters as "path.c" doesn't have `the_repository` available
>     anymore.
>   - Drop the commit that moves `index_blob_stream()` and related
>     functions.
>   - Expand the reasoning why we want to have cached objects per object
>     store instead of globally.
>   - Drop `index_blob_stream()`, which is a trivial wrapper around
>     `index_blob_bulk_checkin()`.
>   - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im

v3 also addressed my feedback on v1 & v2.

[...]
> Range-diff versus v2:

I read over the range-diff and the three new patches (1, 2, & 7); this
round looks good to me.  I particularly like the extended rationale in
the commit message for what is now patch 9.
Patrick Steinhardt April 16, 2025, 7:44 a.m. UTC | #2
On Tue, Apr 15, 2025 at 11:41:34PM -0700, Elijah Newren wrote:
> On Tue, Apr 15, 2025 at 2:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > Changes in v3:
> >   - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match
> >     naming of similar functions.
> >   - Move `safe_create_leading_directories()` et al into "path.c" instead
> >     of into "dir.c". This also requires us to start injecting a repo via
> >     parameters as "path.c" doesn't have `the_repository` available
> >     anymore.
> >   - Drop the commit that moves `index_blob_stream()` and related
> >     functions.
> >   - Expand the reasoning why we want to have cached objects per object
> >     store instead of globally.
> >   - Drop `index_blob_stream()`, which is a trivial wrapper around
> >     `index_blob_bulk_checkin()`.
> >   - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im
> 
> v3 also addressed my feedback on v1 & v2.
> 
> [...]
> > Range-diff versus v2:
> 
> I read over the range-diff and the three new patches (1, 2, & 7); this
> round looks good to me.  I particularly like the extended rationale in
> the commit message for what is now patch 9.

Thanks for your review!

Patrick
Junio C Hamano April 16, 2025, 4:21 p.m. UTC | #3
Elijah Newren <newren@gmail.com> writes:

> I read over the range-diff and the three new patches (1, 2, & 7); this
> round looks good to me.  I particularly like the extended rationale in
> the commit message for what is now patch 9.

Thanks, both, this round looked good to me, too.