Message ID | 20250415-pks-split-object-file-v3-0-6aa7db7ad7b0@pks.im (mailing list archive) |
---|---|
Headers | show |
Series | Split up "object-file.c" | expand |
On Tue, Apr 15, 2025 at 2:38 AM Patrick Steinhardt <ps@pks.im> wrote: > > Hi, > > "object-file.c" is quite a grab-bag of all kinds of different functions. > Many of these functions aren't really a good fit though and should be > owned by a different subsystem. This patch series tries to split up > concerns a bit better by splitting out this functionality into other > files: > > - `safe_create_leading_directories()` is moved into "dir.c". > - `xmmap()` is moved into "wrapper.c". > - `git_open_cloexec()` is moved into "compat/open.c". > - Several functions attached to `struct index_state` are moved into > "read-cache.c". > - Several functions related to `struct object_store` are moved into a > new file "object-store.c". > > "object-file.c" now mostly contains logic to read and write loose object > files, whereas "object-store.c" contains the higher-level logic to > manage different object directories for a repository. Eventually, these > will become the loose object backend as well as the `struct ref_store` > equivalent for objects, respectively. > > The series is built on top of 9d22ac51228 (The third batch, 2025-04-07) > with ps/object-wo-the-repository at 9442b1c919a (Merge remote-tracking > branch 'junio/ps/object-wo-the-repository' into HEAD, 2025-04-08) merged > into it. > > Changes in v2: > - Fix a grammar issue in one of the commit messages. > - Link to v1: https://lore.kernel.org/r/20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im > > Changes in v3: > - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match > naming of similar functions. > - Move `safe_create_leading_directories()` et al into "path.c" instead > of into "dir.c". This also requires us to start injecting a repo via > parameters as "path.c" doesn't have `the_repository` available > anymore. > - Drop the commit that moves `index_blob_stream()` and related > functions. > - Expand the reasoning why we want to have cached objects per object > store instead of globally. > - Drop `index_blob_stream()`, which is a trivial wrapper around > `index_blob_bulk_checkin()`. > - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im v3 also addressed my feedback on v1 & v2. [...] > Range-diff versus v2: I read over the range-diff and the three new patches (1, 2, & 7); this round looks good to me. I particularly like the extended rationale in the commit message for what is now patch 9.
On Tue, Apr 15, 2025 at 11:41:34PM -0700, Elijah Newren wrote: > On Tue, Apr 15, 2025 at 2:38 AM Patrick Steinhardt <ps@pks.im> wrote: > > Changes in v3: > > - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match > > naming of similar functions. > > - Move `safe_create_leading_directories()` et al into "path.c" instead > > of into "dir.c". This also requires us to start injecting a repo via > > parameters as "path.c" doesn't have `the_repository` available > > anymore. > > - Drop the commit that moves `index_blob_stream()` and related > > functions. > > - Expand the reasoning why we want to have cached objects per object > > store instead of globally. > > - Drop `index_blob_stream()`, which is a trivial wrapper around > > `index_blob_bulk_checkin()`. > > - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im > > v3 also addressed my feedback on v1 & v2. > > [...] > > Range-diff versus v2: > > I read over the range-diff and the three new patches (1, 2, & 7); this > round looks good to me. I particularly like the extended rationale in > the commit message for what is now patch 9. Thanks for your review! Patrick
Elijah Newren <newren@gmail.com> writes: > I read over the range-diff and the three new patches (1, 2, & 7); this > round looks good to me. I particularly like the extended rationale in > the commit message for what is now patch 9. Thanks, both, this round looked good to me, too.
Hi, "object-file.c" is quite a grab-bag of all kinds of different functions. Many of these functions aren't really a good fit though and should be owned by a different subsystem. This patch series tries to split up concerns a bit better by splitting out this functionality into other files: - `safe_create_leading_directories()` is moved into "dir.c". - `xmmap()` is moved into "wrapper.c". - `git_open_cloexec()` is moved into "compat/open.c". - Several functions attached to `struct index_state` are moved into "read-cache.c". - Several functions related to `struct object_store` are moved into a new file "object-store.c". "object-file.c" now mostly contains logic to read and write loose object files, whereas "object-store.c" contains the higher-level logic to manage different object directories for a repository. Eventually, these will become the loose object backend as well as the `struct ref_store` equivalent for objects, respectively. The series is built on top of 9d22ac51228 (The third batch, 2025-04-07) with ps/object-wo-the-repository at 9442b1c919a (Merge remote-tracking branch 'junio/ps/object-wo-the-repository' into HEAD, 2025-04-08) merged into it. Changes in v2: - Fix a grammar issue in one of the commit messages. - Link to v1: https://lore.kernel.org/r/20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im Changes in v3: - Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match naming of similar functions. - Move `safe_create_leading_directories()` et al into "path.c" instead of into "dir.c". This also requires us to start injecting a repo via parameters as "path.c" doesn't have `the_repository` available anymore. - Drop the commit that moves `index_blob_stream()` and related functions. - Expand the reasoning why we want to have cached objects per object store instead of globally. - Drop `index_blob_stream()`, which is a trivial wrapper around `index_blob_bulk_checkin()`. - Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im Thanks! Patrick --- Patrick Steinhardt (10): object-file: move `mkdir_in_gitdir()` into "path.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `xmmap()` into "wrapper.c" object-file: split out functions relating to object store subsystem object-file: split up concerns of `HASH_*` flags object-file: drop `index_blob_stream()` object: split out functions relating to object store subsystem object-store: remove global array of cached objects object-store: merge "object-store-ll.h" and "object-store.h" Makefile | 3 +- apply.c | 2 +- archive-tar.c | 2 +- archive-zip.c | 2 +- archive.c | 2 +- attr.c | 2 +- bisect.c | 2 +- blame.c | 4 +- builtin/backfill.c | 2 +- builtin/blame.c | 2 +- builtin/bugreport.c | 4 +- builtin/cat-file.c | 2 +- builtin/checkout.c | 3 +- builtin/clone.c | 6 +- builtin/commit-graph.c | 2 +- builtin/commit-tree.c | 2 +- builtin/count-objects.c | 2 +- builtin/credential-cache--daemon.c | 4 +- builtin/describe.c | 2 +- builtin/diagnose.c | 4 +- builtin/difftool.c | 31 +- builtin/fast-export.c | 2 +- builtin/fast-import.c | 4 +- builtin/fetch.c | 2 +- builtin/fsck.c | 4 +- builtin/gc.c | 9 +- builtin/grep.c | 2 +- builtin/hash-object.c | 25 +- builtin/index-pack.c | 2 +- builtin/init-db.c | 3 +- builtin/log.c | 6 +- builtin/ls-tree.c | 2 +- builtin/merge-file.c | 1 + builtin/merge-tree.c | 2 +- builtin/mktag.c | 2 +- builtin/mktree.c | 3 +- builtin/multi-pack-index.c | 2 +- builtin/mv.c | 3 +- builtin/notes.c | 3 +- builtin/pack-objects.c | 2 +- builtin/pack-redundant.c | 2 +- builtin/prune.c | 2 +- builtin/rebase.c | 3 +- builtin/receive-pack.c | 3 +- builtin/remote.c | 2 +- builtin/repack.c | 2 +- builtin/replace.c | 4 +- builtin/rev-list.c | 2 +- builtin/show-ref.c | 2 +- builtin/sparse-checkout.c | 5 +- builtin/submodule--helper.c | 6 +- builtin/tag.c | 3 +- builtin/unpack-file.c | 3 +- builtin/unpack-objects.c | 3 +- builtin/update-index.c | 2 +- builtin/worktree.c | 8 +- bulk-checkin.c | 8 +- bulk-checkin.h | 15 + bundle-uri.c | 2 +- bundle.c | 2 +- cache-tree.c | 4 +- combine-diff.c | 2 +- commit-graph.c | 5 +- commit-graph.h | 2 +- commit.c | 3 +- compat/open.c | 29 + config.c | 2 +- connected.c | 2 +- convert.c | 2 +- diagnose.c | 2 +- diff.c | 2 +- diffcore-rename.c | 2 +- dir.c | 7 +- entry.c | 2 +- fetch-pack.c | 2 +- fmt-merge-msg.c | 2 +- fsck.c | 2 +- git-compat-util.h | 3 + grep.c | 2 +- http-backend.c | 2 +- http-push.c | 3 +- http-walker.c | 2 +- http.c | 2 +- list-objects-filter.c | 2 +- list-objects.c | 2 +- log-tree.c | 2 +- mailmap.c | 2 +- match-trees.c | 3 +- merge-blobs.c | 2 +- merge-ort.c | 3 +- merge-recursive.c | 6 +- meson.build | 2 + midx-write.c | 2 +- midx.c | 1 - notes-cache.c | 3 +- notes-merge.c | 8 +- notes.c | 3 +- object-file.c | 1220 +----------------------------------- object-file.h | 121 ++-- object-name.c | 2 +- object-store-ll.h | 556 ---------------- object-store.c | 1050 +++++++++++++++++++++++++++++++ object-store.h | 516 ++++++++++++++- object.c | 67 -- oss-fuzz/fuzz-pack-idx.c | 2 +- pack-bitmap-write.c | 2 +- pack-bitmap.c | 3 +- pack-check.c | 2 +- pack-mtimes.c | 3 +- pack-objects.h | 2 +- pack-revindex.c | 3 +- packfile.c | 2 +- path.c | 111 +++- path.h | 45 ++ promisor-remote.c | 2 +- protocol-caps.c | 2 +- prune-packed.c | 2 +- reachable.c | 2 +- read-cache.c | 6 +- ref-filter.c | 2 +- reflog.c | 2 +- refs.c | 2 +- refs/files-backend.c | 4 +- remote.c | 2 +- replace-object.c | 2 +- replace-object.h | 2 +- repository.c | 2 +- rerere.c | 7 +- revision.c | 2 +- send-pack.c | 2 +- sequencer.c | 6 +- server-info.c | 4 +- shallow.c | 2 +- streaming.c | 2 +- submodule-config.c | 2 +- submodule.c | 4 +- t/helper/test-pack-mtimes.c | 2 +- t/helper/test-partial-clone.c | 2 +- t/helper/test-read-graph.c | 2 +- t/helper/test-read-midx.c | 2 +- t/helper/test-ref-store.c | 2 +- tag.c | 2 +- tmp-objdir.c | 2 +- tree-walk.c | 2 +- tree.c | 2 +- unpack-trees.c | 2 +- upload-pack.c | 2 +- walker.c | 2 +- wrapper.c | 48 ++ xdiff-interface.c | 2 +- 150 files changed, 2147 insertions(+), 2067 deletions(-) Range-diff versus v2: 1: ed337338970 < -: ----------- object-file: move `safe_create_leading_directories()` into "dir.c" -: ----------- > 1: 8d838f92936 object-file: move `mkdir_in_gitdir()` into "path.c" -: ----------- > 2: c7c723db86c object-file: move `safe_create_leading_directories()` into "path.c" 2: 48730ada01e = 3: 2b4db7b2090 object-file: move `git_open_cloexec()` to "compat/open.c" 3: a9436d9e4a6 = 4: 7e8b750d652 object-file: move `xmmap()` into "wrapper.c" 4: 628d0bcce6c ! 5: 3b90e8a5841 object-file: split out functions relating to object store subsystem @@ object-file.c: static int get_conv_flags(unsigned flags) return 0; } -- -int odb_mkstemp(struct strbuf *temp_filename, const char *pattern) -{ - int fd; @@ object-file.c: static int get_conv_flags(unsigned flags) - /* slow path */ - /* some mkstemp implementations erase temp_filename on failure */ - repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern); -- safe_create_leading_directories(temp_filename->buf); +- safe_create_leading_directories(the_repository, temp_filename->buf); - return xmkstemp_mode(temp_filename->buf, mode); -} - @@ object-file.c: static int get_conv_flags(unsigned flags) - return fd; - - /* slow path */ -- safe_create_leading_directories_const(name); +- safe_create_leading_directories_const(the_repository, name); - return open(name, O_RDWR|O_CREAT|O_EXCL, 0600); -} - @@ object-store.c (new) + /* slow path */ + /* some mkstemp implementations erase temp_filename on failure */ + repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern); -+ safe_create_leading_directories(temp_filename->buf); ++ safe_create_leading_directories(the_repository, temp_filename->buf); + return xmkstemp_mode(temp_filename->buf, mode); +} + @@ object-store.c (new) + return fd; + + /* slow path */ -+ safe_create_leading_directories_const(name); ++ safe_create_leading_directories_const(the_repository, name); + return open(name, O_RDWR|O_CREAT|O_EXCL, 0600); +} + 5: fd102d9e50a = 6: 6bc5d4811d3 object-file: split up concerns of `HASH_*` flags 6: 298a867897d < -: ----------- object-file: split out functions relating to index subsystem -: ----------- > 7: c1e323f1673 object-file: drop `index_blob_stream()` 7: 53e2bd9d117 = 8: 6b46541fe2a object: split out functions relating to object store subsystem 8: 00b4a52b8ce ! 9: 59df5f22b9d object-store: remove global array of cached objects @@ Commit message object-store: remove global array of cached objects Cached objects are virtual objects that can be set up without writing - anything into the object store directly. This mechanism for example - allows us to create fake commits in git-blame(1). + anything into the object store directly, which is used by git-blame(1) + to create fake commits for the working tree. - The cached objects are stored in a global variable. Refactor the code so - that we instead store the array as part of the raw object store. This is - another step into the direction of libifying our object database. + These cached objects are stored in a global variable, which is another + roadblock for libification of the object subsystem. Refactor the code so + that we instead store the array as part of the raw object store. + + This refactoring raises the question whether virtual objects should + really be specific to a single repository (or rather a single object + store). Hypothetical usecases might for example span across submodules, + and here it may or may not be the right thing to provide virtual objects + across submodule boundaries. + + The only existing usecase is git-blame(1) though, which does not know to + blame across submodule boundaries in the first place. As such, storing + these objects both globally and per-repository would achieve the same + result right now. But arguably, if we learned to blame across submodule + boundaries, we would likely want to create separate fare working tree + commits for each of the submodules so that the user can learn which + worktree a specific uncommitted change belongs to. And even if we would + want to create the same fake commit for each of the submodules we could + do that when storing separate virtual objects per object store. + + While this is all rather hypothetical, the takeaway is that handling + virtual objects per-object store gives us more flexibility compared to + storing them globally. In a hypothetical future where we have achieved + full libification one might be able to handle unrelated repositories in + a single process, where the state of one repository should not have an + impact on the state of another repository. As such, storing these cached + objects per object store will enable more usecases and should lead to + less surprising outcomes overall. Signed-off-by: Patrick Steinhardt <ps@pks.im> 9: 6365dc7cd16 ! 10: df6a749245f object-store: merge "object-store-ll.h" and "object-store.h" @@ builtin/replace.c #include "object-name.h" -#include "object-store-ll.h" +#include "object-store.h" - #include "read-cache.h" #include "replace-object.h" #include "tag.h" + #include "wildmatch.h" ## builtin/rev-list.c ## @@ @@ notes-merge.c -#include "object-store-ll.h" +#include "object-store.h" #include "path.h" - #include "read-cache.h" #include "repository.h" + #include "diff.h" ## object-file.c ## @@ - #include "git-compat-util.h" #include "bulk-checkin.h" + #include "convert.h" +#include "dir.h" #include "environment.h" + #include "fsck.h" #include "gettext.h" - #include "hex.h" ## object-name.c ## @@ --- base-commit: 9442b1c919af9aed513eb0a484fe96358a500cf5 change-id: 20250408-pks-split-object-file-c61d7cd2a21f