Message ID | d045b13795b38caa27f8e25340212f736b66bb05.1648616734.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand |
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Neeraj Singh <neerajsi@microsoft.com> > > Make it clearer in the naming and documentation of the plug_bulk_checkin > and unplug_bulk_checkin APIs that they can be thought of as > a "transaction" to optimize operations on the object database. These > transactions may be nested so that subsystems like the cache-tree > writing code can optimize their operations without caring whether the > top-level code has a transaction active. I can see that "checkin" part of the name is too limiting (you may want to do more than optimize checkin, e.g. fsync), and that you may prefer "begin/end" over "plug/unplug", but I am not sure if we want to limit ourselves to "odb". If we find our code doing things on many instances of something that are not objects (e.g. refs, config variables), don't we want to give them the same chance to be optimized by batching them? {begin,end}_bulk_transaction perhaps? I dunno.
On Wed, Mar 30, 2022 at 10:17 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Neeraj Singh <neerajsi@microsoft.com> > > > > Make it clearer in the naming and documentation of the plug_bulk_checkin > > and unplug_bulk_checkin APIs that they can be thought of as > > a "transaction" to optimize operations on the object database. These > > transactions may be nested so that subsystems like the cache-tree > > writing code can optimize their operations without caring whether the > > top-level code has a transaction active. > > I can see that "checkin" part of the name is too limiting (you may > want to do more than optimize checkin, e.g. fsync), and that you may > prefer "begin/end" over "plug/unplug", but I am not sure if we want > to limit ourselves to "odb". If we find our code doing things on > many instances of something that are not objects (e.g. refs, config > variables), don't we want to give them the same chance to be optimized > by batching them? > > {begin,end}_bulk_transaction perhaps? I dunno. At least in the current code where the implementation of each 'database table' (odb, refs-db, config, index) is pretty separate, it seems better to keep bulk-checkin.c and its plugging scoped to the ODB. Patrick's (who I previously misnamed as 'Peter') older patch at https://lore.kernel.org/git/d9aa96913b1730f1d0c238d7d52e27c20bc55390.1636544377.git.ps@pks.im/ showed a pretty nice and concise implementation for refs tied to the ref-transaction infrastructure. Thanks, Neeraj
diff --git a/builtin/add.c b/builtin/add.c index 3ffb86a4338..9bf37ceae8e 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -670,7 +670,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) string_list_clear(&only_match_skip_worktree, 0); } - plug_bulk_checkin(); + begin_odb_transaction(); if (add_renormalize) exit_status |= renormalize_tracked_files(&pathspec, flags); @@ -682,7 +682,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); - unplug_bulk_checkin(); + end_odb_transaction(); finish: if (write_locked_index(&the_index, &lock_file, diff --git a/bulk-checkin.c b/bulk-checkin.c index 577b135e39c..8b0fd5c7723 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,7 +10,7 @@ #include "packfile.h" #include "object-store.h" -static int bulk_checkin_plugged; +static int odb_transaction_nesting; static struct bulk_checkin_state { char *pack_tmp_name; @@ -280,21 +280,25 @@ int index_bulk_checkin(struct object_id *oid, { int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!bulk_checkin_plugged) + if (!odb_transaction_nesting) finish_bulk_checkin(&bulk_checkin_state); return status; } -void plug_bulk_checkin(void) +void begin_odb_transaction(void) { - assert(!bulk_checkin_plugged); - bulk_checkin_plugged = 1; + odb_transaction_nesting += 1; } -void unplug_bulk_checkin(void) +void end_odb_transaction(void) { - assert(bulk_checkin_plugged); - bulk_checkin_plugged = 0; + odb_transaction_nesting -= 1; + if (odb_transaction_nesting < 0) + BUG("Unbalanced ODB transaction nesting"); + + if (odb_transaction_nesting) + return; + if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..69a94422ac7 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -10,7 +10,17 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); -void plug_bulk_checkin(void); -void unplug_bulk_checkin(void); +/* + * Tell the object database to optimize for adding + * multiple objects. end_odb_transaction must be called + * to make new objects visible. + */ +void begin_odb_transaction(void); + +/* + * Tell the object database to make any objects from the + * current transaction visible. + */ +void end_odb_transaction(void); #endif