Message ID | 913ce1b3df9cf273f1572c256dffad1cacc192a6.1648097906.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand |
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static void end_odb_transaction_if_active(void) > +{ > + if (!odb_transaction_active) > + return; > + > + end_odb_transaction(); > + odb_transaction_active = 0; > +} > __attribute__((format (printf, 1, 2))) > static void report(const char *fmt, ...) > { > @@ -57,6 +68,16 @@ static void report(const char *fmt, ...) > if (!verbose) > return; > > + /* > + * It is possible, though unlikely, that a caller > + * could use the verbose output to synchronize with > + * addition of objects to the object database, so > + * unplug bulk checkin to make sure that future objects > + * are immediately visible. > + */ > + > + end_odb_transaction_if_active(); > + > va_start(vp, fmt); > vprintf(fmt, vp); > putchar('\n'); > @@ -1116,6 +1137,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) > */ > parse_options_start(&ctx, argc, argv, prefix, > options, PARSE_OPT_STOP_AT_NON_OPTION); > + > + /* > + * Allow the object layer to optimize adding multiple objects in > + * a batch. > + */ > + begin_odb_transaction(); > + odb_transaction_active = 1; This looks strange. Shouldn't begin/end pair be responsible for knowing if there is a transaction active already? For that matter, didn't the original unplug in plug/unplug pair automatically turned into no-op when it is already unplugged? IOW, I am not sure end_if_active() should exist in the first place. Shouldn't end_transaction() do that instead?
On Thu, Mar 24, 2022 at 11:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +static void end_odb_transaction_if_active(void) > > +{ > > + if (!odb_transaction_active) > > + return; > > + > > + end_odb_transaction(); > > + odb_transaction_active = 0; > > +} > > > __attribute__((format (printf, 1, 2))) > > static void report(const char *fmt, ...) > > { > > @@ -57,6 +68,16 @@ static void report(const char *fmt, ...) > > if (!verbose) > > return; > > > > + /* > > + * It is possible, though unlikely, that a caller > > + * could use the verbose output to synchronize with > > + * addition of objects to the object database, so > > + * unplug bulk checkin to make sure that future objects > > + * are immediately visible. > > + */ > > + > > + end_odb_transaction_if_active(); > > + > > va_start(vp, fmt); > > vprintf(fmt, vp); > > putchar('\n'); > > @@ -1116,6 +1137,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) > > */ > > parse_options_start(&ctx, argc, argv, prefix, > > options, PARSE_OPT_STOP_AT_NON_OPTION); > > + > > + /* > > + * Allow the object layer to optimize adding multiple objects in > > + * a batch. > > + */ > > + begin_odb_transaction(); > > + odb_transaction_active = 1; > > This looks strange. Shouldn't begin/end pair be responsible for > knowing if there is a transaction active already? For that matter, > didn't the original unplug in plug/unplug pair automatically turned > into no-op when it is already unplugged? > > IOW, I am not sure end_if_active() should exist in the first place. > Shouldn't end_transaction() do that instead? > Today there's an "assert(bulk_checkin_plugged)" in end_odb_transaction. In principle we could just drop the assert and allow a transaction to be ended multiple times. But maybe in the long run for composability we'd like to have nested callers to begin/end transaction (e.g. we could have a nested transaction around writing the cache tree to the ODB to minimize fsyncs there). In that world, having a subsystem not maintain a balanced pairing could be a problem. An alternative API here could be to have an "flush_odb_transaction" call to make the objects visible at this point. Lastly, I could take your original suggested approach of adding a new flag to update-index. I preferred the unplug-on-verbose approach since it would automatically optimize most callers to update-index that might exist in the wild, without users having to change anything. Thanks, Neeraj
Neeraj Singh <nksingh85@gmail.com> writes: >> IOW, I am not sure end_if_active() should exist in the first place. >> Shouldn't end_transaction() do that instead? >> > > Today there's an "assert(bulk_checkin_plugged)" in > end_odb_transaction. In principle we could just drop the assert and > allow a transaction to be ended multiple times. But maybe in the long > run for composability we'd like to have nested callers to begin/end > transaction (e.g. we could have a nested transaction around writing > the cache tree to the ODB to minimize fsyncs there). I am not convinced that "transaction" is a good mental model for this mechanism to begin with, in the sense that the sense that it is not a bug or failure of the implementation if two or more operations in the same <begin,end> bracket did not happen (or not happen) atomically, or if 'begin' and 'end' were not properly nested. With the design getting more complex with things like tentative object store that needs to be explicitly migrated after the outermost level of end-transaction, we may end up _requiring_ that sufficient number of 'end' must come once we issued 'begin', which I am not sure is necessarily a good thing. In any case, we aspire/envision to have a nested plug/unplug, I think it is a good thing. A helper for one subsystem may have its large batch of operations inside plug/unplug pair, another help may do the same, and the caller of these two helpers may want to say plug call helper A A does plug A does many things A does unplug call helper B B does plug B does many things B does unplug unplug to "cancel" the unplug helper A and B has. > In that world, > having a subsystem not maintain a balanced pairing could be a problem. And in such a world, you never want to have end-if-active to implement what you are doing here, as you may end up being not properly nested: begin begin do many things if some condtion end_if_active do more things end end > An alternative API here could be to have an "flush_odb_transaction" > call to make the objects visible at this point. Yes, what you want is a forced-flush instead, I think. So I suspect you'd want these three primitives, perhaps? * begin increments the nesting level - if outermost, you may have to do real "setup" things - otherwise, you may not have anything other than just counting the nesting level * flush implements unplug, fsync, etc. and does so immediately, even when plugged. * end decrements the nesting level - if outermost, you'd do "flush". - otherwise, you may only count the nesting level and do nothing else, but doing "flush" when you realize that you've queued too many is not a bug or a crime.
On Thu, Mar 24, 2022 at 2:34 PM Junio C Hamano <gitster@pobox.com> wrote: > > Neeraj Singh <nksingh85@gmail.com> writes: > > >> IOW, I am not sure end_if_active() should exist in the first place. > >> Shouldn't end_transaction() do that instead? > >> > > > > Today there's an "assert(bulk_checkin_plugged)" in > > end_odb_transaction. In principle we could just drop the assert and > > allow a transaction to be ended multiple times. But maybe in the long > > run for composability we'd like to have nested callers to begin/end > > transaction (e.g. we could have a nested transaction around writing > > the cache tree to the ODB to minimize fsyncs there). > > I am not convinced that "transaction" is a good mental model for > this mechanism to begin with, in the sense that the sense that it is > not a bug or failure of the implementation if two or more operations > in the same <begin,end> bracket did not happen (or not happen) > atomically, or if 'begin' and 'end' were not properly nested. With > the design getting more complex with things like tentative object > store that needs to be explicitly migrated after the outermost level > of end-transaction, we may end up _requiring_ that sufficient number > of 'end' must come once we issued 'begin', which I am not sure is > necessarily a good thing. I don't love the tentative object store that keeps things invisble, but that was the safest way to maintain the invariant that no loose-object name appears in the ODB without durable contents. I think we want the "durability/ordering boundary" part of database transactions without necessarily needing full abort/commit semantics. As you say, we don't need full atomicity, but we do need ordering to ensure that blobs are durable before trees pointing them, and so on up the merkle chain. The begin/end pairs help us defer the syncs required for ordering to the end rather than pessimistically assuming that every object write is the end. > In any case, we aspire/envision to have a nested plug/unplug, I > think it is a good thing. A helper for one subsystem may have its > large batch of operations inside plug/unplug pair, another help may > do the same, and the caller of these two helpers may want to say > > plug > call helper A > A does plug > A does many things > A does unplug > call helper B > B does plug > B does many things > B does unplug > unplug > > to "cancel" the unplug helper A and B has. > > > In that world, > > having a subsystem not maintain a balanced pairing could be a problem. > > And in such a world, you never want to have end-if-active to > implement what you are doing here, as you may end up being not > properly nested: > > begin > begin > do many things > if some condtion > end_if_active > do more things > end > end > > > An alternative API here could be to have an "flush_odb_transaction" > > call to make the objects visible at this point. > > Yes, what you want is a forced-flush instead, I think. > > So I suspect you'd want these three primitives, perhaps? > > * begin increments the nesting level > - if outermost, you may have to do real "setup" things > - otherwise, you may not have anything other than just counting > the nesting level > > * flush implements unplug, fsync, etc. and does so immediately, > even when plugged. > > * end decrements the nesting level > - if outermost, you'd do "flush". > - otherwise, you may only count the nesting level and do nothing else, > but doing "flush" when you realize that you've queued too many > is not a bug or a crime. > Yes, I'll move in this direction. Thanks for the feedback.
diff --git a/builtin/update-index.c b/builtin/update-index.c index aafe7eeac2a..ae7887cfe37 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -32,6 +33,7 @@ static int allow_replace; static int info_only; static int force_remove; static int verbose; +static int odb_transaction_active; static int mark_valid_only; static int mark_skip_worktree_only; static int mark_fsmonitor_only; @@ -49,6 +51,15 @@ enum uc_mode { UC_FORCE }; +static void end_odb_transaction_if_active(void) +{ + if (!odb_transaction_active) + return; + + end_odb_transaction(); + odb_transaction_active = 0; +} + __attribute__((format (printf, 1, 2))) static void report(const char *fmt, ...) { @@ -57,6 +68,16 @@ static void report(const char *fmt, ...) if (!verbose) return; + /* + * It is possible, though unlikely, that a caller + * could use the verbose output to synchronize with + * addition of objects to the object database, so + * unplug bulk checkin to make sure that future objects + * are immediately visible. + */ + + end_odb_transaction_if_active(); + va_start(vp, fmt); vprintf(fmt, vp); putchar('\n'); @@ -1116,6 +1137,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) */ parse_options_start(&ctx, argc, argv, prefix, options, PARSE_OPT_STOP_AT_NON_OPTION); + + /* + * Allow the object layer to optimize adding multiple objects in + * a batch. + */ + begin_odb_transaction(); + odb_transaction_active = 1; while (ctx.argc) { if (parseopt_state != PARSE_OPT_DONE) parseopt_state = parse_options_step(&ctx, options, @@ -1190,6 +1218,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } + /* + * By now we have added all of the new objects + */ + end_odb_transaction_if_active(); + if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; "