Message ID | 6662e2dae0f5d65c158fba785d186885f9671073.1647760561.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand |
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Neeraj Singh <neerajsi@microsoft.com> > > The unpack-objects functionality is used by fetch, push, and fast-import > to turn the transfered data into object database entries when there are > fewer objects than the 'unpacklimit' setting. > > By enabling bulk-checkin when unpacking objects, we can take advantage > of batched fsyncs. This feels confused in that we dispatch to unpack-objects (instead of index-objects) only when the number of loose objects should not matter from performance point of view, and bulk-checkin should shine from performance point of view only when there are enough objects to batch. Also if we ever add "too many small loose objects is wasteful, let's send them into a single 'batch pack'" optimization, it would create a funny situation where the caller sends the contents of a small incoming packfile to unpack-objects, but the command chooses to bunch them all together in a packfile anyway ;-) So, I dunno. > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> > --- > builtin/unpack-objects.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c > index dbeb0680a58..c55b6616aed 100644 > --- a/builtin/unpack-objects.c > +++ b/builtin/unpack-objects.c > @@ -1,5 +1,6 @@ > #include "builtin.h" > #include "cache.h" > +#include "bulk-checkin.h" > #include "config.h" > #include "object-store.h" > #include "object.h" > @@ -503,10 +504,12 @@ static void unpack_all(void) > if (!quiet) > progress = start_progress(_("Unpacking objects"), nr_objects); > CALLOC_ARRAY(obj_list, nr_objects); > + plug_bulk_checkin(); > for (i = 0; i < nr_objects; i++) { > unpack_one(i); > display_progress(progress, i + 1); > } > + unplug_bulk_checkin(); > stop_progress(&progress); > > if (delta_list)
On Mon, Mar 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Neeraj Singh <neerajsi@microsoft.com> > > > > The unpack-objects functionality is used by fetch, push, and fast-import > > to turn the transfered data into object database entries when there are > > fewer objects than the 'unpacklimit' setting. > > > > By enabling bulk-checkin when unpacking objects, we can take advantage > > of batched fsyncs. > > This feels confused in that we dispatch to unpack-objects (instead > of index-objects) only when the number of loose objects should not > matter from performance point of view, and bulk-checkin should shine > from performance point of view only when there are enough objects to > batch. > > Also if we ever add "too many small loose objects is wasteful, let's > send them into a single 'batch pack'" optimization, it would create > a funny situation where the caller sends the contents of a small > incoming packfile to unpack-objects, but the command chooses to > bunch them all together in a packfile anyway ;-) > > So, I dunno. > I'd be happy to just drop this patch. I originally added it to answer Avarab's question: how does batch mode compare to packfiles? [1] [2]. [1] https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ [2] https://lore.kernel.org/git/pull.1076.v5.git.git.1632514331.gitgitgadget@gmail.com/
On Mon, Mar 21, 2022 at 4:02 PM Neeraj Singh <nksingh85@gmail.com> wrote: > > On Mon, Mar 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote: > > > > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > From: Neeraj Singh <neerajsi@microsoft.com> > > > > > > The unpack-objects functionality is used by fetch, push, and fast-import > > > to turn the transfered data into object database entries when there are > > > fewer objects than the 'unpacklimit' setting. > > > > > > By enabling bulk-checkin when unpacking objects, we can take advantage > > > of batched fsyncs. > > > > This feels confused in that we dispatch to unpack-objects (instead > > of index-objects) only when the number of loose objects should not > > matter from performance point of view, and bulk-checkin should shine > > from performance point of view only when there are enough objects to > > batch. > > > > Also if we ever add "too many small loose objects is wasteful, let's > > send them into a single 'batch pack'" optimization, it would create > > a funny situation where the caller sends the contents of a small > > incoming packfile to unpack-objects, but the command chooses to > > bunch them all together in a packfile anyway ;-) > > > > So, I dunno. > > > > I'd be happy to just drop this patch. I originally added it to answer Avarab's > question: how does batch mode compare to packfiles? [1] [2]. > > [1] https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ > [2] https://lore.kernel.org/git/pull.1076.v5.git.git.1632514331.gitgitgadget@gmail.com/ Well looking back again at the spreadsheet [3], at 90 objects, which is below the default transfer.unpackLimit, we see a 3x difference in performance between batch mode and the default fsync mode. That's a different interaction class (230 ms versus 760 ms). I'll include a small table in the commit description with these performance numbers to help justify it. [3] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index dbeb0680a58..c55b6616aed 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "object-store.h" #include "object.h" @@ -503,10 +504,12 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); + plug_bulk_checkin(); for (i = 0; i < nr_objects; i++) { unpack_one(i); display_progress(progress, i + 1); } + unplug_bulk_checkin(); stop_progress(&progress); if (delta_list)