mbox series

[v2,0/7] core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects

Message ID pull.1134.v2.git.1647760560.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Message

Philippe Blain via GitGitGadget March 20, 2022, 7:15 a.m. UTC
V2 changes:

 * Change doc to indicate that only some repo updates are batched
 * Null and zero out control variables in do_batch_fsync under
   unplug_bulk_checkin
 * Make batch mode default on Windows.
 * Update the description for the initial patch that cleans up the
   bulk-checkin infrastructure.
 * Rebase onto 'seen' at 0cac37f38f9.

--Original definition-- When core.fsync includes loose-object, we issue an
fsync after every written object. For a 'git-add' or similar command that
adds a lot of files to the repo, the costs of these fsyncs adds up. One
major factor in this cost is the time it takes for the physical storage
controller to flush its caches to durable media.

This series takes advantage of the writeout-only mode of git_fsync to issue
OS cache writebacks for all of the objects being added to the repository
followed by a single fsync to a dummy file, which should trigger a
filesystem log flush and storage controller cache flush. This mechanism is
known to be safe on common Windows filesystems and expected to be safe on
macOS. Some linux filesystems, such as XFS, will probably do the right thing
as well. See [1] for previous discussion on the predecessor of this patch
series.

This series is important on Windows, where loose-objects are included in the
fsync set by default in Git-For-Windows. In this series, I'm also setting
the default mode for Windows to turn on loose object fsyncing with batch
mode, so that we can get CI coverage of the actual git-for-windows
configuration upstream. We still don't actually issue fsyncs for the test
suite since GIT_TEST_FSYNC is set to 0, but we exercise all of the
surrounding batch mode code.

This work is based on 'seen' at . It's dependent on ns/core-fsyncmethod.

[1]
https://lore.kernel.org/git/2c1ddef6057157d85da74a7274e03eacf0374e45.1629856293.git.gitgitgadget@gmail.com/

Neeraj Singh (7):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncmethod: batched disk flushes for loose-objects
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsync: use batch mode and sync loose objects by default on
    Windows
  core.fsyncmethod: tests for batch mode
  core.fsyncmethod: performance tests for add and stash

 Documentation/config/core.txt |  7 +++
 builtin/unpack-objects.c      |  3 ++
 builtin/update-index.c        |  6 +++
 bulk-checkin.c                | 92 +++++++++++++++++++++++++++++++----
 bulk-checkin.h                |  2 +
 cache.h                       | 12 ++++-
 compat/mingw.h                |  3 ++
 config.c                      |  4 +-
 git-compat-util.h             |  2 +
 object-file.c                 |  2 +
 t/lib-unique-files.sh         | 36 ++++++++++++++
 t/perf/p3700-add.sh           | 59 ++++++++++++++++++++++
 t/perf/p3900-stash.sh         | 62 +++++++++++++++++++++++
 t/perf/perf-lib.sh            |  4 +-
 t/t3700-add.sh                | 22 +++++++++
 t/t3903-stash.sh              | 17 +++++++
 t/t5300-pack-object.sh        | 32 +++++++-----
 17 files changed, 340 insertions(+), 25 deletions(-)
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 0cac37f38f94bb93550eb164b5d574cd96e23785
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1134%2Fneerajsi-msft%2Fns%2Fbatched-fsync-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1134/neerajsi-msft/ns/batched-fsync-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1134

Range-diff vs v1:

 1:  a77d02df626 ! 1:  9c2abd12bbb bulk-checkin: rename 'state' variable and separate 'plugged' boolean
     @@ Metadata
       ## Commit message ##
          bulk-checkin: rename 'state' variable and separate 'plugged' boolean
      
     -    Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.
     +    This commit prepares for adding batch-fsync to the bulk-checkin
     +    infrastructure.
     +
     +    The bulk-checkin infrastructure is currently used to batch up addition
     +    of large blobs to a packfile. When a blob is larger than
     +    big_file_threshold, we unconditionally add it to a pack. If bulk
     +    checkins are 'plugged', we allow multiple large blobs to be added to a
     +    single pack until we reach the packfile size limit; otherwise, we simply
     +    make a new packfile for each large blob. The 'unplug' call tells us when
     +    the series of blob additions is done so that we can finish the packfiles
     +    and make their objects available to subsequent operations.
     +
     +    Stated another way, bulk-checkin allows callers to define a transaction
     +    that adds multiple objects to the object database, where the object
     +    database can optimize its internal operations within the transaction
     +    boundary.
     +
     +    Batched fsync will fit into bulk-checkin by taking advantage of the
     +    plug/unplug functionality to determine the appropriate time to fsync
     +    and make newly-added objects available in the primary object database.
      
          * Rename 'state' variable to 'bulk_checkin_state', since we will later
            be adding 'bulk_fsync_objdir'.  This also makes the variable easier to
 2:  d38f20b4430 ! 2:  3ed1dcd9b9b core.fsyncmethod: batched disk flushes for loose-objects
     @@ Documentation/config/core.txt: core.fsyncMethod::
         filesystem and storage hardware, data added to the repository may not be
         durable in the event of a system crash. This is the default mode on macOS.
      +* `batch` enables a mode that uses writeout-only flushes to stage multiple
     -+  updates in the disk writeback cache and then a single full fsync to trigger
     -+  the disk cache flush at the end of the operation. This mode is expected to
     ++  updates in the disk writeback cache and then does a single full fsync of
     ++  a dummy file to trigger the disk cache flush at the end of the operation.
     ++  Currently `batch` mode only applies to loose-object files. Other repository
     ++  data is made durable as if `fsync` was specified. This mode is expected to
      +  be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems
      +  and on Windows for repos stored on NTFS or ReFS filesystems.
       
     @@ bulk-checkin.c: clear_exit:
      +		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
      +		delete_tempfile(&temp);
      +		strbuf_release(&temp_path);
     ++		needs_batch_fsync = 0;
      +	}
      +
     -+	if (bulk_fsync_objdir)
     ++	if (bulk_fsync_objdir) {
      +		tmp_objdir_migrate(bulk_fsync_objdir);
     ++		bulk_fsync_objdir = NULL;
     ++	}
      +}
      +
       static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 3:  b0480f0c814 = 3:  54797dbc520 update-index: use the bulk-checkin infrastructure
 4:  99e3a61b919 = 4:  6662e2dae0f unpack-objects: use the bulk-checkin infrastructure
 5:  4e56c58c8cb ! 5:  03bf591742a core.fsync: use batch mode and sync loose objects by default on Windows
     @@ Commit message
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     -    change fsyncmethod to batch as well
     -
       ## cache.h ##
      @@ cache.h: enum fsync_component {
       			      FSYNC_COMPONENT_INDEX | \
 6:  88e47047d79 = 6:  1937746df47 core.fsyncmethod: tests for batch mode
 7:  876741f1ef9 = 7:  624244078c7 core.fsyncmethod: performance tests for add and stash

Comments

Junio C Hamano March 21, 2022, 5:03 p.m. UTC | #1
"Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> V2 changes:
>
>  * Change doc to indicate that only some repo updates are batched

OK.

>  * Null and zero out control variables in do_batch_fsync under
>    unplug_bulk_checkin

OK.

>  * Make batch mode default on Windows.

I do not care either way ;-)

>  * Update the description for the initial patch that cleans up the
>    bulk-checkin infrastructure.

OK.

>  * Rebase onto 'seen' at 0cac37f38f9.

That's unfortunate.  Having to depend on almost everything in 'seen'
is a guaranteed way to ensure that the topic would never graduate to
'next'.

For this topic, ns/core-fsyncmethod is the only thing outside of
'master' that the previous round needed, so I did an equivalent of

    $ git checkout -b ns/batch-fsync b896f729e2
    $ git merge ns/core-fsyncmethod 

to prepare fd008b1442 and then queued the patches on top, i.e.

    $ git am -s mbox

> This work is based on 'seen' at . It's dependent on ns/core-fsyncmethod.

"at ."?

In any case, I've applied them on 0cac37f38f9 and then re-applied
the result on top of fd008b1442 (i.e. the same base as the previous
round was queued), which, with the magic of "am -3", applied
cleanly.  Double checking the result was also simple (i.e. the tip of
such an application on top of fd008b1442 can be merged with
0cac37f38f9 and the result should be identical to the result of
applying them directly on top of 0cac37f38f9) and seems to have
produced the right result.

\Thanks.
Neeraj Singh March 21, 2022, 6:14 p.m. UTC | #2
On Mon, Mar 21, 2022 at 10:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> >  * Rebase onto 'seen' at 0cac37f38f9.
>
> That's unfortunate.  Having to depend on almost everything in 'seen'
> is a guaranteed way to ensure that the topic would never graduate to
> 'next'.
>
> For this topic, ns/core-fsyncmethod is the only thing outside of
> 'master' that the previous round needed, so I did an equivalent of
>
>     $ git checkout -b ns/batch-fsync b896f729e2
>     $ git merge ns/core-fsyncmethod
>
> to prepare fd008b1442 and then queued the patches on top, i.e.
>
>     $ git am -s mbox
>
> > This work is based on 'seen' at . It's dependent on ns/core-fsyncmethod.
>
> "at ."?
>
> In any case, I've applied them on 0cac37f38f9 and then re-applied
> the result on top of fd008b1442 (i.e. the same base as the previous
> round was queued), which, with the magic of "am -3", applied
> cleanly.  Double checking the result was also simple (i.e. the tip of
> such an application on top of fd008b1442 can be merged with
> 0cac37f38f9 and the result should be identical to the result of
> applying them directly on top of 0cac37f38f9) and seems to have
> produced the right result.
>
> \Thanks.

Thanks Junio.  I was worried about how to properly represent the dependency
between these two in-flight branches without waiting for ns/core-fsyncmethod to
get into next.   Now ns/core-fsyncmethod appears to be there, so I'm assuming
that branch should have a stable OID until the end of the cycle.

Should I base future versions of this series on the tip of
ns/core-fsyncmethod, or
on the merge point between that branch and 'next'?  I guess it doesn't
really matter
if the merge is clean.

Thanks,
Neeraj
Junio C Hamano March 21, 2022, 8:49 p.m. UTC | #3
Neeraj Singh <nksingh85@gmail.com> writes:

>> In any case, I've applied them on 0cac37f38f9 and then re-applied
>> the result on top of fd008b1442 (i.e. the same base as the previous
>> round was queued), which, with the magic of "am -3", applied
>> cleanly.  Double checking the result was also simple (i.e. the tip of
>> such an application on top of fd008b1442 can be merged with
>> 0cac37f38f9 and the result should be identical to the result of
>> applying them directly on top of 0cac37f38f9) and seems to have
>> produced the right result.
>>
>> \Thanks.
>
> Thanks Junio.  I was worried about how to properly represent the dependency
> between these two in-flight branches without waiting for ns/core-fsyncmethod to
> get into next.   Now ns/core-fsyncmethod appears to be there, so I'm assuming
> that branch should have a stable OID until the end of the cycle.
>
> Should I base future versions of this series on the tip of
> ns/core-fsyncmethod, or
> on the merge point between that branch and 'next'?

Please base it on fd008b1442 (i.e. the same base as this and the
previous round was queued on), unless there is a strong reason to
rebase elsewhere.

Thanks.