mbox series

[v4,00/13] core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects

Message ID pull.1134.v4.git.1648514552.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Message

Philippe Blain via GitGitGadget March 29, 2022, 12:42 a.m. UTC
V4 changes:

 * Make ODB transactions nestable.
 * Add an ODB transaction around writing out the cached tree.
 * Change update-index to use a more straightforward way of managing ODB
   transactions.
 * Fix missing 'local's in lib-unique-files
 * Add a per-iteration setup mechanism to test_perf.
 * Fix camelCasing in warning message.

V3 changes:

 * Rebrand plug/unplug-bulk-checkin to "begin_odb_transaction" and
   "end_odb_transaction"
 * Add a patch to pass filenames to fsync_or_die, rather than the string
   "loose object"
 * Update the commit description for "core.fsyncmethod to explain why we do
   not directly expose objects until an fsync occurs.
 * Also explain in the commit description why we're using a dummy file for
   the fsync.
 * Create the bulk-fsync tmp-objdir lazily the first time a loose object is
   added. We now do fsync iff that objdir exists.
 * Do batch fsync if core.fsyncMethod=batch and core.fsync contains
   loose-object, regardless of the core.fsyncObjectFiles setting.
 * Mitigate the risk in update-index of an object not being visible due to
   bulk checkin.
 * Add a perf comment to justify the unpack-objects usage of bulk-checkin.
 * Add a new patch to create helpers for parsing OIDs from git commands.
 * Add a comment to the lib-unique-files.sh helper about uniqueness only
   within a repo.
 * Fix style and add '&&' chaining to test helpers.
 * Comment on some magic numbers in tests.
 * Take the object list as an argument in
   ./t5300-pack-object.sh:check_unpack ()
 * Drop accidental change to t/perf/perf-lib.sh

V2 changes:

 * Change doc to indicate that only some repo updates are batched
 * Null and zero out control variables in do_batch_fsync under
   unplug_bulk_checkin
 * Make batch mode default on Windows.
 * Update the description for the initial patch that cleans up the
   bulk-checkin infrastructure.
 * Rebase onto 'seen' at 0cac37f38f9.

--Original definition-- When core.fsync includes loose-object, we issue an
fsync after every written object. For a 'git-add' or similar command that
adds a lot of files to the repo, the costs of these fsyncs adds up. One
major factor in this cost is the time it takes for the physical storage
controller to flush its caches to durable media.

This series takes advantage of the writeout-only mode of git_fsync to issue
OS cache writebacks for all of the objects being added to the repository
followed by a single fsync to a dummy file, which should trigger a
filesystem log flush and storage controller cache flush. This mechanism is
known to be safe on common Windows filesystems and expected to be safe on
macOS. Some linux filesystems, such as XFS, will probably do the right thing
as well. See [1] for previous discussion on the predecessor of this patch
series.

This series is important on Windows, where loose-objects are included in the
fsync set by default in Git-For-Windows. In this series, I'm also setting
the default mode for Windows to turn on loose object fsyncing with batch
mode, so that we can get CI coverage of the actual git-for-windows
configuration upstream. We still don't actually issue fsyncs for the test
suite since GIT_TEST_FSYNC is set to 0, but we exercise all of the
surrounding batch mode code.

This work is based on 'next' at c54b8eb302. It's dependent on
ns/core-fsyncmethod.

[1]
https://lore.kernel.org/git/2c1ddef6057157d85da74a7274e03eacf0374e45.1629856293.git.gitgitgadget@gmail.com/

Neeraj Singh (13):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  object-file: pass filename to fsync_or_die
  core.fsyncmethod: batched disk flushes for loose-objects
  cache-tree: use ODB transaction around writing a tree
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsync: use batch mode and sync loose objects by default on
    Windows
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsyncmethod: tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: performance tests for add and stash
  core.fsyncmethod: correctly camel-case warning message

 Documentation/config/core.txt          |   8 ++
 builtin/add.c                          |   4 +-
 builtin/unpack-objects.c               |   3 +
 builtin/update-index.c                 |  24 ++++++
 bulk-checkin.c                         | 101 ++++++++++++++++++++++---
 bulk-checkin.h                         |  17 ++++-
 cache-tree.c                           |   3 +
 cache.h                                |  12 ++-
 compat/mingw.h                         |   3 +
 config.c                               |   6 +-
 git-compat-util.h                      |   2 +
 object-file.c                          |  15 ++--
 t/lib-unique-files.sh                  |  34 +++++++++
 t/perf/p3700-add.sh                    |  59 +++++++++++++++
 t/perf/p4220-log-grep-engines.sh       |   3 +-
 t/perf/p4221-log-grep-engines-fixed.sh |   3 +-
 t/perf/p5302-pack-index.sh             |  15 ++--
 t/perf/p7519-fsmonitor.sh              |  18 +----
 t/perf/p7820-grep-engines.sh           |   6 +-
 t/perf/perf-lib.sh                     |  62 +++++++++++++--
 t/t3700-add.sh                         |  28 +++++++
 t/t3903-stash.sh                       |  20 +++++
 t/t5300-pack-object.sh                 |  41 ++++++----
 t/t5317-pack-objects-filter-objects.sh |  91 +++++++++++-----------
 t/test-lib-functions.sh                |  10 +++
 25 files changed, 469 insertions(+), 119 deletions(-)
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh


base-commit: c54b8eb302ffb72f31e73a26044c8a864e2cb307
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1134%2Fneerajsi-msft%2Fns%2Fbatched-fsync-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1134/neerajsi-msft/ns/batched-fsync-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1134

Range-diff vs v3:

  2:  b2d9766a662 !  1:  c7a2a7efe6d bulk-checkin: rename 'state' variable and separate 'plugged' boolean
     @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
       	return status;
       }
       
     - void begin_odb_transaction(void)
     + void plug_bulk_checkin(void)
       {
      -	state.plugged = 1;
      +	assert(!bulk_checkin_plugged);
      +	bulk_checkin_plugged = 1;
       }
       
     - void end_odb_transaction(void)
     + void unplug_bulk_checkin(void)
       {
      -	state.plugged = 0;
      -	if (state.f)
  1:  53261f0099d !  2:  d045b13795b bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
     @@ Commit message
      
          Make it clearer in the naming and documentation of the plug_bulk_checkin
          and unplug_bulk_checkin APIs that they can be thought of as
     -    a "transaction" to optimize operations on the object database.
     +    a "transaction" to optimize operations on the object database. These
     +    transactions may be nested so that subsystems like the cache-tree
     +    writing code can optimize their operations without caring whether the
     +    top-level code has a transaction active.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       	if (write_locked_index(&the_index, &lock_file,
      
       ## bulk-checkin.c ##
     +@@
     + #include "packfile.h"
     + #include "object-store.h"
     + 
     +-static int bulk_checkin_plugged;
     ++static int odb_transaction_nesting;
     + 
     + static struct bulk_checkin_state {
     + 	char *pack_tmp_name;
      @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
     + {
     + 	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
     + 				     path, flags);
     +-	if (!bulk_checkin_plugged)
     ++	if (!odb_transaction_nesting)
     + 		finish_bulk_checkin(&bulk_checkin_state);
       	return status;
       }
       
      -void plug_bulk_checkin(void)
      +void begin_odb_transaction(void)
       {
     - 	state.plugged = 1;
     +-	assert(!bulk_checkin_plugged);
     +-	bulk_checkin_plugged = 1;
     ++	odb_transaction_nesting += 1;
       }
       
      -void unplug_bulk_checkin(void)
      +void end_odb_transaction(void)
       {
     - 	state.plugged = 0;
     - 	if (state.f)
     +-	assert(bulk_checkin_plugged);
     +-	bulk_checkin_plugged = 0;
     ++	odb_transaction_nesting -= 1;
     ++	if (odb_transaction_nesting < 0)
     ++		BUG("Unbalanced ODB transaction nesting");
     ++
     ++	if (odb_transaction_nesting)
     ++		return;
     ++
     + 	if (bulk_checkin_state.f)
     + 		finish_bulk_checkin(&bulk_checkin_state);
     + }
      
       ## bulk-checkin.h ##
      @@ bulk-checkin.h: int index_bulk_checkin(struct object_id *oid,
  3:  26ce5b8fdda =  3:  2d1bc4568ac object-file: pass filename to fsync_or_die
  4:  52638326790 !  4:  9e7ae22fa4a core.fsyncmethod: batched disk flushes for loose-objects
     @@ bulk-checkin.c
       #include "packfile.h"
       #include "object-store.h"
       
     - static int bulk_checkin_plugged;
     + static int odb_transaction_nesting;
       
      +static struct tmp_objdir *bulk_fsync_objdir;
      +
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +	 * callers may not know whether any objects will be
      +	 * added at the time they call begin_odb_transaction.
      +	 */
     -+	if (!bulk_checkin_plugged || bulk_fsync_objdir)
     ++	if (!odb_transaction_nesting || bulk_fsync_objdir)
      +		return;
      +
      +	bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +void fsync_loose_object_bulk_checkin(int fd, const char *filename)
      +{
      +	/*
     -+	 * If we have a plugged bulk checkin, we issue a call that
     ++	 * If we have an active ODB transaction, we issue a call that
      +	 * cleans the filesystem page cache but avoids a hardware flush
      +	 * command. Later on we will issue a single hardware flush
      +	 * before as part of do_batch_fsync.
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags)
      @@ bulk-checkin.c: void end_odb_transaction(void)
     - 	bulk_checkin_plugged = 0;
     + 
       	if (bulk_checkin_state.f)
       		finish_bulk_checkin(&bulk_checkin_state);
      +
  -:  ----------- >  5:  83fa4a5f3a5 cache-tree: use ODB transaction around writing a tree
  5:  913ce1b3df9 !  6:  f03ebee695a update-index: use the bulk-checkin infrastructure
     @@ Commit message
          There is some risk with this change, since under batch fsync, the object
          files will be in a tmp-objdir until update-index is complete, so callers
          using the --stdin option will not see them until update-index is done.
     -    This risk is mitigated by unplugging the batch when reporting verbose
     -    output, which is the only way a --stdin caller might synchronize with
     -    the addition of an object.
     +    This risk is mitigated by not keeping an ODB transaction open around
     +    --stdin processing if in --verbose mode. Without --verbose mode,
     +    a caller feeding update-index via --stdin wouldn't know when
     +    update-index adds an object, event without an ODB transaction.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     @@ builtin/update-index.c
       #include "config.h"
       #include "lockfile.h"
       #include "quote.h"
     -@@ builtin/update-index.c: static int allow_replace;
     - static int info_only;
     - static int force_remove;
     - static int verbose;
     -+static int odb_transaction_active;
     - static int mark_valid_only;
     - static int mark_skip_worktree_only;
     - static int mark_fsmonitor_only;
     -@@ builtin/update-index.c: enum uc_mode {
     - 	UC_FORCE
     - };
     - 
     -+static void end_odb_transaction_if_active(void)
     -+{
     -+	if (!odb_transaction_active)
     -+		return;
     -+
     -+	end_odb_transaction();
     -+	odb_transaction_active = 0;
     -+}
     -+
     - __attribute__((format (printf, 1, 2)))
     - static void report(const char *fmt, ...)
     - {
     -@@ builtin/update-index.c: static void report(const char *fmt, ...)
     - 	if (!verbose)
     - 		return;
     - 
     -+	/*
     -+	 * It is possible, though unlikely, that a caller
     -+	 * could use the verbose output to synchronize with
     -+	 * addition of objects to the object database, so
     -+	 * unplug bulk checkin to make sure that future objects
     -+	 * are immediately visible.
     -+	 */
     -+
     -+	end_odb_transaction_if_active();
     -+
     - 	va_start(vp, fmt);
     - 	vprintf(fmt, vp);
     - 	putchar('\n');
      @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
       	 */
       	parse_options_start(&ctx, argc, argv, prefix,
     @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const
      +	 * a batch.
      +	 */
      +	begin_odb_transaction();
     -+	odb_transaction_active = 1;
       	while (ctx.argc) {
       		if (parseopt_state != PARSE_OPT_DONE)
       			parseopt_state = parse_options_step(&ctx, options,
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 		the_index.version = preferred_index_format;
     + 	}
     + 
     ++	/*
     ++	 * It is possible, though unlikely, that a caller could use the verbose
     ++	 * output to synchronize with addition of objects to the object
     ++	 * database. The current implementation of ODB transactions leaves
     ++	 * objects invisible while a transaction is active, so end the
     ++	 * transaction here if verbose output is enabled.
     ++	 */
     ++
     ++	if (verbose)
     ++		end_odb_transaction();
     ++
     + 	if (read_from_stdin) {
     + 		struct strbuf buf = STRBUF_INIT;
     + 		struct strbuf unquoted = STRBUF_INIT;
      @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
       		strbuf_release(&buf);
       	}
     @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const
      +	/*
      +	 * By now we have added all of the new objects
      +	 */
     -+	end_odb_transaction_if_active();
     ++	if (!verbose)
     ++		end_odb_transaction();
      +
       	if (split_index > 0) {
       		if (git_config_get_split_index() == 0)
  6:  84fd144ef18 =  7:  d85013f7d2c unpack-objects: use the bulk-checkin infrastructure
  7:  447263e8ef1 =  8:  73e54f94c20 core.fsync: use batch mode and sync loose objects by default on Windows
  8:  8f1b01c9ca0 =  9:  124450c86d9 test-lib-functions: add parsing helpers for ls-files and ls-tree
  9:  b5f371e97fe ! 10:  282fbdef792 core.fsyncmethod: tests for batch mode
     @@ t/lib-unique-files.sh (new)
      +	local files="$2" &&
      +	local basedir="$3" &&
      +	local counter=0 &&
     ++	local i &&
     ++	local j &&
      +	test_tick &&
      +	local basedata=$basedir$test_tick &&
      +	rm -rf "$basedir" &&
  -:  ----------- > 11:  ee7ecf4cabe t/perf: add iteration setup mechanism to perf-lib
 10:  b99b32a469c ! 12:  fdf90d45f52 core.fsyncmethod: performance tests for add and stash
     @@ t/perf/p3700-add.sh (new)
      +# core.fsyncMethod=batch mode, which is why we are testing different values
      +# of that setting explicitly and creating a lot of unique objects.
      +
     -+test_description="Tests performance of add"
     ++test_description="Tests performance of adding things to the object database"
      +
      +# Fsync is normally turned off for the test suite.
      +GIT_TEST_FSYNC=1
     @@ t/perf/p3700-add.sh (new)
      +
      +. $TEST_DIRECTORY/lib-unique-files.sh
      +
     -+test_perf_default_repo
     ++test_perf_fresh_repo
      +test_checkout_worktree
      +
      +dir_count=10
      +files_per_dir=50
      +total_files=$((dir_count * files_per_dir))
      +
     -+# We need to create the files each time we run the perf test, but
     -+# we do not want to measure the cost of creating the files, so run
     -+# the test once.
     -+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
     -+then
     -+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
     -+	GIT_PERF_REPEAT_COUNT=1
     -+fi
     -+
     -+for m in false true batch
     ++for mode in false true batch
      +do
     -+	test_expect_success "create the files for object_fsyncing=$m" '
     -+		git reset --hard &&
     -+		# create files across directories
     -+		test_create_unique_files $dir_count $files_per_dir files
     -+	'
     -+
     -+	case $m in
     ++	case $mode in
      +	false)
      +		FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync'
      +		;;
     @@ t/perf/p3700-add.sh (new)
      +		;;
      +	esac
      +
     -+	test_perf "add $total_files files (object_fsyncing=$m)" "
     -+		git $FSYNC_CONFIG add files
     ++	test_perf "add $total_files files (object_fsyncing=$mode)" \
     ++		--setup "
     ++		(rm -rf .git || 1) &&
     ++		git init &&
     ++		test_create_unique_files $dir_count $files_per_dir files_$mode
     ++	" "
     ++		git $FSYNC_CONFIG add files_$mode
      +	"
     -+done
     -+
     -+test_done
     -
     - ## t/perf/p3900-stash.sh (new) ##
     -@@
     -+#!/bin/sh
     -+#
     -+# This test measures the performance of adding new files to the object database
     -+# and index. The test was originally added to measure the effect of the
     -+# core.fsyncMethod=batch mode, which is why we are testing different values
     -+# of that setting explicitly and creating a lot of unique objects.
     -+
     -+test_description="Tests performance of stash"
     -+
     -+# Fsync is normally turned off for the test suite.
     -+GIT_TEST_FSYNC=1
     -+export GIT_TEST_FSYNC
     -+
     -+. ./perf-lib.sh
     -+
     -+. $TEST_DIRECTORY/lib-unique-files.sh
     -+
     -+test_perf_default_repo
     -+test_checkout_worktree
     -+
     -+dir_count=10
     -+files_per_dir=50
     -+total_files=$((dir_count * files_per_dir))
     -+
     -+# We need to create the files each time we run the perf test, but
     -+# we do not want to measure the cost of creating the files, so run
     -+# the test once.
     -+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
     -+then
     -+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
     -+	GIT_PERF_REPEAT_COUNT=1
     -+fi
     -+
     -+for m in false true batch
     -+do
     -+	test_expect_success "create the files for object_fsyncing=$m" '
     -+		git reset --hard &&
     -+		# create files across directories
     -+		test_create_unique_files $dir_count $files_per_dir files
     -+	'
     -+
     -+	case $m in
     -+	false)
     -+		FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync'
     -+		;;
     -+	true)
     -+		FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync'
     -+		;;
     -+	batch)
     -+		FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch'
     -+		;;
     -+	esac
      +
     -+	# We only stash files in the 'files' subdirectory since
     -+	# the perf test infrastructure creates files in the
     -+	# current working directory that need to be preserved
     -+	test_perf "stash $total_files files (object_fsyncing=$m)" "
     -+		git $FSYNC_CONFIG stash push -u -- files
     ++	test_perf "stash $total_files files (object_fsyncing=$mode)" \
     ++		--setup "
     ++		(rm -rf .git || 1) &&
     ++		git init &&
     ++		test_commit first &&
     ++		test_create_unique_files $dir_count $files_per_dir stash_files_$mode
     ++	" "
     ++		git $FSYNC_CONFIG stash push -u -- stash_files_$mode
      +	"
      +done
      +
 11:  6b832e89bc4 = 13:  fb30bd02c8d core.fsyncmethod: correctly camel-case warning message

Comments

Ævar Arnfjörð Bjarmason March 29, 2022, 10:47 a.m. UTC | #1
On Tue, Mar 29 2022, Neeraj K. Singh via GitGitGadget wrote:

> V4 changes:
>
>  * Make ODB transactions nestable.
>  * Add an ODB transaction around writing out the cached tree.
>  * Change update-index to use a more straightforward way of managing ODB
>    transactions.
>  * Fix missing 'local's in lib-unique-files
>  * Add a per-iteration setup mechanism to test_perf.
>  * Fix camelCasing in warning message.

I haven't looked at the bulk of this in any detail, but:

>  10:  b99b32a469c ! 12:  fdf90d45f52 core.fsyncmethod: performance tests for add and stash
>      @@ t/perf/p3700-add.sh (new)
>       +# core.fsyncMethod=batch mode, which is why we are testing different values
>       +# of that setting explicitly and creating a lot of unique objects.
>       +
>      -+test_description="Tests performance of add"
>      ++test_description="Tests performance of adding things to the object database"

Now having both tests for "add" and "stash" in a test named p3700-add.sh
isn't better, the rest of the perf tests are split up by command,
perhaps just add a helper library and have both use it?

And re the unaddressed feedback I ad of "why the random data"
inhttps://lore.kernel.org/git/220326.86o81sk9ao.gmgdl@evledraar.gmail.com/
I tried patching it on top to do what I suggested there, allowing us to
run these against any arbitrary repository and came up with this:

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
index ef6024f9897..60abd5ee076 100755
--- a/t/perf/p3700-add.sh
+++ b/t/perf/p3700-add.sh
@@ -13,47 +13,26 @@ export GIT_TEST_FSYNC
 
 . ./perf-lib.sh
 
-. $TEST_DIRECTORY/lib-unique-files.sh
-
-test_perf_fresh_repo
+test_perf_default_repo
 test_checkout_worktree
 
-dir_count=10
-files_per_dir=50
-total_files=$((dir_count * files_per_dir))
-
-for mode in false true batch
+for cfg in \
+	'-c core.fsync=-loose-object -c core.fsyncmethod=fsync' \
+	'-c core.fsync=loose-object -c core.fsyncmethod=fsync' \
+	'-c core.fsync=loose-object -c core.fsyncmethod=batch' \
+	'-c core.fsyncmethod=batch'
 do
-	case $mode in
-	false)
-		FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync'
-		;;
-	true)
-		FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync'
-		;;
-	batch)
-		FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch'
-		;;
-	esac
-
-	test_perf "add $total_files files (object_fsyncing=$mode)" \
-		--setup "
-		(rm -rf .git || 1) &&
-		git init &&
-		test_create_unique_files $dir_count $files_per_dir files_$mode
-	" "
-		git $FSYNC_CONFIG add files_$mode
-	"
-
-	test_perf "stash $total_files files (object_fsyncing=$mode)" \
-		--setup "
-		(rm -rf .git || 1) &&
-		git init &&
-		test_commit first &&
-		test_create_unique_files $dir_count $files_per_dir stash_files_$mode
-	" "
-		git $FSYNC_CONFIG stash push -u -- stash_files_$mode
-	"
+	test_perf "'git add' with '$cfg'" \
+		--setup '
+			mv -v .git .git.old &&
+			git init .
+		' \
+		--cleanup '
+			rm -rf .git &&
+			mv .git.old .git
+		' '
+		git $cfg add -f -- ":!.git.old/"
+	'
 done
 
 test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..12c489069ba
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+test_description='performance of "git stash" with different fsync settings'
+
+# Fsync is normally turned off for the test suite.
+GIT_TEST_FSYNC=1
+export GIT_TEST_FSYNC
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+for cfg in \
+	'-c core.fsync=-loose-object -c core.fsyncmethod=fsync' \
+	'-c core.fsync=loose-object -c core.fsyncmethod=fsync' \
+	'-c core.fsync=loose-object -c core.fsyncmethod=batch' \
+	'-c core.fsyncmethod=batch'
+do
+	test_perf "'stash push -u' with '$cfg'" \
+		--setup '
+			mv -v .git .git.old &&
+			git init . &&
+			test_commit dummy
+		' \
+		--cleanup '
+			rm -rf .git &&
+			mv .git.old .git
+		' '
+		git $cfg stash push -a -u ":!.git.old/" ":!test*" "."
+	'
+done
+
+test_done
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index a935ad622d3..24a5108f234 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -194,6 +194,7 @@ test_wrapper_ () {
 	test_start_
 	test_prereq=
 	test_perf_setup_=
+	test_perf_cleanup_=
 	while test $# != 0
 	do
 		case $1 in
@@ -205,6 +206,10 @@ test_wrapper_ () {
 			test_perf_setup_=$2
 			shift
 			;;
+		--cleanup)
+			test_perf_cleanup_=$2
+			shift
+			;;
 		*)
 			break
 			;;
@@ -214,6 +219,7 @@ test_wrapper_ () {
 	test "$#" = 1 || BUG "test_wrapper_ needs 2 positional parameters"
 	export test_prereq
 	export test_perf_setup_
+	export test_perf_cleanup_
 	if ! test_skip "$test_title_" "$@"
 	then
 		base=$(basename "$0" .sh)
@@ -256,6 +262,16 @@ test_perf_ () {
 			test_failure_ "$@"
 			break
 		fi
+		if test -n "$test_perf_cleanup_"
+		then
+			say >&3 "cleanup: $test_perf_cleanup_"
+			if ! test_eval_ $test_perf_cleanup_
+			then
+				test_failure_ "$test_perf_cleanup_"
+				break
+			fi
+
+		fi
 	done
 	if test -z "$verbose"; then
 		echo " ok"


Here it is against Cor.git (a random small-ish repo I had laying around):
	
	$ GIT_SKIP_TESTS='p3[79]00.[12]' GIT_PERF_MAKE_OPTS='CFLAGS=-O3' GIT_PERF_REPO=~/g/Cor/ ./run origin/master HEAD -- p3900-stash.sh
	=== Building abf474a5dd901f28013c52155411a48fd4c09922 (origin/master) ===
	    GEN git-add--interactive
	    GEN git-archimport
	    GEN git-cvsexportcommit
	    GEN git-cvsimport
	    GEN git-cvsserver
	    GEN git-send-email
	    GEN git-svn
	    GEN git-p4
	    SUBDIR templates
	=== Running 1 tests in /home/avar/g/git/t/perf/build/abf474a5dd901f28013c52155411a48fd4c09922/bin-wrappers ===
	ok 1 # skip 'stash push -u' with '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
	ok 2 # skip 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
	perf 3 - 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=batch': 1 2 3 ok
	perf 4 - 'stash push -u' with '-c core.fsyncmethod=batch': 1 2 3 ok
	# passed all 4 test(s)
	1..4
	=== Building ecda9c2b029e35d239e369b875b245f45fd2a097 (HEAD) ===
	    GEN git-add--interactive
	    GEN git-archimport
	    GEN git-cvsexportcommit
	    GEN git-cvsimport
	    GEN git-cvsserver
	    GEN git-send-email
	    GEN git-svn
	    GEN git-p4
	    SUBDIR templates
	=== Running 1 tests in /home/avar/g/git/t/perf/build/ecda9c2b029e35d239e369b875b245f45fd2a097/bin-wrappers ===
	ok 1 # skip 'stash push -u' with '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
	ok 2 # skip 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
	perf 3 - 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=batch': 1 2 3 ok
	perf 4 - 'stash push -u' with '-c core.fsyncmethod=batch': 1 2 3 ok
	# passed all 4 test(s)
	1..4
	Test       origin/master     HEAD
	---------------------------------------------------
	3900.3:    0.03(0.00+0.00)   0.02(0.00+0.00) -33.3%
	3900.4:    0.02(0.00+0.00)   0.03(0.00+0.00) +50.0%
Ævar Arnfjörð Bjarmason March 29, 2022, 11:45 a.m. UTC | #2
On Tue, Mar 29 2022, Neeraj K. Singh via GitGitGadget wrote:

> V4 changes:
>
>  * Make ODB transactions nestable.
>  * Add an ODB transaction around writing out the cached tree.
>  * Change update-index to use a more straightforward way of managing ODB
>    transactions.
>  * Fix missing 'local's in lib-unique-files
>  * Add a per-iteration setup mechanism to test_perf.
>  * Fix camelCasing in warning message.

Despite my
https://lore.kernel.org/git/220329.86czi52ekn.gmgdl@evledraar.gmail.com/
I eventually gave up on trying to extract meaningful numbers from
t/perf, I can never quite find out if they're because of its
shellscripts shenanigans or actual code.

(And also; I realize I didn't follow-up on
https://lore.kernel.org/git/CANQDOdcFN5GgOPZ3hqCsjHDTiRfRpqoAKxjF1n9D6S8oD9--_A@mail.gmail.com/,
sorry):

But I came up with this (uses my thin
https://gitlab.com/avar/git-hyperfine/ wrapper, and you should be able
to apt get hyperfine):
	
	#!/bin/sh
	set -xe
	
	if ! test -d /tmp/scalar.git
	then
		git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git
		mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack
	fi
	git hyperfine \
	        --warmup 1 -r 3 \
		-L rev neeraj-v4,avar-RFC \
		-s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt' \
		-p 'rm -rf repo/.git/objects/* repo/.git/index' \
		$@'./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt'
	
	git hyperfine \
	        --warmup 1 -r 3 \
		-L rev neeraj-v4,avar-RFC \
		-s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/' \
		-p 'rm -rf repo/.git/objects/* repo/.git/index' \
		$@'./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .'
	
	git hyperfine \
	        --warmup 1 -r 3 \
		-L rev neeraj-v4,avar-RFC \
	        -s 'make CFLAGS=-O3' \
	        -p 'git init --bare dest.git' \
	        -c 'rm -rf dest.git' \
	        $@'./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack'

Those tags are your v4 here & the v2 of the RFC I sent at
https://lore.kernel.org/git/RFC-cover-v2-0.7-00000000000-20220323T140753Z-avarab@gmail.com/

Which shows my RFC v2 is ~20% faster with:

    $ PFX='strace' ~/g/git.meta/benchmark.sh "strace "

    1.22 ± 0.02 times faster than 'strace ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
    1.22 ± 0.01 times faster than 'strace ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
    1.00 ± 0.01 times faster than 'strace ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'

But only for add/update-index, is the unpack-objects not using the
tmp-objdir? (presumably yes).

As noted before I've found "strace" to be a handy way to "simulate"
slower FS ops on a ramdisk (I get about the same numbers sometimes on
the actual non-SSD disk, but due to load on the system (that I'm not in
full control of[1]) I can't get hyperfine to be happy with the
non-fuzzyness:

    1.06 ± 0.02 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
    1.06 ± 0.03 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
    1.01 ± 0.01 times faster than './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'

FWIW these are my actual non-fuzzy-with-strace numbers on the
not-ramdisk, as you can see the intervals overlap, but for the first two
the "min" time is never close to the RFC v2:
	
	$ XDG_RUNTIME_DIR=/tmp/ghf ~/g/git.meta/benchmark.sh
	+ test -d /tmp/scalar.git
	+ git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt -p rm -rf repo/.git/objects/* repo/.git/index ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt
	Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4
	  Time (mean ± σ):      1.043 s ±  0.143 s    [User: 0.184 s, System: 0.193 s]
	  Range (min … max):    0.943 s …  1.207 s    3 runs
	
	Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'avar-RFC
	  Time (mean ± σ):     877.6 ms ± 183.4 ms    [User: 197.9 ms, System: 149.4 ms]
	  Range (min … max):   697.8 ms … 1064.4 ms    3 runs
	
	Summary
	  './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'avar-RFC' ran
	    1.19 ± 0.30 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
	+ git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ -p rm -rf repo/.git/objects/* repo/.git/index ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .
	Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4
	  Time (mean ± σ):      1.019 s ±  0.057 s    [User: 0.213 s, System: 0.194 s]
	  Range (min … max):    0.963 s …  1.076 s    3 runs
	
	Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'avar-RFC
	  Time (mean ± σ):     918.6 ms ±  34.4 ms    [User: 207.8 ms, System: 164.1 ms]
	  Range (min … max):   880.6 ms … 947.5 ms    3 runs
	
	Summary
	  './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'avar-RFC' ran
	    1.11 ± 0.07 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
	+ git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 -p git init --bare dest.git -c rm -rf dest.git ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack
	Benchmark 1: ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4
	  Time (mean ± σ):      1.362 s ±  0.285 s    [User: 1.021 s, System: 0.186 s]
	  Range (min … max):    1.192 s …  1.691 s    3 runs
	
	  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
	
	Benchmark 2: ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'avar-RFC
	 ⠏ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ⠙ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  Time (mean ± σ):      1.188 s ±  0.009 s    [User: 1.025 s, System: 0.161 s]
	  Range (min … max):    1.180 s …  1.199 s    3 runs
	 
	Summary
	  './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'avar-RFC' ran
	    1.15 ± 0.24 times faster than './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'

1. I do my git hacking on a bare metal box I rent with some friends, and
   one of them is running one those persistent video game daemons
   written in Java. So I think all my non-RAM I/O numbers are
   continually fuzzed by what players are doing in Minecraft or whatever
   that thing is...
Neeraj Singh March 29, 2022, 4:51 p.m. UTC | #3
On Tue, Mar 29, 2022 at 5:04 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Mar 29 2022, Neeraj K. Singh via GitGitGadget wrote:
>
> > V4 changes:
> >
> >  * Make ODB transactions nestable.
> >  * Add an ODB transaction around writing out the cached tree.
> >  * Change update-index to use a more straightforward way of managing ODB
> >    transactions.
> >  * Fix missing 'local's in lib-unique-files
> >  * Add a per-iteration setup mechanism to test_perf.
> >  * Fix camelCasing in warning message.
>
> Despite my
> https://lore.kernel.org/git/220329.86czi52ekn.gmgdl@evledraar.gmail.com/
> I eventually gave up on trying to extract meaningful numbers from
> t/perf, I can never quite find out if they're because of its
> shellscripts shenanigans or actual code.
>
> (And also; I realize I didn't follow-up on
> https://lore.kernel.org/git/CANQDOdcFN5GgOPZ3hqCsjHDTiRfRpqoAKxjF1n9D6S8oD9--_A@mail.gmail.com/,
> sorry):
>

Looks like we aren't actually hitting fsync in the numbers you
expressed there, if they're down in the 20ms range.  Or we simply
aren't adding enough files.  Or if that's against a ramdisk, the
ramdisk doesn't have enough cost to represent real disk hardware.

> But I came up with this (uses my thin
> https://gitlab.com/avar/git-hyperfine/ wrapper, and you should be able
> to apt get hyperfine):
>
>         #!/bin/sh
>         set -xe
>
>         if ! test -d /tmp/scalar.git
>         then
>                 git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git
>                 mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack
>         fi
>         git hyperfine \
>                 --warmup 1 -r 3 \
>                 -L rev neeraj-v4,avar-RFC \
>                 -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt' \
>                 -p 'rm -rf repo/.git/objects/* repo/.git/index' \
>                 $@'./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt'
>
>         git hyperfine \
>                 --warmup 1 -r 3 \
>                 -L rev neeraj-v4,avar-RFC \
>                 -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/' \
>                 -p 'rm -rf repo/.git/objects/* repo/.git/index' \
>                 $@'./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .'
>
>         git hyperfine \
>                 --warmup 1 -r 3 \
>                 -L rev neeraj-v4,avar-RFC \
>                 -s 'make CFLAGS=-O3' \
>                 -p 'git init --bare dest.git' \
>                 -c 'rm -rf dest.git' \
>                 $@'./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack'
>
> Those tags are your v4 here & the v2 of the RFC I sent at
> https://lore.kernel.org/git/RFC-cover-v2-0.7-00000000000-20220323T140753Z-avarab@gmail.com/
>
> Which shows my RFC v2 is ~20% faster with:
>
>     $ PFX='strace' ~/g/git.meta/benchmark.sh "strace "
>
>     1.22 ± 0.02 times faster than 'strace ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
>     1.22 ± 0.01 times faster than 'strace ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
>     1.00 ± 0.01 times faster than 'strace ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'
>
> But only for add/update-index, is the unpack-objects not using the
> tmp-objdir? (presumably yes).
>
> As noted before I've found "strace" to be a handy way to "simulate"
> slower FS ops on a ramdisk (I get about the same numbers sometimes on
> the actual non-SSD disk, but due to load on the system (that I'm not in
> full control of[1]) I can't get hyperfine to be happy with the
> non-fuzzyness:
>

At least in this case, I don't think 'strace' is representative of
what a real disk would behave like.  Unless you can somehow make
strace of sync_file_range cost less than strace of fsync.

>     1.06 ± 0.02 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
>     1.06 ± 0.03 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
>     1.01 ± 0.01 times faster than './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'
>
> FWIW these are my actual non-fuzzy-with-strace numbers on the
> not-ramdisk, as you can see the intervals overlap, but for the first two
> the "min" time is never close to the RFC v2:
>
>         $ XDG_RUNTIME_DIR=/tmp/ghf ~/g/git.meta/benchmark.sh
>         + test -d /tmp/scalar.git
>         + git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt -p rm -rf repo/.git/objects/* repo/.git/index ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt
>         Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4
>           Time (mean ± σ):      1.043 s ±  0.143 s    [User: 0.184 s, System: 0.193 s]
>           Range (min … max):    0.943 s …  1.207 s    3 runs
>
>         Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'avar-RFC
>           Time (mean ± σ):     877.6 ms ± 183.4 ms    [User: 197.9 ms, System: 149.4 ms]
>           Range (min … max):   697.8 ms … 1064.4 ms    3 runs
>
>         Summary
>           './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'avar-RFC' ran
>             1.19 ± 0.30 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'neeraj-v4'
>         + git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ -p rm -rf repo/.git/objects/* repo/.git/index ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .
>         Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4
>           Time (mean ± σ):      1.019 s ±  0.057 s    [User: 0.213 s, System: 0.194 s]
>           Range (min … max):    0.963 s …  1.076 s    3 runs
>
>         Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'avar-RFC
>           Time (mean ± σ):     918.6 ms ±  34.4 ms    [User: 207.8 ms, System: 164.1 ms]
>           Range (min … max):   880.6 ms … 947.5 ms    3 runs
>
>         Summary
>           './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'avar-RFC' ran
>             1.11 ± 0.07 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'neeraj-v4'
>         + git hyperfine --warmup 1 -r 3 -L rev neeraj-v4,avar-RFC -s make CFLAGS=-O3 -p git init --bare dest.git -c rm -rf dest.git ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack
>         Benchmark 1: ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4
>           Time (mean ± σ):      1.362 s ±  0.285 s    [User: 1.021 s, System: 0.186 s]
>           Range (min … max):    1.192 s …  1.691 s    3 runs
>
>           Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
>
>         Benchmark 2: ./git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'avar-RFC
>          ⠏ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ⠙ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  Time (mean ± σ):      1.188 s ±  0.009 s    [User: 1.025 s, System: 0.161 s]
>           Range (min … max):    1.180 s …  1.199 s    3 runs
>
>         Summary
>           './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'avar-RFC' ran
>             1.15 ± 0.24 times faster than './git -C dest.git -c core.fsyncMethod=batch unpack-objects </tmp/scalar.git/my.pack' in 'neeraj-v4'
>
> 1. I do my git hacking on a bare metal box I rent with some friends, and
>    one of them is running one those persistent video game daemons
>    written in Java. So I think all my non-RAM I/O numbers are
>    continually fuzzed by what players are doing in Minecraft or whatever
>    that thing is...

Thanks for the numbers.  So if I'm understanding correctly, the
difference on a real disk between quarantine and non-quarantine is 20%
or so on your system?

I did my own experiment by adding a 'batch-no-quarantine' method. No
quarantine was slightly faster.
*  For 'git add' I found a very small difference (.29s vs 30s).
* For 'git stash' it was a bit bigger (.35s vs.55s).

This is with perf-lib, so we're just looking at min-times.  On the
other hand, classic fsync is 1.04s for 'git add' and 1.21s for 'git
stash', all with 500 tiny blobs.  FYI, this is measured on my laptop
running Ubuntu in WSL.

 I don't think it's worth having a knob for no-quarantine for this
small delta.  I believe a better use of time for a follow-on change
would be to implement an appendable pack format for newly-added
objects.

Thanks,
Neeraj
Neeraj Singh March 29, 2022, 5:09 p.m. UTC | #4
On Tue, Mar 29, 2022 at 4:17 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Mar 29 2022, Neeraj K. Singh via GitGitGadget wrote:
>
> > V4 changes:
> >
> >  * Make ODB transactions nestable.
> >  * Add an ODB transaction around writing out the cached tree.
> >  * Change update-index to use a more straightforward way of managing ODB
> >    transactions.
> >  * Fix missing 'local's in lib-unique-files
> >  * Add a per-iteration setup mechanism to test_perf.
> >  * Fix camelCasing in warning message.
>
> I haven't looked at the bulk of this in any detail, but:
>
> >  10:  b99b32a469c ! 12:  fdf90d45f52 core.fsyncmethod: performance tests for add and stash
> >      @@ t/perf/p3700-add.sh (new)
> >       +# core.fsyncMethod=batch mode, which is why we are testing different values
> >       +# of that setting explicitly and creating a lot of unique objects.
> >       +
> >      -+test_description="Tests performance of add"
> >      ++test_description="Tests performance of adding things to the object database"
>
> Now having both tests for "add" and "stash" in a test named p3700-add.sh
> isn't better, the rest of the perf tests are split up by command,
> perhaps just add a helper library and have both use it?
>

I was getting tired of editing two files that were nearly identical
and thought that reviewers would be tired of reading them.  At least
in the main test suite, t/t3700-add.sh covers update-index in addition
to git-add.

> And re the unaddressed feedback I ad of "why the random data"
> inhttps://lore.kernel.org/git/220326.86o81sk9ao.gmgdl@evledraar.gmail.com/
> I tried patching it on top to do what I suggested there, allowing us to
> run these against any arbitrary repository and came up with this:
>

The advantage of the random data is that it's easy to scale the number
of files and change the tree shape.  I wanted the
test_create_unique_files helper anyway, so using it here made sense.
Also, I'm quite confident that I'm really getting new objects added to
the repo with this test scheme.

> diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
> index ef6024f9897..60abd5ee076 100755
> --- a/t/perf/p3700-add.sh
> +++ b/t/perf/p3700-add.sh
> @@ -13,47 +13,26 @@ export GIT_TEST_FSYNC
>
>  . ./perf-lib.sh
>
> -. $TEST_DIRECTORY/lib-unique-files.sh
> -
> -test_perf_fresh_repo
> +test_perf_default_repo
>  test_checkout_worktree
>
> -dir_count=10
> -files_per_dir=50
> -total_files=$((dir_count * files_per_dir))
> -
> -for mode in false true batch
> +for cfg in \
> +       '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' \
> +       '-c core.fsync=loose-object -c core.fsyncmethod=fsync' \
> +       '-c core.fsync=loose-object -c core.fsyncmethod=batch' \
> +       '-c core.fsyncmethod=batch'
>  do
> -       case $mode in
> -       false)
> -               FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync'
> -               ;;
> -       true)
> -               FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync'
> -               ;;
> -       batch)
> -               FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch'
> -               ;;
> -       esac
> -
> -       test_perf "add $total_files files (object_fsyncing=$mode)" \
> -               --setup "
> -               (rm -rf .git || 1) &&
> -               git init &&
> -               test_create_unique_files $dir_count $files_per_dir files_$mode
> -       " "
> -               git $FSYNC_CONFIG add files_$mode
> -       "
> -
> -       test_perf "stash $total_files files (object_fsyncing=$mode)" \
> -               --setup "
> -               (rm -rf .git || 1) &&
> -               git init &&
> -               test_commit first &&
> -               test_create_unique_files $dir_count $files_per_dir stash_files_$mode
> -       " "
> -               git $FSYNC_CONFIG stash push -u -- stash_files_$mode
> -       "
> +       test_perf "'git add' with '$cfg'" \
> +               --setup '
> +                       mv -v .git .git.old &&
> +                       git init .
> +               ' \
> +               --cleanup '
> +                       rm -rf .git &&
> +                       mv .git.old .git
> +               ' '
> +               git $cfg add -f -- ":!.git.old/"
> +       '
>  done
>
>  test_done
> diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
> new file mode 100755
> index 00000000000..12c489069ba
> --- /dev/null
> +++ b/t/perf/p3900-stash.sh
> @@ -0,0 +1,34 @@
> +#!/bin/sh
> +
> +test_description='performance of "git stash" with different fsync settings'
> +
> +# Fsync is normally turned off for the test suite.
> +GIT_TEST_FSYNC=1
> +export GIT_TEST_FSYNC
> +
> +. ./perf-lib.sh
> +
> +test_perf_default_repo
> +test_checkout_worktree
> +
> +for cfg in \
> +       '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' \
> +       '-c core.fsync=loose-object -c core.fsyncmethod=fsync' \
> +       '-c core.fsync=loose-object -c core.fsyncmethod=batch' \
> +       '-c core.fsyncmethod=batch'
> +do
> +       test_perf "'stash push -u' with '$cfg'" \
> +               --setup '
> +                       mv -v .git .git.old &&
> +                       git init . &&
> +                       test_commit dummy
> +               ' \
> +               --cleanup '
> +                       rm -rf .git &&
> +                       mv .git.old .git
> +               ' '
> +               git $cfg stash push -a -u ":!.git.old/" ":!test*" "."
> +       '
> +done
> +
> +test_done
> diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> index a935ad622d3..24a5108f234 100644
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -194,6 +194,7 @@ test_wrapper_ () {
>         test_start_
>         test_prereq=
>         test_perf_setup_=
> +       test_perf_cleanup_=
>         while test $# != 0
>         do
>                 case $1 in
> @@ -205,6 +206,10 @@ test_wrapper_ () {
>                         test_perf_setup_=$2
>                         shift
>                         ;;
> +               --cleanup)
> +                       test_perf_cleanup_=$2
> +                       shift
> +                       ;;
>                 *)
>                         break
>                         ;;
> @@ -214,6 +219,7 @@ test_wrapper_ () {
>         test "$#" = 1 || BUG "test_wrapper_ needs 2 positional parameters"
>         export test_prereq
>         export test_perf_setup_
> +       export test_perf_cleanup_
>         if ! test_skip "$test_title_" "$@"
>         then
>                 base=$(basename "$0" .sh)
> @@ -256,6 +262,16 @@ test_perf_ () {
>                         test_failure_ "$@"
>                         break
>                 fi
> +               if test -n "$test_perf_cleanup_"
> +               then
> +                       say >&3 "cleanup: $test_perf_cleanup_"
> +                       if ! test_eval_ $test_perf_cleanup_
> +                       then
> +                               test_failure_ "$test_perf_cleanup_"
> +                               break
> +                       fi
> +
> +               fi
>         done
>         if test -z "$verbose"; then
>                 echo " ok"
>
>
> Here it is against Cor.git (a random small-ish repo I had laying around):
>
>         $ GIT_SKIP_TESTS='p3[79]00.[12]' GIT_PERF_MAKE_OPTS='CFLAGS=-O3' GIT_PERF_REPO=~/g/Cor/ ./run origin/master HEAD -- p3900-stash.sh
>         === Building abf474a5dd901f28013c52155411a48fd4c09922 (origin/master) ===
>             GEN git-add--interactive
>             GEN git-archimport
>             GEN git-cvsexportcommit
>             GEN git-cvsimport
>             GEN git-cvsserver
>             GEN git-send-email
>             GEN git-svn
>             GEN git-p4
>             SUBDIR templates
>         === Running 1 tests in /home/avar/g/git/t/perf/build/abf474a5dd901f28013c52155411a48fd4c09922/bin-wrappers ===
>         ok 1 # skip 'stash push -u' with '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
>         ok 2 # skip 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
>         perf 3 - 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=batch': 1 2 3 ok
>         perf 4 - 'stash push -u' with '-c core.fsyncmethod=batch': 1 2 3 ok
>         # passed all 4 test(s)
>         1..4
>         === Building ecda9c2b029e35d239e369b875b245f45fd2a097 (HEAD) ===
>             GEN git-add--interactive
>             GEN git-archimport
>             GEN git-cvsexportcommit
>             GEN git-cvsimport
>             GEN git-cvsserver
>             GEN git-send-email
>             GEN git-svn
>             GEN git-p4
>             SUBDIR templates
>         === Running 1 tests in /home/avar/g/git/t/perf/build/ecda9c2b029e35d239e369b875b245f45fd2a097/bin-wrappers ===
>         ok 1 # skip 'stash push -u' with '-c core.fsync=-loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
>         ok 2 # skip 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=fsync' (GIT_SKIP_TESTS)
>         perf 3 - 'stash push -u' with '-c core.fsync=loose-object -c core.fsyncmethod=batch': 1 2 3 ok
>         perf 4 - 'stash push -u' with '-c core.fsyncmethod=batch': 1 2 3 ok
>         # passed all 4 test(s)
>         1..4
>         Test       origin/master     HEAD
>         ---------------------------------------------------
>         3900.3:    0.03(0.00+0.00)   0.02(0.00+0.00) -33.3%
>         3900.4:    0.02(0.00+0.00)   0.03(0.00+0.00) +50.0%
>

Something is wrong with your data here.  Or your repo is too small to
highlight the differences.

I'd suggest that if you want to write a different perf test for this
feature, that it be a follow-on change.

Thanks,
Neeraj