Message ID | 1937746df47eefecfc343e32eb9bf6c0949fb7b9.1647760561.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand |
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Neeraj Singh <neerajsi@microsoft.com> > > Add test cases to exercise batch mode for: > * 'git add' I was wondering why the obviously safe and good candidate 'git add' is not gaining plug/unplug pair in this series. It is obviously safe, unlike 'update-index', that nobody can interact with it, observe its intermediate output, and expect anything from it. I think the stupid reason of the lack of new plug/unplug is because we already had them, which is good ;-). > * 'git stash' > * 'git update-index' As I said, I suspect that we'd want to do this safely by adding a new option to "update-index" and passing it from "stash" which knows that it does not care about the intermediate state. > These tests ensure that the added data winds up in the object database. In other words, "git add $path; git rev-parse :$path" (and its cousins) would be happy? Like new object files not left hanging in a tentative object store etc. _after_ the commands finish. Good. > In this change we introduce a new test helper lib-unique-files.sh. The > goal of this library is to create a tree of files that have different > oids from any other files that may have been created in the current test > repo. This helps us avoid missing validation of an object being added due > to it already being in the repo. More on this below. > We aren't actually issuing any fsyncs in these tests, since > GIT_TEST_FSYNC is 0, but we still exercise all of the tmp_objdir logic > in bulk-checkin. Shouldn't we manually override that, if it matters? Not a suggestion but a question. > +# Create multiple files with unique contents. Takes the number of > +# directories, the number of files in each directory, and the base > +# directory. This is more honest, compared to the claim made in the proposed log message, in that the uniqueness guarantee is only among the files created by this helper. If we created other test contents without using this helper, that may crash with the ones created here. > +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files > +# each in my_dir, all with unique > +# contents. > + > +test_create_unique_files() { Style. SP on both sides of (). I.e. test_create_unique_files () { > + test "$#" -ne 3 && BUG "3 param" > + > + local dirs=$1 > + local files=$2 > + local basedir=$3 > + local counter=0 > + test_tick > + local basedata=$test_tick I am not sure if consumption and reliance on tick is a wise thing. $basedir must be unique across all the other directories in this test repository (there is no other $basedir)---can't we key uniqueness off of it? > + rm -rf $basedir Can $basedir have any $IFS character in it? We should "$quote" it. > + for i in $(test_seq $dirs) > + do > + local dir=$basedir/dir$i > + > + mkdir -p "$dir" > + for j in $(test_seq $files) > + do > + counter=$((counter + 1)) > + echo "$basedata.$counter" >"$dir/file$j.txt" An extra SP before ">"? > + done > + done > +} There is no &&- cascade here, and we expect nothing in this to fail. Is that sensible? > +test_expect_success 'git add: core.fsyncmethod=batch' " > + test_create_unique_files 2 4 fsync-files && > + git $BATCH_CONFIGURATION add -- ./fsync-files/ && > + rm -f fsynced_files && > + git ls-files --stage fsync-files/ > fsynced_files && Style. No SP between redirection operator and its target. I.e. git ls-files --stage fsync-files/ >fsynced_files && Mixture of names-with-dash and name_with_understore looks somewhat irritating. > + test_line_count = 8 fsynced_files && The magic "8" matches "2 4" we saw earlier for create_unique_files? > + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e A test helper that takes the name of a file that has "ls-files -s" output may prove to be useful. I dunno. > diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh > index a11d61206ad..8e2f73cc68f 100755 > --- a/t/t5300-pack-object.sh > +++ b/t/t5300-pack-object.sh > @@ -162,23 +162,25 @@ test_expect_success 'pack-objects with bogus arguments' ' > > check_unpack () { > test_when_finished "rm -rf git2" && > - git init --bare git2 && > - git -C git2 unpack-objects -n <"$1".pack && > - git -C git2 unpack-objects <"$1".pack && > - (cd .git && find objects -type f -print) | > - while read path > - do > - cmp git2/$path .git/$path || { > - echo $path differs. > - return 1 > - } > - done > + git $2 init --bare git2 && > + ( > + git $2 -C git2 unpack-objects -n <"$1".pack && > + git $2 -C git2 unpack-objects <"$1".pack && > + git $2 -C git2 cat-file --batch-check="%(objectname)" > + ) <obj-list >current && > + cmp obj-list current > } I think the change from the old "the existence and the contents of the object files must all match" to the new "cat-file should say that the objects we expect to exist indeed do" is not a bad thing. We used to only depend on the contents of the provided packfile but now we assume that obj-list file gives us the list of objects. Is that sensible? I somehow do not think so. Don't we have the corresponding "$1.idx" that we can feed to "git show-index", e.g. git show-index <"$1.pack" >expect.full && cut -d" " -f2 >expect <expect.full && ... your test in "$2", but feeding expect instead of obj-list ... test_cmp expect actual Also make sure you quote whatever is coming from outside, even if you happen to call the helper with tokens that do not need quoting in the current code. It is a good discipline to help readers. Thanks.
On Mon, Mar 21, 2022 at 11:34 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Neeraj Singh <neerajsi@microsoft.com> > > > > Add test cases to exercise batch mode for: > > * 'git add' > > I was wondering why the obviously safe and good candidate 'git add' is > not gaining plug/unplug pair in this series. It is obviously safe, > unlike 'update-index', that nobody can interact with it, observe its > intermediate output, and expect anything from it. > > I think the stupid reason of the lack of new plug/unplug is because > we already had them, which is good ;-). > > > * 'git stash' > > * 'git update-index' > > As I said, I suspect that we'd want to do this safely by adding a > new option to "update-index" and passing it from "stash" which knows > that it does not care about the intermediate state. > > > These tests ensure that the added data winds up in the object database. > > In other words, "git add $path; git rev-parse :$path" (and its > cousins) would be happy? Like new object files not left hanging in > a tentative object store etc. _after_ the commands finish. > > Good. > > > In this change we introduce a new test helper lib-unique-files.sh. The > > goal of this library is to create a tree of files that have different > > oids from any other files that may have been created in the current test > > repo. This helps us avoid missing validation of an object being added due > > to it already being in the repo. > > More on this below. To me the idea of putting the 'why' into the commit message and the 'what' into code comments makes sense, since I'd assume people looking into the history care about the why, but people making future changes would read the documentation in the comments for e.g. lib-unique-files. > > > We aren't actually issuing any fsyncs in these tests, since > > GIT_TEST_FSYNC is 0, but we still exercise all of the tmp_objdir logic > > in bulk-checkin. > > Shouldn't we manually override that, if it matters? > Not a suggestion but a question. I manually override it for the performance tests. I think it's sensible to manually override this variable for the small number of tests added in this commit so that we can exercise the underlying system calls, so I'll do that. > > +# Create multiple files with unique contents. Takes the number of > > +# directories, the number of files in each directory, and the base > > +# directory. > > This is more honest, compared to the claim made in the proposed log > message, in that the uniqueness guarantee is only among the files > created by this helper. If we created other test contents without > using this helper, that may crash with the ones created here. > I've revised this comment to indicate that the files are only unique within this test run. > > +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files > > +# each in my_dir, all with unique > > +# contents. > > + > > +test_create_unique_files() { > > Style. SP on both sides of (). I.e. > > test_create_unique_files () { > Fixed > > + test "$#" -ne 3 && BUG "3 param" > > + > > + local dirs=$1 > > + local files=$2 > > + local basedir=$3 > > + local counter=0 > > + test_tick > > + local basedata=$test_tick > > I am not sure if consumption and reliance on tick is a wise thing. > $basedir must be unique across all the other directories in this > test repository (there is no other $basedir)---can't we key > uniqueness off of it? In the performance tests, we create sets of files with the same basedir but we want the files to have different contents since we don't blow away the repo between tests. The current approach still generates uniqueness if someone simply copy/pastes a test_create_unique_files invocation, rather than subtly failing to make new objects. > > + rm -rf $basedir > > Can $basedir have any $IFS character in it? We should "$quote" it. Fixed. > > > + for i in $(test_seq $dirs) > > + do > > + local dir=$basedir/dir$i > > + > > + mkdir -p "$dir" > > + for j in $(test_seq $files) > > + do > > + counter=$((counter + 1)) > > + echo "$basedata.$counter" >"$dir/file$j.txt" > > An extra SP before ">"? > Fixed. > > + done > > + done > > +} > > There is no &&- cascade here, and we expect nothing in this to > fail. Is that sensible? > I apologize, there's a lot of subtlety about UNIX shell scripting that I simply do not know. I put an '&&' chain in, but I might still have it wrong. > > +test_expect_success 'git add: core.fsyncmethod=batch' " > > + test_create_unique_files 2 4 fsync-files && > > + git $BATCH_CONFIGURATION add -- ./fsync-files/ && > > + rm -f fsynced_files && > > + git ls-files --stage fsync-files/ > fsynced_files && > > Style. No SP between redirection operator and its target. I.e. > > git ls-files --stage fsync-files/ >fsynced_files && > > Mixture of names-with-dash and name_with_understore looks somewhat > irritating. > Will switch to underscores for both. Also will rename the base dir to be more distinct from the list of files that we expect to see. > > + test_line_count = 8 fsynced_files && > > The magic "8" matches "2 4" we saw earlier for create_unique_files? > Will comment to explain the 8. > > + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e > > A test helper that takes the name of a file that has "ls-files -s" output > may prove to be useful. I dunno. > > > diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh > > index a11d61206ad..8e2f73cc68f 100755 > > --- a/t/t5300-pack-object.sh > > +++ b/t/t5300-pack-object.sh > > @@ -162,23 +162,25 @@ test_expect_success 'pack-objects with bogus arguments' ' > > > > check_unpack () { > > test_when_finished "rm -rf git2" && > > - git init --bare git2 && > > - git -C git2 unpack-objects -n <"$1".pack && > > - git -C git2 unpack-objects <"$1".pack && > > - (cd .git && find objects -type f -print) | > > - while read path > > - do > > - cmp git2/$path .git/$path || { > > - echo $path differs. > > - return 1 > > - } > > - done > > + git $2 init --bare git2 && > > + ( > > + git $2 -C git2 unpack-objects -n <"$1".pack && > > + git $2 -C git2 unpack-objects <"$1".pack && > > + git $2 -C git2 cat-file --batch-check="%(objectname)" > > + ) <obj-list >current && > > + cmp obj-list current > > } > > I think the change from the old "the existence and the contents of > the object files must all match" to the new "cat-file should say > that the objects we expect to exist indeed do" is not a bad thing. > > We used to only depend on the contents of the provided packfile but > now we assume that obj-list file gives us the list of objects. Is > that sensible? I somehow do not think so. Don't we have the > corresponding "$1.idx" that we can feed to "git show-index", e.g. > I believe that "obj-list" is in some sense more authoritative, assuming arbitrary future bugs in the pack implementation. We make sure that the pack we were handed isn't missing any objects. It makes sense to accept obj-list from the outside so that check_unpack itself doesn't depend on that file name. But the previous code wouldn't catch a bug where the pack code mistakenly drops an object. > git show-index <"$1.pack" >expect.full && > cut -d" " -f2 >expect <expect.full && > ... your test in "$2", but feeding expect instead of obj-list ... > test_cmp expect actual > > Also make sure you quote whatever is coming from outside, even if > you happen to call the helper with tokens that do not need quoting > in the current code. It is a good discipline to help readers. > > Thanks. I'll take another pass over the shell code in the morning to make sure I'm following all the conventions and recommendations. Apologize for the set of mistakes. In terms of shell, I am a rookie. Thanks, -Neeraj
diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh new file mode 100644 index 00000000000..a7de4ca8512 --- /dev/null +++ b/t/lib-unique-files.sh @@ -0,0 +1,36 @@ +# Helper to create files with unique contents + + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files +# each in my_dir, all with unique +# contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + local counter=0 + test_tick + local basedata=$test_tick + + + rm -rf $basedir + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" + for j in $(test_seq $files) + do + counter=$((counter + 1)) + echo "$basedata.$counter" >"$dir/file$j.txt" + done + done +} diff --git a/t/t3700-add.sh b/t/t3700-add.sh index b1f90ba3250..1f349f52ad3 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -8,6 +8,8 @@ test_description='Test of git add, including the -- option.' TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh + # Test the file mode "$1" of the file "$2" in the index. test_mode_in_index () { case "$(git ls-files -s "$2")" in @@ -34,6 +36,26 @@ test_expect_success \ 'Test that "git add -- -q" works' \ 'touch -- -q && git add -- -q' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'git add: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION add -- ./fsync-files/ && + rm -f fsynced_files && + git ls-files --stage fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e +" + +test_expect_success 'git update-index: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files2 && + find fsync-files2 ! -type d -print | xargs git $BATCH_CONFIGURATION update-index --add -- && + rm -f fsynced_files2 && + git ls-files --stage fsync-files2/ > fsynced_files2 && + test_line_count = 8 fsynced_files2 && + awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e +" + test_expect_success \ 'git add: Test that executable bit is not used if core.filemode=0' \ 'git config core.filemode 0 && diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 55cd77901a8..5a3996b838f 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh test_expect_success 'usage on cmd and subcommand invalid option' ' test_expect_code 129 git stash --invalid-option 2>usage && @@ -1462,6 +1463,22 @@ test_expect_success 'stash handles skip-worktree entries nicely' ' git rev-parse --verify refs/stash:A.t ' + +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'stash with core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION stash push -u -- ./fsync-files/ && + rm -f fsynced_files && + + # The files were untracked, so use the third parent, + # which contains the untracked files + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e +" + + test_expect_success 'git stash succeeds despite directory/file change' ' test_create_repo directory_file_switch_v1 && ( diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index a11d61206ad..8e2f73cc68f 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -162,23 +162,25 @@ test_expect_success 'pack-objects with bogus arguments' ' check_unpack () { test_when_finished "rm -rf git2" && - git init --bare git2 && - git -C git2 unpack-objects -n <"$1".pack && - git -C git2 unpack-objects <"$1".pack && - (cd .git && find objects -type f -print) | - while read path - do - cmp git2/$path .git/$path || { - echo $path differs. - return 1 - } - done + git $2 init --bare git2 && + ( + git $2 -C git2 unpack-objects -n <"$1".pack && + git $2 -C git2 unpack-objects <"$1".pack && + git $2 -C git2 cat-file --batch-check="%(objectname)" + ) <obj-list >current && + cmp obj-list current } test_expect_success 'unpack without delta' ' check_unpack test-1-${packname_1} ' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'unpack without delta (core.fsyncmethod=batch)' ' + check_unpack test-1-${packname_1} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with REF_DELTA' ' packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) && check_deltas stderr -gt 0 @@ -188,6 +190,10 @@ test_expect_success 'unpack with REF_DELTA' ' check_unpack test-2-${packname_2} ' +test_expect_success 'unpack with REF_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-2-${packname_2} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with OFS_DELTA' ' packname_3=$(git pack-objects --progress --delta-base-offset test-3 \ <obj-list 2>stderr) && @@ -198,6 +204,10 @@ test_expect_success 'unpack with OFS_DELTA' ' check_unpack test-3-${packname_3} ' +test_expect_success 'unpack with OFS_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-3-${packname_3} "$BATCH_CONFIGURATION" +' + test_expect_success 'compare delta flavors' ' perl -e '\'' defined($_ = -s $_) or die for @ARGV;