Message ID | f1ef9656fc3adf079c8e40a74baeb5356bcf1586.1625150864.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Builtin FSMonitor Feature | expand |
On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Change p7519 to use a single "test-tool touch" command to update > the mtime on a series of (thousands) files instead of invoking > thousands of commands to update a single file. > > This is primarily for Windows where process creation is so > very slow and reduces the test run time by minutes. > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > t/perf/p7519-fsmonitor.sh | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh > index 5eb5044a103..f74e6014a0a 100755 > --- a/t/perf/p7519-fsmonitor.sh > +++ b/t/perf/p7519-fsmonitor.sh > @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' > fi && > > mkdir 1_file 10_files 100_files 1000_files 10000_files && > - for i in $(test_seq 1 10); do touch 10_files/$i; done && > - for i in $(test_seq 1 100); do touch 100_files/$i; done && > - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && > - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && > + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && > + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && > + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && > + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && > + > git add 1_file 10_files 100_files 1000_files 10000_files && > git commit -qm "Add files" && > > @@ -200,15 +201,12 @@ test_fsmonitor_suite() { > # Update the mtimes on upto 100k files to make status think > # that they are dirty. For simplicity, omit any files with > # LFs (i.e. anything that ls-files thinks it needs to dquote). > - # Then fully backslash-quote the paths to capture any > - # whitespace so that they pass thru xargs properly. > # > test_perf_w_drop_caches "status (dirty) ($DESC)" ' > git ls-files | \ > head -100000 | \ > grep -v \" | \ > - sed '\''s/\(.\)/\\\1/g'\'' | \ > - xargs test-tool chmtime -300 && > + test-tool touch stdin && > git status > ' Did you try to replace this with some variant of: test_seq 1 10000 | xargs touch Which (depending on your xargs version) would invoke "touch" commands with however many argv items it thinks you can handle.
On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Change p7519 to use a single "test-tool touch" command to update >> the mtime on a series of (thousands) files instead of invoking >> thousands of commands to update a single file. >> >> This is primarily for Windows where process creation is so >> very slow and reduces the test run time by minutes. >> >> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >> --- >> t/perf/p7519-fsmonitor.sh | 14 ++++++-------- >> 1 file changed, 6 insertions(+), 8 deletions(-) >> >> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh >> index 5eb5044a103..f74e6014a0a 100755 >> --- a/t/perf/p7519-fsmonitor.sh >> +++ b/t/perf/p7519-fsmonitor.sh >> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' >> fi && >> >> mkdir 1_file 10_files 100_files 1000_files 10000_files && >> - for i in $(test_seq 1 10); do touch 10_files/$i; done && >> - for i in $(test_seq 1 100); do touch 100_files/$i; done && >> - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && >> - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && >> + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && >> + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && >> + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && >> + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && >> + >> git add 1_file 10_files 100_files 1000_files 10000_files && >> git commit -qm "Add files" && >> >> @@ -200,15 +201,12 @@ test_fsmonitor_suite() { >> # Update the mtimes on upto 100k files to make status think >> # that they are dirty. For simplicity, omit any files with >> # LFs (i.e. anything that ls-files thinks it needs to dquote). >> - # Then fully backslash-quote the paths to capture any >> - # whitespace so that they pass thru xargs properly. >> # >> test_perf_w_drop_caches "status (dirty) ($DESC)" ' >> git ls-files | \ >> head -100000 | \ >> grep -v \" | \ >> - sed '\''s/\(.\)/\\\1/g'\'' | \ >> - xargs test-tool chmtime -300 && >> + test-tool touch stdin && >> git status >> ' > > Did you try to replace this with some variant of: > > test_seq 1 10000 | xargs touch > > Which (depending on your xargs version) would invoke "touch" commands > with however many argv items it thinks you can handle. > a quick test on my Windows machine shows that test_seq 1 10000 | xargs touch takes 3.1 seconds. just a simple test_seq 1 10000 >/dev/null take 0.2 seconds. using my test-tool helper cuts that time in half. Jeff
On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote: > > On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: > > > > Did you try to replace this with some variant of: > > > > test_seq 1 10000 | xargs touch > > > > Which (depending on your xargs version) would invoke "touch" commands > > with however many argv items it thinks you can handle. > > a quick test on my Windows machine shows that > > test_seq 1 10000 | xargs touch > > takes 3.1 seconds. > > just a simple > > test_seq 1 10000 >/dev/null > > take 0.2 seconds. > > using my test-tool helper cuts that time in half. Yeah, test_seq is pretty bad; it's just a loop in shell. Is there a 'seq' on windows, and does using it instead of test_seq make things faster with Ævar's suggested command? I'd really like to modify test_seq to use seq when it's available and fall back to the looping-in-shell when we need to for various platforms. Maybe it'd even make sense to write a 'test-tool seq' and make test_seq use that just so we can rip out that super lame shell looping.
Elijah Newren <newren@gmail.com> writes: > On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote: >> >> a quick test on my Windows machine shows that >> >> test_seq 1 10000 | xargs touch >> >> takes 3.1 seconds. >> >> just a simple >> >> test_seq 1 10000 >/dev/null >> >> take 0.2 seconds. >> >> using my test-tool helper cuts that time in half. > > Yeah, test_seq is pretty bad; it's just a loop in shell. Is there a > 'seq' on windows, and does using it instead of test_seq make things > faster with Ævar's suggested command? Unless I am misreading Jeff's message, I do not think that makes sense. Counting to 10000 in shell loop is trivial (0.2 seconds), but letting touch invoked 10000 times to create (or smudge mtime of, but I suspect that is not what is going on here) 10000 files takes 3.1 seconds, and of course a native binary that creates 10000 files with a single invocation would be faster. > I'd really like to modify test_seq to use seq when it's available and > fall back to the looping-in-shell when we need to for various > platforms. So, if I am reading Jeff correctly, that optimizes something that is not a bottleneck. > Maybe it'd even make sense to write a 'test-tool seq' and make > test_seq use that just so we can rip out that super lame shell > looping.
On Tue, Jul 13, 2021 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote: > >> > >> a quick test on my Windows machine shows that > >> > >> test_seq 1 10000 | xargs touch > >> > >> takes 3.1 seconds. > >> > >> just a simple > >> > >> test_seq 1 10000 >/dev/null > >> > >> take 0.2 seconds. > >> > >> using my test-tool helper cuts that time in half. > > > > Yeah, test_seq is pretty bad; it's just a loop in shell. Is there a > > 'seq' on windows, and does using it instead of test_seq make things > > faster with Ævar's suggested command? > > Unless I am misreading Jeff's message, I do not think that makes > sense. Counting to 10000 in shell loop is trivial (0.2 seconds), > but letting touch invoked 10000 times to create (or smudge mtime of, > but I suspect that is not what is going on here) 10000 files takes > 3.1 seconds, and of course a native binary that creates 10000 files > with a single invocation would be faster. > > > I'd really like to modify test_seq to use seq when it's available and > > fall back to the looping-in-shell when we need to for various > > platforms. > > So, if I am reading Jeff correctly, that optimizes something that is > not a bottleneck. Oh, indeed. Sorry, I misread.
On 7/13/21 1:36 PM, Elijah Newren wrote: > On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote: >> >> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: >>> >>> Did you try to replace this with some variant of: >>> >>> test_seq 1 10000 | xargs touch >>> >>> Which (depending on your xargs version) would invoke "touch" commands >>> with however many argv items it thinks you can handle. >> >> a quick test on my Windows machine shows that >> >> test_seq 1 10000 | xargs touch >> >> takes 3.1 seconds. >> >> just a simple >> >> test_seq 1 10000 >/dev/null >> >> take 0.2 seconds. >> >> using my test-tool helper cuts that time in half. > > Yeah, test_seq is pretty bad; it's just a loop in shell. Is there a > 'seq' on windows, and does using it instead of test_seq make things > faster with Ævar's suggested command? > The Git for Windows SDK bash environment does have a /usr/bin/seq which appears to be from GNU coreutils 8.32. (This is different from the version that I have on my Mac (which doesn't have a version number).) Using it: seq 1 10000 >/dev/null takes 0.04 seconds instead of 0.2. However, it doesn't help the touch. seq 1 10000 | xargs touch still takes ~3.1 seconds. FWIW, the xargs is clustering the 10,000 files into ~4 command lines, so there is a little bit of Windows process overhead, but not that much. seq 1 10000 | xargs wc -l | grep total > I'd really like to modify test_seq to use seq when it's available and > fall back to the looping-in-shell when we need to for various > platforms. > > Maybe it'd even make sense to write a 'test-tool seq' and make > test_seq use that just so we can rip out that super lame shell > looping. >
On 7/1/21 10:47 AM, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Change p7519 to use a single "test-tool touch" command to update > the mtime on a series of (thousands) files instead of invoking > thousands of commands to update a single file. > > This is primarily for Windows where process creation is so > very slow and reduces the test run time by minutes. > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > t/perf/p7519-fsmonitor.sh | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh > index 5eb5044a103..f74e6014a0a 100755 > --- a/t/perf/p7519-fsmonitor.sh > +++ b/t/perf/p7519-fsmonitor.sh > @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' > fi && > > mkdir 1_file 10_files 100_files 1000_files 10000_files && > - for i in $(test_seq 1 10); do touch 10_files/$i; done && > - for i in $(test_seq 1 100); do touch 100_files/$i; done && > - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && > - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && > + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && > + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && > + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && > + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && The big win in taking *minutes* off of the run time of this test was getting rid of the `for` loops and one `touch` invocation per file. So whether we keep my `test-tool touch` command or switch to `test_seq` or `seq` is open for debate. Mine seems quicker, but it is more or less round off error in the larger picture considering what we started with. I'll play with this a bit. Jeff
Jeff Hostetler <git@jeffhostetler.com> writes: > FWIW, the xargs is clustering the 10,000 files into ~4 command lines, > so there is a little bit of Windows process overhead, but not that > much. > > seq 1 10000 | xargs wc -l | grep total > >> I'd really like to modify test_seq to use seq when it's available and >> fall back to the looping-in-shell when we need to for various >> platforms. >> Maybe it'd even make sense to write a 'test-tool seq' and make >> test_seq use that just so we can rip out that super lame shell >> looping. >> So what lame in this picture is not shell, or process overhead, but I/O performance. I've seen some noises about Windows file creation performance raised as an issue when doing initial checkout followed by "git clone", and an idea floated to create a bunch of open file handles for writing in threads when checkout (really the caller that repeatedly calls entry.c:write_entry() by iterating the in-core index) starts, and write out the contents in parallel, as a workaround. When I heard it, I somehow thought it was meant as a not-so-funny joke, but from the sounds of it, the I/O performance may be so horrible to require such a hack to be usable there. Sigh...
On Tue, Jul 13 2021, Jeff Hostetler wrote: > On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >> >>> From: Jeff Hostetler <jeffhost@microsoft.com> >>> >>> Change p7519 to use a single "test-tool touch" command to update >>> the mtime on a series of (thousands) files instead of invoking >>> thousands of commands to update a single file. >>> >>> This is primarily for Windows where process creation is so >>> very slow and reduces the test run time by minutes. >>> >>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >>> --- >>> t/perf/p7519-fsmonitor.sh | 14 ++++++-------- >>> 1 file changed, 6 insertions(+), 8 deletions(-) >>> >>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh >>> index 5eb5044a103..f74e6014a0a 100755 >>> --- a/t/perf/p7519-fsmonitor.sh >>> +++ b/t/perf/p7519-fsmonitor.sh >>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' >>> fi && >>> mkdir 1_file 10_files 100_files 1000_files 10000_files && >>> - for i in $(test_seq 1 10); do touch 10_files/$i; done && >>> - for i in $(test_seq 1 100); do touch 100_files/$i; done && >>> - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && >>> - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && >>> + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && >>> + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && >>> + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && >>> + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && >>> + >>> git add 1_file 10_files 100_files 1000_files 10000_files && >>> git commit -qm "Add files" && >>> @@ -200,15 +201,12 @@ test_fsmonitor_suite() { >>> # Update the mtimes on upto 100k files to make status think >>> # that they are dirty. For simplicity, omit any files with >>> # LFs (i.e. anything that ls-files thinks it needs to dquote). >>> - # Then fully backslash-quote the paths to capture any >>> - # whitespace so that they pass thru xargs properly. >>> # >>> test_perf_w_drop_caches "status (dirty) ($DESC)" ' >>> git ls-files | \ >>> head -100000 | \ >>> grep -v \" | \ >>> - sed '\''s/\(.\)/\\\1/g'\'' | \ >>> - xargs test-tool chmtime -300 && >>> + test-tool touch stdin && >>> git status >>> ' >> Did you try to replace this with some variant of: >> test_seq 1 10000 | xargs touch >> Which (depending on your xargs version) would invoke "touch" >> commands >> with however many argv items it thinks you can handle. >> > > a quick test on my Windows machine shows that > > test_seq 1 10000 | xargs touch > > takes 3.1 seconds. > > just a simple > > test_seq 1 10000 >/dev/null > > take 0.2 seconds. > > using my test-tool helper cuts that time in half. There's what Elijah mentioned about test_seq, so maybe it's just that. But what I was suggesting was using the xargs mode where it does N arguments at a time. Does this work for you, and does it cause xargs to invoke "touch" with the relevant N number of arguments, and does it help with the performance? test_seq 1 10000 | xargs touch test_seq 1 10000 | xargs -n 10 touch test_seq 1 10000 | xargs -n 100 touch test_seq 1 10000 | xargs -n 1000 touch etc. Also I didn't notice this before, but the -300 part of "chmtime -300" was redundant before then? I.e. you're implicitly changing it to "=+0" instead with your "touch" helper, are you not?
On 7/13/21 2:07 PM, Junio C Hamano wrote: > Jeff Hostetler <git@jeffhostetler.com> writes: > >> FWIW, the xargs is clustering the 10,000 files into ~4 command lines, >> so there is a little bit of Windows process overhead, but not that >> much. >> >> seq 1 10000 | xargs wc -l | grep total >> >>> I'd really like to modify test_seq to use seq when it's available and >>> fall back to the looping-in-shell when we need to for various >>> platforms. >>> Maybe it'd even make sense to write a 'test-tool seq' and make >>> test_seq use that just so we can rip out that super lame shell >>> looping. >>> > > So what lame in this picture is not shell, or process overhead, but > I/O performance. > > I've seen some noises about Windows file creation performance raised > as an issue when doing initial checkout followed by "git clone", and > an idea floated to create a bunch of open file handles for writing > in threads when checkout (really the caller that repeatedly calls > entry.c:write_entry() by iterating the in-core index) starts, and > write out the contents in parallel, as a workaround. When I heard > it, I somehow thought it was meant as a not-so-funny joke, but from > the sounds of it, the I/O performance may be so horrible to require > such a hack to be usable there. Sigh... > Yes, there are some things here (that I believe to be I/O related) on Windows that I want to look at when I wrap up FSMonitor. And yes, some of them sound pretty stupid. Jeff
On 7/13/21 2:18 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Jul 13 2021, Jeff Hostetler wrote: > >> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: >>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >>> >>>> From: Jeff Hostetler <jeffhost@microsoft.com> >>>> >>>> Change p7519 to use a single "test-tool touch" command to update >>>> the mtime on a series of (thousands) files instead of invoking >>>> thousands of commands to update a single file. >>>> >>>> This is primarily for Windows where process creation is so >>>> very slow and reduces the test run time by minutes. >>>> >>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >>>> --- >>>> t/perf/p7519-fsmonitor.sh | 14 ++++++-------- >>>> 1 file changed, 6 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh >>>> index 5eb5044a103..f74e6014a0a 100755 >>>> --- a/t/perf/p7519-fsmonitor.sh >>>> +++ b/t/perf/p7519-fsmonitor.sh >>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' >>>> fi && >>>> mkdir 1_file 10_files 100_files 1000_files 10000_files && >>>> - for i in $(test_seq 1 10); do touch 10_files/$i; done && >>>> - for i in $(test_seq 1 100); do touch 100_files/$i; done && >>>> - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && >>>> - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && >>>> + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && >>>> + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && >>>> + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && >>>> + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && >>>> + >>>> git add 1_file 10_files 100_files 1000_files 10000_files && >>>> git commit -qm "Add files" && >>>> @@ -200,15 +201,12 @@ test_fsmonitor_suite() { >>>> # Update the mtimes on upto 100k files to make status think >>>> # that they are dirty. For simplicity, omit any files with >>>> # LFs (i.e. anything that ls-files thinks it needs to dquote). >>>> - # Then fully backslash-quote the paths to capture any >>>> - # whitespace so that they pass thru xargs properly. >>>> # >>>> test_perf_w_drop_caches "status (dirty) ($DESC)" ' >>>> git ls-files | \ >>>> head -100000 | \ >>>> grep -v \" | \ >>>> - sed '\''s/\(.\)/\\\1/g'\'' | \ >>>> - xargs test-tool chmtime -300 && >>>> + test-tool touch stdin && >>>> git status >>>> ' >>> Did you try to replace this with some variant of: >>> test_seq 1 10000 | xargs touch >>> Which (depending on your xargs version) would invoke "touch" >>> commands >>> with however many argv items it thinks you can handle. >>> >> >> a quick test on my Windows machine shows that >> >> test_seq 1 10000 | xargs touch >> >> takes 3.1 seconds. >> >> just a simple >> >> test_seq 1 10000 >/dev/null >> >> take 0.2 seconds. >> >> using my test-tool helper cuts that time in half. > > There's what Elijah mentioned about test_seq, so maybe it's just that. > > But what I was suggesting was using the xargs mode where it does N > arguments at a time. > > Does this work for you, and does it cause xargs to invoke "touch" with > the relevant N number of arguments, and does it help with the > performance? > > test_seq 1 10000 | xargs touch > test_seq 1 10000 | xargs -n 10 touch > test_seq 1 10000 | xargs -n 100 touch > test_seq 1 10000 | xargs -n 1000 touch The GFW SDK version of xargs does have `-n N` and it does work as advertised. And it does slow down things considerably. Letting it do ~2500 per command in 4 commands took the 3.1 seconds listed above. Add a -n 100 to it takes 5.7 seconds, so process creation overhead is a factor here. > > etc. > > Also I didn't notice this before, but the -300 part of "chmtime -300" > was redundant before then? I.e. you're implicitly changing it to "=+0" > instead with your "touch" helper, are you not? > Right. I'm changing it to the current time. Jeff
On Tue, Jul 13 2021, Jeff Hostetler wrote: > On 7/13/21 2:18 PM, Ævar Arnfjörð Bjarmason wrote: >> On Tue, Jul 13 2021, Jeff Hostetler wrote: >> >>> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote: >>>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >>>> >>>>> From: Jeff Hostetler <jeffhost@microsoft.com> >>>>> >>>>> Change p7519 to use a single "test-tool touch" command to update >>>>> the mtime on a series of (thousands) files instead of invoking >>>>> thousands of commands to update a single file. >>>>> >>>>> This is primarily for Windows where process creation is so >>>>> very slow and reduces the test run time by minutes. >>>>> >>>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >>>>> --- >>>>> t/perf/p7519-fsmonitor.sh | 14 ++++++-------- >>>>> 1 file changed, 6 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh >>>>> index 5eb5044a103..f74e6014a0a 100755 >>>>> --- a/t/perf/p7519-fsmonitor.sh >>>>> +++ b/t/perf/p7519-fsmonitor.sh >>>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' >>>>> fi && >>>>> mkdir 1_file 10_files 100_files 1000_files 10000_files && >>>>> - for i in $(test_seq 1 10); do touch 10_files/$i; done && >>>>> - for i in $(test_seq 1 100); do touch 100_files/$i; done && >>>>> - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && >>>>> - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && >>>>> + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && >>>>> + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && >>>>> + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && >>>>> + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && >>>>> + >>>>> git add 1_file 10_files 100_files 1000_files 10000_files && >>>>> git commit -qm "Add files" && >>>>> @@ -200,15 +201,12 @@ test_fsmonitor_suite() { >>>>> # Update the mtimes on upto 100k files to make status think >>>>> # that they are dirty. For simplicity, omit any files with >>>>> # LFs (i.e. anything that ls-files thinks it needs to dquote). >>>>> - # Then fully backslash-quote the paths to capture any >>>>> - # whitespace so that they pass thru xargs properly. >>>>> # >>>>> test_perf_w_drop_caches "status (dirty) ($DESC)" ' >>>>> git ls-files | \ >>>>> head -100000 | \ >>>>> grep -v \" | \ >>>>> - sed '\''s/\(.\)/\\\1/g'\'' | \ >>>>> - xargs test-tool chmtime -300 && >>>>> + test-tool touch stdin && >>>>> git status >>>>> ' >>>> Did you try to replace this with some variant of: >>>> test_seq 1 10000 | xargs touch >>>> Which (depending on your xargs version) would invoke "touch" >>>> commands >>>> with however many argv items it thinks you can handle. >>>> >>> >>> a quick test on my Windows machine shows that >>> >>> test_seq 1 10000 | xargs touch >>> >>> takes 3.1 seconds. >>> >>> just a simple >>> >>> test_seq 1 10000 >/dev/null >>> >>> take 0.2 seconds. >>> >>> using my test-tool helper cuts that time in half. >> There's what Elijah mentioned about test_seq, so maybe it's just >> that. >> But what I was suggesting was using the xargs mode where it does N >> arguments at a time. >> Does this work for you, and does it cause xargs to invoke "touch" >> with >> the relevant N number of arguments, and does it help with the >> performance? >> test_seq 1 10000 | xargs touch >> test_seq 1 10000 | xargs -n 10 touch >> test_seq 1 10000 | xargs -n 100 touch >> test_seq 1 10000 | xargs -n 1000 touch > > The GFW SDK version of xargs does have `-n N` and it does work as > advertised. And it does slow down things considerably. Letting it > do ~2500 per command in 4 commands took the 3.1 seconds listed above. > > Add a -n 100 to it takes 5.7 seconds, so process creation overhead > is a factor here. Doesn't -n 2500 being faster than -n 100 suggest the opposite of process overhead being the deciding factor? With -n 2500 you'll invoke 4 touch processes, so one takes 2500/3.1 =~ 0.8s to run, whereas with -n 100 you invoke 100 of them, so if the overall time is then 5.7 seconds that's 5.7/100 =~ ~0.06s. Or am I misunderstanding you, or does some implicit parallelism kick in with that version of xargs depending on -n? >> etc. >> Also I didn't notice this before, but the -300 part of "chmtime >> -300" >> was redundant before then? I.e. you're implicitly changing it to "=+0" >> instead with your "touch" helper, are you not? >> > > Right. I'm changing it to the current time. If that "while we're at it change the behavior of the test" is wanted I think it should be called out in the commit message. Right now it looks like it might be an unintentional regression in the test.
diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh index 5eb5044a103..f74e6014a0a 100755 --- a/t/perf/p7519-fsmonitor.sh +++ b/t/perf/p7519-fsmonitor.sh @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" ' fi && mkdir 1_file 10_files 100_files 1000_files 10000_files && - for i in $(test_seq 1 10); do touch 10_files/$i; done && - for i in $(test_seq 1 100); do touch 100_files/$i; done && - for i in $(test_seq 1 1000); do touch 1000_files/$i; done && - for i in $(test_seq 1 10000); do touch 10000_files/$i; done && + test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 && + test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 && + test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 && + test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 && + git add 1_file 10_files 100_files 1000_files 10000_files && git commit -qm "Add files" && @@ -200,15 +201,12 @@ test_fsmonitor_suite() { # Update the mtimes on upto 100k files to make status think # that they are dirty. For simplicity, omit any files with # LFs (i.e. anything that ls-files thinks it needs to dquote). - # Then fully backslash-quote the paths to capture any - # whitespace so that they pass thru xargs properly. # test_perf_w_drop_caches "status (dirty) ($DESC)" ' git ls-files | \ head -100000 | \ grep -v \" | \ - sed '\''s/\(.\)/\\\1/g'\'' | \ - xargs test-tool chmtime -300 && + test-tool touch stdin && git status '