Message ID | 1328fe72-1a27-b214-c226-d239099be673@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Avoid spawning gzip in git archive | expand |
On Wed, Jun 15 2022, René Scharfe wrote: > Git uses zlib for its own object store, but calls gzip when creating tgz > archives. Add an option to perform the gzip compression for the latter > using zlib, without depending on the external gzip binary. > > Plug it in by making write_block a function pointer and switching to a > compressing variant if the filter command has the magic value "git > archive gzip". Does that indirection slow down tar creation? Not > really, at least not in this test: > > $ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \ > './git -C ../linux archive --format=tar HEAD # {rev}' Shameless plug: https://lore.kernel.org/git/211201.86r1aw9gbd.gmgdl@evledraar.gmail.com/ I.e. a "hyperfine" wrapper I wrote to make exactly this sort of thing easier. You'll find that you need less or no --warmup with it, since the checkout flip-flopping and re-making (and resulting FS and other cache eviction) will go away, as we'll use different "git worktree"'s for the two "rev". (Also, putting those on a ramdisk really helps) > Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD > Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s] > Range (min … max): 4.038 s … 4.059 s 10 runs > > Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main > Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s] > Range (min … max): 4.038 s … 4.066 s 10 runs > > How does tgz creation perform? > > $ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \ > './git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD' > Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD > Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s] > Range (min … max): 20.395 s … 20.414 s 10 runs > > Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD > Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s] > Range (min … max): 23.782 s … 23.857 s 10 runs > > Summary > './git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran > 1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD' > > So the internal implementation takes 17% longer on the Linux repo, but > uses 2% less CPU time. That's because the external gzip can run in > parallel on its own processor, while the internal one works sequentially > and avoids the inter-process communication overhead. > > What are the benefits? Only an internal sequential implementation can > offer this eco mode, and it allows avoiding the gzip(1) requirement. I had been keeping one eye on this series, but didn't look at it in any detail. I found this after reading 6/6, which I think in any case could really use some "why" summary, which seems to mostly be covered here. I.e. it's unclear if the "drop the dependency on gzip(1)" in 6/6 is a reference to the GZIP test dependency, or that our users are unlikely to have "gzip(1)" on their systems. If it's the latter I'd much rather (as a user) take a 17% wallclock improvement over a 2% cost of CPU. I mostly care about my own time, not that of the CPU. Can't we have our 6/6 cake much easier and eat it too by learning a "fallback" mode, i.e. we try to invoke gzip, and if that doesn't work use the "internal" one? Re the "eco mode": I also wonder how much of the overhead you're seeing for both that 17% and 2% would go away if you pin both processes to the same CPU, I can't recall the command offhand, but IIRC taskset or numactl can do that. I.e. is this really measuring IPC overhead, or I-CPU overhead on your system?
Am 15.06.22 um 22:32 schrieb Ævar Arnfjörð Bjarmason: > > On Wed, Jun 15 2022, René Scharfe wrote: > >> Git uses zlib for its own object store, but calls gzip when creating tgz >> archives. Add an option to perform the gzip compression for the latter >> using zlib, without depending on the external gzip binary. >> >> Plug it in by making write_block a function pointer and switching to a >> compressing variant if the filter command has the magic value "git >> archive gzip". Does that indirection slow down tar creation? Not >> really, at least not in this test: >> >> $ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \ >> './git -C ../linux archive --format=tar HEAD # {rev}' > > Shameless plug: https://lore.kernel.org/git/211201.86r1aw9gbd.gmgdl@evledraar.gmail.com/ > > I.e. a "hyperfine" wrapper I wrote to make exactly this sort of thing > easier. > > You'll find that you need less or no --warmup with it, since the > checkout flip-flopping and re-making (and resulting FS and other cache > eviction) will go away, as we'll use different "git worktree"'s for the > two "rev". OK, but requiring hyperfine alone is burden enough for reviewers. I had a try anyway and it took me a while to realize that git-hyperfine requires setting the Git config option hyperfine.run-dir band that it ignores it on my system. Had to hard-code it in the script. > (Also, putting those on a ramdisk really helps) > >> Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD >> Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s] >> Range (min … max): 4.038 s … 4.059 s 10 runs >> >> Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main >> Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s] >> Range (min … max): 4.038 s … 4.066 s 10 runs >> >> How does tgz creation perform? >> >> $ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \ >> './git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD' >> Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD >> Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s] >> Range (min … max): 20.395 s … 20.414 s 10 runs >> >> Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD >> Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s] >> Range (min … max): 23.782 s … 23.857 s 10 runs >> >> Summary >> './git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran >> 1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD' >> >> So the internal implementation takes 17% longer on the Linux repo, but >> uses 2% less CPU time. That's because the external gzip can run in >> parallel on its own processor, while the internal one works sequentially >> and avoids the inter-process communication overhead. >> >> What are the benefits? Only an internal sequential implementation can >> offer this eco mode, and it allows avoiding the gzip(1) requirement. > > I had been keeping one eye on this series, but didn't look at it in any > detail. > > I found this after reading 6/6, which I think in any case could really > use some "why" summary, which seems to mostly be covered here. > > I.e. it's unclear if the "drop the dependency on gzip(1)" in 6/6 is a > reference to the GZIP test dependency, or that our users are unlikely to > have "gzip(1)" on their systems. It's to avoid a run dependency; the build/test dependency remains. > If it's the latter I'd much rather (as a user) take a 17% wallclock > improvement over a 2% cost of CPU. I mostly care about my own time, not > that of the CPU. Understandable, and you can set tar.tgz.command='gzip -cn' to get the old behavior. Saving energy is a better default, though. The runtime in the real world probably includes lots more I/O time. The tests above are repeated and warmed up to get consistent measurements, but big repos are probably not fully kept in memory like that. > Can't we have our 6/6 cake much easier and eat it too by learning a > "fallback" mode, i.e. we try to invoke gzip, and if that doesn't work > use the "internal" one? Interesting idea, but I think the existing config option suffices. E.g. a distro could set it in the system-wide config file if/when gzip is installed. > Re the "eco mode": I also wonder how much of the overhead you're seeing > for both that 17% and 2% would go away if you pin both processes to the > same CPU, I can't recall the command offhand, but IIRC taskset or > numactl can do that. I.e. is this really measuring IPC overhead, or > I-CPU overhead on your system? I'd expect that running git archive and gzip at the same CPU core takes more wall-clock time than using zlib because inflating the object files and deflating the archive are done sequentially in both scenarios. Can't test it on macOS because it doesn't offer a way to pin programs to a certain core, but e.g. someone with access to a Linux system can check that using taskset(1). René
On Thu, Jun 16 2022, René Scharfe wrote: > Am 15.06.22 um 22:32 schrieb Ævar Arnfjörð Bjarmason: >> [...] > Understandable, and you can set tar.tgz.command='gzip -cn' to get the > old behavior. Saving energy is a better default, though. I disagree with that in general, a big reason for why git won out over other VCS's is that it wasn't as slow. I think we should primarily be interested in the time a user might end up staring at the screen. I understand the concern to have "git archive" just work, e.g. if you uninstall gzip(1) (although that seems rather obscure, but perhaps this is for more minimal setups). I don't think saving energy is a virtue, *maybe* it is, but maybe your computer is powered by hydro, solar or nuclear instead of coal, so even if we're taking global energy policy into account for changes to git it's highly context dependant. In any case, this is also true for pretty much any other git command that might spawn processes or threads, e.g. "git grep": $ hyperfine -w3 -L cpus 0,0-7 'taskset --cpu-list {cpus} ./git grep foo.*bar' -r 10 Benchmark 1: taskset --cpu-list 0 ./git grep foo.*bar Time (mean ± σ): 39.3 ms ± 1.2 ms [User: 20.0 ms, System: 18.6 ms] Range (min … max): 38.2 ms … 41.8 ms 10 runs Benchmark 2: taskset --cpu-list 0-7 ./git grep foo.*bar Time (mean ± σ): 28.1 ms ± 1.3 ms [User: 43.5 ms, System: 51.0 ms] Range (min … max): 26.6 ms … 31.2 ms 10 runs Summary 'taskset --cpu-list 0-7 ./git grep foo.*bar' ran 1.40 ± 0.08 times faster than 'taskset --cpu-list 0 ./git grep foo.*bar' Here we use less than 1/2 the user/system time when I pin it to 1 cpu, but we're 40% slower. So this is a bit of a digression, but this particular thing seems much better left to the OS or your hardware's CPU throttling policy. To the extent that we care perhaps more fitting would be to have a global core.wrapper-cmd option or something, so you could pass all git commands through "taskset" (or your local equivalent), or just use shell aliases. > The runtime in the real world probably includes lots more I/O time. The > tests above are repeated and warmed up to get consistent measurements, > but big repos are probably not fully kept in memory like that. > >> Can't we have our 6/6 cake much easier and eat it too by learning a >> "fallback" mode, i.e. we try to invoke gzip, and if that doesn't work >> use the "internal" one? > > Interesting idea, but I think the existing config option suffices. E.g. > a distro could set it in the system-wide config file if/when gzip is > installed. I think in practice distros are unlikely to have such triggers for "package X is installed, let's set config Y". I mean, e.g. Debian can do that with its packaging system, but it's expecting a lot. Why not flip the default depending on if start_command() fails? >> Re the "eco mode": I also wonder how much of the overhead you're seeing >> for both that 17% and 2% would go away if you pin both processes to the >> same CPU, I can't recall the command offhand, but IIRC taskset or >> numactl can do that. I.e. is this really measuring IPC overhead, or >> I-CPU overhead on your system? > > I'd expect that running git archive and gzip at the same CPU core takes > more wall-clock time than using zlib because inflating the object files > and deflating the archive are done sequentially in both scenarios. > Can't test it on macOS because it doesn't offer a way to pin programs to > a certain core, but e.g. someone with access to a Linux system can check > that using taskset(1). Here's a benchmark, this is your hyperfine command, just with taskset added. It's an 8-core box, so 0-7 is "all CPUs" (I think...): hyperfine -w3 \ -L cpus 0,0-7 \ -L command 'gzip -cn','git archive gzip' \ 'taskset --cpu-list {cpus} ./git -c tar.tgz.command="{command}" archive --format=tgz HEAD' Which gives me: Benchmark 1: taskset --cpu-list 0 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD Time (mean ± σ): 1.561 s ± 0.029 s [User: 1.503 s, System: 0.058 s] Range (min … max): 1.522 s … 1.622 s 10 runs Benchmark 2: taskset --cpu-list 0-7 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD Time (mean ± σ): 1.337 s ± 0.029 s [User: 1.535 s, System: 0.075 s] Range (min … max): 1.298 s … 1.388 s 10 runs Benchmark 3: taskset --cpu-list 0 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD Time (mean ± σ): 1.493 s ± 0.032 s [User: 1.453 s, System: 0.040 s] Range (min … max): 1.462 s … 1.572 s 10 runs Benchmark 4: taskset --cpu-list 0-7 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD Time (mean ± σ): 1.503 s ± 0.026 s [User: 1.466 s, System: 0.036 s] Range (min … max): 1.469 s … 1.542 s 10 runs Summary 'taskset --cpu-list 0-7 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD' ran 1.12 ± 0.03 times faster than 'taskset --cpu-list 0 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD' 1.12 ± 0.03 times faster than 'taskset --cpu-list 0-7 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD' 1.17 ± 0.03 times faster than 'taskset --cpu-list 0 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD' Whic I think should control for the IPC overhead v.s. the advantage of multicore. I.e. we're faster with "gzip -cn" on multicore, but the internal implementation has an advantage when it comes to I tried out the fallback method, memory leaks aside (needs to do a proper cleanup) this seems to work. Most of the diff is moving the existing code into a function: diff --git a/archive-tar.c b/archive-tar.c index 3d77e0f7509..a1b08812ee3 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -458,14 +458,36 @@ static void tgz_write_block(const void *data) tgz_deflate(Z_NO_FLUSH); } +static const char default_gzip_command[] = "gzip -cn"; static const char internal_gzip_command[] = "git archive gzip"; -static int write_tar_filter_archive(const struct archiver *ar, - struct archiver_args *args) +static int do_internal_gzip(const struct archiver *ar, + struct archiver_args *args) { #if ZLIB_VERNUM >= 0x1221 struct gz_header_s gzhead = { .os = 3 }; /* Unix, for reproducibility */ #endif + int r; + + write_block = tgz_write_block; + git_deflate_init_gzip(&gzstream, args->compression_level); +#if ZLIB_VERNUM >= 0x1221 + if (deflateSetHeader(&gzstream.z, &gzhead) != Z_OK) + BUG("deflateSetHeader() called too late"); +#endif + gzstream.next_out = outbuf; + gzstream.avail_out = sizeof(outbuf); + + r = write_tar_archive(ar, args); + + tgz_deflate(Z_FINISH); + git_deflate_end(&gzstream); + return r; +} + +static int write_tar_filter_archive(const struct archiver *ar, + struct archiver_args *args) +{ struct strbuf cmd = STRBUF_INIT; struct child_process filter = CHILD_PROCESS_INIT; int r; @@ -473,33 +495,24 @@ static int write_tar_filter_archive(const struct archiver *ar, if (!ar->filter_command) BUG("tar-filter archiver called with no filter defined"); - if (!strcmp(ar->filter_command, internal_gzip_command)) { - write_block = tgz_write_block; - git_deflate_init_gzip(&gzstream, args->compression_level); -#if ZLIB_VERNUM >= 0x1221 - if (deflateSetHeader(&gzstream.z, &gzhead) != Z_OK) - BUG("deflateSetHeader() called too late"); -#endif - gzstream.next_out = outbuf; - gzstream.avail_out = sizeof(outbuf); - - r = write_tar_archive(ar, args); - - tgz_deflate(Z_FINISH); - git_deflate_end(&gzstream); - return r; - } + if (!strcmp(ar->filter_command, internal_gzip_command)) + return do_internal_gzip(ar, args); strbuf_addstr(&cmd, ar->filter_command); if (args->compression_level >= 0) strbuf_addf(&cmd, " -%d", args->compression_level); - strvec_push(&filter.args, cmd.buf); - filter.use_shell = 1; + strvec_split(&filter.args, cmd.buf); filter.in = -1; - if (start_command(&filter) < 0) + if (start_command(&filter) < 0) { + if (!strcmp(ar->filter_command, default_gzip_command)) { + warning_errno(_("could not start '%s' filter, falling back to '%s'"), + ar->filter_command, internal_gzip_command); + return do_internal_gzip(ar, args); + } die_errno(_("unable to start '%s' filter"), cmd.buf); + } close(1); if (dup2(filter.in, 1) < 0) die_errno(_("unable to redirect descriptor"));
Am 24.06.22 um 13:13 schrieb Ævar Arnfjörð Bjarmason: > > On Thu, Jun 16 2022, René Scharfe wrote: > >> Am 15.06.22 um 22:32 schrieb Ævar Arnfjörð Bjarmason: >>> [...] >> Understandable, and you can set tar.tgz.command='gzip -cn' to get the >> old behavior. Saving energy is a better default, though. > > I disagree with that in general, a big reason for why git won out over > other VCS's is that it wasn't as slow. I think we should primarily be > interested in the time a user might end up staring at the screen. > > I understand the concern to have "git archive" just work, e.g. if you > uninstall gzip(1) (although that seems rather obscure, but perhaps this > is for more minimal setups). The previous attempt came from/via Git on Windows. > I don't think saving energy is a virtue, *maybe* it is, but maybe your > computer is powered by hydro, solar or nuclear instead of coal, so even > if we're taking global energy policy into account for changes to git > it's highly context dependant. Or a device runs on battery power and saving energy keeps it running a bit longer. Or it's housed in a data center and saving energy helps reduce cooling requirements. > In any case, this is also true for pretty much any other git command > that might spawn processes or threads, e.g. "git grep": > > $ hyperfine -w3 -L cpus 0,0-7 'taskset --cpu-list {cpus} ./git grep foo.*bar' -r 10 > Benchmark 1: taskset --cpu-list 0 ./git grep foo.*bar > Time (mean ± σ): 39.3 ms ± 1.2 ms [User: 20.0 ms, System: 18.6 ms] > Range (min … max): 38.2 ms … 41.8 ms 10 runs > > Benchmark 2: taskset --cpu-list 0-7 ./git grep foo.*bar > Time (mean ± σ): 28.1 ms ± 1.3 ms [User: 43.5 ms, System: 51.0 ms] > Range (min … max): 26.6 ms … 31.2 ms 10 runs > > Summary > 'taskset --cpu-list 0-7 ./git grep foo.*bar' ran > 1.40 ± 0.08 times faster than 'taskset --cpu-list 0 ./git grep foo.*bar' > > Here we use less than 1/2 the user/system time when I pin it to 1 cpu, > but we're 40% slower. > > So this is a bit of a digression, but this particular thing seems much > better left to the OS or your hardware's CPU throttling policy. To the > extent that we care perhaps more fitting would be to have a global > core.wrapper-cmd option or something, so you could pass all git commands > through "taskset" (or your local equivalent), or just use shell aliases. Not sure what conclusion to draw from these numbers. Perhaps that computation is not the bottleneck here (increasing the number of cores by 700% increases speed only by 40%)? That coordination overhead makes up a big percentage and there might be room for improvement/tuning? In any case, I agree we should leave scheduling decisions at runtime to the OS. >> The runtime in the real world probably includes lots more I/O time. The >> tests above are repeated and warmed up to get consistent measurements, >> but big repos are probably not fully kept in memory like that. On top of that I guess only few people create tgz files at all. Most of them I would expect to be created automatically (and cached) by sites like kernel.org. So I imagine people rather create tar.xz, tar.zst or zip archives these days. Or use git at both ends (push/pull), as they should. ;-) I have no data to support this guess, though. But yeah, the tradeoff sounds a bit weird: Give 17% duration, get 2% CPU time back -- sounds like a ripoff. In your example below it's 12% longer duration for 5% saved CPU time, which sounds a bit better, but still not terribly attractive. Look at it from a different angle: This basic sequential implementation is better for non-interactive tgz creation due to its slightly lower CPU usage, which we cannot achieve with any parallel process setup. It's easier to deploy because it doesn't need gzip. Its runtime hit isn't *that* hard, and people interested primarily in speed should parallelize the expensive part, deflate, not run the cheap tar creation parallel to a single-threaded deflate. I.e. they should already run pigz (https://zlib.net/pigz/). $ hyperfine -L gz gzip,pigz -w3 'git -C ../linux archive --format=tar HEAD | {gz} -cn' Benchmark 1: git -C ../linux archive --format=tar HEAD | gzip -cn Time (mean ± σ): 20.764 s ± 0.007 s [User: 24.119 s, System: 0.606 s] Range (min … max): 20.758 s … 20.781 s 10 runs Benchmark 2: git -C ../linux archive --format=tar HEAD | pigz -cn Time (mean ± σ): 6.077 s ± 0.023 s [User: 29.708 s, System: 1.599 s] Range (min … max): 6.037 s … 6.125 s 10 runs Summary 'git -C ../linux archive --format=tar HEAD | pigz -cn' ran 3.42 ± 0.01 times faster than 'git -C ../linux archive --format=tar HEAD | gzip -cn' >>> Can't we have our 6/6 cake much easier and eat it too by learning a >>> "fallback" mode, i.e. we try to invoke gzip, and if that doesn't work >>> use the "internal" one? >> >> Interesting idea, but I think the existing config option suffices. E.g. >> a distro could set it in the system-wide config file if/when gzip is >> installed. > > I think in practice distros are unlikely to have such triggers for > "package X is installed, let's set config Y". I mean, e.g. Debian can do > that with its packaging system, but it's expecting a lot. I don't *expect* any reaction either way, but packagers *can* go with a custom config if they see the need. > Why not flip > the default depending on if start_command() fails? Because it's harder to test and support due to its more complicated behavior, and I don't see why it would be needed. >>> Re the "eco mode": I also wonder how much of the overhead you're seeing >>> for both that 17% and 2% would go away if you pin both processes to the >>> same CPU, I can't recall the command offhand, but IIRC taskset or >>> numactl can do that. I.e. is this really measuring IPC overhead, or >>> I-CPU overhead on your system? >> >> I'd expect that running git archive and gzip at the same CPU core takes >> more wall-clock time than using zlib because inflating the object files >> and deflating the archive are done sequentially in both scenarios. >> Can't test it on macOS because it doesn't offer a way to pin programs to >> a certain core, but e.g. someone with access to a Linux system can check >> that using taskset(1). > > Here's a benchmark, this is your hyperfine command, just with taskset > added. It's an 8-core box, so 0-7 is "all CPUs" (I think...): > > hyperfine -w3 \ > -L cpus 0,0-7 \ > -L command 'gzip -cn','git archive gzip' \ > 'taskset --cpu-list {cpus} ./git -c tar.tgz.command="{command}" archive --format=tgz HEAD' > > Which gives me: > > Benchmark 1: taskset --cpu-list 0 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD > Time (mean ± σ): 1.561 s ± 0.029 s [User: 1.503 s, System: 0.058 s] > Range (min … max): 1.522 s … 1.622 s 10 runs > > Benchmark 2: taskset --cpu-list 0-7 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD > Time (mean ± σ): 1.337 s ± 0.029 s [User: 1.535 s, System: 0.075 s] > Range (min … max): 1.298 s … 1.388 s 10 runs > > Benchmark 3: taskset --cpu-list 0 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD > Time (mean ± σ): 1.493 s ± 0.032 s [User: 1.453 s, System: 0.040 s] > Range (min … max): 1.462 s … 1.572 s 10 runs > > Benchmark 4: taskset --cpu-list 0-7 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD > Time (mean ± σ): 1.503 s ± 0.026 s [User: 1.466 s, System: 0.036 s] > Range (min … max): 1.469 s … 1.542 s 10 runs > > Summary > 'taskset --cpu-list 0-7 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD' ran > 1.12 ± 0.03 times faster than 'taskset --cpu-list 0 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD' > 1.12 ± 0.03 times faster than 'taskset --cpu-list 0-7 ./git -c tar.tgz.command="git archive gzip" archive --format=tgz HEAD' > 1.17 ± 0.03 times faster than 'taskset --cpu-list 0 ./git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD' > > Whic I think should control for the IPC overhead v.s. the advantage of > multicore. I.e. we're faster with "gzip -cn" on multicore, but the > internal implementation has an advantage when it comes to Right, #1, #3 and #4 all run sequentially, but #1 has the pipe overhead to deal with as well, which adds 5 percentage points to its runtime. René
diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index ff3f7b0344..b2d1b63d31 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -148,7 +148,8 @@ tar.<format>.command:: to the command (e.g., `-9`). + The `tar.gz` and `tgz` formats are defined automatically and use the -command `gzip -cn` by default. +command `gzip -cn` by default. An internal gzip implementation can be +used by specifying the value `git archive gzip`. tar.<format>.remote:: If true, enable the format for use by remote clients via diff --git a/archive-tar.c b/archive-tar.c index 4e6a3deb80..53d0ef685c 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -38,11 +38,13 @@ static int write_tar_filter_archive(const struct archiver *ar, #define USTAR_MAX_MTIME 077777777777ULL #endif -static void write_block(const void *buf) +static void tar_write_block(const void *buf) { write_or_die(1, buf, BLOCKSIZE); } +static void (*write_block)(const void *) = tar_write_block; + /* writes out the whole block, but only if it is full */ static void write_if_needed(void) { @@ -430,6 +432,34 @@ static int write_tar_archive(const struct archiver *ar, return err; } +static git_zstream gzstream; +static unsigned char outbuf[16384]; + +static void tgz_deflate(int flush) +{ + while (gzstream.avail_in || flush == Z_FINISH) { + int status = git_deflate(&gzstream, flush); + if (!gzstream.avail_out || status == Z_STREAM_END) { + write_or_die(1, outbuf, gzstream.next_out - outbuf); + gzstream.next_out = outbuf; + gzstream.avail_out = sizeof(outbuf); + if (status == Z_STREAM_END) + break; + } + if (status != Z_OK && status != Z_BUF_ERROR) + die(_("deflate error (%d)"), status); + } +} + +static void tgz_write_block(const void *data) +{ + gzstream.next_in = (void *)data; + gzstream.avail_in = BLOCKSIZE; + tgz_deflate(Z_NO_FLUSH); +} + +static const char internal_gzip_command[] = "git archive gzip"; + static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args) { @@ -440,6 +470,19 @@ static int write_tar_filter_archive(const struct archiver *ar, if (!ar->filter_command) BUG("tar-filter archiver called with no filter defined"); + if (!strcmp(ar->filter_command, internal_gzip_command)) { + write_block = tgz_write_block; + git_deflate_init_gzip(&gzstream, args->compression_level); + gzstream.next_out = outbuf; + gzstream.avail_out = sizeof(outbuf); + + r = write_tar_archive(ar, args); + + tgz_deflate(Z_FINISH); + git_deflate_end(&gzstream); + return r; + } + strbuf_addstr(&cmd, ar->filter_command); if (args->compression_level >= 0) strbuf_addf(&cmd, " -%d", args->compression_level); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 7f8d2ab0a7..9ac0ec67fe 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -374,6 +374,22 @@ test_expect_success GZIP 'remote tar.gz can be disabled' ' >remote.tar.gz ' +test_expect_success 'git archive --format=tgz (internal gzip)' ' + test_config tar.tgz.command "git archive gzip" && + git archive --format=tgz HEAD >internal_gzip.tgz +' + +test_expect_success 'git archive --format=tar.gz (internal gzip)' ' + test_config tar.tar.gz.command "git archive gzip" && + git archive --format=tar.gz HEAD >internal_gzip.tar.gz && + test_cmp_bin internal_gzip.tgz internal_gzip.tar.gz +' + +test_expect_success GZIP 'extract tgz file (internal gzip)' ' + gzip -d -c <internal_gzip.tgz >internal_gzip.tar && + test_cmp_bin b.tar internal_gzip.tar +' + test_expect_success 'archive and :(glob)' ' git archive -v HEAD -- ":(glob)**/sh" >/dev/null 2>actual && cat >expect <<EOF &&
Git uses zlib for its own object store, but calls gzip when creating tgz archives. Add an option to perform the gzip compression for the latter using zlib, without depending on the external gzip binary. Plug it in by making write_block a function pointer and switching to a compressing variant if the filter command has the magic value "git archive gzip". Does that indirection slow down tar creation? Not really, at least not in this test: $ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \ './git -C ../linux archive --format=tar HEAD # {rev}' Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s] Range (min … max): 4.038 s … 4.059 s 10 runs Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s] Range (min … max): 4.038 s … 4.066 s 10 runs How does tgz creation perform? $ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \ './git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD' Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s] Range (min … max): 20.395 s … 20.414 s 10 runs Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s] Range (min … max): 23.782 s … 23.857 s 10 runs Summary './git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran 1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD' So the internal implementation takes 17% longer on the Linux repo, but uses 2% less CPU time. That's because the external gzip can run in parallel on its own processor, while the internal one works sequentially and avoids the inter-process communication overhead. What are the benefits? Only an internal sequential implementation can offer this eco mode, and it allows avoiding the gzip(1) requirement. This implementation uses the helper functions from our zlib.c instead of the convenient gz* functions from zlib, because the latter doesn't give the control over the generated gzip header that the next patch requires. Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- Documentation/git-archive.txt | 3 ++- archive-tar.c | 45 ++++++++++++++++++++++++++++++++++- t/t5000-tar-tree.sh | 16 +++++++++++++ 3 files changed, 62 insertions(+), 2 deletions(-) -- 2.36.1