Message ID | pull.1099.git.1640012469.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Trace2 stopwatch timers and global counters | expand |
On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: > Extend Trace2 to provide multiple "stopwatch timers" and "global counters". > 3. Rationale > > Timers and counters are an alternative to the existing "region" and "data" > events. The latter are intended to trace the major flow (or phases) of the > program and possibly capture the amount of work performed within a loop, for > example. The former are offered as a way to measure activity that is not > localized, such as the time spent in zlib or lstat, which may be called from > many different parts of the program. > > There are currently several places in the Git code where we want to measure > such activity -- changed-path Bloom filter stats, topo-walk commit counts, > and tree-walk counts and max-depths. A conversation in [1] suggested that we > should investigate a more general mechanism to collect stats so that each > instance doesn't need to recreate their own atexit handling mechanism. > > This is an attempt to address that and let us easily explore other areas in > the future. > > This patch series does not attempt to refactor those three instances to use > the new timers and counters. That should be a separate effort -- in part > because we may want to retool them rather than just translate them. For > example, rather than just translating the existing four Bloom filter counts > (in revision.c) into Trace2 counters, we may instead want to have a "happy > path timer" and a "sad path timer" if that would provide more insight. I'm excited for these API features. It might be nice to have an RFC- quality series demonstrating how these examples could work with the new API. Makes sense to delay in case there were recommended changes to the API from review in this v1. I also like your attention to thread contexts. I think these timers would be very interesting to use in parallel checkout. CC'ing Matheus for his thoughts on where he would want timer summaries for that feature. I would probably want the per-thread summary to know if we are blocked on one really long thread while the others finish quickly. Within that: what are the things causing us to be slow? Is it zlib? Is it lstat()? Thanks, -Stolee
On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: > > > > 3. Rationale > > > > Timers and counters are an alternative to the existing "region" and "data" > > events. The latter are intended to trace the major flow (or phases) of the > > program and possibly capture the amount of work performed within a loop, for > > example. The former are offered as a way to measure activity that is not > > localized, such as the time spent in zlib or lstat, which may be called from > > many different parts of the program. > > I'm excited for these API features. Me too! This would have been very useful on some experiments I had to run in the past. Thanks for working on it, Jeff :) > I also like your attention to thread contexts. I think these timers > would be very interesting to use in parallel checkout. CC'ing Matheus > for his thoughts on where he would want timer summaries for that > feature. For parallel checkout, I think it would be interesting to have timer summaries for open/close, fstat/lstat, write, and inflation/delta-reconstruction. Perhaps pkt-line routines too, so that we can see how much time we spend in inter-process communication. It would be nice to have timer information for disk reading as well (more on that below), but I don't think it is possible since we read the objects through mmap() and thus, we cannot easily isolate the actual reading time from the decompression time :( > I would probably want the per-thread summary to know if we > are blocked on one really long thread while the others finish quickly. That would be interesting. Parallel checkout actually uses subprocesses, but I can see the per-thread summary being useful on grep, for example. (Nevertheless, the use case you mentioned for the timers -- to evaluate the work balance on parallel checkout -- seems very interesting.) > Within that: what are the things causing us to be slow? Is it zlib? > Is it lstat()? On my tests, the bottleneck on checkout heavily depended on the underlying storage type. On HDDs, the bottleneck was object reading (i.e. page faults on mmap()-ed files), with about 70% to 80% of the checkout runtime. On SSDs, reading was much faster, so CPU (i.e. inflation) became the bottleneck, with 50% of the runtime. (Inflation only lost to reading when checking out from *many* loose objects.) Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat() (which makes the NFS client flush previously cached writes to the server) were the bottlenecks, with about 40% of the total runtime each. These numbers come from a (sequential) `git checkout .` execution on an empty working tree of the Linux kernel (v5.12), and they were gathered using eBPF-based profilers. For other operations, especially ones that require many file removals or more laborious tree merging in unpack_trees(), I suspect the bottlenecks may change. If anyone would be interested in seeing the flamegraphs and other plots for these profiling numbers, I have them at: https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling And there is a bit more context at: https://matheustavares.gitlab.io/posts/parallel-checkout