Message ID | pull.923.git.1617291666.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Builtin FSMonitor Feature | expand |
"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > This patch series adds a builtin FSMonitor daemon to Git. This hasn't seen much (if any) activity for a few weeks. Does that mean nobody (other than obviously the author and whoever wanted to have this feature) is interested? What does it need to get this topic unstuck? > Finally, having a builtin daemon eliminates the need for user to download > and install a third-party tool. This makes enterprise deployments simpler > since there are fewer parts to install, maintain, and updates to track. > > This RFC version includes support for Windows and MacOS file system events. > A Linux version will be submitted in a later patch series.
Hi Junio, On Fri, 16 Apr 2021, Junio C Hamano wrote: > "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > This patch series adds a builtin FSMonitor daemon to Git. > > This hasn't seen much (if any) activity for a few weeks. It actually is a good sign: I integrated this into Git for Windows (as an experimental feature) and am running with it for a couple of weeks already (in _all_ worktrees, not just the massively large ones). At first, I ran into a handful Blue Screens of Death, and I was worried that they should be attributed to FSMonitor. But it turns out that this issue was most likely caused by a Windows update, and semi-resolved with another Windows update (and only happens because I use WSL extensively). In other words, those crashes are not related to FSMonitor. So yeah, I find the lack of activity pretty good news. However, I would have hoped that this patch series would see a couple of reviews in the meantime. Since I was involved in the development of this patch series (I started it just before I got dragged into all that security work that led to v2.24.1, and never quite got back to it after that), I wondered whether it would be "self review" if I reviewed those patches, which is something I'd rather avoid. But if nobody else reviews the patches, I will. > Does that mean nobody (other than obviously the author and whoever > wanted to have this feature) is interested? The most likely reason why this does not see more reviews is that it matters most for massive worktrees, and I don't think anybody here works with those. The closest to a massive worktree I have is the `git-sdk-64` repository (which has pretty much nothing to do with source code at all, it is just a matter of convenience that this is a Git repository; Think of it as if somebody mirrored their Ubuntu installation by tracking it in a Git repository and cloning it onto all of their machines). And that is not really all that massive: $ git -C / ls-files | wc -l 162975 That's tiny compared to some worktrees I saw. But we should not mistake the needs of those on the Git mailing list (`git ls-tree -r v2.31.1 | wc -l` says we have only 3901 files/symlinks) for the needs of some of our biggest users. So I would like to respectfully ask for this patch series to be kept under consideration for `next`. > What does it need to get this topic unstuck? The same resource that you keep complaining about, and that seems to be drained more quickly than it can be replenished: reviewers. I am as guilty as the next person, of course, and it does not help that I get Cc:ed on several dozen patches seemingly every couple of days: this is just too much, and I cannot do it, so I admittedly neglect too many patch series (even the ones that I _do_ want to review, such as the `bisect-in-c` one). My inbox is seriously no fun place to visit right now. > > Finally, having a builtin daemon eliminates the need for user to download > > and install a third-party tool. This makes enterprise deployments simpler > > since there are fewer parts to install, maintain, and updates to track. > > > > This RFC version includes support for Windows and MacOS file system events. > > A Linux version will be submitted in a later patch series. I guess this is another reason why this patch series did not see many reviews: the lack of a Linux backend. And I fear that the statement "A Linux version will be submitted in a later patch series" is a bit strong, given that my original implementation of that backend does not really do its job well: it uses `inotify` and therefore requires one handle _per directory_, which in turn drains the number of file handles rather quickly when your worktree has many directories. Meaning: It fails todoes not work in the massive worktrees for which it was intended. Now, I heard rumors that there is a saner way to monitor directory trees in recent Linux kernel versions (Jeff, can you fill in where I am blanking?) and it might be a good idea to solicit volunteers to tackle this backend, so that the Linux-leaning crowd on this here mailing list is interested a bit more? Ciao, Dscho
On 4/20/21 11:27 AM, Johannes Schindelin wrote: > Hi Junio, ... >>> This RFC version includes support for Windows and MacOS file system events. >>> A Linux version will be submitted in a later patch series. > > I guess this is another reason why this patch series did not see many > reviews: the lack of a Linux backend. And I fear that the statement "A > Linux version will be submitted in a later patch series" is a bit strong, > given that my original implementation of that backend does not really do > its job well: it uses `inotify` and therefore requires one handle _per > directory_, which in turn drains the number of file handles rather quickly > when your worktree has many directories. Meaning: It fails todoes not work in the > massive worktrees for which it was intended. > > Now, I heard rumors that there is a saner way to monitor directory trees > in recent Linux kernel versions (Jeff, can you fill in where I am > blanking?) and it might be a good idea to solicit volunteers to tackle > this backend, so that the Linux-leaning crowd on this here mailing list > is interested a bit more? Yes, I removed the early inotify-based version because the kernel limits the number of inotify handles to 8k (at least on my Mint box) and that is a global limit -- shared by any process wanting to use inotify. The first monorepo that I tried it on had 120K directories in my sparse checkout... I'm told there is a newer "fanotify" facility available in newer Linux kernels that behaves more like Windows and MacOS and handles subdirectories. I intend to jump into that shortly (unless someone is already familiar with fanotify and and wants to try it). Jeff
On 4/20/2021 11:27 AM, Johannes Schindelin wrote: > Hi Junio, > > On Fri, 16 Apr 2021, Junio C Hamano wrote: > >> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> This patch series adds a builtin FSMonitor daemon to Git. >> >> This hasn't seen much (if any) activity for a few weeks. ... >> What does it need to get this topic unstuck? > > The same resource that you keep complaining about, and that seems to be > drained more quickly than it can be replenished: reviewers. I purposefully stayed away from reviewing the series since we are on the same team, but I have _not_ been involved in the development. At least that lets me have fresh eyes. If no external community members are willing to review it, then I will dedicate time for a careful review this week. Thanks, -Stolee
On 4/1/21 11:40 AM, Jeff Hostetler via GitGitGadget wrote: > This patch series adds a builtin FSMonitor daemon to Git. > > This daemon uses platform-specific filesystem notifications to keep track of > changes to a working directory. It also listens over the "Simple IPC" > facility for client requests and responds with a list of files/directories > that have been recently modified. ... > This RFC version includes support for Windows and MacOS file system events. > A Linux version will be submitted in a later patch series. Similarly to my message about testing the Windows performance, I repeated those tests on macOS. The same testing procedure was used, except now I'm on a MacBook Pro laptop instead of a desktop, so the CPU power is likely to be significantly less. However, I am pleased to report that the FS Monitor feature is a clear winner in all scenarios. Using the untracked cache is still highly recommended, but not necessary in order to get a speed boost from the builtin FS Montiro. Sparse Index Disabled, Untracked Cache Enabled ---------------------------------------------- Benchmark #1: none (clean) Time (mean ± σ): 3.980 s ± 0.026 s [User: 919.1 ms, System: 1891.8 ms] Range (min … max): 3.940 s … 4.028 s 10 runs Benchmark #2: builtin (clean) Time (mean ± σ): 477.9 ms ± 6.6 ms [User: 772.9 ms, System: 379.7 ms] Range (min … max): 468.1 ms … 489.5 ms 10 runs Summary 'builtin (clean)' ran 8.33 ± 0.13 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 5.411 s ± 0.199 s [User: 2.993 s, System: 4.120 s] Range (min … max): 5.026 s … 5.756 s 10 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 2.588 s ± 0.025 s [User: 3.752 s, System: 2.853 s] Range (min … max): 2.540 s … 2.628 s 10 runs Summary 'builtin (dirty)' ran 2.09 ± 0.08 times faster than 'none (dirty)' Sparse Index Disabled, Untracked Cache Disabled ----------------------------------------------- Benchmark #1: none (clean) Time (mean ± σ): 2.993 s ± 0.115 s [User: 1.562 s, System: 2.289 s] Range (min … max): 2.741 s … 3.167 s 10 runs Benchmark #2: builtin (clean) Time (mean ± σ): 939.4 ms ± 10.1 ms [User: 1.452 s, System: 1.519 s] Range (min … max): 925.1 ms … 961.0 ms 10 runs Summary 'builtin (clean)' ran 3.19 ± 0.13 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 8.245 s ± 1.118 s [User: 3.204 s, System: 5.684 s] Range (min … max): 5.927 s … 8.985 s 10 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 2.969 s ± 0.034 s [User: 3.832 s, System: 3.160 s] Range (min … max): 2.927 s … 3.023 s 10 runs Summary 'builtin (dirty)' ran 2.78 ± 0.38 times faster than 'none (dirty)' Sparse Index Enabled, Untracked Cache Enabled --------------------------------------------- Benchmark #1: none (clean) Time (mean ± σ): 1.250 s ± 0.050 s [User: 216.9 ms, System: 1836.9 ms] Range (min … max): 1.177 s … 1.300 s 10 runs Benchmark #2: builtin (clean) Time (mean ± σ): 89.3 ms ± 2.9 ms [User: 51.3 ms, System: 22.6 ms] Range (min … max): 81.9 ms … 93.5 ms 31 runs Summary 'builtin (clean)' ran 14.01 ± 0.72 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 2.087 s ± 0.095 s [User: 320.9 ms, System: 3327.5 ms] Range (min … max): 1.943 s … 2.242 s 10 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 233.5 ms ± 2.7 ms [User: 165.5 ms, System: 74.1 ms] Range (min … max): 227.8 ms … 237.1 ms 12 runs Summary 'builtin (dirty)' ran 8.94 ± 0.42 times faster than 'none (dirty)' Sparse Index Enabled, Untracked Cache Disabled ---------------------------------------------- Benchmark #1: none (clean) Time (mean ± σ): 1.277 s ± 0.101 s [User: 215.5 ms, System: 1877.9 ms] Range (min … max): 1.138 s … 1.458 s 10 runs Benchmark #2: builtin (clean) Time (mean ± σ): 300.0 ms ± 6.1 ms [User: 119.4 ms, System: 183.1 ms] Range (min … max): 293.0 ms … 313.2 ms 10 runs Summary 'builtin (clean)' ran 4.26 ± 0.35 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 2.488 s ± 0.088 s [User: 432.6 ms, System: 3631.6 ms] Range (min … max): 2.328 s … 2.601 s 10 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 636.4 ms ± 12.8 ms [User: 266.2 ms, System: 374.0 ms] Range (min … max): 624.4 ms … 671.0 ms 10 runs Summary 'builtin (dirty)' ran 3.91 ± 0.16 times faster than 'none (dirty)' Here are my results for the Git repository: Untracked Cache Enabled ----------------------- Benchmark #1: none (clean) Time (mean ± σ): 51.2 ms ± 4.0 ms [User: 12.9 ms, System: 61.2 ms] Range (min … max): 46.2 ms … 65.7 ms 54 runs Benchmark #2: builtin (clean) Time (mean ± σ): 38.6 ms ± 1.7 ms [User: 9.9 ms, System: 9.7 ms] Range (min … max): 28.6 ms … 42.4 ms 75 runs Summary 'builtin (clean)' ran 1.33 ± 0.12 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 108.1 ms ± 7.2 ms [User: 27.2 ms, System: 126.9 ms] Range (min … max): 97.6 ms … 130.4 ms 25 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 91.7 ms ± 3.8 ms [User: 25.4 ms, System: 27.0 ms] Range (min … max): 88.5 ms … 105.1 ms 32 runs Summary 'builtin (dirty)' ran 1.18 ± 0.09 times faster than 'none (dirty)' Untracked Cache Disabled ------------------------ Benchmark #1: none (clean) Time (mean ± σ): 59.5 ms ± 4.0 ms [User: 15.2 ms, System: 67.7 ms] Range (min … max): 55.5 ms … 71.6 ms 46 runs Benchmark #2: builtin (clean) Time (mean ± σ): 48.9 ms ± 1.0 ms [User: 12.5 ms, System: 17.3 ms] Range (min … max): 46.7 ms … 51.3 ms 58 runs Summary 'builtin (clean)' ran 1.22 ± 0.08 times faster than 'none (clean)' Benchmark #1: none (dirty) Time (mean ± σ): 124.4 ms ± 6.8 ms [User: 31.5 ms, System: 140.2 ms] Range (min … max): 116.8 ms … 140.0 ms 24 runs Benchmark #2: builtin (dirty) Time (mean ± σ): 104.1 ms ± 1.7 ms [User: 27.4 ms, System: 37.8 ms] Range (min … max): 99.7 ms … 106.6 ms 27 runs Summary 'builtin (dirty)' ran 1.19 ± 0.07 times faster than 'none (dirty)' I think it valuable to point out that in my initial tests I had forgotten to disable the Watchman-based FS Monitor hook, and the results looked even more impressive (on the small Git repository). Dropping the hook overhead is a huge benefit here. Thanks, -Stolee