Message ID | pull.1802.v2.git.1727759371110.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] fsmonitor OSX: fix hangs for submodules | expand |
"Koji Nakamaru via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Koji Nakamaru <koji.nakamaru@gree.net> > > fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf > has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets > the value with trailing "/." (as repo_get_git_dir(the_repository) on > Darwin returns ".") so that fsmonitor_classify_path_absolute() returns > IS_OUTSIDE_CONE. > > In this case, fsevent_callback() doesn't update cookie_list so that > fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is > not invoked. > > As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond > that with_lock__mark_cookies_seen() should unlock, the whole daemon > hangs. > > Remove trailing "/." from state->path_gitdir_watch.buf for submodules > and add a corresponding test in t7527-builtin-fsmonitor.sh. > > Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> > Suggested-by: Junio C Hamano <gitster@pobox.com> In none of the changes described above, I have any input to deserve such credit, though. > +start_git_in_background () { > + git "$@" & > + git_pid=$! > + git_pgid=$(ps -o pgid= -p $git_pid) > + nr_tries_left=10 > + while true > + do > + if test $nr_tries_left -eq 0 > + then > + kill -- -$git_pgid > + exit 1 > + fi > + sleep 1 > + nr_tries_left=$(($nr_tries_left - 1)) > + done >/dev/null 2>&1 & > + watchdog_pid=$! > + wait $git_pid > +} > + > +stop_git () { > + while kill -0 -- -$git_pgid > + do > + kill -- -$git_pgid > + sleep 1 > + done > +} On the "git" side you use process group because you expect that "git" would spawn subprocesses and you want to catch all of them, ... > +stop_watchdog () { > + while kill -0 $watchdog_pid > + do > + kill $watchdog_pid > + sleep 1 > + done > +} ... but "watchdog" you know is a single process, so you'd only need a single process id, is that the idea? What is the motivation behind the change in this iteration to use process group? Was it observed that leftover processes hang around if we killed only the $git_pid, or something? > +test_expect_success "submodule implicitly starts daemon by pull" ' > + test_atexit "stop_watchdog" && > + test_when_finished "stop_git && rm -rf cloned super sub" && If stop_git ever returns with non-zero status, "rm -rf" will be skipped, which I am not sure is a good idea. The whole test_when_finished would fail in such a case, so you would notice the problem right away, which is a plus, though. > + create_super super && > + create_sub sub && > + > + git -C super submodule add ../sub ./dir_1/dir_2/sub && > + git -C super commit -m "add sub" && > + git clone --recurse-submodules super cloned && > + > + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && > + set -m && I have to wonder how portable (and necessary) this is. POSIX says it shall be supported if the implementation supports the User Portability Utilities option. It also says that it was added to apply only to the UPE because it applies primarily to interactive use, not shell script applications. And our test scripts are of course not interactive. Thanks.
On Tue, Oct 1, 2024 at 8:57 PM Junio C Hamano <gitster@pobox.com> wrote: > > fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf > > has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets > > the value with trailing "/." (as repo_get_git_dir(the_repository) on > > Darwin returns ".") so that fsmonitor_classify_path_absolute() returns > > IS_OUTSIDE_CONE. > > > > In this case, fsevent_callback() doesn't update cookie_list so that > > fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is > > not invoked. > > > > As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond > > that with_lock__mark_cookies_seen() should unlock, the whole daemon > > hangs. > > > > Remove trailing "/." from state->path_gitdir_watch.buf for submodules > > and add a corresponding test in t7527-builtin-fsmonitor.sh. > > > > Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > Suggested-by: Junio C Hamano <gitster@pobox.com> > > In none of the changes described above, I have any input to deserve > such credit, though. Your points are very helpful :) > > +start_git_in_background () { > > + git "$@" & > > + git_pid=$! > > + git_pgid=$(ps -o pgid= -p $git_pid) > > + nr_tries_left=10 > > + while true > > + do > > + if test $nr_tries_left -eq 0 > > + then > > + kill -- -$git_pgid > > + exit 1 > > + fi > > + sleep 1 > > + nr_tries_left=$(($nr_tries_left - 1)) > > + done >/dev/null 2>&1 & > > + watchdog_pid=$! > > + wait $git_pid > > +} > > + > > +stop_git () { > > + while kill -0 -- -$git_pgid > > + do > > + kill -- -$git_pgid > > + sleep 1 > > + done > > +} > > On the "git" side you use process group because you expect that > "git" would spawn subprocesses and you want to catch all of them, > ... > > > +stop_watchdog () { > > + while kill -0 $watchdog_pid > > + do > > + kill $watchdog_pid > > + sleep 1 > > + done > > +} > > ... but "watchdog" you know is a single process, so you'd only need > a single process id, is that the idea? Yes, that is the idea. > What is the motivation behind the change in this iteration to use > process group? Was it observed that leftover processes hang around > if we killed only the $git_pid, or something? Yes, if the issue occurs, three processes remains: git fetch --update-head-ok --recurse-submodules=on git fetch --no-prune --no-prune-tags --update-head-ok --recurse-submodules --recurse-submodules-default yes --submodule-prefix=dir_1/dir_2/sub/ git fsmonitor--daemon run --detach --ipc-threads=8 If there is no issue, only the fsmonitor process remains. > > +test_expect_success "submodule implicitly starts daemon by pull" ' > > + test_atexit "stop_watchdog" && > > + test_when_finished "stop_git && rm -rf cloned super sub" && > > If stop_git ever returns with non-zero status, "rm -rf" will be > skipped, which I am not sure is a good idea. > > The whole test_when_finished would fail in such a case, so you would > notice the problem right away, which is a plus, though. t/README discusses that test_when_finished and test_atexit differ about the "--immediate" option. As git and its subprocesses are the test target, I moved stop_git to the current place. This might be however confusing when someone later reads this test. Should we simply put stop_git and stop_watchdong in test_atexit? > > + create_super super && > > + create_sub sub && > > + > > + git -C super submodule add ../sub ./dir_1/dir_2/sub && > > + git -C super commit -m "add sub" && > > + git clone --recurse-submodules super cloned && > > + > > + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && > > + set -m && > > I have to wonder how portable (and necessary) this is. > > POSIX says it shall be supported if the implementation supports the > User Portability Utilities option. It also says that it was added > to apply only to the UPE because it applies primarily to interactive > use, not shell script applications. And our test scripts are of > course not interactive. How about the following modification? It still utilizes $git_pgid to filter processes, but avoids "set -m". diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 2dd1ca1a7b..23d9a7c953 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -916,7 +916,7 @@ start_git_in_background () { do if test $nr_tries_left -eq 0 then - kill -- -$git_pgid + kill $git_pid exit 1 fi sleep 1 @@ -927,10 +927,13 @@ start_git_in_background () { } stop_git () { - while kill -0 -- -$git_pgid + for p in $(ps -o pgid=,pid=,comm= | grep "^$git_pgid .*git" | sed 's/^[0-9][0-9]* \([0-9][0-9]*\) .*/\1/') do - kill -- -$git_pgid - sleep 1 + while kill -0 $p + do + kill $p + sleep 1 + done done } @@ -954,7 +957,6 @@ test_expect_success "submodule implicitly starts daemon by pull" ' git clone --recurse-submodules super cloned && git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && - set -m && start_git_in_background -C cloned pull --recurse-submodules ' Koji Nakamaru
Koji Nakamaru <koji.nakamaru@gree.net> writes: >> > +test_expect_success "submodule implicitly starts daemon by pull" ' >> > + test_atexit "stop_watchdog" && >> > + test_when_finished "stop_git && rm -rf cloned super sub" && >> >> If stop_git ever returns with non-zero status, "rm -rf" will be >> skipped, which I am not sure is a good idea. >> >> The whole test_when_finished would fail in such a case, so you would >> notice the problem right away, which is a plus, though. > > t/README discusses that test_when_finished and test_atexit differ about > the "--immediate" option. As git and its subprocesses are the test > target, I moved stop_git to the current place. This might be however > confusing when someone later reads this test. Should we simply put > stop_git and stop_watchdong in test_atexit? That is not what I meant. I was merely questioning the &&-chaining that stops "rm -fr" from running if stop_git ever fails (and your earlier iteration you had multiple "rm -fr" ;-chained, not &&-chained---not using && is often more appropriate in a when_finished handler). >> > + set -m && >> >> I have to wonder how portable (and necessary) this is. >> >> POSIX says it shall be supported if the implementation supports the >> User Portability Utilities option. It also says that it was added >> to apply only to the UPE because it applies primarily to interactive >> use, not shell script applications. And our test scripts are of >> course not interactive. > > How about the following modification? It still utilizes $git_pgid to > filter processes, but avoids "set -m". Nah, your original reads much better, and the code is grabbing and using the process group information anyway (and my question about "-m" was more about "should we be relying on process group features in this test to kill them all?"). I am OK with the idea that we can assume, at least among the platforms that support fsmonitor, that sending a signal to a process group would cause the signal delivered to the member processes just as we expect. Thanks.
On Wed, Oct 2, 2024 at 3:04 AM Junio C Hamano <gitster@pobox.com> wrote: >>> > +test_expect_success "submodule implicitly starts daemon by pull" ' >>> > + test_atexit "stop_watchdog" && >>> > + test_when_finished "stop_git && rm -rf cloned super sub" && >>> >>> If stop_git ever returns with non-zero status, "rm -rf" will be >>> skipped, which I am not sure is a good idea. >>> >>> The whole test_when_finished would fail in such a case, so you would >>> notice the problem right away, which is a plus, though. >> >> t/README discusses that test_when_finished and test_atexit differ about >> the "--immediate" option. As git and its subprocesses are the test >> target, I moved stop_git to the current place. This might be however >> confusing when someone later reads this test. Should we simply put >> stop_git and stop_watchdong in test_atexit? > > That is not what I meant. > > I was merely questioning the &&-chaining that stops "rm -fr" from > running if stop_git ever fails (and your earlier iteration you had > multiple "rm -fr" ;-chained, not &&-chained---not using && is often > more appropriate in a when_finished handler). I see. I'll fix this part. >>> > + set -m && >>> >>> I have to wonder how portable (and necessary) this is. >>> >>> POSIX says it shall be supported if the implementation supports the >>> User Portability Utilities option. It also says that it was added >>> to apply only to the UPE because it applies primarily to interactive >>> use, not shell script applications. And our test scripts are of >>> course not interactive. >> >> How about the following modification? It still utilizes $git_pgid to >> filter processes, but avoids "set -m". > > Nah, your original reads much better, and the code is grabbing and > using the process group information anyway (and my question about > "-m" was more about "should we be relying on process group features > in this test to kill them all?"). > > I am OK with the idea that we can assume, at least among the > platforms that support fsmonitor, that sending a signal to a process > group would cause the signal delivered to the member processes just > as we expect. Thank you for the clarification and the support. Koji Nakamaru
Although I submitted [PATCH v3], it was incomplete about the following: On Wed, Oct 2, 2024 at 3:04 AM Junio C Hamano <gitster@pobox.com> wrote: > I am OK with the idea that we can assume, at least among the > platforms that support fsmonitor, that sending a signal to a process > group would cause the signal delivered to the member processes just > as we expect. On windows, there is no process group so the test cannot run correctly. As hangs corrected with the patch occur only for darwin, I would like to skip MINGW in the test. Is it okay? Koji Nakamaru
Koji Nakamaru <koji.nakamaru@gree.net> writes: > On windows, there is no process group so the test cannot run > correctly. As hangs corrected with the patch occur only for darwin, I > would like to skip MINGW in the test. Is it okay? Surely. But can we do so without spelling MINGW or WINDOWS out? That is, if your test requires process group features available, can we come up with a lazy prerequisite and use that to decide if we skip the test? Earlier in the discussion, you said who are left behind if we did so on systems with process groups, but I wonder what happens when we throw a signal at the top-level "git" process on Windows, and if it behaves better, perhaps we can implement stop_git differently where process groups are not available, instead of skipping the tests altogether? Thanks.
On Thu, Oct 3, 2024 at 3:14 AM Junio C Hamano <gitster@pobox.com> wrote: >> On windows, there is no process group so the test cannot run >> correctly. As hangs corrected with the patch occur only for darwin, I >> would like to skip MINGW in the test. Is it okay? > > Surely. But can we do so without spelling MINGW or WINDOWS out? > > That is, if your test requires process group features available, can > we come up with a lazy prerequisite and use that to decide if we > skip the test? I tweaked fsm-listen-win32.c to cause hangs and tested on windows. I'm sorry that simply saying "there is no process group" was not quite correct. Each mingw process has a process group and its win32 subprocesses can be killed by "kill -- -$pgid" For example, when a hang occurs, the following processes remain. PID PPID PGID WINPID TTY UID STIME COMMAND 56782 40923 56782 9484 pty0 1052296 16:23:22 /mingw64/bin/git # mingw git process 78100 0 0 12564 ? 0 16:23:23 C:\git-sdk-64\mingw64\libexec\git-core\git.exe # win32 process 86108 0 0 20572 ? 0 16:23:23 C:\git-sdk-64\mingw64\libexec\git-core\git.exe # win32 subprocess 73328 0 0 7792 ? 0 16:23:23 C:\git-sdk-64\mingw64\libexec\git-core\git.exe # win32 fsmonitor > Earlier in the discussion, you said who are left behind if we did so > on systems with process groups, but I wonder what happens when we > throw a signal at the top-level "git" process on Windows, and if it > behaves better, perhaps we can implement stop_git differently where > process groups are not available, instead of skipping the tests > altogether? If we do "kill 56782" or "kill -- -56782" for the above example, most of processes are terminated except the win32 fsmonitor. This is because the win32 fsmonitor is detached by FreeConsole(). I also tried "git fsmonitor--daemon stop". It was able to communicate with the win32 fsmonitor and the internal status of the win32 fsmonitor changed, but the win32 daemon didn't terminate. Because it's getting complicated, how about the following: * specify MINGW * note in the commit log: The test is disabled for MINGW because hangs treated with this patch occur only for Darwin and there is no simple way to terminate the win32 fsmonitor daemon that hangs. Koji Nakamaru
Koji Nakamaru <koji.nakamaru@gree.net> writes: > Because it's getting complicated, how about the following: > > * specify MINGW > * note in the commit log: > The test is disabled for MINGW because hangs treated with this patch > occur only for Darwin and there is no simple way to terminate the > win32 fsmonitor daemon that hangs. Sounds good to me. Thanks.
On Fri, Oct 4, 2024 at 2:44 AM Junio C Hamano <gitster@pobox.com> wrote: > > Because it's getting complicated, how about the following: > > > > * specify MINGW > > * note in the commit log: > > The test is disabled for MINGW because hangs treated with this patch > > occur only for Darwin and there is no simple way to terminate the > > win32 fsmonitor daemon that hangs. > > Sounds good to me. Thank you. I've submitted [PATCH v4]. Koji Nakamaru
diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c index dce8a3b2482..e1e6b96d09e 100644 --- a/builtin/fsmonitor--daemon.c +++ b/builtin/fsmonitor--daemon.c @@ -1314,6 +1314,7 @@ static int fsmonitor_run_daemon(void) strbuf_reset(&state.path_gitdir_watch); strbuf_addstr(&state.path_gitdir_watch, absolute_path(repo_get_git_dir(the_repository))); + strbuf_strip_suffix(&state.path_gitdir_watch, "/."); state.nr_paths_watching = 2; } diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 730f3c7f810..2dd1ca1a7b7 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -907,6 +907,57 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" ' test_subcommand git fsmonitor--daemon start <super-sub.trace ' +start_git_in_background () { + git "$@" & + git_pid=$! + git_pgid=$(ps -o pgid= -p $git_pid) + nr_tries_left=10 + while true + do + if test $nr_tries_left -eq 0 + then + kill -- -$git_pgid + exit 1 + fi + sleep 1 + nr_tries_left=$(($nr_tries_left - 1)) + done >/dev/null 2>&1 & + watchdog_pid=$! + wait $git_pid +} + +stop_git () { + while kill -0 -- -$git_pgid + do + kill -- -$git_pgid + sleep 1 + done +} + +stop_watchdog () { + while kill -0 $watchdog_pid + do + kill $watchdog_pid + sleep 1 + done +} + +test_expect_success "submodule implicitly starts daemon by pull" ' + test_atexit "stop_watchdog" && + test_when_finished "stop_git && rm -rf cloned super sub" && + + create_super super && + create_sub sub && + + git -C super submodule add ../sub ./dir_1/dir_2/sub && + git -C super commit -m "add sub" && + git clone --recurse-submodules super cloned && + + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && + set -m && + start_git_in_background -C cloned pull --recurse-submodules +' + # On a case-insensitive file system, confirm that the daemon # notices when the .git directory is moved/renamed/deleted # regardless of how it is spelled in the FS event.