Message ID | YvZbGAf+82WtNXcJ@danh.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | t7527 intermittent failure on macOS APFS and possible fix | expand |
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > Running t7527 on macOS with encrypted APFS filesystem. > I observes intermittent failure, however, when I manually check the > test cases, they're all passed. > > I suspected fileystem caching issue. > I added those sync-s into test steps and the test pass. > I'm not sure if this is the intending "fix" for the tests > since we're testing the fsmonitor with t7527. > > Please advise! fsmonitor experts, please do. My gut feeling is that, unless the production code internally calls the equivalent of "sync" done here and the failure in tests are coming from the fact that the "sync" is missing in "test-tool fsmonitor-client" (i.e. test-tool does not emulate the real world closely enough and fails in cases where the machinery does not fail in the real world), these "sync" calls only sweep the problem under the rug. > P/S: When debugging, I also found out that: > "test-tool fsmonitor-client query" doesn't write the final newline > character, thus making the output harder to read. The diff also have > the final newline added. > > ----- 8< ------- > diff --git a/t/helper/test-fsmonitor-client.c b/t/helper/test-fsmonitor-client.c > index 54a4856c48..98d6cf1440 100644 > --- a/t/helper/test-fsmonitor-client.c > +++ b/t/helper/test-fsmonitor-client.c > @@ -55,6 +55,7 @@ static int do_send_query(const char *token) > > write_in_full(1, answer.buf, answer.len); > strbuf_release(&answer); > + write_in_full(1, "\n", 1); > > return 0; > } > @@ -77,6 +78,7 @@ static int do_send_flush(void) > > write_in_full(1, answer.buf, answer.len); > strbuf_release(&answer); > + write_in_full(1, "\n", 1); > > return 0; > } Aren't these protocol drivers? If the protocol is defined without the trailing LF, would it make sense to update only the sending end to do this? Or does the protocol makes it clear that a trailing LF, or lack of it, should be tolerated by all the implementations? If we are absolutely sure that no implementation of the other side will get upset by seeing an extra LF, It would be fine, but as the original code wants to call write_in_full(), it would be more preferrable to do it this way instead, I suspect. + strbuf_complete(&answer, '\n'); write_in_full(1, answer.buf, answer.len); strbuf_release(&answer);
On 2022-08-12 11:08:30-0700, Junio C Hamano <gitster@pobox.com> wrote: > Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > > > Running t7527 on macOS with encrypted APFS filesystem. > > I observes intermittent failure, however, when I manually check the > > test cases, they're all passed. > > > > I suspected fileystem caching issue. > > I added those sync-s into test steps and the test pass. > > I'm not sure if this is the intending "fix" for the tests > > since we're testing the fsmonitor with t7527. > > > > Please advise! > > fsmonitor experts, please do. > > My gut feeling is that, unless the production code internally calls > the equivalent of "sync" done here and the failure in tests are > coming from the fact that the "sync" is missing in "test-tool > fsmonitor-client" (i.e. test-tool does not emulate the real world > closely enough and fails in cases where the machinery does not fail > in the real world), these "sync" calls only sweep the problem under > the rug. It's my gut feeling, too. Anyway, I'm running the test again, too confirm your suggestion on t/helper/test-fsmonitor-client.c t7527.63 (Unicode nfc/nfd) also failed intermittently, too. Here is the content of unicode.trace: ---- 8< ---- $ cat unicode.trace statfs('/path/to/src/git/t/trash directory.t7527-builtin-fsmonitor/test_unicode') [type 0x0000001a][flags 0x04909080] 'apfs' statfs('/path/to/src/git/t/trash directory.t7527-builtin-fsmonitor/test_unicode') [type 0x0000001a][flags 0x04909080] 'apfs' Watching: worktree '/path/to/src/git/t/trash directory.t7527-builtin-fsmonitor/test_unicode' statfs('/path/to/src/git/t/trash directory.t7527-builtin-fsmonitor/test_unicode') [type 0x0000001a][flags 0x04909080] 'apfs' requested token: quit ---------- >8 ------------------ > > > P/S: When debugging, I also found out that: > > "test-tool fsmonitor-client query" doesn't write the final newline > > character, thus making the output harder to read. The diff also have > > the final newline added. > > > > ----- 8< ------- > > diff --git a/t/helper/test-fsmonitor-client.c b/t/helper/test-fsmonitor-client.c > > index 54a4856c48..98d6cf1440 100644 > > --- a/t/helper/test-fsmonitor-client.c > > +++ b/t/helper/test-fsmonitor-client.c > > @@ -55,6 +55,7 @@ static int do_send_query(const char *token) > > > > write_in_full(1, answer.buf, answer.len); > > strbuf_release(&answer); > > + write_in_full(1, "\n", 1); > > > > return 0; > > } > > @@ -77,6 +78,7 @@ static int do_send_flush(void) > > > > write_in_full(1, answer.buf, answer.len); > > strbuf_release(&answer); > > + write_in_full(1, "\n", 1); > > > > return 0; > > } > > Aren't these protocol drivers? The "answer" strbuf is the response from fsmonitor daemon, I think. The write_in_full to fd 1 is test-tool writes down the answer to stdout. > If the protocol is defined without > the trailing LF, would it make sense to update only the sending end > to do this? Or does the protocol makes it clear that a trailing LF, > or lack of it, should be tolerated by all the implementations? > > If we are absolutely sure that no implementation of the other side > will get upset by seeing an extra LF, It would be fine, but as the > original code wants to call write_in_full(), it would be more > preferrable to do it this way instead, I suspect. > > + strbuf_complete(&answer, '\n'); > write_in_full(1, answer.buf, answer.len); > strbuf_release(&answer); This could work, since we don't send "answer" back to fsmonitor-daemon.
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: >> Aren't these protocol drivers? > > The "answer" strbuf is the response from fsmonitor daemon, I think. > The write_in_full to fd 1 is test-tool writes down the answer to > stdout. Ahh, then please scratch everything I said about this part. Your original implementation makes it more clear that we are "dumping" what we got from the daemon, and then adding LF for our own readability. I somehow mistook the code to be preparing and sending protocol message to the other end (iow, fd #1 was connected to "them"). Sorry about the gotcha, and thanks for straightening me out.
On Fri, Aug 12, 2022 at 08:52:24PM +0700, Đoàn Trần Công Danh wrote: > Running t7527 on macOS with encrypted APFS filesystem. > I observes intermittent failure, however, when I manually check the > test cases, they're all passed. > > I suspected fileystem caching issue. > I added those sync-s into test steps and the test pass. > I'm not sure if this is the intending "fix" for the tests > since we're testing the fsmonitor with t7527. I don't know anything about your problem area, but some tools in our test suite that might help you get better results with an intermittent problem (if you didn't already know about them): - add: GIT_TEST_OPTS = -x --verbose-log into config.mak. Then when a "make test" run fails, you can see the log in t/test-results/t7527-*.out. - try running: ./t7527-* --stress which will run a bunch of instances simultaneously, with the idea that the load will introduce timing issues that will trigger the bug. The failed instance will produce a detailed log on stdout. That might help you get more output from the failing cases. -Peff
On 8/12/22 9:52 AM, Đoàn Trần Công Danh wrote: > Hello, > > Running t7527 on macOS with encrypted APFS filesystem. > I observes intermittent failure, however, when I manually check the > test cases, they're all passed. > > I suspected fileystem caching issue. > I added those sync-s into test steps and the test pass. > I'm not sure if this is the intending "fix" for the tests > since we're testing the fsmonitor with t7527. > > Please advise! > > P/S: When debugging, I also found out that: > "test-tool fsmonitor-client query" doesn't write the final newline > character, thus making the output harder to read. The diff also have > the final newline added. Thanks for the report. Sorry for the delay, I'm still digging thru my email backlog after a long vacation. I'll take a look at this and get back to you shortly. Thanks, Jeff
diff --git a/t/helper/test-fsmonitor-client.c b/t/helper/test-fsmonitor-client.c index 54a4856c48..98d6cf1440 100644 --- a/t/helper/test-fsmonitor-client.c +++ b/t/helper/test-fsmonitor-client.c @@ -55,6 +55,7 @@ static int do_send_query(const char *token) write_in_full(1, answer.buf, answer.len); strbuf_release(&answer); + write_in_full(1, "\n", 1); return 0; } @@ -77,6 +78,7 @@ static int do_send_flush(void) write_in_full(1, answer.buf, answer.len); strbuf_release(&answer); + write_in_full(1, "\n", 1); return 0; } diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 56c0dfffea..3f2cef251d 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -430,6 +430,7 @@ test_expect_success 'edit some files' ' edit_files && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dir1/modified$" .git/trace && grep "^event: dir2/modified$" .git/trace && @@ -445,6 +446,7 @@ test_expect_success 'create some files' ' create_files && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dir1/new$" .git/trace && grep "^event: dir2/new$" .git/trace && @@ -459,6 +461,7 @@ test_expect_success 'delete some files' ' delete_files && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dir1/delete$" .git/trace && grep "^event: dir2/delete$" .git/trace && @@ -473,6 +476,7 @@ test_expect_success 'rename some files' ' rename_files && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dir1/rename$" .git/trace && grep "^event: dir2/rename$" .git/trace && @@ -490,6 +494,7 @@ test_expect_success 'rename directory' ' mv dirtorename dirrenamed && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dirtorename/*$" .git/trace && grep "^event: dirrenamed/*$" .git/trace @@ -503,6 +508,7 @@ test_expect_success 'file changes to directory' ' file_to_directory && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: delete$" .git/trace && grep "^event: delete/new$" .git/trace @@ -516,6 +522,7 @@ test_expect_success 'directory changes to a file' ' directory_to_file && test-tool fsmonitor-client query --token 0 && + sync && grep "^event: dir1$" .git/trace '