Message ID | 64ffddd791160895b8e6730ebcddfac8458653f2.1621362768.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 2ca245f8be56e3269d02076b658e825b91236e5d |
Headers | show |
Series | Convert index writes to use hashfile API | expand |
I haven't looked much into $subject, but there's an interesting regression in 2ca245f8be5 (csum-file.h: increase hashfile buffer size, 2021-05-18) where it fails with -v, but not --verbose-log. Discovered while running it manually. This is a regression in v2.33.0 (not v2.34.0!), so nothing urgent, and this is pretty obscure anyway. For the original change see: https://lore.kernel.org/git/64ffddd791160895b8e6730ebcddfac8458653f2.1621362768.git.gitgitgadget@gmail.com/ To reproduce it: make test T=t4216-log-bloom.sh GIT_TEST_OPTS=-v DEFAULT_TEST_TARGET=test Removing the -v, or making it --verbose-log will make it succeed. It fails on: + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 GIT_TRACE2_EVENT=/home/avar/g/git/t/trash directory.t4216-log-bloom/limits/trace git commit-graph write --reachable --changed-paths + test_max_changed_paths 10 trace + grep "max_changed_paths":10 trace error: last command exited with $?=1 not ok 135 - correctly report changes over limit If it's run with/without a debugging shimmy to save away the "trace" we can see that it doesn't include the max_changed_paths trace2 payload with -v. I.e. for some reason we either don't run trace2_bloom_filter_settings(), or don't log that data in the same way. It's probably easy to figure out for someone familiar with this code, but I couldn't see it from some quick skimming. Hence this E-Mail. FYI This is our only failure when running the whole test suite with -j1 -x (I didn't test a full run with just -v, takes a while, I was using -j1 to get sensible output at the end without needing to scroll up).
On Thu, Nov 25, 2021 at 01:14:45PM +0100, Ævar Arnfjörð Bjarmason wrote: > I haven't looked much into $subject, but there's an interesting > regression in 2ca245f8be5 (csum-file.h: increase hashfile buffer size, > 2021-05-18) where it fails with -v, but not --verbose-log. Discovered > while running it manually. > > This is a regression in v2.33.0 (not v2.34.0!), so nothing urgent, and > this is pretty obscure anyway. > > For the original change see: > https://lore.kernel.org/git/64ffddd791160895b8e6730ebcddfac8458653f2.1621362768.git.gitgitgadget@gmail.com/ Interesting. This patch makes it go away (the "5" is cargo-culted from earlier in the script): diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 50f206db55..2f3a1cd210 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -231,6 +231,7 @@ test_expect_success 'correctly report changes over limit' ' # Commit has 7 file and 4 directory adds GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT_NESTING=5 \ GIT_TRACE2_EVENT="$(pwd)/trace" \ git commit-graph write --reachable --changed-paths && test_max_changed_paths 10 trace && @@ -263,6 +264,7 @@ test_expect_success 'correctly report changes over limit' ' # start from scratch and rebuild rm -f .git/objects/info/commit-graph && GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT_NESTING=5 \ GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ git commit-graph write --reachable --changed-paths && test_max_changed_paths 10 trace-edit && @@ -280,6 +282,7 @@ test_expect_success 'correctly report changes over limit' ' # start from scratch and rebuild rm -f .git/objects/info/commit-graph && GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT_NESTING=5 \ GIT_TRACE2_EVENT="$(pwd)/trace-update" \ git commit-graph write --reachable --changed-paths && test_max_changed_paths 11 trace-update && The commit in question (2ca245f8be) puts the writing into a new trace2 region ("chunkfile"), so it makes sense that the nesting increases by one. But what's interesting is that the nesting is different depending on whether stderr is a terminal. I guess because the progress code starts its own region. The default nesting max for trace2 is 2. That seems kind of low given this example, but I don't know enough about the tradeoffs to say what bad things might happen if it's raised. But the above patch really seems like a hack, and that this quiet omission would absolutely confuse real users who are trying to use trace2 for debugging. -Peff
On 11/25/2021 11:08 PM, Jeff King wrote: > On Thu, Nov 25, 2021 at 01:14:45PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> I haven't looked much into $subject, but there's an interesting >> regression in 2ca245f8be5 (csum-file.h: increase hashfile buffer size, >> 2021-05-18) where it fails with -v, but not --verbose-log. Discovered >> while running it manually. >> >> This is a regression in v2.33.0 (not v2.34.0!), so nothing urgent, and >> this is pretty obscure anyway. >> >> For the original change see: >> https://lore.kernel.org/git/64ffddd791160895b8e6730ebcddfac8458653f2.1621362768.git.gitgitgadget@gmail.com/ > > Interesting. This patch makes it go away (the "5" is cargo-culted from > earlier in the script): ... > # Commit has 7 file and 4 directory adds > GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ > + GIT_TRACE2_EVENT_NESTING=5 \ > GIT_TRACE2_EVENT="$(pwd)/trace" \ > git commit-graph write --reachable --changed-paths && > test_max_changed_paths 10 trace && ... > The commit in question (2ca245f8be) puts the writing into a new trace2 > region ("chunkfile"), so it makes sense that the nesting increases by > one. But what's interesting is that the nesting is different depending > on whether stderr is a terminal. I guess because the progress code > starts its own region. > > The default nesting max for trace2 is 2. That seems kind of low given > this example, but I don't know enough about the tradeoffs to say what > bad things might happen if it's raised. But the above patch really seems > like a hack, and that this quiet omission would absolutely confuse real > users who are trying to use trace2 for debugging. Thanks, both, for identifying the problem and the root cause. I have sent a patch series [1] that sets a deeper GIT_TRACE2_EVENT_NESTING across the test suite to avoid this kind of issue in the future (along with removing the existing uses scattered across the tests). Thanks, -Stolee [1] https://lore.kernel.org/git/pull.1085.git.1638193666.gitgitgadget@gmail.com/
diff --git a/chunk-format.c b/chunk-format.c index da191e59a29d..1c3dca62e205 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -58,9 +58,11 @@ void add_chunk(struct chunkfile *cf, int write_chunkfile(struct chunkfile *cf, void *data) { - int i; + int i, result = 0; uint64_t cur_offset = hashfile_total(cf->f); + trace2_region_enter("chunkfile", "write", the_repository); + /* Add the table of contents to the current offset */ cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE; @@ -77,10 +79,10 @@ int write_chunkfile(struct chunkfile *cf, void *data) for (i = 0; i < cf->chunks_nr; i++) { off_t start_offset = hashfile_total(cf->f); - int result = cf->chunks[i].write_fn(cf->f, data); + result = cf->chunks[i].write_fn(cf->f, data); if (result) - return result; + goto cleanup; if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size) BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", @@ -88,7 +90,9 @@ int write_chunkfile(struct chunkfile *cf, void *data) hashfile_total(cf->f) - start_offset); } - return 0; +cleanup: + trace2_region_leave("chunkfile", "write", the_repository); + return result; } int read_table_of_contents(struct chunkfile *cf, diff --git a/csum-file.c b/csum-file.c index 3c26389d4914..3487d28ed7ad 100644 --- a/csum-file.c +++ b/csum-file.c @@ -11,19 +11,24 @@ #include "progress.h" #include "csum-file.h" +static void verify_buffer_or_die(struct hashfile *f, + const void *buf, + unsigned int count) +{ + ssize_t ret = read_in_full(f->check_fd, f->check_buffer, count); + + if (ret < 0) + die_errno("%s: sha1 file read error", f->name); + if (ret != count) + die("%s: sha1 file truncated", f->name); + if (memcmp(buf, f->check_buffer, count)) + die("sha1 file '%s' validation error", f->name); +} + static void flush(struct hashfile *f, const void *buf, unsigned int count) { - if (0 <= f->check_fd && count) { - unsigned char check_buffer[8192]; - ssize_t ret = read_in_full(f->check_fd, check_buffer, count); - - if (ret < 0) - die_errno("%s: sha1 file read error", f->name); - if (ret != count) - die("%s: sha1 file truncated", f->name); - if (memcmp(buf, check_buffer, count)) - die("sha1 file '%s' validation error", f->name); - } + if (0 <= f->check_fd && count) + verify_buffer_or_die(f, buf, count); if (write_in_full(f->fd, buf, count) < 0) { if (errno == ENOSPC) @@ -46,6 +51,13 @@ void hashflush(struct hashfile *f) } } +static void free_hashfile(struct hashfile *f) +{ + free(f->buffer); + free(f->check_buffer); + free(f); +} + int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags) { int fd; @@ -75,20 +87,20 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl if (close(f->check_fd)) die_errno("%s: sha1 file error on close", f->name); } - free(f); + free_hashfile(f); return fd; } void hashwrite(struct hashfile *f, const void *buf, unsigned int count) { while (count) { - unsigned left = sizeof(f->buffer) - f->offset; + unsigned left = f->buffer_len - f->offset; unsigned nr = count > left ? left : count; if (f->do_crc) f->crc32 = crc32(f->crc32, buf, nr); - if (nr == sizeof(f->buffer)) { + if (nr == f->buffer_len) { /* * Flush a full batch worth of data directly * from the input, skipping the memcpy() to @@ -114,11 +126,6 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) } } -struct hashfile *hashfd(int fd, const char *name) -{ - return hashfd_throughput(fd, name, NULL); -} - struct hashfile *hashfd_check(const char *name) { int sink, check; @@ -132,10 +139,14 @@ struct hashfile *hashfd_check(const char *name) die_errno("unable to open '%s'", name); f = hashfd(sink, name); f->check_fd = check; + f->check_buffer = xmalloc(f->buffer_len); + return f; } -struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp) +static struct hashfile *hashfd_internal(int fd, const char *name, + struct progress *tp, + size_t buffer_len) { struct hashfile *f = xmalloc(sizeof(*f)); f->fd = fd; @@ -146,9 +157,35 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp f->name = name; f->do_crc = 0; the_hash_algo->init_fn(&f->ctx); + + f->buffer_len = buffer_len; + f->buffer = xmalloc(buffer_len); + f->check_buffer = NULL; + return f; } +struct hashfile *hashfd(int fd, const char *name) +{ + /* + * Since we are not going to use a progress meter to + * measure the rate of data passing through this hashfile, + * use a larger buffer size to reduce fsync() calls. + */ + return hashfd_internal(fd, name, NULL, 128 * 1024); +} + +struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp) +{ + /* + * Since we are expecting to report progress of the + * write into this hashfile, use a smaller buffer + * size so the progress indicators arrive at a more + * frequent rate. + */ + return hashfd_internal(fd, name, tp, 8 * 1024); +} + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { hashflush(f); diff --git a/csum-file.h b/csum-file.h index e54d53d1d0b3..3044bd19ab65 100644 --- a/csum-file.h +++ b/csum-file.h @@ -16,7 +16,9 @@ struct hashfile { const char *name; int do_crc; uint32_t crc32; - unsigned char buffer[8192]; + size_t buffer_len; + unsigned char *buffer; + unsigned char *check_buffer; }; /* Checkpoint */