Message ID | 20211012174208.95161-1-chooglen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Use default values from settings instead of config | expand |
Glen Choo <chooglen@google.com> writes: > And since I am re-rolling, I incorporated Ævar's feedback regarding the commit > messages (thanks!). I considered combining patches 1 and 2, but patch 1 has a > grown a little to fix the CI issues, so I've decided to keep patches 1 and 2 > separate. OK. Will replace. Thanks.
On Tue, Oct 12 2021, Glen Choo wrote: > Hi everyone! This patch was created in response to something we observed in > Google, where fsck failed to detect that the commit graph was invalid. We > initially assumed that fsck never checked the commit graph, but it turns out > that it does so only when core.commitgraph is set, even though we set defaults > for "whether to use the commit graph" in the repository settings. > > Instead of using the config, let's use repository settings where > available. Replace core.commitGraph and core.multiPackIndex with their > equivalent repository settings in fsck and gc. > > This re-roll is primarily motivated by the CI failures noted by Junio in [1]. > The underlying cause is that GIT_TEST_COMMIT_GRAPH=1 (enabled in the linux-gcc > job) causes commands like "git commit" to write commit-graphs, but certain tests > like t/t0410-partial-clone.sh and t/t3800-mktag.sh intentionally create > corruptions that cause commit-graphs to become out-of-sync/invalid. Patch 1 > fixes a bug where fsck did not check commit-graphs by default, which means that > these tests will now fail because they have invalid commit-graphs. The easiest > solution I found is to disable this confusing and noisy behavior with > GIT_TEST_COMMIT_GRAPH=0. That's easy, but... > And since I am re-rolling, I incorporated Ævar's feedback regarding the commit > messages (thanks!). I considered combining patches 1 and 2, but patch 1 has a > grown a little to fix the CI issues, so I've decided to keep patches 1 and 2 > separate. > > This series has also seen a healthy amount of interest in test style and > coverage. We haven't converged on a path for the future, but I'd like to think > that the tests here are still a step in (approximately) the right direction. > Hopefully this version LGT us while we figure out what to do with tests :) > > [1] https://lore.kernel.org/git/xmqqfstafyox.fsf@gitster.g/ > > Changes since v3 > * Disable GIT_TEST_COMMIT_GRAPH in tests that intentionally corrupt things in a > way that is incompatible with commit-graphs. > * Make patch 1 and 2's commit messages more concise (thanks Ævar!). ...how isn't disabling those t3800-mktag.sh tests just plasting over corruption that we're noticing because of your changes to (rightly) fix the bug where "fsck" wasn't checking the graph at all? IOW haven't we just found exactly the sort of bug that "GIT_TEST_COMMIT_GRAPH" is put in place to find for us, but now instead of fixing it we're hiding it? If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file I see that we fail N number of tests, but all of them are actually fallout of just this test: git replace $head_parent $head && git replace -f $tree $blob I.e. we've created a replacement object replacing a tree with a blob, as part of tests I added to test how mktag handles those sorts of weird edge cases. This then causes the graph verify code to throw a hissy fit with: root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != 0fbca9850869684085d654f9e1380c9780802570 I.e. when we wrote the graph we somehow didn't notice that the root tree node we wrote is to an object that's not actually a tree? Isn't this a bug where some part of the commit graph writing should be doing its own extended OID lookup that's replacement-object aware, it didn't, and we wrote a corrupt graph as a result? If there is a legitimate reason why we're not just hiding a bug we've turned up with these fixes let's disable that one test, not the entire test file. If you don't run the one test that fails (which is split up into 3 individual pieces) there's still 143 other tests that are run, all of those presumably benefit from finding future bugs with GIT_TEST_COMMIT_GRAPH=true, particularly since the test file seems to just have turned up one just now...
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > ...how isn't disabling those t3800-mktag.sh tests just plasting over > corruption that we're noticing because of your changes to (rightly) fix > the bug where "fsck" wasn't checking the graph at all? > > IOW haven't we just found exactly the sort of bug that > "GIT_TEST_COMMIT_GRAPH" is put in place to find for us, but now instead > of fixing it we're hiding it? > > If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file > I see that we fail N number of tests, but all of them are actually > fallout of just this test: > > git replace $head_parent $head && > git replace -f $tree $blob > > I.e. we've created a replacement object replacing a tree with a blob, as > part of tests I added to test how mktag handles those sorts of weird > edge cases. > > This then causes the graph verify code to throw a hissy fit with: > > root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in > commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != > 0fbca9850869684085d654f9e1380c9780802570 > > I.e. when we wrote the graph we somehow didn't notice that the root tree > node we wrote is to an object that's not actually a tree? Isn't this a > bug where some part of the commit graph writing should be doing its own > extended OID lookup that's replacement-object aware, it didn't, and we > wrote a corrupt graph as a result? > > If there is a legitimate reason why we're not just hiding a bug we've > turned up with these fixes let's disable that one test, not the entire > test file. > > If you don't run the one test that fails (which is split up into 3 > individual pieces) there's still 143 other tests that are run, all of > those presumably benefit from finding future bugs with > GIT_TEST_COMMIT_GRAPH=true, particularly since the test file seems to > just have turned up one just now... I think this falls on my shoulders. I assumed that the failures were expected behavior, not bugs. You are right that we shouldn't be plastering over bugs. I'll have to ask for help here because I don't know enough about mktag to distinguish between 'expected' and 'unexpected' failures. The best I can do is to add GIT_TEST_COMMIT_GRAPH=0 + NEEDSWORK for the failing tests. But if that's good enough for now, I'll just do that :)
On 10/12/2021 4:34 PM, Ævar Arnfjörð Bjarmason wrote:... > If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file > I see that we fail N number of tests, but all of them are actually > fallout of just this test: > > git replace $head_parent $head && > git replace -f $tree $blob > > I.e. we've created a replacement object replacing a tree with a blob, as > part of tests I added to test how mktag handles those sorts of weird > edge cases. > > This then causes the graph verify code to throw a hissy fit with: > > root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in > commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != > 0fbca9850869684085d654f9e1380c9780802570 > > I.e. when we wrote the graph we somehow didn't notice that the root tree > node we wrote is to an object that's not actually a tree? Isn't this a > bug where some part of the commit graph writing should be doing its own > extended OID lookup that's replacement-object aware, it didn't, and we > wrote a corrupt graph as a result? > > If there is a legitimate reason why we're not just hiding a bug we've > turned up with these fixes let's disable that one test, not the entire > test file. The commit-graph should be disabled if replace-objects are enabled. If there is a bug being introduced here it is because the commit-graph is being checked during fsck even though it would never be read when the replace-objects exist. Thanks, -Stolee
On Wed, Oct 13 2021, Derrick Stolee wrote: > On 10/12/2021 4:34 PM, Ævar Arnfjörð Bjarmason wrote:... >> If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file >> I see that we fail N number of tests, but all of them are actually >> fallout of just this test: >> >> git replace $head_parent $head && >> git replace -f $tree $blob >> >> I.e. we've created a replacement object replacing a tree with a blob, as >> part of tests I added to test how mktag handles those sorts of weird >> edge cases. >> >> This then causes the graph verify code to throw a hissy fit with: >> >> root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in >> commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != >> 0fbca9850869684085d654f9e1380c9780802570 >> >> I.e. when we wrote the graph we somehow didn't notice that the root tree >> node we wrote is to an object that's not actually a tree? Isn't this a >> bug where some part of the commit graph writing should be doing its own >> extended OID lookup that's replacement-object aware, it didn't, and we >> wrote a corrupt graph as a result? >> >> If there is a legitimate reason why we're not just hiding a bug we've >> turned up with these fixes let's disable that one test, not the entire >> test file. > > The commit-graph should be disabled if replace-objects are enabled. If > there is a bug being introduced here it is because the commit-graph is > being checked during fsck even though it would never be read when the > replace-objects exist. > > Thanks, > -Stolee Thanks, isn't the obvious fix for this to extend your d6538246d3d (commit-graph: not compatible with replace objects, 2018-08-20) to do "read_replace_refs = 0;" in graph_verify()? That works for me on this case. I.e. if we ignore replace refs when writing the graph, and we don't use it if there are any due to commit_graph_compatible(), why would we consider them when verifying it?
On Tue, Oct 12 2021, Glen Choo wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> ...how isn't disabling those t3800-mktag.sh tests just plasting over >> corruption that we're noticing because of your changes to (rightly) fix >> the bug where "fsck" wasn't checking the graph at all? >> >> IOW haven't we just found exactly the sort of bug that >> "GIT_TEST_COMMIT_GRAPH" is put in place to find for us, but now instead >> of fixing it we're hiding it? >> >> If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file >> I see that we fail N number of tests, but all of them are actually >> fallout of just this test: >> >> git replace $head_parent $head && >> git replace -f $tree $blob >> >> I.e. we've created a replacement object replacing a tree with a blob, as >> part of tests I added to test how mktag handles those sorts of weird >> edge cases. >> >> This then causes the graph verify code to throw a hissy fit with: >> >> root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in >> commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != >> 0fbca9850869684085d654f9e1380c9780802570 >> >> I.e. when we wrote the graph we somehow didn't notice that the root tree >> node we wrote is to an object that's not actually a tree? Isn't this a >> bug where some part of the commit graph writing should be doing its own >> extended OID lookup that's replacement-object aware, it didn't, and we >> wrote a corrupt graph as a result? >> >> If there is a legitimate reason why we're not just hiding a bug we've >> turned up with these fixes let's disable that one test, not the entire >> test file. >> >> If you don't run the one test that fails (which is split up into 3 >> individual pieces) there's still 143 other tests that are run, all of >> those presumably benefit from finding future bugs with >> GIT_TEST_COMMIT_GRAPH=true, particularly since the test file seems to >> just have turned up one just now... > > I think this falls on my shoulders. I assumed that the failures were > expected behavior, not bugs. You are right that we shouldn't be > plastering over bugs. > > I'll have to ask for help here because I don't know enough about mktag > to distinguish between 'expected' and 'unexpected' failures. The best I > can do is to add GIT_TEST_COMMIT_GRAPH=0 + NEEDSWORK for the failing > tests. But if that's good enough for now, I'll just do that :) I don't think your series needs to be tasked with fixing a generic issue in the commit-graph you stumbled across. So for your series I'd think the only required fix would be to narrowlry skip *just* the broken tests, not the entire test file. But in this case (see http://lore.kernel.org/git/87pms8hq4s.fsf@evledraar.gmail.com) the fix seems rather trivial to me. I.e. just adding one line to builtin/commit-graph.c, so perhaps you can see if that works for you and such a "this bug was missed because this mode didn't work" fix could be part of a re-roll of yours?
On 10/13/2021 11:57 AM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 13 2021, Derrick Stolee wrote: > >> On 10/12/2021 4:34 PM, Ævar Arnfjörð Bjarmason wrote:... >>> If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file >>> I see that we fail N number of tests, but all of them are actually >>> fallout of just this test: >>> >>> git replace $head_parent $head && >>> git replace -f $tree $blob >>> >>> I.e. we've created a replacement object replacing a tree with a blob, as >>> part of tests I added to test how mktag handles those sorts of weird >>> edge cases. >>> >>> This then causes the graph verify code to throw a hissy fit with: >>> >>> root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in >>> commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 != >>> 0fbca9850869684085d654f9e1380c9780802570 >>> >>> I.e. when we wrote the graph we somehow didn't notice that the root tree >>> node we wrote is to an object that's not actually a tree? Isn't this a >>> bug where some part of the commit graph writing should be doing its own >>> extended OID lookup that's replacement-object aware, it didn't, and we >>> wrote a corrupt graph as a result? >>> >>> If there is a legitimate reason why we're not just hiding a bug we've >>> turned up with these fixes let's disable that one test, not the entire >>> test file. >> >> The commit-graph should be disabled if replace-objects are enabled. If >> there is a bug being introduced here it is because the commit-graph is >> being checked during fsck even though it would never be read when the >> replace-objects exist. >> >> Thanks, >> -Stolee > > Thanks, isn't the obvious fix for this to extend your d6538246d3d > (commit-graph: not compatible with replace objects, 2018-08-20) to do > "read_replace_refs = 0;" in graph_verify()? That works for me on this > case. Ignoring the replace refs while verifying will allow you to verify the on-disk commit-graph file without issue. > I.e. if we ignore replace refs when writing the graph, and we don't use > it if there are any due to commit_graph_compatible(), why would we > consider them when verifying it? I generally consider any interaction with the commit-graph while replace refs are enabled to be the bug. We don't read the commit-graph when replace refs are on, so why are we verifying it? But, I understand the desire to check the on-disk content even though it is not being used, so disabling replace refs to do the verify should be sufficient. Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: >>> The commit-graph should be disabled if replace-objects are enabled. If >>> there is a bug being introduced here it is because the commit-graph is >>> being checked during fsck even though it would never be read when the >>> replace-objects exist. >>> >>> Thanks, >>> -Stolee >> >> Thanks, isn't the obvious fix for this to extend your d6538246d3d >> (commit-graph: not compatible with replace objects, 2018-08-20) to do >> "read_replace_refs = 0;" in graph_verify()? That works for me on this >> case. > > Ignoring the replace refs while verifying will allow you to verify the > on-disk commit-graph file without issue. It seems like we've converged on doing read_replace_refs = 0 \o/ If we are going to do this twice in graph_verify() and graph_write(), is there any reason why I shouldn't just do "read_replace_refs = 0" once in cmd_commit_graph()? IOW any time we do anything with commit-graphs, we should just ignore replace refs because they're incompatible.
Derrick Stolee <stolee@gmail.com> writes: > I generally consider any interaction with the commit-graph while > replace refs are enabled to be the bug. We don't read the commit-graph > when replace refs are on, so why are we verifying it? > > But, I understand the desire to check the on-disk content even though > it is not being used, so disabling replace refs to do the verify should > be sufficient. That's quite a sensible argument. Sounds like we have a way to solve this that everybody agrees with. Thanks.
On Thu, Oct 14 2021, Glen Choo wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>>> The commit-graph should be disabled if replace-objects are enabled. If >>>> there is a bug being introduced here it is because the commit-graph is >>>> being checked during fsck even though it would never be read when the >>>> replace-objects exist. >>>> >>>> Thanks, >>>> -Stolee >>> >>> Thanks, isn't the obvious fix for this to extend your d6538246d3d >>> (commit-graph: not compatible with replace objects, 2018-08-20) to do >>> "read_replace_refs = 0;" in graph_verify()? That works for me on this >>> case. >> >> Ignoring the replace refs while verifying will allow you to verify the >> on-disk commit-graph file without issue. > > It seems like we've converged on doing read_replace_refs = 0 \o/ > > If we are going to do this twice in graph_verify() and graph_write(), is > there any reason why I shouldn't just do "read_replace_refs = 0" once in > cmd_commit_graph()? IOW any time we do anything with commit-graphs, we > should just ignore replace refs because they're incompatible. No reason, I think that's the best way to do this. I've submitted a series to fix that verify issue, as noted in the CL these patches on top of it without that disabling of the mktag tests will pass with GIT_TEST_COMMIT_GRAPH=true): https://lore.kernel.org/git/cover-0.3-00000000000-20211014T233343Z-avarab@gmail.com
Junio C Hamano <gitster@pobox.com> writes: > Derrick Stolee <stolee@gmail.com> writes: > >> I generally consider any interaction with the commit-graph while >> replace refs are enabled to be the bug. We don't read the commit-graph >> when replace refs are on, so why are we verifying it? >> >> But, I understand the desire to check the on-disk content even though >> it is not being used, so disabling replace refs to do the verify should >> be sufficient. > > That's quite a sensible argument. Sounds like we have a way to > solve this that everybody agrees with. Would it be sensible to silently ignore, though? Would a warning, when we see replacement is in use and an explicit request by an end user is made to make use of or update the commit-graph, be too loud?