mbox series

[0/3] commit-graph: fix corruption during generation v2 upgrade

Message ID cover.1657667404.git.me@ttaylorr.com (mailing list archive)
Headers show
Series commit-graph: fix corruption during generation v2 upgrade | expand

Message

Taylor Blau July 12, 2022, 11:10 p.m. UTC
This brief series resolves a bug where a commit-graph would become
corrupt when upgrading from generation number v1 to v2, as originally
reported in [1].

Some speculation occurred as to what might be causing that bug in the
thread beginning at [1], until the problem was explained in more detail
by Will Chandler in [2].

The crux of the issue, as is described in [2] and [3], is that the
commit_graph_data slab is reused for read and write operations involving
the commit-graph, leading to situations where data computed in
preparation of a write is clobbered by a read of existing data.

The first patch demonstrates the issue, and the second patch prepares to
fix it by introducing a helper function. The crux of the issue is
described and fixed in the third patch.

[1]: https://lore.kernel.org/git/YqD5dgalb9EPnz85@coredump.intra.peff.net/
[2]: https://lore.kernel.org/git/DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org/
[3]: https://lore.kernel.org/git/YsS7H4i5DqUZVQ5i@nand.local/

Taylor Blau (3):
  t5318: demonstrate commit-graph generation v2 corruption
  commit-graph: introduce `repo_find_commit_pos_in_graph()`
  commit-graph: fix corrupt upgrade from generation v1 to v2

 bloom.c                 | 10 +++++-----
 commit-graph.c          | 12 +++++++++---
 commit-graph.h          | 15 +++++++++++++++
 t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
 4 files changed, 56 insertions(+), 8 deletions(-)

Comments

Junio C Hamano July 13, 2022, 5:41 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> This brief series resolves a bug where a commit-graph would become
> corrupt when upgrading from generation number v1 to v2, as originally
> reported in [1].
>
> Some speculation occurred as to what might be causing that bug in the
> thread beginning at [1], until the problem was explained in more detail
> by Will Chandler in [2].
>
> The crux of the issue, as is described in [2] and [3], is that the
> commit_graph_data slab is reused for read and write operations involving
> the commit-graph, leading to situations where data computed in
> preparation of a write is clobbered by a read of existing data.
>
> The first patch demonstrates the issue, and the second patch prepares to
> fix it by introducing a helper function. The crux of the issue is
> described and fixed in the third patch.
>
> [1]: https://lore.kernel.org/git/YqD5dgalb9EPnz85@coredump.intra.peff.net/
> [2]: https://lore.kernel.org/git/DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org/
> [3]: https://lore.kernel.org/git/YsS7H4i5DqUZVQ5i@nand.local/

Thanks.  Do we know where this breaks?  Applying [1/3] on Git 2.32,
2.34, and 2.35 seems to claim that "known breakage vanished".
Taylor Blau July 15, 2022, 2:02 a.m. UTC | #2
On Wed, Jul 13, 2022 at 10:41:45AM -0700, Junio C Hamano wrote:
> Thanks.  Do we know where this breaks?  Applying [1/3] on Git 2.32,
> 2.34, and 2.35 seems to claim that "known breakage vanished".

With a script like:

--- >8 ---
#!/bin/sh

set -e

rm -fr repo
git init -q repo
cd repo

echo "x" >x
git add x
GIT_AUTHOR_DATE="@2 +0000" \
  GIT_COMMITTER_DATE="@2 +0000" git commit -q -m "$(cat x)"

git repack -d -q

git.compile -c commitGraph.generationVersion=1 commit-graph write
git.compile -c commitGraph.generationVersion=2 commit-graph write \
  --changed-paths

git.compile rev-list --all
--- 8< ---

You can bisect it to 3b0199d4c3 (commit-graph: start parsing generation
v2 (again), 2022-03-01), but only because that patch teaches Git to
recognize the existence of the generation v2 chunks.

I suspect (but haven't confirmed) that it was probably broken before
3b0199d4c3. But such breakage wouldn't have mattered, since despite
understanding generation v2, previous versions of Git never read those
chunks, which would have masked over this bug.

Thanks,
Taylor
Derrick Stolee July 15, 2022, 3:20 a.m. UTC | #3
On 7/12/2022 7:10 PM, Taylor Blau wrote:
> This brief series resolves a bug where a commit-graph would become
> corrupt when upgrading from generation number v1 to v2, as originally
> reported in [1].
> 
> Some speculation occurred as to what might be causing that bug in the
> thread beginning at [1], until the problem was explained in more detail
> by Will Chandler in [2].
> 
> The crux of the issue, as is described in [2] and [3], is that the
> commit_graph_data slab is reused for read and write operations involving
> the commit-graph, leading to situations where data computed in
> preparation of a write is clobbered by a read of existing data.
> 
> The first patch demonstrates the issue, and the second patch prepares to
> fix it by introducing a helper function. The crux of the issue is
> described and fixed in the third patch.

Thanks, all, for identifying the scenario that causes this breakage.
It was frustrating that we had reports of corruption without knowing
where it was coming from.

This solution looks like the best we can do right now, although it
hints that some refactoring would be good to distinguish between
"commits as read" and "commits to be written" during this upgrade
phase. Something to think about, but not to block this fix.

Thanks,
-Stolee