Message ID | pull.1017.git.1629136135286.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d9e9b44d7a022ebc57caa3762d1af7e6bf608d5f |
Headers | show |
Series | sparse-index: copy dir_hash in ensure_full_index() | expand |
On 8/16/2021 1:48 PM, Derrick Stolee via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Copy the 'index_state->dir_hash' back to the real istate after expanding > a sparse index. > > A crash was observed in 'git status' during some hashmap lookups with > corrupted hashmap entries. During an index expansion, new cache-entries > are added to the 'index_state->name_hash' and the 'dir_hash' in a > temporary 'index_state' variable 'full'. However, only the 'name_hash' > hashmap from this temp variable was copied back into the real 'istate' > variable. The original copy of the 'dir_hash' was incorrectly > preserved. If the table in the 'full->dir_hash' hashmap were realloced, > the stale version (in 'istate') would be corrupted. > > The test suite does not operate on index sizes sufficiently large to > trigger this reallocation, so they do not cover this behavior. > Increasing the test suite to cover such scale is fragile and likely > wasteful. > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > sparse-index: copy dir_hash in ensure_full_index() > > This fix is an issue we discovered in our first experimental release of > the sparse index in the microsoft/git fork. We fixed it in the latest > experimental release [1] and then I almost forgot about it until we > started rebasing sparse-index work on top of the 2.33.0 release > candidates. > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > This is a change that can be taken anywhere since 4300f8 (sparse-index: > implement ensure_full_index(), 2021-03-30), but this version is based on > v2.33.0-rc2. I sent this patch on the day of v2.33.0, so I'm not surprised that it got lost in the shuffle. Could someone please take a look? It also has not been picked up for the What's Cooking email. Thanks, -Stolee
Hi Stolee, On Mon, 16 Aug 2021, Derrick Stolee via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Copy the 'index_state->dir_hash' back to the real istate after expanding > a sparse index. > > A crash was observed in 'git status' during some hashmap lookups with > corrupted hashmap entries. During an index expansion, new cache-entries > are added to the 'index_state->name_hash' and the 'dir_hash' in a > temporary 'index_state' variable 'full'. However, only the 'name_hash' > hashmap from this temp variable was copied back into the real 'istate' > variable. The original copy of the 'dir_hash' was incorrectly > preserved. If the table in the 'full->dir_hash' hashmap were realloced, > the stale version (in 'istate') would be corrupted. > > The test suite does not operate on index sizes sufficiently large to > trigger this reallocation, so they do not cover this behavior. > Increasing the test suite to cover such scale is fragile and likely > wasteful. That explanation makes sense. And as the finder of the symptom, I can confirm that it fixed the issue. So here is my `Reviewed-by:` ;-) Ciao, Dscho > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > sparse-index: copy dir_hash in ensure_full_index() > > This fix is an issue we discovered in our first experimental release of > the sparse index in the microsoft/git fork. We fixed it in the latest > experimental release [1] and then I almost forgot about it until we > started rebasing sparse-index work on top of the 2.33.0 release > candidates. > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > This is a change that can be taken anywhere since 4300f8 (sparse-index: > implement ensure_full_index(), 2021-03-30), but this version is based on > v2.33.0-rc2. > > While the bug is alarming for users who hit it (seg fault) it requires > sufficient scale and use of the optional sparse index feature. We are > not recommending wide adoption of the sparse index yet because we don't > have a sufficient density of integrated commands. For that reason, I > don't think this should halt progress towards the full v2.33.0 release. > I did want to send this as soon as possible so that could be at the > discretion of the maintainer. > > Thanks, -Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1017%2Fderrickstolee%2Fsparse-index%2Ffix-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1017/derrickstolee/sparse-index/fix-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1017 > > sparse-index.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/sparse-index.c b/sparse-index.c > index c6b4feec413..56eb65dc349 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -283,6 +283,7 @@ void ensure_full_index(struct index_state *istate) > > /* Copy back into original index. */ > memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); > + memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash)); > istate->sparse_index = 0; > free(istate->cache); > istate->cache = full->cache; > > base-commit: 5d213e46bb7b880238ff5ea3914e940a50ae9369 > -- > gitgitgadget > >
On Mon, Aug 16 2021, Derrick Stolee via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Copy the 'index_state->dir_hash' back to the real istate after expanding > a sparse index. > > A crash was observed in 'git status' during some hashmap lookups with > corrupted hashmap entries. During an index expansion, new cache-entries > are added to the 'index_state->name_hash' and the 'dir_hash' in a > temporary 'index_state' variable 'full'. However, only the 'name_hash' > hashmap from this temp variable was copied back into the real 'istate' > variable. The original copy of the 'dir_hash' was incorrectly > preserved. If the table in the 'full->dir_hash' hashmap were realloced, > the stale version (in 'istate') would be corrupted. > > The test suite does not operate on index sizes sufficiently large to > trigger this reallocation, so they do not cover this behavior. > Increasing the test suite to cover such scale is fragile and likely > wasteful. How large does the index need to be to trigger this? I don't know if a test here is useful, but FWIW if we had such a test then the EXPENSIVE prereq + GIT_TEST_LONG=true might be a good fit for it. > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > sparse-index: copy dir_hash in ensure_full_index() > > This fix is an issue we discovered in our first experimental release of > the sparse index in the microsoft/git fork. We fixed it in the latest > experimental release [1] and then I almost forgot about it until we > started rebasing sparse-index work on top of the 2.33.0 release > candidates. > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > This is a change that can be taken anywhere since 4300f8 (sparse-index: > implement ensure_full_index(), 2021-03-30), but this version is based on > v2.33.0-rc2. > > While the bug is alarming for users who hit it (seg fault) it requires > sufficient scale and use of the optional sparse index feature. We are > not recommending wide adoption of the sparse index yet because we don't > have a sufficient density of integrated commands. For that reason, I > don't think this should halt progress towards the full v2.33.0 release. > I did want to send this as soon as possible so that could be at the > discretion of the maintainer. > > Thanks, -Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1017%2Fderrickstolee%2Fsparse-index%2Ffix-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1017/derrickstolee/sparse-index/fix-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1017 > > sparse-index.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/sparse-index.c b/sparse-index.c > index c6b4feec413..56eb65dc349 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -283,6 +283,7 @@ void ensure_full_index(struct index_state *istate) > > /* Copy back into original index. */ > memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); > + memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash)); > istate->sparse_index = 0; > free(istate->cache); > istate->cache = full->cache; > > base-commit: 5d213e46bb7b880238ff5ea3914e940a50ae9369
On Mon, Aug 23, 2021 at 6:14 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 8/16/2021 1:48 PM, Derrick Stolee via GitGitGadget wrote: > > From: Jeff Hostetler <jeffhost@microsoft.com> > > > > Copy the 'index_state->dir_hash' back to the real istate after expanding > > a sparse index. > > > > A crash was observed in 'git status' during some hashmap lookups with > > corrupted hashmap entries. During an index expansion, new cache-entries > > are added to the 'index_state->name_hash' and the 'dir_hash' in a > > temporary 'index_state' variable 'full'. However, only the 'name_hash' > > hashmap from this temp variable was copied back into the real 'istate' > > variable. The original copy of the 'dir_hash' was incorrectly > > preserved. If the table in the 'full->dir_hash' hashmap were realloced, > > the stale version (in 'istate') would be corrupted. > > > > The test suite does not operate on index sizes sufficiently large to > > trigger this reallocation, so they do not cover this behavior. > > Increasing the test suite to cover such scale is fragile and likely > > wasteful. > > > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > --- > > sparse-index: copy dir_hash in ensure_full_index() > > > > This fix is an issue we discovered in our first experimental release of > > the sparse index in the microsoft/git fork. We fixed it in the latest > > experimental release [1] and then I almost forgot about it until we > > started rebasing sparse-index work on top of the 2.33.0 release > > candidates. > > > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > > > This is a change that can be taken anywhere since 4300f8 (sparse-index: > > implement ensure_full_index(), 2021-03-30), but this version is based on > > v2.33.0-rc2. > > I sent this patch on the day of v2.33.0, so I'm not surprised that it > got lost in the shuffle. Could someone please take a look? > > It also has not been picked up for the What's Cooking email. Explanation and the code make sense to me. This function uses some pretty low-level knowledge of how index_state is structured to do its job, and if index_state is updated in some significant way (which isn't likely), then someone would have to update this function. I was trying to think whether there was a way to future proof this function against such possibilities to ensure the need to update it is noticed, but I'm not sure there is one. But index_state doesn't seem particularly likely to change, so this is probably fine. So, here's my ack.
On 8/23/21 10:25 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Aug 16 2021, Derrick Stolee via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Copy the 'index_state->dir_hash' back to the real istate after expanding >> a sparse index. >> >> A crash was observed in 'git status' during some hashmap lookups with >> corrupted hashmap entries. During an index expansion, new cache-entries >> are added to the 'index_state->name_hash' and the 'dir_hash' in a >> temporary 'index_state' variable 'full'. However, only the 'name_hash' >> hashmap from this temp variable was copied back into the real 'istate' >> variable. The original copy of the 'dir_hash' was incorrectly >> preserved. If the table in the 'full->dir_hash' hashmap were realloced, >> the stale version (in 'istate') would be corrupted. >> >> The test suite does not operate on index sizes sufficiently large to >> trigger this reallocation, so they do not cover this behavior. >> Increasing the test suite to cover such scale is fragile and likely >> wasteful. > > How large does the index need to be to trigger this? I don't know if a > test here is useful, but FWIW if we had such a test then the EXPENSIVE > prereq + GIT_TEST_LONG=true might be a good fit for it. > There would need to be enough directories in the repo to cause the dir_hash to grow during an insert into the dir_hash. IIRC the hashmap table is initialized to 64 and auto reallocs when it hits 80% capacity, so somewhere in the area of 50 directories at minimum. Whether the error is observed would also depend on free() trashing the contents of the memory and/or whether the memory was recycled by something else. Jeff
On 8/23/2021 2:18 PM, Jeff Hostetler wrote: > On 8/23/21 10:25 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Mon, Aug 16 2021, Derrick Stolee via GitGitGadget wrote: >> >>> From: Jeff Hostetler <jeffhost@microsoft.com> >>> >>> Copy the 'index_state->dir_hash' back to the real istate after expanding >>> a sparse index. >>> >>> A crash was observed in 'git status' during some hashmap lookups with >>> corrupted hashmap entries. During an index expansion, new cache-entries >>> are added to the 'index_state->name_hash' and the 'dir_hash' in a >>> temporary 'index_state' variable 'full'. However, only the 'name_hash' >>> hashmap from this temp variable was copied back into the real 'istate' >>> variable. The original copy of the 'dir_hash' was incorrectly >>> preserved. If the table in the 'full->dir_hash' hashmap were realloced, >>> the stale version (in 'istate') would be corrupted. >>> >>> The test suite does not operate on index sizes sufficiently large to >>> trigger this reallocation, so they do not cover this behavior. >>> Increasing the test suite to cover such scale is fragile and likely >>> wasteful. >> >> How large does the index need to be to trigger this? I don't know if a >> test here is useful, but FWIW if we had such a test then the EXPENSIVE >> prereq + GIT_TEST_LONG=true might be a good fit for it. >> > > There would need to be enough directories in the repo to cause > the dir_hash to grow during an insert into the dir_hash. > IIRC the hashmap table is initialized to 64 and auto reallocs when > it hits 80% capacity, so somewhere in the area of 50 directories > at minimum. Further complicating this is that the sparse index size matters relative to the full index size, since the dir_hash has a given size going into ensure_full_index() that depends on the sparse-index size and the reallocation patterns. > Whether the error is observed would also depend on free() trashing > the contents of the memory and/or whether the memory was recycled > by something else. Issues like this cause this to be hard to reproduce. We did not catch it in the performance test p2000-sparse-operations.sh, which aims to measure how the index works at scale. This makes me incredibly pessimistic that such a test will be effective at preventing a regression, let alone catching similar issues in the future. Thanks, -Stolee
diff --git a/sparse-index.c b/sparse-index.c index c6b4feec413..56eb65dc349 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -283,6 +283,7 @@ void ensure_full_index(struct index_state *istate) /* Copy back into original index. */ memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash)); istate->sparse_index = 0; free(istate->cache); istate->cache = full->cache;