mbox series

[0/8] Sparse index: integrate with sparse-checkout

Message ID pull.1208.git.1652724693.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse index: integrate with sparse-checkout | expand

Message

Philippe Blain via GitGitGadget May 16, 2022, 6:11 p.m. UTC
This series is based on ds/sparse-colon-path.

This series integrates the 'git sparse-checkout' builtin with the sparse
index. This is the last integration that we fast-tracked into the
microsoft/git fork. After this, we have no work in flight that would
conflict with a Google Summer of Code project in this area.

The tricky part about the sparse-checkout builtin is that we actually do
need to expand the index when growing the sparse-checkout boundary. The
trick is to expand it only as far as we need it, and then ensure that we
collapse any removed directories before the command completes.

To do this, we introduce a concept of a "partially expanded" index. In fact,
we break the boolean sparse_index member into an enum with three states:

 * COMPLETELY_FULL (0): No sparse directories exist.

 * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
   sparse-checkout cone are reduced to sparse directory entries whenever
   possible.

 * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
   outside the sparse-checkout cone may exist. Running convert_to_sparse()
   may further reduce those files to sparse directory entries.

Most of the patches in this series focus on introducing this enum and
carefully converting previous uses of the boolean to use the enum. Some
additional work is required to refactor ensure_full_index() into a new
expand_to_pattern_list() method, as they are doing essentially the same
thing, but with different scopes.

The result is improved performance on the sparse-checkout builtin as
demonstrated in a new p2000-sparse-operations.sh performance test:


Test HEAD~1 HEAD
================

2000.24: git sparse-checkout ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53)
-26.6% 2000.25: git sparse-checkout ... (sparse-v4) 2.20(1.62+0.57)
1.58(0.98+0.59) -28.2%

The improvement here is less dramatic because the operation is dominated by
writing and deleting a lot of files in the worktree. A repeated no-op
operation such as git sparse-checkout set $SPARSE_CONE would show a greater
improvement, but is less interesting since it could gain that improvement
without satisfying the "hard" parts of this builtin.

I specifically chose how to update the tests in t1092 and p2000 to avoid
conflicts with Victoria's 'git stash' work.

Thanks, -Stolee

Derrick Stolee (8):
  sparse-index: create expand_to_pattern_list()
  sparse-index: introduce partially-sparse indexes
  cache-tree: implement cache_tree_find_path()
  sparse-checkout: --no-sparse-index needs a full index
  sparse-index: partially expand directories
  sparse-index: complete partial expansion
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-checkout: integrate with sparse index

 builtin/sparse-checkout.c                |   8 +-
 cache-tree.c                             |  24 +++++
 cache-tree.h                             |   2 +
 cache.h                                  |  32 ++++--
 read-cache.c                             |   6 +-
 sparse-index.c                           | 126 ++++++++++++++++++++---
 sparse-index.h                           |  14 +++
 t/perf/p2000-sparse-operations.sh        |   1 +
 t/t1092-sparse-checkout-compatibility.sh |  25 +++++
 unpack-trees.c                           |   4 +
 10 files changed, 214 insertions(+), 28 deletions(-)


base-commit: 124b05b23005437fa5fb91863bde2a8f5840e164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1208%2Fderrickstolee%2Fsparse-index%2Fsparse-checkout-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1208/derrickstolee/sparse-index/sparse-checkout-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1208