mbox series

[v2,00/19] Parallel Checkout (part I)

Message ID cover.1600814153.git.matheus.bernardino@usp.br (mailing list archive)
Headers show
Series Parallel Checkout (part I) | expand

Message

Matheus Tavares Sept. 22, 2020, 10:49 p.m. UTC
This series adds helper workers to checkout, parallelizing the reading,
filtering and writing of multiple blobs to the working tree.

Since v1, I got the chance to benchmark parallel checkout in more
machines. The results showed that the parallelization is most effective
with repositories located on SSDs or over distributed file systems. For
local file systems on spinning disks, it does not always bring good
performances. In fact, it even brings a slowdown sometimes. But given
the results on the two first cases, I think it's worth having the
parallel code as an optional (and non-default) setting.

The size of the repository being checked out and the compression level
on the packfiles also influence how much performance gain we can get
from parallel checkout. For example, downloading the Linux repo from
GitHub and from kernel.org I got packfiles with 2.9GB and 1.4GB,
respectively. The number of objects was the same, but GitHub's had a
smaller number of delta-chains with size >= 7 [A]. For this reason, the
sequential checkout after GitHub's clone was considerably faster than
the sequential checkout after kernel.org's clone. And the speedup from
parallel checkout was more modest (but it was faster in absolute values,
nevertheless).

[A]: https://docs.google.com/spreadsheets/d/1dDGLym77JAGCVYhKQHe44r3pqtrsvHrjS4NmD_Hqr6k/edit?usp=sharing

V2 got bigger with tests and some additional optimizations, so I decided
to divide the original series into two parts to facilitate reviewing.
This one is constituted of:

- The first 9 patches are preparatory steps in convert.c and entry.c.
- The middle 6 actually implement parallel checkout.
- The last 4 add tests.

Part II will contain some extra optimizations, like work stealing and
the creation of leading directories in parallel. With that, workers
won't need to stat() the path components again before opening the files
for writing. We will also skip some stat() calls during clone.


Major changes since v1:

General:
- Added tests
- Parallel checkout is no longer the default, since not all machines
  benefit from it.
- Rebased on top of master to use the adjusted mem_pool API of
  en/mem-pool.

Patch 10:
- Converted BUG() to error(), in handle_results(), when we finish
  parallel checkout with pending entries. This is not really a BUG; it
  can happen when a worker dies before sending all of its results. Also,
  by emitting an error message instead of die()'ing, we can continue
  processing the next results and, thus, avoid wasting successful work.
- Added missing initialization of ci->status on enqueue_entry().
- Fixed bug on which collision report during clone would not be correct
  when the file that is first written appears after it's colliding pair
  in the cache array.
- Reworded commit message and added comment in handle_results() to
  explain why we retry writing entries with path collisions.
- Renamed CI_RETRY to CI_COLLISION, to make it easier to change the
  behavior on collided entries in the future, if necessary.
- Some other minor changes like:
  * Removed unnecessary PC_HANDLING_RESULTS status.
  * Statically allocated the global parallel_checkout struct.
  * Renamed checkout_item to parallel_checkout_item.

Patch 11:
- Made parse_and_save_result() safer by checking that the received data
  has the expected size, instead of trusting ci->status and possibly
  accessing an invalid address on errors.
- Limited the workers to the number of enqueued entries.
- Added comment in packet_to_ci() mentioning why it's OK to encode
  NULL as a zero length string when sending the working_tree_encoding to
  workers.
- Split subprocess' spawning and finalizing loops, to mitigate the
  spawn/wait cost.
- Don't die() when a worker exits with an error code (only report the
  error), to avoid wasting good work by not updating the index with the 
  stat information from the written entries.
- Renamed checkout.workersThreshold to checkout.thresholdForParallelism.


Jeff Hostetler (4):
  convert: make convert_attrs() and convert structs public
  convert: add [async_]convert_to_working_tree_ca() variants
  convert: add get_stream_filter_ca() variant
  convert: add conv_attrs classification

Matheus Tavares (15):
  entry: extract a header file for entry.c functions
  entry: make fstat_output() and read_blob_entry() public
  entry: extract cache_entry update from write_entry()
  entry: move conv_attrs lookup up to checkout_entry()
  entry: add checkout_entry_ca() which takes preloaded conv_attrs
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: support progress displaying
  make_transient_cache_entry(): optionally alloc from mem_pool
  builtin/checkout.c: complete parallel checkout support
  checkout-index: add parallel checkout support
  parallel-checkout: add tests for basic operations
  parallel-checkout: add tests related to clone collisions
  parallel-checkout: add tests related to .gitattributes
  ci: run test round with parallel-checkout enabled

 .gitignore                              |   1 +
 Documentation/config/checkout.txt       |  21 +
 Makefile                                |   2 +
 apply.c                                 |   1 +
 builtin.h                               |   1 +
 builtin/checkout--helper.c              | 142 ++++++
 builtin/checkout-index.c                |  17 +
 builtin/checkout.c                      |  21 +-
 builtin/difftool.c                      |   3 +-
 cache.h                                 |  34 +-
 ci/run-build-and-tests.sh               |   1 +
 convert.c                               | 121 +++--
 convert.h                               |  68 +++
 entry.c                                 | 102 ++--
 entry.h                                 |  54 ++
 git.c                                   |   2 +
 parallel-checkout.c                     | 631 ++++++++++++++++++++++++
 parallel-checkout.h                     | 103 ++++
 read-cache.c                            |  12 +-
 t/README                                |   4 +
 t/lib-encoding.sh                       |  25 +
 t/lib-parallel-checkout.sh              |  45 ++
 t/t0028-working-tree-encoding.sh        |  25 +-
 t/t2080-parallel-checkout-basics.sh     | 197 ++++++++
 t/t2081-parallel-checkout-collisions.sh | 116 +++++
 t/t2082-parallel-checkout-attributes.sh | 174 +++++++
 unpack-trees.c                          |  22 +-
 27 files changed, 1793 insertions(+), 152 deletions(-)
 create mode 100644 builtin/checkout--helper.c
 create mode 100644 entry.h
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h
 create mode 100644 t/lib-encoding.sh
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh
 create mode 100755 t/t2081-parallel-checkout-collisions.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh