mbox series

[RFC,0/4] add parallel unlink

Message ID 20231203133911.41594-1-hanyoung@protonmail.com (mailing list archive)
Headers show
Series add parallel unlink | expand

Message

Han Young Dec. 3, 2023, 1:39 p.m. UTC
We have had parallel_checkout option since 04155bdad, but the unlink is still performed single threaded.
With a very large repository, directory rename or reorganization can lead to a large amount of unlinked entries.
In some instance, the unlink process can be slower than the parallel checkout.

This series of patches introduces basic support for parallel unlink. The removal of individual files
can be easily multithreaded, but removing empty directories is a little tricky.
If one thread decides to remove the directory, it may still have files that need to be deleted by
another thread. I had to use a mutex-guarded hashset to collect these 'race' directories,
and remove them after all threads have been joined. Maybe there are ways to do this
without mutex and hashmap?

The speed of unlinking files seems to vary from system to system. I did some tests with a private repo.
When I checkout a commit with 15000 moved files on a Linux machine with btrfs, parallel_unlink yields
10% speed up. But on a Intel MacBook Pro with APFS, the speed up is over 100%. I find it difficult to
choose the default threshold of parallel_unlink.

This series is by no means complete. Many functions contains duplicated code, and there are some
memory leaks. I want to know the community opinion before proceed, if it's worth doing or a waste of time.

Han Young (4):
  symlinks: add and export threaded rmdir variants
  entry: add threaded_unlink_entry function
  parallel-checkout: add parallel_unlink
  unpack-trees: introduce parallel_unlink

 entry.c             |  16 ++++++
 entry.h             |   3 ++
 parallel-checkout.c |  80 +++++++++++++++++++++++++++++
 parallel-checkout.h |  25 +++++++++
 symlinks.c          | 120 ++++++++++++++++++++++++++++++++++++++++++--
 symlinks.h          |   6 +++
 unpack-trees.c      |  15 +-----
 7 files changed, 249 insertions(+), 16 deletions(-)