From patchwork Tue Oct 8 14:12:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13826571 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4554A1DFE2D for ; Tue, 8 Oct 2024 14:12:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728396743; cv=none; b=tsLZ8ycgSE7C4MHIrZ+uOALo1oZY2D6r2YIxPGXghMQzrTsLo21pAAyJfiB2x75krLcyex8JuRMAvNxWV66IL+fSxaSRwHzD677wXocwq3j8OnFhfR/jaAl+dhsh0LIS6UgqGSHAlN2F3H9nkbV7DQH1YG1QUgVQaFdPbx8aeo0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728396743; c=relaxed/simple; bh=nHr3lAPUy3tOoVWoI3q8tIHRIs+K3x+c2o9FuYt++P8=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=jqU+9NQtJacf1MAPOyDyHORD7NEvz80GD+2zm+deGIam/hck3G3KmF0H33rXGzC5w78HHezsnOYk8jbuwsIqbbS7Gp+fF5HyhubnoH+FNlg7eBf28NcV6YAodgb2CLqWknjsRZNxRUqUL9oJDV4uVo3VMHrMpr4iXIF5fWopI7o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UB5/xlEs; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UB5/xlEs" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a9939f20ca0so507288866b.1 for ; Tue, 08 Oct 2024 07:12:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728396739; x=1729001539; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=H1Dk8CxF9S/s65XsXVAoxKRDpwmgMnTH343/56iclQo=; b=UB5/xlEsZWJrc8+pMjGCAxbS9aCnFCLk8RtHBs280K26hjbiKrlpJsOwtwPrEMlUrZ d/nQkyMCO3luNqsXTVk/BiDnhjbS5jRvUrq9XfUunIIzVCzOWJwmSOfRUVq1VRQDinoj 6dInLEEy67X+s/KVKYJatiZrzpk9W57sIciQBeO6z5yZwYpHLljWd426NY/vrqy4XXzY pd45KP9aOozppt01Y0OBkm00O8EqZ+AUuIzG7GDGg1+PcBKfxRv5+FSDP34JIPYLYZPB +J22Ba5VvcKw1T/QzpjRyXFBu+t4yDD1pZ4FsoFKhCVWS7OzwUlXyqJrcfCI+mCyZf+K YHUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728396739; x=1729001539; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H1Dk8CxF9S/s65XsXVAoxKRDpwmgMnTH343/56iclQo=; b=hk/vMoJnA9wTCmDQPcvnFM1TvYuxxMLSxoYkbYE6DP/f6jRpzZlzEGT7C27TPX/iK0 5nr21+jazfyqpFVKqUAN503nGToqsO+LhL8mkabyVSQn5ThOvZngned8EnP16ub9k2jy DrNPynPS4muKm9jZimAo5F15myZAN/1ZL554FxMz4kfCVf3C131SJx4+p4wLU4SteaHj hCqGXu47EIgdOI0Zx9mnkKUAie/1M1P1P1Ywe2yziROT3oe24YacU0+MI7xCjfMWU6ax O5ebLtwQaKRWLaWhDVryvJJRUAmc7QwBsloUKVh1lGhXEI80IP32YuO+/DXzd2slT2wK FiDQ== X-Gm-Message-State: AOJu0YxLLTipBsHjBGtKxA3VFpP/G9tUjTQKGd8DCk2G20deDHXuCMwQ 7N4A+lEYAnliAzBMN7yiriGqAWBbQE6xiWsddNaG/6ABeTyRrado1z1SuA== X-Google-Smtp-Source: AGHT+IEXHbtmPVxhP3EYALPgCpjCo3heE3IYTGkuTabytelVKPpnMfCWpK7mRbzrh28/2nIH466rlw== X-Received: by 2002:a17:907:961b:b0:a8a:5ff9:bcd1 with SMTP id a640c23a62f3a-a991bd0bbe2mr1496949166b.21.1728396739013; Tue, 08 Oct 2024 07:12:19 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a994c1c4c57sm353032966b.208.2024.10.08.07.12.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Oct 2024 07:12:18 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 08 Oct 2024 14:12:02 +0000 Subject: [PATCH 16/17] pack-objects: refactor path-walk delta phase Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Previously, the --path-walk option to 'git pack-objects' would compute deltas inline with the path-walk logic. This would make the progress indicator look like it is taking a long time to enumerate objects, and then very quickly computed deltas. Instead of computing deltas on each region of objects organized by tree, store a list of regions corresponding to these groups. These can later be pulled from the list for delta compression before doing the "global" delta search. This presents a new progress indicator that can be used in tests to verify that this stage is happening. The current implementation is not integrated with threads, but could be done in a future update. Since we do not attempt to sort objects by size until after exploring all trees, we can remove the previous change to t5530 due to a different error message appearing first. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 81 +++++++++++++++++++++++++----------- pack-objects.h | 12 ++++++ t/t5300-pack-object.sh | 8 +++- t/t5530-upload-pack-error.sh | 6 --- 4 files changed, 74 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6805a55c60d..5c413ac07e6 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3204,6 +3204,50 @@ static int should_attempt_deltas(struct object_entry *entry) return 1; } +static void find_deltas_for_region(struct object_entry *list UNUSED, + struct packing_region *region, + unsigned int *processed) +{ + struct object_entry **delta_list; + uint32_t delta_list_nr = 0; + + ALLOC_ARRAY(delta_list, region->nr); + for (uint32_t i = 0; i < region->nr; i++) { + struct object_entry *entry = to_pack.objects + region->start + i; + if (should_attempt_deltas(entry)) + delta_list[delta_list_nr++] = entry; + } + + QSORT(delta_list, delta_list_nr, type_size_sort); + find_deltas(delta_list, &delta_list_nr, window, depth, processed); + free(delta_list); +} + +static void find_deltas_by_region(struct object_entry *list, + struct packing_region *regions, + uint32_t start, uint32_t nr) +{ + unsigned int processed = 0; + uint32_t progress_nr; + + if (!nr) + return; + + progress_nr = regions[nr - 1].start + regions[nr - 1].nr; + + if (progress) + progress_state = start_progress(_("Compressing objects by path"), + progress_nr); + + while (nr--) + find_deltas_for_region(list, + ®ions[start++], + &processed); + + display_progress(progress_state, progress_nr); + stop_progress(&progress_state); +} + static void prepare_pack(int window, int depth) { struct object_entry **delta_list; @@ -3228,6 +3272,10 @@ static void prepare_pack(int window, int depth) if (!to_pack.nr_objects || !window || !depth) return; + if (path_walk) + find_deltas_by_region(to_pack.objects, to_pack.regions, + 0, to_pack.nr_regions); + ALLOC_ARRAY(delta_list, to_pack.nr_objects); nr_deltas = n = 0; @@ -4165,10 +4213,8 @@ static int add_objects_by_path(const char *path, enum object_type type, void *data) { - struct object_entry **delta_list; size_t oe_start = to_pack.nr_objects; size_t oe_end; - unsigned int sub_list_size; unsigned int *processed = data; /* @@ -4201,32 +4247,17 @@ static int add_objects_by_path(const char *path, if (oe_end == oe_start || !window) return 0; - sub_list_size = 0; - ALLOC_ARRAY(delta_list, oe_end - oe_start); + ALLOC_GROW(to_pack.regions, + to_pack.nr_regions + 1, + to_pack.nr_regions_alloc); - for (size_t i = 0; i < oe_end - oe_start; i++) { - struct object_entry *entry = to_pack.objects + oe_start + i; + to_pack.regions[to_pack.nr_regions].start = oe_start; + to_pack.regions[to_pack.nr_regions].nr = oe_end - oe_start; + to_pack.nr_regions++; - if (!should_attempt_deltas(entry)) - continue; + *processed += oids->nr; + display_progress(progress_state, *processed); - delta_list[sub_list_size++] = entry; - } - - /* - * Find delta bases among this list of objects that all match the same - * path. This causes the delta compression to be interleaved in the - * object walk, which can lead to confusing progress indicators. This is - * also incompatible with threaded delta calculations. In the future, - * consider creating a list of regions in the full to_pack.objects array - * that could be picked up by the threaded delta computation. - */ - if (sub_list_size && window) { - QSORT(delta_list, sub_list_size, type_size_sort); - find_deltas(delta_list, &sub_list_size, window, depth, processed); - } - - free(delta_list); return 0; } diff --git a/pack-objects.h b/pack-objects.h index b9898a4e64b..bde4ba19755 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -118,11 +118,23 @@ struct object_entry { unsigned ext_base:1; /* delta_idx points outside packlist */ }; +/** + * A packing region is a section of the packing_data.objects array + * as given by a starting index and a number of elements. + */ +struct packing_region { + uint32_t start; + uint32_t nr; +}; + struct packing_data { struct repository *repo; struct object_entry *objects; uint32_t nr_objects, nr_alloc; + struct packing_region *regions; + uint32_t nr_regions, nr_regions_alloc; + int32_t *index; uint32_t index_size; diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 5f6914acae7..4f81613eab1 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -677,7 +677,9 @@ done # Basic "repack everything" test test_expect_success '--path-walk pack everything' ' git -C server rev-parse HEAD >in && - git -C server pack-objects --stdout --revs --path-walk out.pack && + GIT_PROGRESS_DELAY=0 git -C server pack-objects \ + --stdout --revs --path-walk --progress out.pack 2>err && + grep "Compressing objects by path" err && git -C server index-pack --stdin out.pack && + GIT_PROGRESS_DELAY=0 git -C server pack-objects \ + --thin --stdout --revs --path-walk --progress out.pack 2>err && + grep "Compressing objects by path" err && git -C server index-pack --fix-thin --stdin input && - - # The current implementation of path-walk causes a different - # error message. This will be changed by a future refactoring. - GIT_TEST_PACK_PATH_WALK=0 && - export GIT_TEST_PACK_PATH_WALK && - test_must_fail git upload-pack . /dev/null 2>output.err && test_grep "unable to read" output.err && test_grep "pack-objects died" output.err