From patchwork Sun Aug 14 16:55:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E2FCC25B06 for ; Sun, 14 Aug 2022 17:00:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241858AbiHNRAN (ORCPT ); Sun, 14 Aug 2022 13:00:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241540AbiHNQ7p (ORCPT ); Sun, 14 Aug 2022 12:59:45 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BA4E62CA for ; Sun, 14 Aug 2022 09:55:17 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id az6-20020a05600c600600b003a530cebbe3so2908830wmb.0 for ; Sun, 14 Aug 2022 09:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=OZX76/OBWtsYaD2plHm2WdewtzV+QcpnjZ+i1Lu1Vaw=; b=Rn6FmiX2lPbftJFdD3AfyTnk+UaV50uiCeUhtgc00sAH9jSU/rwCFnHpLAqNBw6+MK bc4YmlptDw7Ptrt/TJuMY4Vu+8ZS8o2DrpH/+1Bq3Z5y4wy3vVSp7DkrOxDltLHbzA78 6M9q444+ODtLpX/m935cJ6uBJUSssPcMZotyWuem6/wiL2A9jsJVi3m08y5b2h8b+7YJ 0VEVhncpkN7Qn4SYR/7JrKC2czE6fbgSlzD+f6oeUp0aksudUUwJ/P+BRMLedOkqhJkL pJftK/zImn/B7kjNgOiR7DZB1vznx3Wh7i1Eksxm1cumT8q1ApHngOsqFj06bigqBsVE 2IHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=OZX76/OBWtsYaD2plHm2WdewtzV+QcpnjZ+i1Lu1Vaw=; b=hTxZZSYE7oHg+Yuzg6A7WNys+/6V30DP8uI1DvbTj28HrOTYhaYfHElOJvq/4cNxL8 8eSgVO/B9QGWYAZJPp3rIiZneIlRBWYTDInYFJXGu43ZcT/dr1pVpxNeD98fZJqTG9Aw 0aGR1Ig+fMJgZmZHZeqCFRgaYm9EYKvxjV7MXCbfC55fxRhijyOEUUJRC/t7Og4laHu3 0EWwG8cp/GSKaCLT5HYM0IdgtxzYN2zsdvwFMd8mLr1I5mALdfXuBdhO6XhgMYU6HPXp Aw/Vld6Mzv5Kwd1txdoSmAQp742zGnObg0eKaSm2jNq2IDiDIPznQaEe0a4xLJeEQYbe 24Sg== X-Gm-Message-State: ACgBeo2cgTSb0rF6kzpidqyIQyO11vzWKR02Ziyr7jA4pKIfNL6b1fB0 ERIX3ntTHp1pEmJmpVSe3PDThNtwhmA= X-Google-Smtp-Source: AA6agR4ifwpW0Do1i0YbwjywGw16u0aoQkSSlrPKAvRrmOMZpXgG4REEcemY1PEmD6jKx2IM08GV2g== X-Received: by 2002:a1c:7512:0:b0:3a4:f09d:1cf7 with SMTP id o18-20020a1c7512000000b003a4f09d1cf7mr14093035wmc.67.1660496115853; Sun, 14 Aug 2022 09:55:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t16-20020adfdc10000000b0021e5f32ade7sm4920096wri.68.2022.08.14.09.55.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:14 -0700 (PDT) Message-Id: <67b71be8c85a44b21c3181a9e9532d5dc3f81668.1660496112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:06 +0000 Subject: [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty When reading bitmap file, Git loads each and every bitmap one by one even if all the bitmaps are not required. A "bitmap lookup table" extension to the bitmap format can reduce the overhead of loading bitmaps which stores a list of bitmapped commit id pos (in the midx or pack, along with their offset and xor offset. This way Git can load only the necessary bitmaps without loading the previous bitmaps. Older versions of Git ignore the lookup table extension and don't throw any kind of warning or error while parsing the bitmap file. Add some information for the new "bitmap lookup table" extension in the bitmap-format documentation. Mentored-by: Taylor Blau Co-Mentored-by: Kaartic Sivaraam Co-Authored-by: Taylor Blau Signed-off-by: Abhradeep Chakraborty --- Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index a85f58f5153..c2e652b71a7 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -72,6 +72,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required. pack/MIDX. The format and meaning of the name-hash is described below. + ** {empty} + BITMAP_OPT_LOOKUP_TABLE (0x10): ::: + If present, the end of the bitmap file contains a table + containing a list of `N` + triplets. The format and meaning of the table is described + below. ++ +NOTE: Unlike the xor_offset used to compress an individual bitmap, +`xor_row` stores an *absolute* index into the lookup table, not a location +relative to the current entry. + 4-byte entry count (network byte order): :: The total count of entries (bitmapped commits) in this bitmap index. @@ -216,3 +227,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag. If implementations want to choose a different hashing scheme, they are free to do so, but MUST allocate a new header flag (because comparing hashes made under two different schemes would be pointless). + +Commit lookup table +------------------- + +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)` +bytes (preceding the name-hash cache and trailing hash) of the `.bitmap` +file contains a lookup table specifying the information needed to get +the desired bitmap from the entries without parsing previous unnecessary +bitmaps. + +For a `.bitmap` containing `nr_entries` reachability bitmaps, the table +contains a list of `nr_entries` triplets +(sorted in the ascending order of `commit_pos`). The content of i'th +triplet is - + + * {empty} + commit_pos (4 byte integer, network byte order): :: + It stores the object position of a commit (in the midx or pack + index). + + * {empty} + offset (8 byte integer, network byte order): :: + The offset from which that commit's bitmap can be read. + + * {empty} + xor_row (4 byte integer, network byte order): :: + The position of the triplet whose bitmap is used to compress + this one, or `0xffffffff` if no such bitmap exists. From patchwork Sun Aug 14 16:55:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAC9CC25B06 for ; Sun, 14 Aug 2022 17:00:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241673AbiHNRAH (ORCPT ); Sun, 14 Aug 2022 13:00:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241569AbiHNQ7p (ORCPT ); Sun, 14 Aug 2022 12:59:45 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC89B62DF for ; Sun, 14 Aug 2022 09:55:18 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id h13so6630315wrf.6 for ; Sun, 14 Aug 2022 09:55:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=Fir1tozSe6riEGrxncc2/FDO3x0dsVejSnmsGJfL5Vs=; b=jtjYkriGp3hHCBqMZRZaB2Tqqy/EWQUQN43ORlztEjOxsoVQ9wizLGQwnLsgTAgA4D WUw0rImYq0Pg5QkXUnvHzy+0ey6qWaunmx9dQKsWF5gEjm3rh7Hll00BXTkbbITZB4ch 4rx/fI1jhyP8oKzwWWmQDU1vjF26xVKU/+8utdnNRGYqX5Z70adPq8OUby6Sz1lxi3a4 bFH5zSOxCk+EDRjxQG9KnWxdDYXB3JgGTxxUHpRGFuEZRZZYTww5hJ/5r0OcgjUAXTTF jGjvYuI3jmThL94THuAJNUAu2GU/gDdTl1QX9N/0qvlLeQpfpOiKo43Tb51++cINH8Fq qYKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=Fir1tozSe6riEGrxncc2/FDO3x0dsVejSnmsGJfL5Vs=; b=AO6NT3dfs++EG7ZBygQqTZrBzzQrpGa8XTkDxe4tFyNwwwksg+5uDhZAf1NgmF0dQD u153MzieQR9eEN5hj2MtZPDk324LnELeu5Chz5eQG9BQXrhMfCSIpHKcGHRUKJiUN8wv fmxBLGvvpAGmvZCXwYXBd7asHgifjv2hcWz2LQHP+6g1goCfhqu/LO0eqbc/Ho8pkB5W Z4g5Gpdj8JUX29IFbYYs1+OJykt+07dmTz2IyOFSbAQorc+4h254jgdcfERPp9jCSyHS +yzhJQ3v602iRVWVoMU4KN0k86BxCoCqavBUKLJumva5rIJHRj34P4FnKfzBWc8KgsRC qxdg== X-Gm-Message-State: ACgBeo1ZOwgFGWSM4/0mt2TqbEAxgiq+jK/oWC72RZGwJERB14wcI1DF NreUb4v2p8aT9dW7wErcLnkSpFJG2Vc= X-Google-Smtp-Source: AA6agR6dBUZeVGJXDtSoZONBii+En4nIJ8YuBQ4Q8S2riU8rGEMQvmNWHy5HSVv8CTtKxTnzso9BOw== X-Received: by 2002:adf:e28d:0:b0:21e:4c3b:b446 with SMTP id v13-20020adfe28d000000b0021e4c3bb446mr6408233wri.300.1660496117015; Sun, 14 Aug 2022 09:55:17 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x18-20020adff0d2000000b0021e45afa7b0sm4871786wro.109.2022.08.14.09.55.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:16 -0700 (PDT) Message-Id: <92ca58fbeeb0ac74a411fc2e67fcbceccc819883.1660496112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:07 +0000 Subject: [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty The `write_selected_commits_v1` function takes care of writing commit positions along with their corresponding bitmaps in the disk. It is OK because this `search commit position of a given commit` algorithm is needed only once here. But in later changes of the `lookup table extension series`, we need same commit positions which means we have to run the above mentioned algorithm one more time. Move the `search commit position of a given commit` algorithm to `bitmap_writer_finish()` and use the `commit_positions` array to get commit positions of their corresponding bitmaps. Signed-off-by: Abhradeep Chakraborty --- pack-bitmap-write.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 4fcfaed428f..9b1be59f6d3 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -650,20 +650,15 @@ static const struct object_id *oid_access(size_t pos, const void *table) static void write_selected_commits_v1(struct hashfile *f, struct pack_idx_entry **index, - uint32_t index_nr) + uint32_t index_nr, + uint32_t *commit_positions) { int i; for (i = 0; i < writer.selected_nr; ++i) { struct bitmapped_commit *stored = &writer.selected[i]; - int commit_pos = - oid_pos(&stored->commit->object.oid, index, index_nr, oid_access); - - if (commit_pos < 0) - BUG("trying to write commit not in index"); - - hashwrite_be32(f, commit_pos); + hashwrite_be32(f, commit_positions[i]); hashwrite_u8(f, stored->xor_offset); hashwrite_u8(f, stored->flags); @@ -697,6 +692,8 @@ void bitmap_writer_finish(struct pack_idx_entry **index, static uint16_t flags = BITMAP_OPT_FULL_DAG; struct strbuf tmp_file = STRBUF_INIT; struct hashfile *f; + uint32_t *commit_positions = NULL; + uint32_t i; struct bitmap_disk_header header; @@ -715,7 +712,20 @@ void bitmap_writer_finish(struct pack_idx_entry **index, dump_bitmap(f, writer.trees); dump_bitmap(f, writer.blobs); dump_bitmap(f, writer.tags); - write_selected_commits_v1(f, index, index_nr); + + ALLOC_ARRAY(commit_positions, writer.selected_nr); + + for (i = 0; i < writer.selected_nr; i++) { + struct bitmapped_commit *stored = &writer.selected[i]; + int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access); + + if (commit_pos < 0) + BUG(_("trying to write commit not in index")); + + commit_positions[i] = commit_pos; + } + + write_selected_commits_v1(f, index, index_nr, commit_positions); if (options & BITMAP_OPT_HASH_CACHE) write_hash_cache(f, index, index_nr); @@ -730,4 +740,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index, die_errno("unable to rename temporary bitmap file to '%s'", filename); strbuf_release(&tmp_file); + free(commit_positions); } From patchwork Sun Aug 14 16:55:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00F03C282E7 for ; Sun, 14 Aug 2022 17:00:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241636AbiHNRAP (ORCPT ); Sun, 14 Aug 2022 13:00:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241836AbiHNQ7p (ORCPT ); Sun, 14 Aug 2022 12:59:45 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA89B6349 for ; Sun, 14 Aug 2022 09:55:19 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id e27so1916827wra.11 for ; Sun, 14 Aug 2022 09:55:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=9T1CPmxOfBodrz8wPGdorUScuh/xHb6POfc+zCI05DA=; b=pfRTVjRH32Wp0Ns8JCFnmOjltxReJZUr5l8I/Pt4dIeePJfbJvjfIjS6B3EPb3KD22 rgRq0aRIFgNcEnWnLn7OTg/DooudcgWVsH/YjdLEzNiTeBeBM+HHIhJJDmaZa30s7TAx gZqxnTEanUHPNPBWcH46qpHS4Zp8Dp3OZiUngII+dr5y8sti347NkFgUx7w6/QEsoUnr FL/9FWdMzQTl58rikBb8pp+5Qi75cVieJ+I3JtQqU/0abFAWz969ZH1vd6ZpS/mtA1FL dUgwvgu451Nde7X6PMI87RIk1NgL/qycOT7MOrhVMXKfQcRqDFimE2Gin3vKNQwvcHj8 qKcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=9T1CPmxOfBodrz8wPGdorUScuh/xHb6POfc+zCI05DA=; b=77haeYf8g6tHsCe85s29972EPd1Bm2VtkHYXDw4AT8YGTc4L3Zff7CbQzA2fnrizf9 T00bnudd/mTMPbhxqVTxq9dKToVpfqgfkLwyUMV6+H/XaGC27n4EShJpdQtS4vdl2pCj jXhh7wAJs9SRVh0FbXx5ZewT757xlYfRYpus11UR5CxnMlo31ArIIv04TFwLjIS2RV/G YjLRHOpbHgePPGovo4ks/UWBaJ99LreTpCxY7yKwwTAR+Ecra/KV3bzAXtC1vz0g4Ygu EN/iOhaZa1b131RWHv59A/59whVatoZHMnyKX2ctKaEreP13ubZHfPbjN6MnWYpbY75j +Rhg== X-Gm-Message-State: ACgBeo0RAKfQH8BSh3fKGViGYO/YImlWS1HPCFcV1heg3Q7DD7LlVG3v QPfa/kHBYafdthSyh+8Qbn0YuvDaisc= X-Google-Smtp-Source: AA6agR6vvEsK0VD8ISVrDA3YKwgc9gTiZThBJGW2bHgwzInLlQQn/GbiPjomsbXj3pEGTRxLq9jzFg== X-Received: by 2002:a5d:4b03:0:b0:220:6b87:8f0f with SMTP id v3-20020a5d4b03000000b002206b878f0fmr6743588wrq.534.1660496118063; Sun, 14 Aug 2022 09:55:18 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q14-20020a05600000ce00b0021ee28ff76esm4942218wrx.38.2022.08.14.09.55.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:17 -0700 (PDT) Message-Id: <090becaabe0c47fb5fed4ec5ce7628deeafeded1.1660496112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:08 +0000 Subject: [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty The bitmap lookup table extension was documented by an earlier change, but Git does not yet know how to write that extension. Teach Git to write bitmap lookup table extension. The table contains the list of `N` ` triplets. These triplets are sorted according to their commit pos (ascending order). The meaning of each data in the i'th triplet is given below: - commit_pos stores commit position (in the pack-index or midx). It is a 4 byte network byte order unsigned integer. - offset is the position (in the bitmap file) from which that commit's bitmap can be read. - xor_row is the position of the triplet in the lookup table whose bitmap is used to compress this bitmap, or `0xffffffff` if no such bitmap exists. Mentored-by: Taylor Blau Co-mentored-by: Kaartic Sivaraam Co-authored-by: Taylor Blau Signed-off-by: Abhradeep Chakraborty --- pack-bitmap-write.c | 91 ++++++++++++++++++++++++++++++++++++++++++++- pack-bitmap.h | 5 ++- 2 files changed, 92 insertions(+), 4 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 9b1be59f6d3..2cfc92f2871 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -651,13 +651,17 @@ static const struct object_id *oid_access(size_t pos, const void *table) static void write_selected_commits_v1(struct hashfile *f, struct pack_idx_entry **index, uint32_t index_nr, - uint32_t *commit_positions) + uint32_t *commit_positions, + off_t *offsets) { int i; for (i = 0; i < writer.selected_nr; ++i) { struct bitmapped_commit *stored = &writer.selected[i]; + if (offsets) + offsets[i] = hashfile_total(f); + hashwrite_be32(f, commit_positions[i]); hashwrite_u8(f, stored->xor_offset); hashwrite_u8(f, stored->flags); @@ -666,6 +670,81 @@ static void write_selected_commits_v1(struct hashfile *f, } } +static int table_cmp(const void *_va, const void *_vb, void *_data) +{ + uint32_t *commit_positions = _data; + uint32_t a = commit_positions[*(uint32_t *)_va]; + uint32_t b = commit_positions[*(uint32_t *)_vb]; + + if (a > b) + return 1; + else if (a < b) + return -1; + + return 0; +} + +static void write_lookup_table(struct hashfile *f, + struct pack_idx_entry **index, + uint32_t index_nr, + uint32_t *commit_positions, + off_t *offsets) +{ + uint32_t i; + uint32_t *table, *table_inv; + + ALLOC_ARRAY(table, writer.selected_nr); + ALLOC_ARRAY(table_inv, writer.selected_nr); + + for (i = 0; i < writer.selected_nr; i++) + table[i] = i; + + /* + * At the end of this sort table[j] = i means that the i'th + * bitmap corresponds to j'th bitmapped commit (among the selected + * commits) in lex order of OIDs. + */ + QSORT_S(table, writer.selected_nr, table_cmp, commit_positions); + + /* table_inv helps us discover that relationship (i'th bitmap + * to j'th commit by j = table_inv[i]) + */ + for (i = 0; i < writer.selected_nr; i++) + table_inv[table[i]] = i; + + trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository); + for (i = 0; i < writer.selected_nr; i++) { + struct bitmapped_commit *selected = &writer.selected[table[i]]; + uint32_t xor_offset = selected->xor_offset; + uint32_t xor_row; + + if (xor_offset) { + /* + * xor_index stores the index (in the bitmap entries) + * of the corresponding xor bitmap. But we need to convert + * this index into lookup table's index. So, table_inv[xor_index] + * gives us the index position w.r.t. the lookup table. + * + * If "k = table[i] - xor_offset" then the xor base is the k'th + * bitmap. `table_inv[k]` gives us the position of that bitmap + * in the lookup table. + */ + uint32_t xor_index = table[i] - xor_offset; + xor_row = table_inv[xor_index]; + } else { + xor_row = 0xffffffff; + } + + hashwrite_be32(f, commit_positions[table[i]]); + hashwrite_be64(f, (uint64_t)offsets[table[i]]); + hashwrite_be32(f, xor_row); + } + trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository); + + free(table); + free(table_inv); +} + static void write_hash_cache(struct hashfile *f, struct pack_idx_entry **index, uint32_t index_nr) @@ -693,6 +772,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index, struct strbuf tmp_file = STRBUF_INIT; struct hashfile *f; uint32_t *commit_positions = NULL; + off_t *offsets = NULL; uint32_t i; struct bitmap_disk_header header; @@ -713,6 +793,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index, dump_bitmap(f, writer.blobs); dump_bitmap(f, writer.tags); + if (options & BITMAP_OPT_LOOKUP_TABLE) + CALLOC_ARRAY(offsets, index_nr); + ALLOC_ARRAY(commit_positions, writer.selected_nr); for (i = 0; i < writer.selected_nr; i++) { @@ -725,7 +808,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index, commit_positions[i] = commit_pos; } - write_selected_commits_v1(f, index, index_nr, commit_positions); + write_selected_commits_v1(f, index, index_nr, commit_positions, offsets); + + if (options & BITMAP_OPT_LOOKUP_TABLE) + write_lookup_table(f, index, index_nr, commit_positions, offsets); if (options & BITMAP_OPT_HASH_CACHE) write_hash_cache(f, index, index_nr); @@ -741,4 +827,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index, strbuf_release(&tmp_file); free(commit_positions); + free(offsets); } diff --git a/pack-bitmap.h b/pack-bitmap.h index f3a57ca065f..cb065a263cb 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -24,8 +24,9 @@ struct bitmap_disk_header { #define NEEDS_BITMAP (1u<<22) enum pack_bitmap_opts { - BITMAP_OPT_FULL_DAG = 1, - BITMAP_OPT_HASH_CACHE = 4, + BITMAP_OPT_FULL_DAG = 0x1, + BITMAP_OPT_HASH_CACHE = 0x4, + BITMAP_OPT_LOOKUP_TABLE = 0x10, }; enum pack_bitmap_flags { From patchwork Sun Aug 14 16:55:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E5EAC25B06 for ; Sun, 14 Aug 2022 17:00:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241723AbiHNRAV (ORCPT ); Sun, 14 Aug 2022 13:00:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241900AbiHNQ7q (ORCPT ); Sun, 14 Aug 2022 12:59:46 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E82B16450 for ; Sun, 14 Aug 2022 09:55:21 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id h13so6630402wrf.6 for ; Sun, 14 Aug 2022 09:55:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=Khfczqb8FeCPc8s1/Pk5pADD4fZNIu98MwdUf2wIkq8=; b=JjFapAk325PVFr2AiY6dYO3W2Z0atrWDw22XnG1UzvvMonHbmfQalZ7n1srLAxnvXq kx0VPuF4p2IKtUXxIz2Xnm1HQf5veSHWqWIQUv59EMipjVbl+R2l2alRt5nCtmzM9V0/ AvD6NOvSetLlCxCybPZaUrkHKQCOfxMNPC6fD9WhOEYWaBHbYUj3gARf1e3uSm+eppl6 Qu3dTgLD5QxClfiWGL/qEluPuhkyWLfIiGkvKy4z5pyeC5FC5IGYb8TPdFkWqIsiL543 hXjED4ueOyDfM5gDC0jtuXK63Uphnfche++cVZ9YtxabzuHMfzahK1vPvBstMSQLqC/q SMFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=Khfczqb8FeCPc8s1/Pk5pADD4fZNIu98MwdUf2wIkq8=; b=B8+GQBu4ght1kouN5whadyx99VFU1wJMPYTDmhlBowtW1Oo508UNtwGomMNe4rwoFs unuKWsblYuTl4YobKOS9OZNK+8VUJgBMXczsavuOy4kSyZ2i13z2h8dlponk/dpK78kz MNVBvOGbytFA3SPzqqvXP8Hu4QP5cPqHRuWZgVM6GrvDZuDpK4aUYWMRhv2ENJ/ceipi 6yU74jw3UxeqZ4LRzL+ebQYS4YybSqfylJOKKrRbsG2fWDhCFfPv1kNrE4WnbdqIoFJ1 /aJTo31GY/YyntX5M5NlbCfxaQu1o7rdhzW02cWQvN5dgvUrhllhDlq9+unWr9tdGGhz mRFg== X-Gm-Message-State: ACgBeo2qd5CvQFwNoiTfH8Rf0An1ZNdaXOpxJdZiosuzQo5e8kb9ZWR6 vLs3g67pZ8U6BKhCYjHPqLXW5Bgq2/w= X-Google-Smtp-Source: AA6agR5oHdOJSi12gEYWOXx+neT5a59krjeMlSvJM2w+KQAEmU9Pw5AjzieZfHKbvbovANbxOHlVtw== X-Received: by 2002:a05:6000:1051:b0:221:6d46:4f0f with SMTP id c17-20020a056000105100b002216d464f0fmr6554164wrx.163.1660496119583; Sun, 14 Aug 2022 09:55:19 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b6-20020a05600c4e0600b003a5f4fccd4asm1183241wmq.35.2022.08.14.09.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:18 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:09 +0000 Subject: [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty Teach Git to provide a way for users to enable/disable bitmap lookup table extension by providing a config option named 'writeBitmapLookupTable'. Default is false. Also add test to verify writting of lookup table. Mentored-by: Taylor Blau Co-Mentored-by: Kaartic Sivaraam Co-Authored-by: Taylor Blau Signed-off-by: Abhradeep Chakraborty --- Documentation/config/pack.txt | 7 + builtin/multi-pack-index.c | 7 + builtin/pack-objects.c | 8 + midx.c | 3 + midx.h | 1 + t/t5310-pack-bitmaps.sh | 792 ++++++++++++++++-------------- t/t5311-pack-bitmaps-shallow.sh | 53 +- t/t5326-multi-pack-bitmaps.sh | 421 +++++++++------- t/t5327-multi-pack-bitmaps-rev.sh | 24 +- 9 files changed, 733 insertions(+), 583 deletions(-) diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index ad7f73a1ead..b955ca572ec 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are computed; instead, any namehashes stored in an existing bitmap are permuted into their appropriate location when writing a new bitmap. +pack.writeBitmapLookupTable:: + When true, Git will include a "lookup table" section in the + bitmap index (if one is written). This table is used to defer + loading individual bitmaps as late as possible. This can be + beneficial in repositories that have relatively large bitmap + indexes. Defaults to false. + pack.writeReverseIndex:: When true, git will write a corresponding .rev file (see: link:../technical/pack-format.html[Documentation/technical/pack-format.txt]) diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 8f24d59a753..e7cce1d26ee 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value, opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; } + if (!strcmp(var, "pack.writebitmaplookuptable")) { + if (git_config_bool(var, value)) + opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE; + else + opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE; + } + /* * We should never make a fall-back call to 'git_default_config', since * this was already called in 'cmd_multi_pack_index()'. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 39e28cfcafc..46e26774963 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb) else write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } + + if (!strcmp(k, "pack.writebitmaplookuptable")) { + if (git_config_bool(k, v)) + write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE; + else + write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE; + } + if (!strcmp(k, "pack.usebitmaps")) { use_bitmap_index_default = git_config_bool(k, v); return 0; diff --git a/midx.c b/midx.c index 4e956cacb71..3ff6e91e6ee 100644 --- a/midx.c +++ b/midx.c @@ -1070,6 +1070,9 @@ static int write_midx_bitmap(const char *midx_name, if (flags & MIDX_WRITE_BITMAP_HASH_CACHE) options |= BITMAP_OPT_HASH_CACHE; + if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE) + options |= BITMAP_OPT_LOOKUP_TABLE; + /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of diff --git a/midx.h b/midx.h index 22e8e53288e..5578cd7b835 100644 --- a/midx.h +++ b/midx.h @@ -47,6 +47,7 @@ struct multi_pack_index { #define MIDX_WRITE_REV_INDEX (1 << 1) #define MIDX_WRITE_BITMAP (1 << 2) #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3) +#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4) const unsigned char *get_midx_checksum(struct multi_pack_index *m); void get_midx_filename(struct strbuf *out, const char *object_dir); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index f775fc1ce69..c0607172827 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -26,22 +26,413 @@ has_any () { grep -Ff "$1" "$2" } -setup_bitmap_history - -test_expect_success 'setup writing bitmaps during repack' ' - git config repack.writeBitmaps true -' - -test_expect_success 'full repack creates bitmaps' ' - GIT_TRACE2_EVENT="$(pwd)/trace" \ +test_bitmap_cases () { + writeLookupTable=false + for i in "$@" + do + case "$i" in + "pack.writeBitmapLookupTable") writeLookupTable=true;; + esac + done + + test_expect_success 'setup test repository' ' + rm -fr * .git && + git init && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' + ' + setup_bitmap_history + + test_expect_success 'setup writing bitmaps during repack' ' + git config repack.writeBitmaps true + ' + + test_expect_success 'full repack creates bitmaps' ' + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git repack -ad && + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && + grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace + ' + + basic_bitmap_tests + + test_expect_success 'pack-objects respects --local (non-local loose)' ' + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + list_packed_objects 1.idx >1.objects && + printf "%s\n" "$altblob" "$blob" >nonlocal-loose && + ! has_any nonlocal-loose 1.objects + ' + + test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + blob2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && + pack2=$(git pack-objects pack2 .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $blob2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + list_packed_objects 2a.idx >2a.objects && + ! has_any keepobjects 2a.objects + ' + + test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + list_packed_objects 2b.idx >2b.objects && + ! has_any keepobjects 2b.objects + ' + + test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && + >.git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + list_packed_objects 3a.idx >3a.objects && + ! has_any packbitmap.objects 3a.objects + ' + + test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && + rm -f .git/objects/pack/multi-pack-index && + test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + list_packed_objects 3b.idx >3b.objects && + ! has_any packbitmap.objects 3b.objects + ' + + test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa packa.objects && + list_packed_objects packb-$packbsha1.idx >packb.objects && + test_cmp packa.objects packb.objects + ' + + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && - ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output && - grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && - grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace -' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output + ' + + test_expect_success 'fetch (full bitmap)' ' + git --git-dir=clone.git fetch origin second:second && + git rev-parse HEAD >expect && + git --git-dir=clone.git rev-parse HEAD >actual && + test_cmp expect actual + ' + + test_expect_success 'create objects for missing-HAVE tests' ' + blob=$(echo "missing have" | git hash-object -w --stdin) && + tree=$(printf "100644 blob $blob\tfile\n" | git mktree) && + parent=$(echo parent | git commit-tree $tree) && + commit=$(echo commit | git commit-tree $tree -p $parent) && + cat >revs <<-EOF + HEAD + ^HEAD^ + ^$commit + EOF + ' + + test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs 4.pack && + git index-pack 4.pack && + list_packed_objects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + test_cmp 4.objects objects + ' + + test_expect_success 'pack with missing blob' ' + rm $(objpath $blob) && + git pack-objects --stdout --revs /dev/null + ' + + test_expect_success 'pack with missing tree' ' + rm $(objpath $tree) && + git pack-objects --stdout --revs /dev/null + ' + + test_expect_success 'pack with missing parent' ' + rm $(objpath $parent) && + git pack-objects --stdout --revs /dev/null + ' + + test_expect_success JGIT,SHA1 'we can read jgit bitmaps' ' + git clone --bare . compat-jgit.git && + ( + cd compat-jgit.git && + rm -f objects/pack/*.bitmap && + jgit gc && + git rev-list --test-bitmap HEAD + ) + ' + + test_expect_success JGIT,SHA1 'jgit can read our bitmaps' ' + git clone --bare . compat-us.git && + ( + cd compat-us.git && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && + git repack -adb && + # jgit gc will barf if it does not like our bitmaps + jgit gc + ) + ' + + test_expect_success 'splitting packs does not generate bogus bitmaps' ' + test-tool genrandom foo $((1024 * 1024)) >rand && + git add rand && + git commit -m "commit with big file" && + git -c pack.packSizeLimit=500k repack -adb && + git init --bare no-bitmaps.git && + git -C no-bitmaps.git fetch .. HEAD + ' + + test_expect_success 'set up reusable pack' ' + rm -f .git/objects/pack/*.keep && + git repack -adb && + reusable_pack () { + git for-each-ref --format="%(objectname)" | + git pack-objects --delta-base-offset --revs --stdout "$@" + } + ' + + test_expect_success 'pack reuse respects --honor-pack-keep' ' + test_when_finished "rm -f .git/objects/pack/*.keep" && + for i in .git/objects/pack/*.pack + do + >${i%.pack}.keep || return 1 + done && + reusable_pack --honor-pack-keep >empty.pack && + git index-pack empty.pack && + git show-index actual && + test_must_be_empty actual + ' + + test_expect_success 'pack reuse respects --local' ' + mv .git/objects/pack/* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" && + reusable_pack --local >empty.pack && + git index-pack empty.pack && + git show-index actual && + test_must_be_empty actual + ' + + test_expect_success 'pack reuse respects --incremental' ' + reusable_pack --incremental >empty.pack && + git index-pack empty.pack && + git show-index actual && + test_must_be_empty actual + ' + + test_expect_success 'truncated bitmap fails gracefully (ewah)' ' + test_config pack.writebitmaphashcache false && + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 256 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupt.ewah.bitmap stderr + ' + + test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr + ' + + # Create a state of history with these properties: + # + # - refs that allow a client to fetch some new history, while sharing some old + # history with the server; we use branches delta-reuse-old and + # delta-reuse-new here + # + # - the new history contains an object that is stored on the server as a delta + # against a base that is in the old history + # + # - the base object is not immediately reachable from the tip of the old + # history; finding it would involve digging down through history we know the + # other side has + # + # This should result in a state where fetching from old->new would not + # traditionally reuse the on-disk delta (because we'd have to dig to realize + # that the client has it), but we will do so if bitmaps can tell us cheaply + # that the other side has it. + test_expect_success 'set up thin delta-reuse parent' ' + # This first commit contains the buried base object. + test-tool genrandom delta 16384 >file && + git add file && + git commit -m "delta base" && + base=$(git rev-parse --verify HEAD:file) && + + # These intermediate commits bury the base back in history. + # This becomes the "old" state. + for i in 1 2 3 4 5 + do + echo $i >file && + git commit -am "intermediate $i" || return 1 + done && + git branch delta-reuse-old && + + # And now our new history has a delta against the buried base. Note + # that this must be smaller than the original file, since pack-objects + # prefers to create deltas from smaller objects to larger. + test-tool genrandom delta 16300 >file && + git commit -am "delta result" && + delta=$(git rev-parse --verify HEAD:file) && + git branch delta-reuse-new && + + # Repack with bitmaps and double check that we have the expected delta + # relationship. + git repack -adb && + have_delta $delta $base + ' + + # Now we can sanity-check the non-bitmap behavior (that the server is not able + # to reuse the delta). This isn't strictly something we care about, so this + # test could be scrapped in the future. But it makes sure that the next test is + # actually triggering the feature we want. + # + # Note that our tools for working with on-the-wire "thin" packs are limited. So + # we actually perform the fetch, retain the resulting pack, and inspect the + # result. + test_expect_success 'fetch without bitmaps ignores delta against old base' ' + test_config pack.usebitmaps false && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $ZERO_OID + ) + ' + + # And do the same for the bitmap case, where we do expect to find the delta. + test_expect_success 'fetch with bitmaps can reuse old base' ' + test_config pack.usebitmaps true && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $base + ) + ' + + test_expect_success 'pack.preferBitmapTips' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && + + # create enough commits that not all are receive bitmap + # coverage even if they are all at the tip of some reference. + test_commit_bulk --message="%s" 103 && + + git rev-list HEAD >commits.raw && + sort commits && + + git log --format="create refs/tags/%s %H" HEAD >refs && + git update-ref --stdin bitmaps && + + # remember which commits did not receive bitmaps + comm -13 bitmaps commits >before && + test_file_not_empty before && + + # mark the commits which did not receive bitmaps as preferred, + # and generate the bitmap again + perl -pe "s{^}{create refs/tags/include/$. }" bitmaps && + comm -13 bitmaps commits >after && + + ! test_cmp before after + ) + ' + + test_expect_success 'complains about multiple pack bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && + + test_commit base && + + git repack -adb && + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" && + mv "$bitmap" "$bitmap.bak" && + + test_commit other && + git repack -ab && + + mv "$bitmap.bak" "$bitmap" && + + find .git/objects/pack -type f -name "*.pack" >packs && + find .git/objects/pack -type f -name "*.bitmap" >bitmaps && + test_line_count = 2 packs && + test_line_count = 2 bitmaps && + + git rev-list --use-bitmap-index HEAD 2>err && + grep "ignoring extra bitmap file" err + ) + ' +} -basic_bitmap_tests +test_bitmap_cases test_expect_success 'incremental repack fails when bitmaps are requested' ' test_commit more-1 && @@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' -test_expect_success 'pack-objects respects --local (non-local loose)' ' - git init --bare alt.git && - echo $(pwd)/alt.git/objects >.git/objects/info/alternates && - echo content1 >file1 && - # non-local loose object which is not present in bitmapped pack - altblob=$(GIT_DIR=alt.git git hash-object -w file1) && - # non-local loose object which is also present in bitmapped pack - git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && - git add file1 && - test_tick && - git commit -m commit_file1 && - echo HEAD | git pack-objects --local --stdout --revs >1.pack && - git index-pack 1.pack && - list_packed_objects 1.idx >1.objects && - printf "%s\n" "$altblob" "$blob" >nonlocal-loose && - ! has_any nonlocal-loose 1.objects -' - -test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' - echo content2 >file2 && - blob2=$(git hash-object -w file2) && - git add file2 && - test_tick && - git commit -m commit_file2 && - printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && - pack2=$(git pack-objects pack2 .git/objects/pack/pack2-$pack2.keep && - rm $(objpath $blob2) && - echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && - git index-pack 2a.pack && - list_packed_objects 2a.idx >2a.objects && - ! has_any keepobjects 2a.objects -' - -test_expect_success 'pack-objects respects --local (non-local pack)' ' - mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && - echo HEAD | git pack-objects --local --stdout --revs >2b.pack && - git index-pack 2b.pack && - list_packed_objects 2b.idx >2b.objects && - ! has_any keepobjects 2b.objects -' - -test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' - ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output && - packbitmap=$(basename $(cat output) .bitmap) && - list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && - test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && - >.git/objects/pack/$packbitmap.keep && - echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && - git index-pack 3a.pack && - list_packed_objects 3a.idx >3a.objects && - ! has_any packbitmap.objects 3a.objects -' - -test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' - mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && - rm -f .git/objects/pack/multi-pack-index && - test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && - echo HEAD | git pack-objects --local --stdout --revs >3b.pack && - git index-pack 3b.pack && - list_packed_objects 3b.idx >3b.objects && - ! has_any packbitmap.objects 3b.objects -' - -test_expect_success 'pack-objects to file can use bitmap' ' - # make sure we still have 1 bitmap index from previous tests - ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output && - # verify equivalent packs are generated with/without using bitmap index - packasha1=$(git pack-objects --no-use-bitmap-index --all packa packa.objects && - list_packed_objects packb-$packbsha1.idx >packb.objects && - test_cmp packa.objects packb.objects -' - -test_expect_success 'full repack, reusing previous bitmaps' ' - git repack -ad && - ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output -' - -test_expect_success 'fetch (full bitmap)' ' - git --git-dir=clone.git fetch origin second:second && - git rev-parse HEAD >expect && - git --git-dir=clone.git rev-parse HEAD >actual && - test_cmp expect actual -' - -test_expect_success 'create objects for missing-HAVE tests' ' - blob=$(echo "missing have" | git hash-object -w --stdin) && - tree=$(printf "100644 blob $blob\tfile\n" | git mktree) && - parent=$(echo parent | git commit-tree $tree) && - commit=$(echo commit | git commit-tree $tree -p $parent) && - cat >revs <<-EOF - HEAD - ^HEAD^ - ^$commit - EOF -' - -test_expect_success 'pack-objects respects --incremental' ' - cat >revs2 <<-EOF && - HEAD - $commit - EOF - git pack-objects --incremental --stdout --revs 4.pack && - git index-pack 4.pack && - list_packed_objects 4.idx >4.objects && - test_line_count = 4 4.objects && - git rev-list --objects $commit >revlist && - cut -d" " -f1 revlist |sort >objects && - test_cmp 4.objects objects -' - -test_expect_success 'pack with missing blob' ' - rm $(objpath $blob) && - git pack-objects --stdout --revs /dev/null -' +test_bitmap_cases "pack.writeBitmapLookupTable" -test_expect_success 'pack with missing tree' ' - rm $(objpath $tree) && - git pack-objects --stdout --revs /dev/null -' - -test_expect_success 'pack with missing parent' ' - rm $(objpath $parent) && - git pack-objects --stdout --revs /dev/null -' - -test_expect_success JGIT,SHA1 'we can read jgit bitmaps' ' - git clone --bare . compat-jgit.git && - ( - cd compat-jgit.git && - rm -f objects/pack/*.bitmap && - jgit gc && - git rev-list --test-bitmap HEAD - ) -' - -test_expect_success JGIT,SHA1 'jgit can read our bitmaps' ' - git clone --bare . compat-us.git && - ( - cd compat-us.git && - git repack -adb && - # jgit gc will barf if it does not like our bitmaps - jgit gc - ) -' - -test_expect_success 'splitting packs does not generate bogus bitmaps' ' - test-tool genrandom foo $((1024 * 1024)) >rand && - git add rand && - git commit -m "commit with big file" && - git -c pack.packSizeLimit=500k repack -adb && - git init --bare no-bitmaps.git && - git -C no-bitmaps.git fetch .. HEAD -' - -test_expect_success 'set up reusable pack' ' - rm -f .git/objects/pack/*.keep && - git repack -adb && - reusable_pack () { - git for-each-ref --format="%(objectname)" | - git pack-objects --delta-base-offset --revs --stdout "$@" - } -' - -test_expect_success 'pack reuse respects --honor-pack-keep' ' - test_when_finished "rm -f .git/objects/pack/*.keep" && - for i in .git/objects/pack/*.pack - do - >${i%.pack}.keep || return 1 - done && - reusable_pack --honor-pack-keep >empty.pack && - git index-pack empty.pack && - git show-index actual && - test_must_be_empty actual -' - -test_expect_success 'pack reuse respects --local' ' - mv .git/objects/pack/* alt.git/objects/pack/ && - test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" && - reusable_pack --local >empty.pack && - git index-pack empty.pack && - git show-index actual && - test_must_be_empty actual -' - -test_expect_success 'pack reuse respects --incremental' ' - reusable_pack --incremental >empty.pack && - git index-pack empty.pack && - git show-index actual && - test_must_be_empty actual -' - -test_expect_success 'truncated bitmap fails gracefully (ewah)' ' - test_config pack.writebitmaphashcache false && - git repack -ad && - git rev-list --use-bitmap-index --count --all >expect && - bitmap=$(ls .git/objects/pack/*.bitmap) && - test_when_finished "rm -f $bitmap" && - test_copy_bytes 256 <$bitmap >$bitmap.tmp && - mv -f $bitmap.tmp $bitmap && - git rev-list --use-bitmap-index --count --all >actual 2>stderr && - test_cmp expect actual && - test_i18ngrep corrupt.ewah.bitmap stderr -' - -test_expect_success 'truncated bitmap fails gracefully (cache)' ' - git repack -ad && - git rev-list --use-bitmap-index --count --all >expect && - bitmap=$(ls .git/objects/pack/*.bitmap) && - test_when_finished "rm -f $bitmap" && - test_copy_bytes 512 <$bitmap >$bitmap.tmp && - mv -f $bitmap.tmp $bitmap && - git rev-list --use-bitmap-index --count --all >actual 2>stderr && - test_cmp expect actual && - test_i18ngrep corrupted.bitmap.index stderr -' - -# Create a state of history with these properties: -# -# - refs that allow a client to fetch some new history, while sharing some old -# history with the server; we use branches delta-reuse-old and -# delta-reuse-new here -# -# - the new history contains an object that is stored on the server as a delta -# against a base that is in the old history -# -# - the base object is not immediately reachable from the tip of the old -# history; finding it would involve digging down through history we know the -# other side has -# -# This should result in a state where fetching from old->new would not -# traditionally reuse the on-disk delta (because we'd have to dig to realize -# that the client has it), but we will do so if bitmaps can tell us cheaply -# that the other side has it. -test_expect_success 'set up thin delta-reuse parent' ' - # This first commit contains the buried base object. - test-tool genrandom delta 16384 >file && - git add file && - git commit -m "delta base" && - base=$(git rev-parse --verify HEAD:file) && - - # These intermediate commits bury the base back in history. - # This becomes the "old" state. - for i in 1 2 3 4 5 - do - echo $i >file && - git commit -am "intermediate $i" || return 1 - done && - git branch delta-reuse-old && - - # And now our new history has a delta against the buried base. Note - # that this must be smaller than the original file, since pack-objects - # prefers to create deltas from smaller objects to larger. - test-tool genrandom delta 16300 >file && - git commit -am "delta result" && - delta=$(git rev-parse --verify HEAD:file) && - git branch delta-reuse-new && - - # Repack with bitmaps and double check that we have the expected delta - # relationship. - git repack -adb && - have_delta $delta $base -' - -# Now we can sanity-check the non-bitmap behavior (that the server is not able -# to reuse the delta). This isn't strictly something we care about, so this -# test could be scrapped in the future. But it makes sure that the next test is -# actually triggering the feature we want. -# -# Note that our tools for working with on-the-wire "thin" packs are limited. So -# we actually perform the fetch, retain the resulting pack, and inspect the -# result. -test_expect_success 'fetch without bitmaps ignores delta against old base' ' - test_config pack.usebitmaps false && - test_when_finished "rm -rf client.git" && - git init --bare client.git && - ( - cd client.git && - git config transfer.unpackLimit 1 && - git fetch .. delta-reuse-old:delta-reuse-old && - git fetch .. delta-reuse-new:delta-reuse-new && - have_delta $delta $ZERO_OID - ) -' - -# And do the same for the bitmap case, where we do expect to find the delta. -test_expect_success 'fetch with bitmaps can reuse old base' ' - test_config pack.usebitmaps true && - test_when_finished "rm -rf client.git" && - git init --bare client.git && - ( - cd client.git && - git config transfer.unpackLimit 1 && - git fetch .. delta-reuse-old:delta-reuse-old && - git fetch .. delta-reuse-new:delta-reuse-new && - have_delta $delta $base - ) -' - -test_expect_success 'pack.preferBitmapTips' ' - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && - - # create enough commits that not all are receive bitmap - # coverage even if they are all at the tip of some reference. - test_commit_bulk --message="%s" 103 && - - git rev-list HEAD >commits.raw && - sort commits && - - git log --format="create refs/tags/%s %H" HEAD >refs && - git update-ref --stdin bitmaps && - - # remember which commits did not receive bitmaps - comm -13 bitmaps commits >before && - test_file_not_empty before && - - # mark the commits which did not receive bitmaps as preferred, - # and generate the bitmap again - perl -pe "s{^}{create refs/tags/include/$. }" bitmaps && - comm -13 bitmaps commits >after && - - ! test_cmp before after - ) -' - -test_expect_success 'complains about multiple pack bitmaps' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && - - test_commit base && - - git repack -adb && - bitmap="$(ls .git/objects/pack/pack-*.bitmap)" && - mv "$bitmap" "$bitmap.bak" && - - test_commit other && - git repack -ab && - - mv "$bitmap.bak" "$bitmap" && - - find .git/objects/pack -type f -name "*.pack" >packs && - find .git/objects/pack -type f -name "*.bitmap" >bitmaps && - test_line_count = 2 packs && - test_line_count = 2 bitmaps && - - git rev-list --use-bitmap-index HEAD 2>err && - grep "ignoring extra bitmap file" err - ) +test_expect_success 'verify writing bitmap lookup table when enabled' ' + GIT_TRACE2_EVENT="$(pwd)/trace2" \ + git repack -ad && + grep "\"label\":\"writing_lookup_table\"" trace2 ' test_done diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh index 872a95df338..9dae60f73e3 100755 --- a/t/t5311-pack-bitmaps-shallow.sh +++ b/t/t5311-pack-bitmaps-shallow.sh @@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories' # the tree for A. But in a shallow one, we've grafted away # A, and fetching A to B requires that the other side send # us the tree for file=1. -test_expect_success 'setup shallow repo' ' - echo 1 >file && - git add file && - git commit -m orig && - echo 2 >file && - git commit -a -m update && - git clone --no-local --bare --depth=1 . shallow.git && - echo 1 >file && - git commit -a -m repeat -' - -test_expect_success 'turn on bitmaps in the parent' ' - git repack -adb -' - -test_expect_success 'shallow fetch from bitmapped repo' ' - (cd shallow.git && git fetch) -' +test_shallow_bitmaps () { + writeLookupTable=false + + for i in "$@" + do + case $i in + "pack.writeBitmapLookupTable") writeLookupTable=true;; + esac + done + + test_expect_success 'setup shallow repo' ' + rm -rf * .git && + git init && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && + echo 1 >file && + git add file && + git commit -m orig && + echo 2 >file && + git commit -a -m update && + git clone --no-local --bare --depth=1 . shallow.git && + echo 1 >file && + git commit -a -m repeat + ' + + test_expect_success 'turn on bitmaps in the parent' ' + git repack -adb + ' + + test_expect_success 'shallow fetch from bitmapped repo' ' + (cd shallow.git && git fetch) + ' +} + +test_shallow_bitmaps +test_shallow_bitmaps "pack.writeBitmapLookupTable" test_done diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4fe57414c13..3b206adcee6 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 sane_unset GIT_TEST_MIDX_WRITE_REV sane_unset GIT_TEST_MIDX_READ_RIDX -midx_bitmap_core - bitmap_reuse_tests() { from=$1 to=$2 + writeLookupTable=false + + for i in $3-${$#} + do + case $i in + "pack.writeBitmapLookupTable") writeLookupTable=true;; + esac + done test_expect_success "setup pack reuse tests ($from -> $to)" ' rm -fr repo && git init repo && ( cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && test_commit_bulk 16 && git tag old-tip && @@ -43,6 +50,7 @@ bitmap_reuse_tests() { test_expect_success "build bitmap from existing ($from -> $to)" ' ( cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && test_commit_bulk --id=further 16 && git tag new-tip && @@ -59,6 +67,7 @@ bitmap_reuse_tests() { test_expect_success "verify resulting bitmaps ($from -> $to)" ' ( cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && git for-each-ref && git rev-list --test-bitmap refs/tags/old-tip && git rev-list --test-bitmap refs/tags/new-tip @@ -66,244 +75,294 @@ bitmap_reuse_tests() { ' } -bitmap_reuse_tests 'pack' 'MIDX' -bitmap_reuse_tests 'MIDX' 'pack' -bitmap_reuse_tests 'MIDX' 'MIDX' +test_midx_bitmap_cases () { + writeLookupTable=false + writeBitmapLookupTable= + + for i in "$@" + do + case $i in + "pack.writeBitmapLookupTable") + writeLookupTable=true + writeBitmapLookupTable="$i" + ;; + esac + done + + test_expect_success 'setup test_repository' ' + rm -rf * .git && + git init && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' + ' -test_expect_success 'missing object closure fails gracefully' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + midx_bitmap_core - test_commit loose && - test_commit packed && + bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable" + bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable" + bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable" - # Do not pass "--revs"; we want a pack without the "loose" - # commit. - git pack-objects $objdir/pack/pack <<-EOF && - $(git rev-parse packed) - EOF + test_expect_success 'missing object closure fails gracefully' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && - test_must_fail git multi-pack-index write --bitmap 2>err && - grep "doesn.t have full closure" err && - test_path_is_missing $midx - ) -' + test_commit loose && + test_commit packed && -midx_bitmap_partial_tests + # Do not pass "--revs"; we want a pack without the "loose" + # commit. + git pack-objects $objdir/pack/pack <<-EOF && + $(git rev-parse packed) + EOF -test_expect_success 'removing a MIDX clears stale bitmaps' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && - test_commit base && - git repack && - git multi-pack-index write --bitmap && + test_must_fail git multi-pack-index write --bitmap 2>err && + grep "doesn.t have full closure" err && + test_path_is_missing $midx + ) + ' - # Write a MIDX and bitmap; remove the MIDX but leave the bitmap. - stale_bitmap=$midx-$(midx_checksum $objdir).bitmap && - rm $midx && + midx_bitmap_partial_tests - # Then write a new MIDX. - test_commit new && - git repack && - git multi-pack-index write --bitmap && + test_expect_success 'removing a MIDX clears stale bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && + test_commit base && + git repack && + git multi-pack-index write --bitmap && + + # Write a MIDX and bitmap; remove the MIDX but leave the bitmap. + stale_bitmap=$midx-$(midx_checksum $objdir).bitmap && + rm $midx && + + # Then write a new MIDX. + test_commit new && + git repack && + git multi-pack-index write --bitmap && + + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + test_path_is_missing $stale_bitmap + ) + ' - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_missing $stale_bitmap - ) -' + test_expect_success 'pack.preferBitmapTips' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && -test_expect_success 'pack.preferBitmapTips' ' - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + test_commit_bulk --message="%s" 103 && - test_commit_bulk --message="%s" 103 && + git log --format="%H" >commits.raw && + sort commits && - git log --format="%H" >commits.raw && - sort commits && + git log --format="create refs/tags/%s %H" HEAD >refs && + git update-ref --stdin refs && - git update-ref --stdin bitmaps && + comm -13 bitmaps commits >before && + test_line_count = 1 before && - test-tool bitmap list-commits | sort >bitmaps && - comm -13 bitmaps commits >before && - test_line_count = 1 before && + perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \ + bitmaps && + comm -13 bitmaps commits >after && - git -c pack.preferBitmapTips=refs/tags/include \ - multi-pack-index write --bitmap && - test-tool bitmap list-commits | sort >bitmaps && - comm -13 bitmaps commits >after && + ! test_cmp before after + ) + ' - ! test_cmp before after - ) -' + test_expect_success 'writing a bitmap with --refs-snapshot' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && -test_expect_success 'writing a bitmap with --refs-snapshot' ' - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + test_commit one && + test_commit two && - test_commit one && - test_commit two && + git rev-parse one >snapshot && - git rev-parse one >snapshot && + git repack -ad && - git repack -ad && + # First, write a MIDX which see both refs/tags/one and + # refs/tags/two (causing both of those commits to receive + # bitmaps). + git multi-pack-index write --bitmap && - # First, write a MIDX which see both refs/tags/one and - # refs/tags/two (causing both of those commits to receive - # bitmaps). - git multi-pack-index write --bitmap && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + test-tool bitmap list-commits | sort >bitmaps && + grep "$(git rev-parse one)" bitmaps && + grep "$(git rev-parse two)" bitmaps && - test-tool bitmap list-commits | sort >bitmaps && - grep "$(git rev-parse one)" bitmaps && - grep "$(git rev-parse two)" bitmaps && + rm -fr $midx-$(midx_checksum $objdir).bitmap && + rm -fr $midx && - rm -fr $midx-$(midx_checksum $objdir).bitmap && - rm -fr $midx && + # Then again, but with a refs snapshot which only sees + # refs/tags/one. + git multi-pack-index write --bitmap --refs-snapshot=snapshot && - # Then again, but with a refs snapshot which only sees - # refs/tags/one. - git multi-pack-index write --bitmap --refs-snapshot=snapshot && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + test-tool bitmap list-commits | sort >bitmaps && + grep "$(git rev-parse one)" bitmaps && + ! grep "$(git rev-parse two)" bitmaps + ) + ' - test-tool bitmap list-commits | sort >bitmaps && - grep "$(git rev-parse one)" bitmaps && - ! grep "$(git rev-parse two)" bitmaps - ) -' + test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && -test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' ' - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + test_commit_bulk --message="%s" 103 && - test_commit_bulk --message="%s" 103 && + git log --format="%H" >commits.raw && + sort commits && - git log --format="%H" >commits.raw && - sort commits && + git log --format="create refs/tags/%s %H" HEAD >refs && + git update-ref --stdin refs && - git update-ref --stdin bitmaps && + comm -13 bitmaps commits >before && + test_line_count = 1 before && - test-tool bitmap list-commits | sort >bitmaps && - comm -13 bitmaps commits >before && - test_line_count = 1 before && + ( + grep -vf before commits.raw && + # mark missing commits as preferred + sed "s/^/+/" before + ) >snapshot && + rm -fr $midx-$(midx_checksum $objdir).bitmap && + rm -fr $midx && + + git multi-pack-index write --bitmap --refs-snapshot=snapshot && + test-tool bitmap list-commits | sort >bitmaps && + comm -13 bitmaps commits >after && + + ! test_cmp before after + ) + ' + + test_expect_success 'hash-cache values are propagated from pack bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && ( - grep -vf before commits.raw && - # mark missing commits as preferred - sed "s/^/+/" before - ) >snapshot && + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && - rm -fr $midx-$(midx_checksum $objdir).bitmap && - rm -fr $midx && + test_commit base && + test_commit base2 && + git repack -adb && - git multi-pack-index write --bitmap --refs-snapshot=snapshot && - test-tool bitmap list-commits | sort >bitmaps && - comm -13 bitmaps commits >after && + test-tool bitmap dump-hashes >pack.raw && + test_file_not_empty pack.raw && + sort pack.raw >pack.hashes && - ! test_cmp before after - ) -' + test_commit new && + git repack && + git multi-pack-index write --bitmap && -test_expect_success 'hash-cache values are propagated from pack bitmaps' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + test-tool bitmap dump-hashes >midx.raw && + sort midx.raw >midx.hashes && - test_commit base && - test_commit base2 && - git repack -adb && + # ensure that every namehash in the pack bitmap can be found in + # the midx bitmap (i.e., that there are no oid-namehash pairs + # unique to the pack bitmap). + comm -23 pack.hashes midx.hashes >dropped.hashes && + test_must_be_empty dropped.hashes + ) + ' - test-tool bitmap dump-hashes >pack.raw && - test_file_not_empty pack.raw && - sort pack.raw >pack.hashes && + test_expect_success 'no .bitmap is written without any objects' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && - test_commit new && - git repack && - git multi-pack-index write --bitmap && + empty="$(git pack-objects $objdir/pack/pack packs <<-EOF && + pack-$empty.idx + EOF - test-tool bitmap dump-hashes >midx.raw && - sort midx.raw >midx.hashes && + git multi-pack-index write --bitmap --stdin-packs \ + err && - # ensure that every namehash in the pack bitmap can be found in - # the midx bitmap (i.e., that there are no oid-namehash pairs - # unique to the pack bitmap). - comm -23 pack.hashes midx.hashes >dropped.hashes && - test_must_be_empty dropped.hashes - ) -' + grep "bitmap without any objects" err && -test_expect_success 'no .bitmap is written without any objects' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && + test_path_is_file $midx && + test_path_is_missing $midx-$(midx_checksum $objdir).bitmap + ) + ' + + test_expect_success 'graceful fallback when missing reverse index' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && - empty="$(git pack-objects $objdir/pack/pack packs <<-EOF && - pack-$empty.idx - EOF + test_commit base && - git multi-pack-index write --bitmap --stdin-packs \ - err && + # write a pack and MIDX bitmap containing base + git repack -adb && + git multi-pack-index write --bitmap && - grep "bitmap without any objects" err && + GIT_TEST_MIDX_READ_RIDX=0 \ + git rev-list --use-bitmap-index HEAD 2>err && + ! grep "ignoring extra bitmap file" err + ) + ' +} - test_path_is_file $midx && - test_path_is_missing $midx-$(midx_checksum $objdir).bitmap - ) -' +test_midx_bitmap_cases + +test_midx_bitmap_cases "pack.writeBitmapLookupTable" -test_expect_success 'graceful fallback when missing reverse index' ' +test_expect_success 'multi-pack-index write writes lookup table if enabled' ' rm -fr repo && git init repo && test_when_finished "rm -fr repo" && ( cd repo && - test_commit base && - - # write a pack and MIDX bitmap containing base - git repack -adb && - git multi-pack-index write --bitmap && - - GIT_TEST_MIDX_READ_RIDX=0 \ - git rev-list --use-bitmap-index HEAD 2>err && - ! grep "ignoring extra bitmap file" err + git config pack.writeBitmapLookupTable true && + git repack -ad && + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git multi-pack-index write --bitmap && + grep "\"label\":\"writing_lookup_table\"" trace ) ' diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh index d30ba632c87..e65e311cd73 100755 --- a/t/t5327-multi-pack-bitmaps-rev.sh +++ b/t/t5327-multi-pack-bitmaps-rev.sh @@ -17,7 +17,27 @@ GIT_TEST_MIDX_READ_RIDX=0 export GIT_TEST_MIDX_WRITE_REV export GIT_TEST_MIDX_READ_RIDX -midx_bitmap_core rev -midx_bitmap_partial_tests rev +test_midx_bitmap_rev () { + writeLookupTable=false + + for i in "$@" + do + case $i in + "pack.writeBitmapLookupTable") writeLookupTable=true;; + esac + done + + test_expect_success 'setup bitmap config' ' + rm -rf * .git && + git init && + git config pack.writeBitmapLookupTable '"$writeLookupTable"' + ' + + midx_bitmap_core rev + midx_bitmap_partial_tests rev +} + +test_midx_bitmap_rev +test_midx_bitmap_rev "pack.writeBitmapLookupTable" test_done From patchwork Sun Aug 14 16:55:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90A35C25B06 for ; Sun, 14 Aug 2022 17:00:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240830AbiHNRAR (ORCPT ); Sun, 14 Aug 2022 13:00:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240807AbiHNQ7p (ORCPT ); Sun, 14 Aug 2022 12:59:45 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E3C7643F for ; Sun, 14 Aug 2022 09:55:21 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id h13so6630416wrf.6 for ; Sun, 14 Aug 2022 09:55:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=L5ZDQjNSZODqV5ob6I9yfV1lvKfSzqAUZAjTDsOlzLI=; b=jy4DAA/wJ0r9AhLUaP7bDT3gdtdlqyKahnz6eAJ8K8TtSL+94+Mj+8eO9nFS7cOOGN yKCKmFrqcdVz+40+M2+Yme3lSfLtraKsGfez3DX8p03+AXVfcMx9Ao7FYLLxAS7VxhRy WAz9Ok4VHxca2PjsW3QxvL0xD8Ry19P+r3+TbMs7VkMEwKODov0L5xNODV3oZjQlu7ui R0n/YWfosPIrpvEs8uFy6GmJDE9fmq+EBQQ0yCLFWfE/DDI4s4gnVKxSgQDgHpCFBxvH oYiZHdafNZwPzOoX8izf1+/Bh/umJrlTU1SxdoGnbYvSkTET9sNi+W21+diTeD9TADHO gArA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=L5ZDQjNSZODqV5ob6I9yfV1lvKfSzqAUZAjTDsOlzLI=; b=CeYiirhwdug+AfZJONzsiKPwPw6ULYPXoVHwod9ghI4f8chnYQYqoKxyfBXdGlWUyT Xx49V9mP5xtZktS0gVQEwcRghI0YqU8EoXVI3EiugiBJd7oneg7PGMa8zZzCRFU8keMy FQwkFMrd0AMcmoYoJmCg18PaNF7jzQRPkTEdfp4BgrLO+s8l6cEy9DJQ+07z8fQuAt1m dv6+oMeimS9WcXStIvpOZwZIpQbJClrjrMa7BYg54HrYGO0HU/LE5qviQD8FuwpXG0cA oSuFIttp2u0/2UIHTSmTXqCCnERLvMjVf8mce2LHef9gx73ltKrIbFcrACdL7RLMMWiV nlig== X-Gm-Message-State: ACgBeo2Te2D9fAWH6eVuOuKYufpqm0u9H/8KfdncbQfmz3hlAq8XchIE XH6U+4vz75bpT8lLT/iy8cEBDIObh1g= X-Google-Smtp-Source: AA6agR77X0AUK9jNym6MnmIwQS73YDKk7H3DO2b8i/g3aZTYu7s/bc0zMdpkd0KZrxNTXi3NTyJLXg== X-Received: by 2002:adf:f2c1:0:b0:223:a7d2:4283 with SMTP id d1-20020adff2c1000000b00223a7d24283mr5733593wrp.485.1660496120478; Sun, 14 Aug 2022 09:55:20 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m8-20020a056000008800b0022063e5228bsm4890550wrx.93.2022.08.14.09.55.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:19 -0700 (PDT) Message-Id: <79842ca590c8f3af41d416bfb5d2b0109b1b918e.1660496112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:10 +0000 Subject: [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty Earlier change teaches Git to write bitmap lookup table. But Git does not know how to parse them. Teach Git to parse the existing bitmap lookup table. The older versions of Git are not affected by it. Those versions ignore the lookup table. Mentored-by: Taylor Blau Co-Mentored-by: Kaartic Sivaraam Signed-off-by: Abhradeep Chakraborty --- pack-bitmap.c | 290 ++++++++++++++++++++++++++++++++++++++-- pack-bitmap.h | 9 ++ t/t5310-pack-bitmaps.sh | 22 +++ 3 files changed, 312 insertions(+), 9 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index ef580be9e3f..9a208abc1fd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -83,6 +83,12 @@ struct bitmap_index { /* The checksum of the packfile or MIDX; points into map. */ const unsigned char *checksum; + /* + * If not NULL, this point into the commit table extension + * (within the memory mapped region `map`). + */ + unsigned char *table_lookup; + /* * Extended index. * @@ -186,6 +192,16 @@ static int load_bitmap_header(struct bitmap_index *index) index->hashes = (void *)(index_end - cache_size); index_end -= cache_size; } + + if (flags & BITMAP_OPT_LOOKUP_TABLE) { + size_t table_size = st_mult(ntohl(header->entry_count), + BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH); + if (table_size > index_end - index->map - header_size) + return error(_("corrupted bitmap index file (too short to fit lookup table)")); + if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) + index->table_lookup = (void *)(index_end - table_size); + index_end -= table_size; + } } index->entry_count = ntohl(header->entry_count); @@ -212,9 +228,11 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index, hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret); - /* a 0 return code means the insertion succeeded with no changes, - * because the SHA1 already existed on the map. this is bad, there - * shouldn't be duplicated commits in the index */ + /* + * A 0 return code means the insertion succeeded with no changes, + * because the SHA1 already existed on the map. This is bad, there + * shouldn't be duplicated commits in the index. + */ if (ret == 0) { error(_("duplicate entry in bitmap index: '%s'"), oid_to_hex(oid)); return NULL; @@ -482,7 +500,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git) !(bitmap_git->tags = read_bitmap_1(bitmap_git))) goto failed; - if (load_bitmap_entries_v1(bitmap_git) < 0) + if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0) goto failed; return 0; @@ -570,13 +588,256 @@ struct include_data { struct bitmap *seen; }; +struct bitmap_lookup_table_triplet { + uint32_t commit_pos; + uint64_t offset; + uint32_t xor_row; +}; + +struct bitmap_lookup_table_xor_item { + struct object_id oid; + uint64_t offset; +}; + +/* + * Given a `triplet` struct pointer and pointer `p`, this + * function reads the triplet beginning at `p` into the struct. + * Note that this function assumes that there is enough memory + * left for filling the `triplet` struct from `p`. + */ +static int bitmap_lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet, + const unsigned char *p) +{ + if (!triplet) + return -1; + + triplet->commit_pos = get_be32(p); + p += sizeof(uint32_t); + triplet->offset = get_be64(p); + p += sizeof(uint64_t); + triplet->xor_row = get_be32(p); + return 0; +} + +/* + * This function gets the raw triplet from `row`'th row in the + * lookup table and fills that data to the `triplet`. + */ +static int bitmap_lookup_table_get_triplet(struct bitmap_index *bitmap_git, + uint32_t pos, + struct bitmap_lookup_table_triplet *triplet) +{ + unsigned char *p = NULL; + if (pos >= bitmap_git->entry_count) + return error(_("corrupt bitmap lookup table: triplet position out of index")); + + p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH); + + return bitmap_lookup_table_get_triplet_by_pointer(triplet, p); +} + +/* + * Searches for a matching triplet. `commit_pos` is a pointer + * to the wanted commit position value. `table_entry` points to + * a triplet in lookup table. The first 4 bytes of each + * triplet (pointed by `table_entry`) are compared with `*commit_pos`. + */ +static int triplet_cmp(const void *commit_pos, const void *table_entry) +{ + + uint32_t a = *(uint32_t *)commit_pos; + uint32_t b = get_be32(table_entry); + if (a > b) + return 1; + else if (a < b) + return -1; + + return 0; +} + +static uint32_t bitmap_bsearch_pos(struct bitmap_index *bitmap_git, + struct object_id *oid, + uint32_t *result) +{ + int found; + + if (bitmap_is_midx(bitmap_git)) + found = bsearch_midx(oid, bitmap_git->midx, result); + else + found = bsearch_pack(oid, bitmap_git->pack, result); + + return found; +} + +/* + * `bsearch_triplet_by_pos` function searches for the raw triplet + * having commit position same as `commit_pos` and fills `triplet` + * object from the raw triplet. Returns 1 on success and 0 on + * failure. + */ +static int bitmap_bsearch_triplet_by_pos(uint32_t commit_pos, + struct bitmap_index *bitmap_git, + struct bitmap_lookup_table_triplet *triplet) +{ + unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count, + BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp); + + if (!p) + return -1; + + return bitmap_lookup_table_get_triplet_by_pointer(triplet, p); +} + +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + uint32_t commit_pos, xor_row; + uint64_t offset; + int flags; + struct bitmap_lookup_table_triplet triplet; + struct object_id *oid = &commit->object.oid; + struct ewah_bitmap *bitmap; + struct stored_bitmap *xor_bitmap = NULL; + const int bitmap_header_size = 6; + static struct bitmap_lookup_table_xor_item *xor_items = NULL; + static size_t xor_items_nr = 0, xor_items_alloc = 0; + static int is_corrupt = 0; + int xor_flags; + khiter_t hash_pos; + struct bitmap_lookup_table_xor_item *xor_item; + + if (is_corrupt) + return NULL; + + if (!bitmap_bsearch_pos(bitmap_git, oid, &commit_pos)) + return NULL; + + if (bitmap_bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0) + return NULL; + + xor_items_nr = 0; + offset = triplet.offset; + xor_row = triplet.xor_row; + + while (xor_row != 0xffffffff) { + ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc); + + if (xor_items_nr + 1 >= bitmap_git->entry_count) { + error(_("corrupt bitmap lookup table: xor chain exceed entry count")); + goto corrupt; + } + + if (bitmap_lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0) + goto corrupt; + + xor_item = &xor_items[xor_items_nr]; + xor_item->offset = triplet.offset; + + if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) { + error(_("corrupt bitmap lookup table: commit index %u out of range"), + triplet.commit_pos); + goto corrupt; + } + + hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid); + + /* + * If desired bitmap is already stored, we don't need + * to iterate further. Because we know that bitmaps + * that are needed to be parsed to parse this bitmap + * has already been stored. So, assign this stored bitmap + * to the xor_bitmap. + */ + if (hash_pos < kh_end(bitmap_git->bitmaps) && + (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos))) + break; + xor_items_nr++; + xor_row = triplet.xor_row; + } + + while (xor_items_nr) { + xor_item = &xor_items[xor_items_nr - 1]; + bitmap_git->map_pos = xor_item->offset; + if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) { + error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""), + oid_to_hex(&xor_item->oid)); + goto corrupt; + } + + bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t); + xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos); + bitmap = read_bitmap_1(bitmap_git); + + if (!bitmap) + goto corrupt; + + xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags); + xor_items_nr--; + } + + bitmap_git->map_pos = offset; + if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) { + error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""), + oid_to_hex(oid)); + goto corrupt; + } + + /* + * Don't bother reading the commit's index position or its xor + * offset: + * + * - The commit's index position is irrelevant to us, since + * load_bitmap_entries_v1 only uses it to learn the object + * id which is used to compute the hashmap's key. We already + * have an object id, so no need to look it up again. + * + * - The xor_offset is unusable for us, since it specifies how + * many entries previous to ours we should look at. This + * makes sense when reading the bitmaps sequentially (as in + * load_bitmap_entries_v1()), since we can keep track of + * each bitmap as we read them. + * + * But it can't work for us, since the bitmap's don't have a + * fixed size. So we learn the position of the xor'd bitmap + * from the commit table (and resolve it to a bitmap in the + * above if-statement). + * + * Instead, we can skip ahead and immediately read the flags and + * ewah bitmap. + */ + bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t); + flags = read_u8(bitmap_git->map, &bitmap_git->map_pos); + bitmap = read_bitmap_1(bitmap_git); + + if (!bitmap) + goto corrupt; + + return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags); + +corrupt: + free(xor_items); + is_corrupt = 1; + return NULL; +} + struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, struct commit *commit) { khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid); - if (hash_pos >= kh_end(bitmap_git->bitmaps)) - return NULL; + if (hash_pos >= kh_end(bitmap_git->bitmaps)) { + struct stored_bitmap *bitmap = NULL; + if (!bitmap_git->table_lookup) + return NULL; + + trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository); + /* NEEDSWORK: cache misses aren't recorded */ + bitmap = lazy_bitmap_for_commit(bitmap_git, commit); + trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository); + if (!bitmap) + return NULL; + return lookup_stored_bitmap(bitmap); + } return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos)); } @@ -1712,8 +1973,10 @@ void test_bitmap_walk(struct rev_info *revs) if (revs->pending.nr != 1) die(_("you must specify exactly one commit to test")); - fprintf_ln(stderr, "Bitmap v%d test (%d entries loaded)", - bitmap_git->version, bitmap_git->entry_count); + fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)", + bitmap_git->version, + bitmap_git->entry_count, + bitmap_git->table_lookup ? "" : " loaded"); root = revs->pending.objects[0].item; bm = bitmap_for_commit(bitmap_git, (struct commit *)root); @@ -1766,13 +2029,22 @@ void test_bitmap_walk(struct rev_info *revs) int test_bitmap_commits(struct repository *r) { - struct bitmap_index *bitmap_git = prepare_bitmap_git(r); struct object_id oid; MAYBE_UNUSED void *value; + struct bitmap_index *bitmap_git = prepare_bitmap_git(r); if (!bitmap_git) die(_("failed to load bitmap indexes")); + /* + * As this function is only used to print bitmap selected + * commits, we don't have to read the commit table. + */ + if (bitmap_git->table_lookup) { + if (load_bitmap_entries_v1(bitmap_git) < 0) + die(_("failed to load bitmap indexes")); + } + kh_foreach(bitmap_git->bitmaps, oid, value, { printf_ln("%s", oid_to_hex(&oid)); }); diff --git a/pack-bitmap.h b/pack-bitmap.h index cb065a263cb..f0180b5276b 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -23,6 +23,15 @@ struct bitmap_disk_header { #define NEEDS_BITMAP (1u<<22) +/* + * The width in bytes of a single triplet in the lookup table + * extension: + * (commit_pos, offset, xor_row) + * + * whose fields ar 32-, 64-, 32- bits wide, respectively. + */ +#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16) + enum pack_bitmap_opts { BITMAP_OPT_FULL_DAG = 0x1, BITMAP_OPT_HASH_CACHE = 0x4, diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index c0607172827..7e50f8e7653 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -258,6 +258,7 @@ test_bitmap_cases () { test_expect_success 'truncated bitmap fails gracefully (ewah)' ' test_config pack.writebitmaphashcache false && + test_config pack.writebitmaplookuptable false && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -270,6 +271,7 @@ test_bitmap_cases () { ' test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -453,4 +455,24 @@ test_expect_success 'verify writing bitmap lookup table when enabled' ' grep "\"label\":\"writing_lookup_table\"" trace2 ' +test_expect_success 'lookup table is actually used to traverse objects' ' + git repack -adb && + GIT_TRACE2_EVENT="$(pwd)/trace3" \ + git rev-list --use-bitmap-index --count --all && + grep "\"label\":\"reading_lookup_table\"" trace3 +' + +test_expect_success 'truncated bitmap fails gracefully (lookup table)' ' + test_config pack.writebitmaphashcache false && + git repack -adb && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr +' + test_done From patchwork Sun Aug 14 16:55:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12942934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C249FC25B0F for ; Sun, 14 Aug 2022 17:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239845AbiHNRAX (ORCPT ); Sun, 14 Aug 2022 13:00:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241953AbiHNQ7q (ORCPT ); Sun, 14 Aug 2022 12:59:46 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D65DDF62 for ; Sun, 14 Aug 2022 09:55:23 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id j7so6640542wrh.3 for ; Sun, 14 Aug 2022 09:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc; bh=HbVbbfV0XZun2uRwXAVRnrwH3J9jKCmHcoWiu3EahOI=; b=J6S3lAF1yF+Xnn76al7fNVO1zuOP5+FkDPUNEvL9duslML2sYeRr7wU45JSD/TS8uq 4JovKKv79rqvrn2UdF/DQebbSIqjxY4hEvYLk5Z1nO/2QRAGZ19TO2867YHA+ytpJfzC dK7zzzzzQDbuNNnvpe/gLB6GUYDOy7AMR8VoKa8LLizUZNy6bDUEZbp/IptMYOXZQ2Le 4qQsL18PO8tfUAU+c4b81DfVBthrNlxX5CmeEV4ppJ7oQE8bt8JtkILwwjjLFOk5rz7R EdYid+MUZ5QLWjC+YrHxFyVXN3q3QUEzJ4drzMtvi/Tcz0G2ArWvk8ohSKfNkjaLr6aH IQhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc; bh=HbVbbfV0XZun2uRwXAVRnrwH3J9jKCmHcoWiu3EahOI=; b=HMsmtxbZgzbr/I6NxtrPl1bL2nrPpQBMrdWAdR67SD2paqXd7d/zJv8tuB0R1sadtC v3KNC/a+ypMvzliRoZosRy/KMXM8Tk8XB9pw3pHRoSzmS9ssSvNu17sujNp7G7Iz3HDg /mLgtkH5V5jeXpID5pWP2YIdPYT6giacsW4/mVEaBSIgV1q8nen1m0NMLVSEqWFtNVnb 0j408XfId1VCN9bXyBYm+fny8O5RmoJvOaqkYLkLwdAoZEq+0ib5T3V5b69bXJWkh85s TTmTvMJRYItzWK/iWVgYD0Ngz8c3Cs7ZYSZgnb80Ndbi/cx12X0XzjpYbxqL9K2NJ0RG hDfQ== X-Gm-Message-State: ACgBeo1YR9nt4Hjo5z9AQpAns1i/m2qV7F8Z/pWzlfU+YIF1FaZuBePy vhjvTUukfEdrOVc2wlaGA6vp+N1PkOY= X-Google-Smtp-Source: AA6agR4lCtV94+EAqHIzBB4hIjDKbSepQ0gzSFG3OSi0uyF1h9MRabjwJDxcleDrGvlBozrVV9sLJQ== X-Received: by 2002:a05:6000:697:b0:224:bfb3:dbca with SMTP id bo23-20020a056000069700b00224bfb3dbcamr3825140wrb.646.1660496121590; Sun, 14 Aug 2022 09:55:21 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q9-20020adff509000000b0021efc75914esm4946525wro.79.2022.08.14.09.55.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Aug 2022 09:55:21 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 14 Aug 2022 16:55:11 +0000 Subject: [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Kaartic Sivaram , Junio C Hamano , Derrick Stolee , Philip Oakley , Martin =?utf-8?b?w4VncmVu?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Eric Sunshine , Johannes Schindelin , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty Add performance tests to verify the performance of lookup table. `p5310-pack-bitmaps.sh` contain tests with and without lookup table. `p5312-pack-bitmaps-revs.sh` contain same tests with and without lookup table but with `pack.writeReverseIndex` enabled. Lookup table makes Git run faster in most of the cases. Below is the result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh` gives similar result. The repository used in the test is linux kernel. Test this tree ----------------------------------------------------------------------- 5310.4: enable lookup table: false 0.01(0.00+0.00) 5310.5: repack to disk 320.89(230.20+23.45) 5310.6: simulated clone 14.04(5.78+1.79) 5310.7: simulated fetch 1.95(3.05+0.20) 5310.8: pack to file (bitmap) 44.73(20.55+7.45) 5310.9: rev-list (commits) 0.78(0.46+0.10) 5310.10: rev-list (objects) 4.07(3.97+0.08) 5310.11: rev-list with tag negated via --not 0.06(0.02+0.03) --all (objects) 5310.12: rev-list with negative tag (objects) 0.21(0.15+0.05) 5310.13: rev-list count with blob:none 0.24(0.17+0.06) 5310.14: rev-list count with blob:limit=1k 7.07(5.92+0.48) 5310.15: rev-list count with tree:0 0.25(0.17+0.07) 5310.16: simulated partial clone 5.67(3.28+0.64) 5310.18: clone (partial bitmap) 16.05(8.34+1.86) 5310.19: pack to file (partial bitmap) 59.76(27.22+7.43) 5310.20: rev-list with tree filter (partial bitmap) 0.90(0.18+0.16) 5310.24: enable lookup table: true 0.01(0.00+0.00) 5310.25: repack to disk 319.73(229.30+23.01) 5310.26: simulated clone 13.69(5.72+1.78) 5310.27: simulated fetch 1.84(3.02+0.16) 5310.28: pack to file (bitmap) 45.63(20.67+7.50) 5310.29: rev-list (commits) 0.56(0.39+0.8) 5310.30: rev-list (objects) 3.77(3.74+0.08) 5310.31: rev-list with tag negated via --not 0.05(0.02+0.03) --all (objects) 5310.32: rev-list with negative tag (objects) 0.21(0.15+0.05) 5310.33: rev-list count with blob:none 0.23(0.17+0.05) 5310.34: rev-list count with blob:limit=1k 6.65(5.72+0.40) 5310.35: rev-list count with tree:0 0.23(0.16+0.06) 5310.36: simulated partial clone 5.57(3.26+0.59) 5310.38: clone (partial bitmap) 15.89(8.39+1.84) 5310.39: pack to file (partial bitmap) 58.32(27.55+7.47) 5310.40: rev-list with tree filter (partial bitmap) 0.73(0.18+0.15) Test 4-15 are tested without using lookup table. Same tests are repeated in 16-30 (using lookup table). Mentored-by: Taylor Blau Co-Mentored-by: Kaartic Sivaraam Signed-off-by: Abhradeep Chakraborty --- t/perf/lib-bitmap.sh | 31 +++++++++ t/perf/p5310-pack-bitmaps.sh | 78 +++++++++------------- t/perf/p5311-pack-bitmaps-fetch.sh | 74 ++++++++++++--------- t/perf/p5312-pack-bitmaps-revs.sh | 35 ++++++++++ t/perf/p5326-multi-pack-bitmaps.sh | 103 +++++++++++++++++------------ 5 files changed, 199 insertions(+), 122 deletions(-) create mode 100755 t/perf/p5312-pack-bitmaps-revs.sh diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh index 63d3bc7cece..55a8feb1dc4 100644 --- a/t/perf/lib-bitmap.sh +++ b/t/perf/lib-bitmap.sh @@ -67,3 +67,34 @@ test_partial_bitmap () { --filter=tree:0 >/dev/null ' } + +test_pack_bitmap () { + test_perf "repack to disk" ' + git repack -ad + ' + + test_full_bitmap + + test_expect_success "create partial bitmap state" ' + # pick a commit to represent the repo tip in the past + cutoff=$(git rev-list HEAD~100 -1) && + orig_tip=$(git rev-parse HEAD) && + + # now kill off all of the refs and pretend we had + # just the one tip + rm -rf .git/logs .git/refs/* .git/packed-refs && + git update-ref HEAD $cutoff && + + # and then repack, which will leave us with a nice + # big bitmap pack of the "old" history, and all of + # the new history will be loose, as if it had been pushed + # up incrementally and exploded via unpack-objects + git repack -Ad && + + # and now restore our original tip, as if the pushes + # had happened + git update-ref HEAD $orig_tip + ' + + test_partial_bitmap +} diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh index 7ad4f237bc3..b1399f1007e 100755 --- a/t/perf/p5310-pack-bitmaps.sh +++ b/t/perf/p5310-pack-bitmaps.sh @@ -4,51 +4,37 @@ test_description='Tests pack performance using bitmaps' . ./perf-lib.sh . "${TEST_DIRECTORY}/perf/lib-bitmap.sh" -test_perf_large_repo - -# note that we do everything through config, -# since we want to be able to compare bitmap-aware -# git versus non-bitmap git -# -# We intentionally use the deprecated pack.writebitmaps -# config so that we can test against older versions of git. -test_expect_success 'setup bitmap config' ' - git config pack.writebitmaps true -' - -# we need to create the tag up front such that it is covered by the repack and -# thus by generated bitmaps. -test_expect_success 'create tags' ' - git tag --message="tag pointing to HEAD" perf-tag HEAD -' - -test_perf 'repack to disk' ' - git repack -ad -' - -test_full_bitmap - -test_expect_success 'create partial bitmap state' ' - # pick a commit to represent the repo tip in the past - cutoff=$(git rev-list HEAD~100 -1) && - orig_tip=$(git rev-parse HEAD) && - - # now kill off all of the refs and pretend we had - # just the one tip - rm -rf .git/logs .git/refs/* .git/packed-refs && - git update-ref HEAD $cutoff && - - # and then repack, which will leave us with a nice - # big bitmap pack of the "old" history, and all of - # the new history will be loose, as if it had been pushed - # up incrementally and exploded via unpack-objects - git repack -Ad && - - # and now restore our original tip, as if the pushes - # had happened - git update-ref HEAD $orig_tip -' - -test_partial_bitmap +test_lookup_pack_bitmap () { + test_expect_success 'start the test from scratch' ' + rm -rf * .git + ' + + test_perf_large_repo + + # note that we do everything through config, + # since we want to be able to compare bitmap-aware + # git versus non-bitmap git + # + # We intentionally use the deprecated pack.writebitmaps + # config so that we can test against older versions of git. + test_expect_success 'setup bitmap config' ' + git config pack.writebitmaps true + ' + + # we need to create the tag up front such that it is covered by the repack and + # thus by generated bitmaps. + test_expect_success 'create tags' ' + git tag --message="tag pointing to HEAD" perf-tag HEAD + ' + + test_perf "enable lookup table: $1" ' + git config pack.writeBitmapLookupTable '"$1"' + ' + + test_pack_bitmap +} + +test_lookup_pack_bitmap false +test_lookup_pack_bitmap true test_done diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh index 47c3fd7581c..426fab87e32 100755 --- a/t/perf/p5311-pack-bitmaps-fetch.sh +++ b/t/perf/p5311-pack-bitmaps-fetch.sh @@ -3,42 +3,52 @@ test_description='performance of fetches from bitmapped packs' . ./perf-lib.sh -test_perf_default_repo - -test_expect_success 'create bitmapped server repo' ' - git config pack.writebitmaps true && - git repack -ad -' - -# simulate a fetch from a repository that last fetched N days ago, for -# various values of N. We do so by following the first-parent chain, -# and assume the first entry in the chain that is N days older than the current -# HEAD is where the HEAD would have been then. -for days in 1 2 4 8 16 32 64 128; do - title=$(printf '%10s' "($days days)") - test_expect_success "setup revs from $days days ago" ' - now=$(git log -1 --format=%ct HEAD) && - then=$(($now - ($days * 86400))) && - tip=$(git rev-list -1 --first-parent --until=$then HEAD) && - { - echo HEAD && - echo ^$tip - } >revs +test_fetch_bitmaps () { + test_expect_success 'setup test directory' ' + rm -fr * .git ' - test_perf "server $title" ' - git pack-objects --stdout --revs \ - --thin --delta-base-offset \ - tmp.pack - ' + test_perf_default_repo - test_size "size $title" ' - wc -c revs + ' + + test_perf "server $title (lookup=$1)" ' + git pack-objects --stdout --revs \ + --thin --delta-base-offset \ + tmp.pack + ' + + test_size "size $title" ' + wc -c