From patchwork Thu Jun 2 13:52:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12867891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2969C433EF for ; Thu, 2 Jun 2022 13:53:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235605AbiFBNxB (ORCPT ); Thu, 2 Jun 2022 09:53:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235564AbiFBNww (ORCPT ); Thu, 2 Jun 2022 09:52:52 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BF901AA173 for ; Thu, 2 Jun 2022 06:52:51 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id z17so2630727wmf.1 for ; Thu, 02 Jun 2022 06:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=lQUV46032B+RDVGoe9w/TiOva8DdwMfNUVNzl1mBHU0=; b=hIMiNnnn5c5hORC2SCaM/J1zuF1OfvYcYyKyERAEGoasdlbTk2Zb4HliWyAFK6Zqi/ jwP6n4VLGid2Hxur8fmtvvkyk1JWXHhabkmTjYn+5tLudUaayWPYNF/ssPioAF/dZNa4 RKqtfyaiXsoApxopcZgQeHrIkIIvIcUJYpG8+Rmtke9noxpg0pBKIhbJQUB+q6ByYH8o c3bzPrbE65w+W8TsCUcZe2EMHeYSx0T0N9ILxzxg2bX9Oj6SMM0Fy6kwQ6w9TtwAXhzm A/58x6jGIF3uyMixDaB6+2OH7za19F5c0deUVKNPkjnvLlt7DhVx7LMFfuqzmw9CH4JE 7k3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=lQUV46032B+RDVGoe9w/TiOva8DdwMfNUVNzl1mBHU0=; b=Vbxgr7yXmtgqqroGXpIuxrsvBY3spffos7bUi37I8OQMDAXGl5gkWitzVWyGlyqSBz OO7dzy8n7a+5d7DRRmjE3x5o1wWRxjTrE1lp4Pd7oC1xIrlaVwkOALZQRDm8bhi4iYUd hUvGxsSu0GfSz0fu+eHVSI3rTzlc4wdPVkbOCOjqHTzXwn8oE/o7va86wDyfDzWJfZ2v uW9uuoFb5A+0dQUwZpExXVgUzfJLz/LBHq0C9/aItzLJOKwilVP7bH5rFys6y9DsrxR7 g5DHR+VfjHNTRLAFNtdlsbRcSS2Noq3uVJzL0H7Ipqi2J9I66IAtrJKWI2CZmwOWIFW4 mQuQ== X-Gm-Message-State: AOAM531BgWabGKqOC/st0A0OfAez8GGaZw0T5b3GZ3tLxcqw08/E6K53 IGquBMHZrSJG+oU793rofLl/sRkmgnUM9PdQ X-Google-Smtp-Source: ABdhPJzd26I2nQM3bs4Ph0N9l9++Wp0Q73UHDgI6iEV5x9cInw2iXAobhiB2AIf4h2/JrK/9ea8HHA== X-Received: by 2002:a05:600c:3646:b0:397:326d:eac7 with SMTP id y6-20020a05600c364600b00397326deac7mr4211437wmq.43.1654177969450; Thu, 02 Jun 2022 06:52:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x8-20020adff0c8000000b00210a6bd8019sm4209695wro.8.2022.06.02.06.52.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jun 2022 06:52:48 -0700 (PDT) Message-Id: <976361e624a3dd58c8f291358d42f4e4c66eb266.1654177966.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 02 Jun 2022 13:52:45 +0000 Subject: [PATCH 1/2] bitmap-format.txt: fix some formatting issues Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty The asciidoc generated html for `Documentation/technical/bitmap- format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. Fix these and also reformat it (e.g. removing some blank lines) for better readability of the html page. Signed-off-by: Abhradeep Chakraborty --- Documentation/technical/bitmap-format.txt | 96 +++++++++++------------ 1 file changed, 45 insertions(+), 51 deletions(-) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..110d7ddf8ed 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: + * A header appears at the beginning: 4-byte signature: {'B', 'I', 'T', 'M'} @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required. of the bitmap index (the same one as JGit). 2-byte flags (network byte order) - The following flags are supported: - - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED - This flag must always be present. It implies that the - bitmap index has been generated for a packfile or - multi-pack index (MIDX) with full closure (i.e. where - every single object in the packfile/MIDX can find its - parent links inside the same packfile/MIDX). This is a - requirement for the bitmap index format, also present in - JGit, that greatly reduces the complexity of the - implementation. - - - BITMAP_OPT_HASH_CACHE (0x4) - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack/MIDX. The format and meaning of the name-hash is - described below. + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the + bitmap index has been generated for a packfile or + multi-pack index (MIDX) with full closure (i.e. where + every single object in the packfile/MIDX can find its + parent links inside the same packfile/MIDX). This is a + requirement for the bitmap index format, also present in + JGit, that greatly reduces the complexity of the + implementation. + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack/MIDX. The format and meaning of the name-hash is + described below. 4-byte entry count (network byte order) - The total count of entries (bitmapped commits) in this bitmap index. 20-byte checksum - The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes + * 4 EWAH bitmaps that act as type indexes Type indexes are serialized after the hash cache in the shape of four EWAH bitmaps stored consecutively (see Appendix A for @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. There is a bitmap for each Git object type, stored in the following order: - - Commits - Trees - Blobs @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required. in a full set (all bits set), and the AND of all 4 bitmaps will result in an empty bitmap (no bits set). - - N entries with compressed bitmaps, one for each indexed commit + * N entries with compressed bitmaps, one for each indexed commit Where `N` is the total amount of entries in this bitmap index. Each entry contains the following: - - 4-byte object position (network byte order) - The position **in the index for the packfile or - multi-pack index** where the bitmap for this commit is - found. - - - 1-byte XOR-offset - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - - - The compressed bitmap itself, see Appendix A. + ** 4-byte object position (network byte order) + The position **in the index for the packfile or + multi-pack index** where the bitmap for this commit is + found. + + ** 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + ** 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap From patchwork Thu Jun 2 13:52:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12867890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF6F9C433EF for ; Thu, 2 Jun 2022 13:52:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235603AbiFBNwy (ORCPT ); Thu, 2 Jun 2022 09:52:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235598AbiFBNww (ORCPT ); Thu, 2 Jun 2022 09:52:52 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 412CD29C for ; Thu, 2 Jun 2022 06:52:51 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id f23-20020a7bcc17000000b003972dda143eso4818933wmh.3 for ; Thu, 02 Jun 2022 06:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Yun/+5/TJLiwGZK6Xx5stTbrp7pJpLTDZx8cGdfNMOo=; b=XtyAaBR0ZPZiM7Cg55z25K8NaJ7YqghnYhkl278D0rwrIZ/SYcdJP7f7JO8G3GC6XY ZcwlSHOdEIQwzjKSugsczvWMjOzCZS+s0fs0LbjqSZLeGncCthr9l/LVgR4MtoHwUySS 98z/xg52TVcoTbZhIf/U8wMZvYbRJeTbnYPbcVqJvBTTngMxRRlhGS2QWm9JN+Xuiy9t guAc/3J7Gx+vIFMJXEImcAu4afSIHHsw3G22JB+4lA3p2wIrqEuYaHREORVzsFb6kMV5 SOpPNkrSY4d4Vt/kpxwwUaCGH/28P2T7oUgFyOxmgklEPkqmNCSbLeEOgfEE1pzQiXEM tWQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Yun/+5/TJLiwGZK6Xx5stTbrp7pJpLTDZx8cGdfNMOo=; b=aVmsPX+1JnKCiQwZncKtyzmSaVFBEqHIn1J8OyFoTp/iCVFl2eOyGZYREhn5GkGl4B Vu60+6ZYQ/CXcNNBP7jVanucMsQMcnujkAVuUXceYnN/cf27+/+ZoxRRDrDUKkg0WCD1 k54c8N/v9l+RKZ5oFKKnClmFvjaTUUtTTzXTcrPPyRgJVnA/AyWrtESTIGnW0MINo0US 2Oxg1cilwuZ0pWeJBe1HQuYcvlNQlTb3Xou2korx47UnwaAkI0ctpdEXuVy+7FW/O3Ta JHrDBFxBC5WwA+OmFX92S+YQjf/Ob0qjCizaW1VuWZLPLGVYHG2F7Sd2wY4baIzoHQht ju9w== X-Gm-Message-State: AOAM533DCdW5Xmx3MClW5hqKyHWV/k2kT6ONqKjNZ56afym8Y5jAkPNT TFv8eRhw1qHCZ5LYDiFMw6+npj7gLeJ9NogC X-Google-Smtp-Source: ABdhPJz1vlq3tvDqxeMDO8auncmfPCjMEgH/2WORZUyn/6Vtckzk5qcjwCpxOWx7pA1IDOHTBqyNLw== X-Received: by 2002:a05:600c:3595:b0:399:fd8f:2c00 with SMTP id p21-20020a05600c359500b00399fd8f2c00mr24369308wmq.97.1654177970601; Thu, 02 Jun 2022 06:52:50 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p33-20020a05600c1da100b003942a244ebesm5544045wms.3.2022.06.02.06.52.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jun 2022 06:52:50 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 02 Jun 2022 13:52:46 +0000 Subject: [PATCH 2/2] bitmap-format.txt: add information for trailing checksum Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 110d7ddf8ed..6846e7221a7 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -125,6 +125,10 @@ MIDXs, both the bit-cache and rev-cache extensions are required. ** The compressed bitmap itself, see Appendix A. + * TRAILER: + + Index checksum of the above contents. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH From patchwork Tue Jun 7 17:43:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhradeep Chakraborty X-Patchwork-Id: 12872214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C58CACCA47C for ; Tue, 7 Jun 2022 18:02:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230258AbiFGSC4 (ORCPT ); Tue, 7 Jun 2022 14:02:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350926AbiFGSBa (ORCPT ); Tue, 7 Jun 2022 14:01:30 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E217F14D7B1 for ; Tue, 7 Jun 2022 10:43:42 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id n185so9702559wmn.4 for ; Tue, 07 Jun 2022 10:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=CrFF6ZgyQNirS2ZOC0MhUXTEX+PprmCGACgxdgMJDUM=; b=aOFSiqv1n2aRJV6igzkUohiVxrc/lhUSCktOkBXrck73Um/RVLVjWLbDBlL/xc09lw cu7Xu0ORamZEqQ5vQjmXMjgytkTPg/k818gvmIuQQV1HC4dNHde/KjNauYMnQXEq0mcS FcAlU7A2O7wl+0YHv7v5NgwwzKlPIHBk5MkyyUk7Rx1meM00GA7gHcO4RXmQJSWm0anP 49JPYGlQ3hfvRm16IvuG8orIYVUkvKxs06EdsWRFNpCaINSALy+gNnokYhfQCb9SaQ2s FT5f+mRdpXnMnqG0pOe17ddMKeFaKCY/W8OWRI0oClmm2E5/V8NN6bL59CzbxhdHabfV cTNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=CrFF6ZgyQNirS2ZOC0MhUXTEX+PprmCGACgxdgMJDUM=; b=zfrvwVmr+nAin+0z8u/wapQA13euIHshuHedL1EniNMrl2l4DJdyj5ToGe1vVuJ+U8 ZBgYButXSUStM4KvOCXQCNpQmk7ZEQU8Mr4kzyct2Iu6oYYredRBFHUk5xTbdYiGk2io 7lyOyt//lwpTXCIvaAldfbi0n2jugR2YBKiYrKZzjKC5cyhMo3sR9yYq1YAbzRtUQUF7 jQTKL6dfkrllWGnya//PJ0b8BEYEjQStD4/2byL+rpC1MEVNEo6dXTgVrpWI17ibU5ry v85bPTJODYzzcwVmb3rDSCPuHvvcRS/KPCkqZCKZUnG9XpGUeQmET6WUY+MygE3LqOpG dJLA== X-Gm-Message-State: AOAM532gXW++VadfF5Bamc8lFJ7TSiAJcIW9isQnx5qbrYmsblis0Jna Q9ahCLVIooH7bmKJD3/JTIAAHllZ7e6vQ/IC X-Google-Smtp-Source: ABdhPJwlGeIyAD1f3BLo1uy4gSlRYe1Yl0Hnm6f7IwLtCx80Mu/GGFKfNjXhWTQ18Xz0ZxkeIvzULw== X-Received: by 2002:a05:600c:3d94:b0:39c:1c04:3191 with SMTP id bi20-20020a05600c3d9400b0039c1c043191mr29569980wmb.171.1654623819996; Tue, 07 Jun 2022 10:43:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n3-20020a1c2703000000b0039c4b518df4sm11014370wmn.5.2022.06.07.10.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jun 2022 10:43:39 -0700 (PDT) Message-Id: <2171d31fb2b783371bdc31ba54856dea8224de65.1654623814.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 07 Jun 2022 17:43:34 +0000 Subject: [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Vicent Marti , Kaartic Sivaraam , Derrick Stolee , Junio C Hamano , Abhradeep Chakraborty , Abhradeep Chakraborty Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhradeep Chakraborty From: Abhradeep Chakraborty Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index f22669b5916..a43d2fe2bbf 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -125,6 +125,10 @@ MIDXs, both the bit-cache and rev-cache extensions are required. ** The compressed bitmap itself, see Appendix A. + * TRAILER: + + Index checksum of the above contents. It is a 20-byte SHA1 checksum. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH