Message ID | 976361e624a3dd58c8f291358d42f4e4c66eb266.1654177966.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | bitmap-format.txt: fix some formatting issues and include checksum info | expand |
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > Cc: git@vger.kernel.org, Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Identify those who may have input with "git log --no-merges" and add them here, perhaps? > The asciidoc generated html for `Documentation/technical/bitmap- > format.txt` is broken. This is mainly because `-` is used for nested > lists (which is not allowed in asciidoc) instead of `*`. Are we missing another step that must come much earlier than this patch? It seems to me that Documentation/Makefile does not even consider that we should feed this file to AsciiDoc. > Fix these and also reformat it (e.g. removing some blank lines) for > better readability of the html page. Do these blank lines hurt very badly how the end-result is formatted in HTML? Does the extra indentation between the line with "The following flags are supported" on it and the two bullet items in the header make the output better in significant way? These changes make the input text much harder to read, and are not very welcome, so unless they are part of "fixing generated HTML is broken", please omit them. As evidenced by the lack of HTML output in the build system, a lot more folks read this document in text than in HTML, and readability of the source matters. Thanks. > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > --- > Documentation/technical/bitmap-format.txt | 96 +++++++++++------------ > 1 file changed, 45 insertions(+), 51 deletions(-) > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt > index 04b3ec21785..110d7ddf8ed 100644 > --- a/Documentation/technical/bitmap-format.txt > +++ b/Documentation/technical/bitmap-format.txt > @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > == On-disk format > > - - A header appears at the beginning: > + * A header appears at the beginning: > > 4-byte signature: {'B', 'I', 'T', 'M'} > > @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > of the bitmap index (the same one as JGit). > > 2-byte flags (network byte order) > - > The following flags are supported: > - > - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED > - This flag must always be present. It implies that the > - bitmap index has been generated for a packfile or > - multi-pack index (MIDX) with full closure (i.e. where > - every single object in the packfile/MIDX can find its > - parent links inside the same packfile/MIDX). This is a > - requirement for the bitmap index format, also present in > - JGit, that greatly reduces the complexity of the > - implementation. > - > - - BITMAP_OPT_HASH_CACHE (0x4) > - If present, the end of the bitmap file contains > - `N` 32-bit name-hash values, one per object in the > - pack/MIDX. The format and meaning of the name-hash is > - described below. > + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED > + This flag must always be present. It implies that the > + bitmap index has been generated for a packfile or > + multi-pack index (MIDX) with full closure (i.e. where > + every single object in the packfile/MIDX can find its > + parent links inside the same packfile/MIDX). This is a > + requirement for the bitmap index format, also present in > + JGit, that greatly reduces the complexity of the > + implementation. > + - BITMAP_OPT_HASH_CACHE (0x4) > + If present, the end of the bitmap file contains > + `N` 32-bit name-hash values, one per object in the > + pack/MIDX. The format and meaning of the name-hash is > + described below. > > 4-byte entry count (network byte order) > - > The total count of entries (bitmapped commits) in this bitmap index. > > 20-byte checksum > - > The SHA1 checksum of the pack/MIDX this bitmap index > belongs to. > > - - 4 EWAH bitmaps that act as type indexes > + * 4 EWAH bitmaps that act as type indexes > > Type indexes are serialized after the hash cache in the shape > of four EWAH bitmaps stored consecutively (see Appendix A for > @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > There is a bitmap for each Git object type, stored in the following > order: > - > - Commits > - Trees > - Blobs > @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > in a full set (all bits set), and the AND of all 4 bitmaps will > result in an empty bitmap (no bits set). > > - - N entries with compressed bitmaps, one for each indexed commit > + * N entries with compressed bitmaps, one for each indexed commit > > Where `N` is the total amount of entries in this bitmap index. > Each entry contains the following: > > - - 4-byte object position (network byte order) > - The position **in the index for the packfile or > - multi-pack index** where the bitmap for this commit is > - found. > - > - - 1-byte XOR-offset > - The xor offset used to compress this bitmap. For an entry > - in position `x`, a XOR offset of `y` means that the actual > - bitmap representing this commit is composed by XORing the > - bitmap for this entry with the bitmap in entry `x-y` (i.e. > - the bitmap `y` entries before this one). > - > - Note that this compression can be recursive. In order to > - XOR this entry with a previous one, the previous entry needs > - to be decompressed first, and so on. > - > - The hard-limit for this offset is 160 (an entry can only be > - xor'ed against one of the 160 entries preceding it). This > - number is always positive, and hence entries are always xor'ed > - with **previous** bitmaps, not bitmaps that will come afterwards > - in the index. > - > - - 1-byte flags for this bitmap > - At the moment the only available flag is `0x1`, which hints > - that this bitmap can be re-used when rebuilding bitmap indexes > - for the repository. > - > - - The compressed bitmap itself, see Appendix A. > + ** 4-byte object position (network byte order) > + The position **in the index for the packfile or > + multi-pack index** where the bitmap for this commit is > + found. > + > + ** 1-byte XOR-offset > + The xor offset used to compress this bitmap. For an entry > + in position `x`, a XOR offset of `y` means that the actual > + bitmap representing this commit is composed by XORing the > + bitmap for this entry with the bitmap in entry `x-y` (i.e. > + the bitmap `y` entries before this one). > + > + Note that this compression can be recursive. In order to > + XOR this entry with a previous one, the previous entry needs > + to be decompressed first, and so on. > + > + The hard-limit for this offset is 160 (an entry can only be > + xor'ed against one of the 160 entries preceding it). This > + number is always positive, and hence entries are always xor'ed > + with **previous** bitmaps, not bitmaps that will come afterwards > + in the index. > + > + ** 1-byte flags for this bitmap > + At the moment the only available flag is `0x1`, which hints > + that this bitmap can be re-used when rebuilding bitmap indexes > + for the repository. > + > + ** The compressed bitmap itself, see Appendix A. > > == Appendix A: Serialization format for an EWAH bitmap
Junio C Hamano <gitster@pobox.com> wrote: > Identify those who may have input with "git log --no-merges" and add > them here, perhaps? Thanks, I hopefully cc'd all the people who can give some input about the patch except Peff. I got to know that he took a break so I decided not to cc him (will surely do if you say). I would love to hear from other people who has knowledge on asciidoc. I previously informed Taylor and Kaartic about the patch but forgot to cc them :P Another thing to note that the checksum that I included in the last commit is suggested by Taylor himself. I was having problem to understand some portion of `load_bitmap_header()` (because I wasn't aware of the trailing checksum) when he cleared my doubt by saying that a trailer checksum exists and also suggested to make a PR addressing that - > I'm glad that it was helpful! If you think others may be confused by the same, feel free to write a patch modifying Documentation/technical/bitmap-format.txt to point out the trailing checksum. Junio wrote - > Are we missing another step that must come much earlier than this > patch? It seems to me that Documentation/Makefile does not even > consider that we should feed this file to AsciiDoc. I also think the same. At first, I thought this is intentional. When I ran `make doc` (to test the resulting html file), it didn't generate any html file for bitmap-format.txt. But thankfully there is an online asciidoc editor[1] where you can check the resulting html file. You also can check the resulting html by copy-pasting the content[2] of my github branch bitmap-format file to that editor. Will write a patch for it. The current broken page can be found at - https://git-scm.com/docs/bitmap-format > Do these blank lines hurt very badly how the end-result is formatted > in HTML? Does the extra indentation between the line with "The > following flags are supported" on it and the two bullet items in the > header make the output better in significant way? Answering to the first question - yes, those are necessary to improve the html readability (you can verify that by including and removing the blank lines in the editor and obsering the changes). This ensures that all the related paragraphes are contained in the same block. The extra identations are not necessary. I add those because I thought that these would be visually better for html page readers. If you think it does the opposite, I can remove those. I tried to use two bullets as less as possible ( In most cases, nested lists came under <pre> blocks, so I didn't have to use two bullets). But in one case, I had to use it for nested lists (Try the editor to see the rendered output). > These changes make the input text much harder to read, and are not > very welcome, so unless they are part of "fixing generated HTML is > broken", please omit them. As evidenced by the lack of HTML output > in the build system, a lot more folks read this document in text than > in HTML, and readability of the source matters. Okay, I will then remove those extra indentations. But besides that, all are necessary. I admit that readability of source matters but I think html pages are also important (even more important) for people who don't have the source codes and want to know the git internals. Thanks :) [1] https://asciidoclive.com/edit/scratch/1 [2] https://github.com/Abhra303/git/blob/fix-doc-formatting/Documentation/technical/bitmap-format.txt
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..110d7ddf8ed 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: + * A header appears at the beginning: 4-byte signature: {'B', 'I', 'T', 'M'} @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required. of the bitmap index (the same one as JGit). 2-byte flags (network byte order) - The following flags are supported: - - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED - This flag must always be present. It implies that the - bitmap index has been generated for a packfile or - multi-pack index (MIDX) with full closure (i.e. where - every single object in the packfile/MIDX can find its - parent links inside the same packfile/MIDX). This is a - requirement for the bitmap index format, also present in - JGit, that greatly reduces the complexity of the - implementation. - - - BITMAP_OPT_HASH_CACHE (0x4) - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack/MIDX. The format and meaning of the name-hash is - described below. + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the + bitmap index has been generated for a packfile or + multi-pack index (MIDX) with full closure (i.e. where + every single object in the packfile/MIDX can find its + parent links inside the same packfile/MIDX). This is a + requirement for the bitmap index format, also present in + JGit, that greatly reduces the complexity of the + implementation. + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack/MIDX. The format and meaning of the name-hash is + described below. 4-byte entry count (network byte order) - The total count of entries (bitmapped commits) in this bitmap index. 20-byte checksum - The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes + * 4 EWAH bitmaps that act as type indexes Type indexes are serialized after the hash cache in the shape of four EWAH bitmaps stored consecutively (see Appendix A for @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. There is a bitmap for each Git object type, stored in the following order: - - Commits - Trees - Blobs @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required. in a full set (all bits set), and the AND of all 4 bitmaps will result in an empty bitmap (no bits set). - - N entries with compressed bitmaps, one for each indexed commit + * N entries with compressed bitmaps, one for each indexed commit Where `N` is the total amount of entries in this bitmap index. Each entry contains the following: - - 4-byte object position (network byte order) - The position **in the index for the packfile or - multi-pack index** where the bitmap for this commit is - found. - - - 1-byte XOR-offset - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - - - The compressed bitmap itself, see Appendix A. + ** 4-byte object position (network byte order) + The position **in the index for the packfile or + multi-pack index** where the bitmap for this commit is + found. + + ** 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + ** 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap