mbox series

[v2,0/3] bitmap-format.txt: fix some formatting issues and include checksum info

Message ID pull.1246.v2.git.1654623814.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series bitmap-format.txt: fix some formatting issues and include checksum info | expand

Message

Philippe Blain via GitGitGadget June 7, 2022, 5:43 p.m. UTC
There are some issues in the bitmap-format html page. For example, some
nested lists are shown as top-level lists (e.g. [1]- Here
BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
top-level list). There is also a need of adding info about trailing checksum
in the docs.

Changes since v1:

 * a new commit addressing bitmap-format.txt html page generation is added
 * Remove extra indentation from the previous change
 * elaborate more about the trailing checksum (as suggested by Kaartic)

initial version:

 * first commit fixes some formatting issues
 * information about trailing checksum in the bitmap file is added in the
   bitmap-format doc.

[1] https://git-scm.com/docs/bitmap-format#_on_disk_format

Abhradeep Chakraborty (3):
  bitmap-format.txt: feed the file to asciidoc to generate html
  bitmap-format.txt: fix some formatting issues
  bitmap-format.txt: add information for trailing checksum

 Documentation/Makefile                    |  1 +
 Documentation/technical/bitmap-format.txt | 24 +++++++++++------------
 2 files changed, 12 insertions(+), 13 deletions(-)


base-commit: 2668e3608e47494f2f10ef2b6e69f08a84816bcb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1246

Range-diff vs v1:

 -:  ----------- > 1:  a1b9bd9af90 bitmap-format.txt: feed the file to asciidoc to generate html
 1:  976361e624a ! 2:  cb919513c14 bitmap-format.txt: fix some formatting issues
     @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac
      -
       			The following flags are supported:
      -
     --			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
     --			This flag must always be present. It implies that the
     --			bitmap index has been generated for a packfile or
     --			multi-pack index (MIDX) with full closure (i.e. where
     --			every single object in the packfile/MIDX can find its
     --			parent links inside the same packfile/MIDX). This is a
     --			requirement for the bitmap index format, also present in
     --			JGit, that greatly reduces the complexity of the
     --			implementation.
     + 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
     + 			This flag must always be present. It implies that the
     + 			bitmap index has been generated for a packfile or
     +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     + 			requirement for the bitmap index format, also present in
     + 			JGit, that greatly reduces the complexity of the
     + 			implementation.
      -
     --			- BITMAP_OPT_HASH_CACHE (0x4)
     --			If present, the end of the bitmap file contains
     --			`N` 32-bit name-hash values, one per object in the
     --			pack/MIDX. The format and meaning of the name-hash is
     --			described below.
     -+				- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
     -+				This flag must always be present. It implies that the
     -+				bitmap index has been generated for a packfile or
     -+				multi-pack index (MIDX) with full closure (i.e. where
     -+				every single object in the packfile/MIDX can find its
     -+				parent links inside the same packfile/MIDX). This is a
     -+				requirement for the bitmap index format, also present in
     -+				JGit, that greatly reduces the complexity of the
     -+				implementation.
     -+				- BITMAP_OPT_HASH_CACHE (0x4)
     -+				If present, the end of the bitmap file contains
     -+				`N` 32-bit name-hash values, one per object in the
     -+				pack/MIDX. The format and meaning of the name-hash is
     -+				described below.
     + 			- BITMAP_OPT_HASH_CACHE (0x4)
     + 			If present, the end of the bitmap file contains
     + 			`N` 32-bit name-hash values, one per object in the
     +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     + 			described below.
       
       		4-byte entry count (network byte order)
      -
     @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac
       		Each entry contains the following:
       
      -		- 4-byte object position (network byte order)
     --			The position **in the index for the packfile or
     --			multi-pack index** where the bitmap for this commit is
     --			found.
     --
     ++		** 4-byte object position (network byte order)
     + 			The position **in the index for the packfile or
     + 			multi-pack index** where the bitmap for this commit is
     + 			found.
     + 
      -		- 1-byte XOR-offset
     --			The xor offset used to compress this bitmap. For an entry
     --			in position `x`, a XOR offset of `y` means that the actual
     --			bitmap representing this commit is composed by XORing the
     --			bitmap for this entry with the bitmap in entry `x-y` (i.e.
     --			the bitmap `y` entries before this one).
     --
     --			Note that this compression can be recursive. In order to
     --			XOR this entry with a previous one, the previous entry needs
     --			to be decompressed first, and so on.
     --
     --			The hard-limit for this offset is 160 (an entry can only be
     --			xor'ed against one of the 160 entries preceding it). This
     --			number is always positive, and hence entries are always xor'ed
     --			with **previous** bitmaps, not bitmaps that will come afterwards
     --			in the index.
     --
     ++		** 1-byte XOR-offset
     + 			The xor offset used to compress this bitmap. For an entry
     + 			in position `x`, a XOR offset of `y` means that the actual
     + 			bitmap representing this commit is composed by XORing the
     +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     + 			with **previous** bitmaps, not bitmaps that will come afterwards
     + 			in the index.
     + 
      -		- 1-byte flags for this bitmap
     --			At the moment the only available flag is `0x1`, which hints
     --			that this bitmap can be re-used when rebuilding bitmap indexes
     --			for the repository.
     --
     ++		** 1-byte flags for this bitmap
     + 			At the moment the only available flag is `0x1`, which hints
     + 			that this bitmap can be re-used when rebuilding bitmap indexes
     + 			for the repository.
     + 
      -		- The compressed bitmap itself, see Appendix A.
     -+			** 4-byte object position (network byte order)
     -+				The position **in the index for the packfile or
     -+				multi-pack index** where the bitmap for this commit is
     -+				found.
     -+
     -+			** 1-byte XOR-offset
     -+				The xor offset used to compress this bitmap. For an entry
     -+				in position `x`, a XOR offset of `y` means that the actual
     -+				bitmap representing this commit is composed by XORing the
     -+				bitmap for this entry with the bitmap in entry `x-y` (i.e.
     -+				the bitmap `y` entries before this one).
     -+
     -+				Note that this compression can be recursive. In order to
     -+				XOR this entry with a previous one, the previous entry needs
     -+				to be decompressed first, and so on.
     -+
     -+				The hard-limit for this offset is 160 (an entry can only be
     -+				xor'ed against one of the 160 entries preceding it). This
     -+				number is always positive, and hence entries are always xor'ed
     -+				with **previous** bitmaps, not bitmaps that will come afterwards
     -+				in the index.
     -+
     -+			** 1-byte flags for this bitmap
     -+				At the moment the only available flag is `0x1`, which hints
     -+				that this bitmap can be re-used when rebuilding bitmap indexes
     -+				for the repository.
     -+
     -+			** The compressed bitmap itself, see Appendix A.
     ++		** The compressed bitmap itself, see Appendix A.
       
       == Appendix A: Serialization format for an EWAH bitmap
       
 2:  ba534b5d486 ! 3:  2171d31fb2b bitmap-format.txt: add information for trailing checksum
     @@ Commit message
       ## Documentation/technical/bitmap-format.txt ##
      @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
       
     - 			** The compressed bitmap itself, see Appendix A.
     + 		** The compressed bitmap itself, see Appendix A.
       
      +	* TRAILER:
      +
     -+		Index checksum of the above contents.
     ++		Index checksum of the above contents. It is a 20-byte SHA1 checksum.
      +
       == Appendix A: Serialization format for an EWAH bitmap

Comments

Junio C Hamano June 7, 2022, 6:28 p.m. UTC | #1
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> There are some issues in the bitmap-format html page.

"First, it does not even exist!" before anything else ;-)

> For example, some
> nested lists are shown as top-level lists (e.g. [1]- Here
> BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
> top-level list). There is also a need of adding info about trailing checksum
> in the docs.
>
> Changes since v1:
>
>  * a new commit addressing bitmap-format.txt html page generation is added

Good.

>  * Remove extra indentation from the previous change

Good.

>  * elaborate more about the trailing checksum (as suggested by Kaartic)

Good.

Will take a look (and audiences are requested to do so, too).

Thanks.
Taylor Blau June 7, 2022, 8:58 p.m. UTC | #2
On Tue, Jun 07, 2022 at 11:28:17AM -0700, Junio C Hamano wrote:
> Will take a look (and audiences are requested to do so, too).

I think this is on a good track. The rendered HTML still has much of its
content inside of <pre> elements, but that may be an acceptable
trade-off to maintain readability of the source material.

If there's a way to make the rendered page more appealing without
compromising on the readability of the source, I'd be in favor of that.
But I trust Abhradeep's judgement here, so if there isn't, I'd be happy
with the series (mostly) as-is.

I left a textual suggestion on the third patch, which I'd like to adopt
before picking this up (this will also give Abhradeep a chance to
investigate the formatting improvements on patch 2/3).

In the meantime, it's probably safe to drop Vicent Martí from the CC
list, since he is no longer working on Git (though I miss him very
much!).

Thanks,
Taylor
Junio C Hamano June 7, 2022, 9 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> There are some issues in the bitmap-format html page.
>
> "First, it does not even exist!" before anything else ;-)
>
>> For example, some
>> nested lists are shown as top-level lists (e.g. [1]- Here
>> BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
>> top-level list). There is also a need of adding info about trailing checksum
>> in the docs.
> ...

No, this is not quite ready for production.

Almost all the "indented" material are shown in fixed-width
typewriter format in the resulting HTML output.

Look how ugly the output from it is.  Not your fault; it is mostly
because when the original text was written, it was not even meant to
be given to AsciiDoc.

  https://twitter.com/jch2355/status/1534276427607986178/photo/1
  https://pbs.twimg.com/media/FUrYP2nakAAnRaH?format=png

And as I already said, removal of the blank lines made it harder to
see what is going on in the source, and because the output is pretty
much straight copy of the source in the fixed-font, just like reading
the source in the terminal, the output here is equally hard to read.

  https://twitter.com/jch2355/status/1534277664441511937/photo/1
  https://pbs.twimg.com/media/FUrZZXUUsAEmEeT?format=png

If we really want to give it to AsciiDoc, we'd need to reformat it
more extensively, not just tweak it on the surface and making an
equivalent of <pre>...</pre> slightly easier to read, which is what
this patch does.

Thanks.
Abhradeep Chakraborty June 8, 2022, 5:12 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> wrote:

> No, this is not quite ready for production.
>
> Almost all the "indented" material are shown in fixed-width
> typewriter format in the resulting HTML output.
>
> Look how ugly the output from it is.  Not your fault; it is mostly
> because when the original text was written, it was not even meant to
> be given to AsciiDoc.

Actually, I am wondering how git-scm.com is able to produce a html page
for bitmap-format.txt (if it is not passing to asciidoc). The design of
asciidoc generated html pages in `make docs` are not same as the design
of production html page designs. Probably, production uses some extra
css code to beautify the asciidoc generated html files.

So, the generated html file (production version) is not as bad as the
locally built generated html. I need some understanding of the working
of git-scm though (to verify it).

If you see other locally built html pages - they would look similar to
the bitmap-format html page. But in production, they are beautiful enough.

By the way, I forgot to inform that https://git-scm.com/docs/pack-format#_original_version_1_pack_idx_files_have_the_following_format also has
some weird formatting issues. See the <pre> block after the pack-idx structure
drawing. There are other issues also which you can find (like having
unnecessary indentations e.g. here[1] the second block under the "The header
is followed by number of object entries....").

Thanks :)

[1] https://git-scm.com/docs/pack-format#_pack_pack_files_have_the_following_format