mbox series

[v3,0/3] bitmap-format.txt: fix some formatting issues and include checksum info

Message ID pull.1246.v3.git.1654858481.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series bitmap-format.txt: fix some formatting issues and include checksum info | expand

Message

Philippe Blain via GitGitGadget June 10, 2022, 10:54 a.m. UTC
There are some issues in the bitmap-format html page. For example, some
nested lists are shown as top-level lists (e.g. [1]- Here
BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
top-level list). There is also a need of adding info about trailing checksum
in the docs.

Changes since v2: The last two commits are updated to address the
suggestions. These changes are -

 * previously omitted blank lines are re-added. In the updated commit, use
   of <pre> blocks are decreased. Description lists and + are used instead
   to add more than one paragraphs under lists. Readability of the source
   text might decrease due to the use of +. But other documentation files
   (e.g. git-add.txt) also use it to connect two paragraphs. So, I hope this
   is acceptable.

 * Information about trailing checksum is updated (as suggested by Taylor)

Changes since v1:

 * a new commit addressing bitmap-format.txt html page generation is added
 * Remove extra indentation from the previous change
 * elaborate more about the trailing checksum (as suggested by Kaartic)

initial version:

 * first commit fixes some formatting issues
 * information about trailing checksum in the bitmap file is added in the
   bitmap-format doc.

[1] https://git-scm.com/docs/bitmap-format#_on_disk_format

Abhradeep Chakraborty (3):
  bitmap-format.txt: feed the file to asciidoc to generate html
  bitmap-format.txt: fix some formatting issues
  bitmap-format.txt: add information for trailing checksum

 Documentation/Makefile                    |   1 +
 Documentation/technical/bitmap-format.txt | 113 ++++++++++++----------
 2 files changed, 63 insertions(+), 51 deletions(-)


base-commit: 2668e3608e47494f2f10ef2b6e69f08a84816bcb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1246

Range-diff vs v2:

 1:  a1b9bd9af90 = 1:  a1b9bd9af90 bitmap-format.txt: feed the file to asciidoc to generate html
 2:  cb919513c14 ! 2:  c74b9a52c2a bitmap-format.txt: fix some formatting issues
     @@ Commit message
          format.txt` is broken. This is mainly because `-` is used for nested
          lists (which is not allowed in asciidoc) instead of `*`.
      
     -    Fix these and also reformat it (e.g. removing some blank lines) for
     -    better readability of the html page.
     +    Fix these and also reformat it for better readability of the html page.
      
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
     @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac
      -	- A header appears at the beginning:
      +	* A header appears at the beginning:
       
     - 		4-byte signature: {'B', 'I', 'T', 'M'}
     +-		4-byte signature: {'B', 'I', 'T', 'M'}
     ++		4-byte signature: :: {'B', 'I', 'T', 'M'}
     ++
     ++		2-byte version number (network byte order): ::
       
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     +-		2-byte version number (network byte order)
     + 			The current implementation only supports version 1
       			of the bitmap index (the same one as JGit).
       
     - 		2-byte flags (network byte order)
     --
     +-		2-byte flags (network byte order)
     ++		2-byte flags (network byte order): ::
     + 
       			The following flags are supported:
     --
     - 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
     + 
     +-			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
     ++			** {empty}
     ++			BITMAP_OPT_FULL_DAG (0x1) REQUIRED: :::
     ++
       			This flag must always be present. It implies that the
       			bitmap index has been generated for a packfile or
     + 			multi-pack index (MIDX) with full closure (i.e. where
      @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     - 			requirement for the bitmap index format, also present in
       			JGit, that greatly reduces the complexity of the
       			implementation.
     --
     - 			- BITMAP_OPT_HASH_CACHE (0x4)
     + 
     +-			- BITMAP_OPT_HASH_CACHE (0x4)
     ++			** {empty}
     ++			BITMAP_OPT_HASH_CACHE (0x4): :::
     ++
       			If present, the end of the bitmap file contains
       			`N` 32-bit name-hash values, one per object in the
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     + 			pack/MIDX. The format and meaning of the name-hash is
       			described below.
       
     - 		4-byte entry count (network byte order)
     +-		4-byte entry count (network byte order)
      -
     ++		4-byte entry count (network byte order): ::
       			The total count of entries (bitmapped commits) in this bitmap index.
       
     - 		20-byte checksum
     +-		20-byte checksum
      -
     ++		20-byte checksum: ::
       			The SHA1 checksum of the pack/MIDX this bitmap index
       			belongs to.
       
      -	- 4 EWAH bitmaps that act as type indexes
     -+	* 4 EWAH bitmaps that act as type indexes
     - 
     - 		Type indexes are serialized after the hash cache in the shape
     - 		of four EWAH bitmaps stored consecutively (see Appendix A for
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     - 
     - 		There is a bitmap for each Git object type, stored in the following
     - 		order:
      -
     - 			- Commits
     - 			- Trees
     - 			- Blobs
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     - 		in a full set (all bits set), and the AND of all 4 bitmaps will
     - 		result in an empty bitmap (no bits set).
     - 
     +-		Type indexes are serialized after the hash cache in the shape
     +-		of four EWAH bitmaps stored consecutively (see Appendix A for
     +-		the serialization format of an EWAH bitmap).
     +-
     +-		There is a bitmap for each Git object type, stored in the following
     +-		order:
     +-
     +-			- Commits
     +-			- Trees
     +-			- Blobs
     +-			- Tags
     +-
     +-		In each bitmap, the `n`th bit is set to true if the `n`th object
     +-		in the packfile or multi-pack index is of that type.
     +-
     +-		The obvious consequence is that the OR of all 4 bitmaps will result
     +-		in a full set (all bits set), and the AND of all 4 bitmaps will
     +-		result in an empty bitmap (no bits set).
     +-
      -	- N entries with compressed bitmaps, one for each indexed commit
     -+	* N entries with compressed bitmaps, one for each indexed commit
     - 
     - 		Where `N` is the total amount of entries in this bitmap index.
     - 		Each entry contains the following:
     - 
     +-
     +-		Where `N` is the total amount of entries in this bitmap index.
     +-		Each entry contains the following:
     +-
      -		- 4-byte object position (network byte order)
     -+		** 4-byte object position (network byte order)
     ++	* 4 EWAH bitmaps that act as type indexes
     +++
     ++Type indexes are serialized after the hash cache in the shape
     ++of four EWAH bitmaps stored consecutively (see Appendix A for
     ++the serialization format of an EWAH bitmap).
     +++
     ++There is a bitmap for each Git object type, stored in the following
     ++order:
     +++
     ++	- Commits
     ++	- Trees
     ++	- Blobs
     ++	- Tags
     ++
     +++
     ++In each bitmap, the `n`th bit is set to true if the `n`th object
     ++in the packfile or multi-pack index is of that type.
     ++
     ++    The obvious consequence is that the OR of all 4 bitmaps will result
     ++    in a full set (all bits set), and the AND of all 4 bitmaps will
     ++    result in an empty bitmap (no bits set).
     ++
     ++	* N entries with compressed bitmaps, one for each indexed commit
     +++
     ++Where `N` is the total amount of entries in this bitmap index.
     ++Each entry contains the following:
     ++
     ++		** {empty}
     ++		4-byte object position (network byte order): ::
       			The position **in the index for the packfile or
       			multi-pack index** where the bitmap for this commit is
       			found.
       
      -		- 1-byte XOR-offset
     -+		** 1-byte XOR-offset
     ++		** {empty}
     ++		1-byte XOR-offset: ::
       			The xor offset used to compress this bitmap. For an entry
       			in position `x`, a XOR offset of `y` means that the actual
       			bitmap representing this commit is composed by XORing the
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     - 			with **previous** bitmaps, not bitmaps that will come afterwards
     - 			in the index.
     - 
     + 			bitmap for this entry with the bitmap in entry `x-y` (i.e.
     + 			the bitmap `y` entries before this one).
     +-
     +-			Note that this compression can be recursive. In order to
     +-			XOR this entry with a previous one, the previous entry needs
     +-			to be decompressed first, and so on.
     +-
     +-			The hard-limit for this offset is 160 (an entry can only be
     +-			xor'ed against one of the 160 entries preceding it). This
     +-			number is always positive, and hence entries are always xor'ed
     +-			with **previous** bitmaps, not bitmaps that will come afterwards
     +-			in the index.
     +-
      -		- 1-byte flags for this bitmap
     -+		** 1-byte flags for this bitmap
     +++
     ++NOTE: This compression can be recursive. In order to
     ++XOR this entry with a previous one, the previous entry needs
     ++to be decompressed first, and so on.
     +++
     ++The hard-limit for this offset is 160 (an entry can only be
     ++xor'ed against one of the 160 entries preceding it). This
     ++number is always positive, and hence entries are always xor'ed
     ++with **previous** bitmaps, not bitmaps that will come afterwards
     ++in the index.
     ++
     ++		** {empty}
     ++		1-byte flags for this bitmap: ::
       			At the moment the only available flag is `0x1`, which hints
       			that this bitmap can be re-used when rebuilding bitmap indexes
       			for the repository.
 3:  2171d31fb2b ! 3:  b971558e1cb bitmap-format.txt: add information for trailing checksum
     @@ Commit message
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Documentation/technical/bitmap-format.txt ##
     -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     +@@ Documentation/technical/bitmap-format.txt: in the index.
       
       		** The compressed bitmap itself, see Appendix A.
       
     -+	* TRAILER:
     -+
     -+		Index checksum of the above contents. It is a 20-byte SHA1 checksum.
     ++	* {empty}
     ++	TRAILER: ::
     ++		Trailing checksum of the preceding contents.
      +
       == Appendix A: Serialization format for an EWAH bitmap

Comments

Junio C Hamano June 10, 2022, 5:01 p.m. UTC | #1
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> There are some issues in the bitmap-format html page. For example, some
> nested lists are shown as top-level lists (e.g. [1]- Here
> BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
> top-level list). There is also a need of adding info about trailing checksum
> in the docs.

Quite honestly, I am not sure if a piecemeal "let's make
<pre>...</pre> a bit prettier" is worth our time.  Especially
relative to the importance of adding missing information to the
documentation.

So, if this round (I haven't looked at the formatting changes at all
yet) turns out to be still not doing the HTML properly, I'd suggest
shuffling the patches around, add missing information so that readers
can get the corrections in text regardless of the rest of HTMLify
effort.  We'll see.

Thanks.
Taylor Blau June 15, 2022, 2:28 a.m. UTC | #2
On Fri, Jun 10, 2022 at 10:01:02AM -0700, Junio C Hamano wrote:
> "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > There are some issues in the bitmap-format html page. For example, some
> > nested lists are shown as top-level lists (e.g. [1]- Here
> > BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as
> > top-level list). There is also a need of adding info about trailing checksum
> > in the docs.
>
> Quite honestly, I am not sure if a piecemeal "let's make
> <pre>...</pre> a bit prettier" is worth our time.  Especially
> relative to the importance of adding missing information to the
> documentation.
>
> So, if this round (I haven't looked at the formatting changes at all
> yet) turns out to be still not doing the HTML properly, I'd suggest
> shuffling the patches around, add missing information so that readers
> can get the corrections in text regardless of the rest of HTMLify
> effort.  We'll see.

This version of the series significantly improves the readability of the
generated HTML, and I only had a minor comment or two.

So I think that the improvement is worthwhile, though if others disagree
strongly, the third patch should get picked up regardless, since it
addresses a legitimate gap in our documentation.

Thanks,
Taylor
Junio C Hamano June 15, 2022, 10:41 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> This version of the series significantly improves the readability of the
> generated HTML, and I only had a minor comment or two.

Yeah, I looked at the output and it is improved so much to the point
that the remaining paragraph or two that are still typeset in the fixed
font incorrectly start to look even irritating ;-)

I've tentatively queued it in my tree.  I doubt that the topic is
ultra-urgent so if the remaining mark-up issues can be fixed before
the topic hits 'next', that would be great.

Thanks, both.