diff mbox series

[2/5] index-format.txt: document SHA-256 index format

Message ID e811455d55cdb222a85d880f3cf3d5e28a8d4c91.1597406877.git.martin.agren@gmail.com (mailing list archive)
State Superseded
Headers show
Series more SHA-256 documentation | expand

Commit Message

Martin Ågren Aug. 14, 2020, 12:21 p.m. UTC
Similar to a recent commit, document that in SHA-1 repositories, we use
SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
other uses of "SHA-1" with something more neutral.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/index-format.txt | 27 +++++++++++++-----------
 1 file changed, 15 insertions(+), 12 deletions(-)

Comments

Derrick Stolee Aug. 14, 2020, 12:28 p.m. UTC | #1
On 8/14/2020 8:21 AM, Martin Ågren wrote:
> Similar to a recent commit, document that in SHA-1 repositories, we use
> SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
> other uses of "SHA-1" with something more neutral.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/index-format.txt | 27 +++++++++++++-----------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
> index faa25c5c52..827ece2ed1 100644
> --- a/Documentation/technical/index-format.txt
> +++ b/Documentation/technical/index-format.txt
> @@ -3,8 +3,11 @@ Git index format
>  
>  == The Git index file has the following format
>  
> -  All binary numbers are in network byte order. Version 2 is described
> -  here unless stated otherwise.
> +  All binary numbers are in network byte order.
> +  In a repository using the traditional SHA-1, checksums and object IDs
> +  (object names) mentioned below are all computed using SHA-1.  Similarly,
> +  in SHA-256 repositories, these values are computed using SHA-256.
> +  Version 2 is described here unless stated otherwise.
>  
>     - A 12-byte header consisting of
>  
> @@ -32,7 +35,7 @@ Git index format
>  
>       Extension data
>  
> -   - 160-bit SHA-1 over the content of the index file before this
> +   - 160-bit hash checksum over the content of the index file before this
>       checksum.

If this hash is flexible, then "160-bit" is not correct anymore, right?

>  == Index entry
> @@ -80,7 +83,7 @@ Git index format
>    32-bit file size
>      This is the on-disk size from stat(2), truncated to 32-bit.
>  
> -  160-bit SHA-1 for the represented object
> +  160-bit object name for the represented object

Same here. The later instances of "160-bit" were dropped.

>    A 16-bit 'flags' field split into (high to low bits)
>  
> @@ -211,8 +214,8 @@ Git index format
>  
>    The extension consists of:
>  
> -  - 160-bit SHA-1 of the shared index file. The shared index file path
> -    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
> +  - Hash of the shared index file. The shared index file path
> +    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
>      index does not require a shared index file.
>  
>    - An ewah-encoded delete bitmap, each bit represents an entry in the
> @@ -253,10 +256,10 @@ Git index format
>  
>    - 32-bit dir_flags (see struct dir_struct)
>  
> -  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
> +  - Hash of $GIT_DIR/info/exclude. A null hash means the file
>      does not exist.
>  
> -  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
> +  - Hash of core.excludesfile. A null hash means the file does
>      not exist.
>  
>    - NUL-terminated string of per-dir exclude file name. This usually
> @@ -285,13 +288,13 @@ The remaining data of each directory block is grouped by type:
>    - An ewah bitmap, the n-th bit records "check-only" bit of
>      read_directory_recursive() for the n-th directory.
>  
> -  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
> +  - An ewah bitmap, the n-th bit indicates whether hash and stat data
>      is valid for the n-th directory and exists in the next data.
>  
>    - An array of stat data. The n-th data corresponds with the n-th
>      "one" bit in the previous ewah bitmap.
>  
> -  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
> +  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
>      in the previous ewah bitmap.
>  
>    - One NUL.
> @@ -330,12 +333,12 @@ The remaining data of each directory block is grouped by type:
>  
>    - 32-bit offset to the end of the index entries
>  
> -  - 160-bit SHA-1 over the extension types and their sizes (but not
> +  - Hash over the extension types and their sizes (but not
>  	their contents).  E.g. if we have "TREE" extension that is N-bytes
>  	long, "REUC" extension that is M-bytes long, followed by "EOIE",
>  	then the hash would be:
>  
> -	SHA-1("TREE" + <binary representation of N> +
> +	Hash("TREE" + <binary representation of N> +
>  		"REUC" + <binary representation of M>)
>  
>  == Index Entry Offset Table
> 

Thanks,
-Stolee
Martin Ågren Aug. 14, 2020, 2:05 p.m. UTC | #2
On Fri, 14 Aug 2020 at 14:28, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > -   - 160-bit SHA-1 over the content of the index file before this
> > +   - 160-bit hash checksum over the content of the index file before this
> >       checksum.
>
> If this hash is flexible, then "160-bit" is not correct anymore, right?
>
> > -  160-bit SHA-1 for the represented object
> > +  160-bit object name for the represented object
>
> Same here. The later instances of "160-bit" were dropped.

Thanks for pointing out these errors.


Martin
diff mbox series

Patch

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index faa25c5c52..827ece2ed1 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@  Git index format
 
 == The Git index file has the following format
 
-  All binary numbers are in network byte order. Version 2 is described
-  here unless stated otherwise.
+  All binary numbers are in network byte order.
+  In a repository using the traditional SHA-1, checksums and object IDs
+  (object names) mentioned below are all computed using SHA-1.  Similarly,
+  in SHA-256 repositories, these values are computed using SHA-256.
+  Version 2 is described here unless stated otherwise.
 
    - A 12-byte header consisting of
 
@@ -32,7 +35,7 @@  Git index format
 
      Extension data
 
-   - 160-bit SHA-1 over the content of the index file before this
+   - 160-bit hash checksum over the content of the index file before this
      checksum.
 
 == Index entry
@@ -80,7 +83,7 @@  Git index format
   32-bit file size
     This is the on-disk size from stat(2), truncated to 32-bit.
 
-  160-bit SHA-1 for the represented object
+  160-bit object name for the represented object
 
   A 16-bit 'flags' field split into (high to low bits)
 
@@ -211,8 +214,8 @@  Git index format
 
   The extension consists of:
 
-  - 160-bit SHA-1 of the shared index file. The shared index file path
-    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+  - Hash of the shared index file. The shared index file path
+    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
     index does not require a shared index file.
 
   - An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -253,10 +256,10 @@  Git index format
 
   - 32-bit dir_flags (see struct dir_struct)
 
-  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+  - Hash of $GIT_DIR/info/exclude. A null hash means the file
     does not exist.
 
-  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+  - Hash of core.excludesfile. A null hash means the file does
     not exist.
 
   - NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +288,13 @@  The remaining data of each directory block is grouped by type:
   - An ewah bitmap, the n-th bit records "check-only" bit of
     read_directory_recursive() for the n-th directory.
 
-  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+  - An ewah bitmap, the n-th bit indicates whether hash and stat data
     is valid for the n-th directory and exists in the next data.
 
   - An array of stat data. The n-th data corresponds with the n-th
     "one" bit in the previous ewah bitmap.
 
-  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
     in the previous ewah bitmap.
 
   - One NUL.
@@ -330,12 +333,12 @@  The remaining data of each directory block is grouped by type:
 
   - 32-bit offset to the end of the index entries
 
-  - 160-bit SHA-1 over the extension types and their sizes (but not
+  - Hash over the extension types and their sizes (but not
 	their contents).  E.g. if we have "TREE" extension that is N-bytes
 	long, "REUC" extension that is M-bytes long, followed by "EOIE",
 	then the hash would be:
 
-	SHA-1("TREE" + <binary representation of N> +
+	Hash("TREE" + <binary representation of N> +
 		"REUC" + <binary representation of M>)
 
 == Index Entry Offset Table