From patchwork Thu Aug 13 22:49:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11713143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75A00138C for ; Thu, 13 Aug 2020 22:49:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5CC3020855 for ; Thu, 13 Aug 2020 22:49:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (3072-bit key) header.d=crustytoothpaste.net header.i=@crustytoothpaste.net header.b="0MSw8Jo+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726597AbgHMWtm (ORCPT ); Thu, 13 Aug 2020 18:49:42 -0400 Received: from injection.crustytoothpaste.net ([192.241.140.119]:41492 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbgHMWtm (ORCPT ); Thu, 13 Aug 2020 18:49:42 -0400 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id E1204607A2; Thu, 13 Aug 2020 22:49:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1597358951; bh=NakwnmNVGLdxMwzUueifo9xvgmqSzI0buqb8gs+VWR4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=0MSw8Jo+8di+S8Fd1BqEvfz+KfxXzLjeo83OS01/qIPW/XXk2rU9H7q60JNzJSqLA m9BeT9a0vJ2EPeMeRmq58pje6elxHHEgB4Os94nHZE7Nn2T2e/nb93lHKAtnd/ejBa 8cWwbkBSjQZ8ivRcMDVsx2J53FaKHhVFGscBMKnc83WjRyzdSXsBbl/iWlTbf3wbUR I66dg20p0IjGNy4Fwnz3Xx/rg9CC9WEdDAIqOOmz6FdAjgPkLSF9ocnSMm3c8MyG5V UvneRERH0Uq3caf2ObCJ/gw4M3GABHK8ZSOw33hPwqaYDsEN1IEgCC+nPML54L9TRw YaL6zkgaM4LxgC7pIeh1K5CQDLRf4ubYjVytI7bUp4+oMGPKJKfmrn9GgAMM3M/6Eu csD+xSaGxHmp/jLJimsGCUyF4LSNtjsRKJ6d41/xp3mpDZlfam2HQKJAAuTataGcjJ 02/vkY4WoevhKTtkgwW+9vq08Ci49Nv/q5CE2++Eq3SoveIvAM6 From: "brian m. carlson" To: Cc: =?utf-8?q?Martin_=C3=85gren?= Subject: [PATCH 1/2] docs: document SHA-256 pack and indices Date: Thu, 13 Aug 2020 22:49:00 +0000 Message-Id: <20200813224901.2652387-2-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.28.0.220.ged08abb693 In-Reply-To: <20200813224901.2652387-1-sandals@crustytoothpaste.net> References: <20200813224901.2652387-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Now that we have SHA-256 support for packs and indices, let's document that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object names and checksums. Instead of duplicating this information throughout the document, let's just document that in SHA-1 repositories, we use SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256. Signed-off-by: brian m. carlson --- Documentation/technical/pack-format.txt | 36 ++++++++++++++----------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index d3a142c652..f4c8d94f73 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -1,6 +1,12 @@ Git pack format =============== +== Checksums and object IDs + +In a repository using the traditional SHA-1, pack checksums, index checksums, +and object IDs (object names) mentioned below are all computed using SHA-1. +Similarly, in SHA-256 repositories, these values are computed using SHA-256. + == pack-*.pack files have the following format: - A header appears at the beginning and consists of the following: @@ -26,7 +32,7 @@ Git pack format (deltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) - 20-byte base object name if OBJ_REF_DELTA or a negative relative + base object name if OBJ_REF_DELTA or a negative relative offset from the delta object's position in the pack if this is an OBJ_OFS_DELTA object compressed delta data @@ -34,7 +40,7 @@ Git pack format Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything. - - The trailer records 20-byte SHA-1 checksum of all of the above. + - The trailer records a pack checksum of all of the above. === Object types @@ -58,8 +64,8 @@ ofs-delta and ref-delta, which is only valid in a pack file. Both ofs-delta and ref-delta store the "delta" to be applied to another object (called 'base object') to reconstruct the object. The -difference between them is, ref-delta directly encodes 20-byte base -object name. If the base object is in the same pack, ofs-delta encodes +difference between them is, ref-delta directly encodes base object +name. If the base object is in the same pack, ofs-delta encodes the offset of the base object in the pack instead. The base object could also be deltified if it's in the same pack. @@ -143,14 +149,14 @@ This is the instruction reserved for future expansion. object is stored in the packfile as the offset from the beginning. - 20-byte object name. + one object name of the appropriate size. - The file is concluded with a trailer: - A copy of the 20-byte SHA-1 checksum at the end of - corresponding packfile. + A copy of the pack checksum at the end of the corresponding + packfile. - 20-byte SHA-1-checksum of all of the above. + Index checksum of all of the above. Pack Idx file: @@ -198,7 +204,7 @@ Pack file entry: <+ If it is not DELTA, then deflated bytes (the size above is the size before compression). If it is REF_DELTA, then - 20-byte base object name SHA-1 (the size above is the + base object name (the size above is the size of the delta data that follows). delta data, deflated. If it is OFS_DELTA, then @@ -227,9 +233,9 @@ Pack file entry: <+ - A 256-entry fan-out table just like v1. - - A table of sorted 20-byte SHA-1 object names. These are - packed together without offset values to reduce the cache - footprint of the binary search for a specific object name. + - A table of sorted object names. These are packed together + without offset values to reduce the cache footprint of the + binary search for a specific object name. - A table of 4-byte CRC32 values of the packed object data. This is new in v2 so compressed data can be copied directly @@ -248,10 +254,10 @@ Pack file entry: <+ - The same trailer as a v1 pack file: - A copy of the 20-byte SHA-1 checksum at the end of + A copy of the pack checksum at the end of corresponding packfile. - 20-byte SHA-1-checksum of all of the above. + Index checksum of all of the above. == multi-pack-index (MIDX) files have the following format: @@ -329,4 +335,4 @@ CHUNK DATA: TRAILER: - 20-byte SHA1-checksum of the above contents. + Index checksum of the above contents. From patchwork Thu Aug 13 22:49:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11713145 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAC331744 for ; Thu, 13 Aug 2020 22:49:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB70D207DA for ; Thu, 13 Aug 2020 22:49:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (3072-bit key) header.d=crustytoothpaste.net header.i=@crustytoothpaste.net header.b="K75Zhoea" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726606AbgHMWtn (ORCPT ); Thu, 13 Aug 2020 18:49:43 -0400 Received: from injection.crustytoothpaste.net ([192.241.140.119]:41496 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726596AbgHMWtm (ORCPT ); Thu, 13 Aug 2020 18:49:42 -0400 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id 5F71A6081E; Thu, 13 Aug 2020 22:49:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1597358951; bh=B3h++Hc4P9Hj98dIK04qBWKBYsXASrBJcG4cGEcrx90=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=K75Zhoea0CdAa9gWh80chHBKd9syCKIZUOsaD0JpE+c2ADVcJUFqH0d79t7WvibMz lsmvbtz2OXfvS0KLQy5vWHGPUVDkIfngnZkFwVCiGaFt0g8db8Cpqa8PoGxL1HBT22 jIPFb/Dal6DLfmGObsYq5Ljx5MyZ1sKP39mpIcEFLP/IbE3iNbhxsvlWUE/gAAveJH WYxvqVgsk3eRkmbZoGbZOXOVSEtGMAKLNgua3MQx0MHm69rmRW3l9FIiEZXenFqFaI CWP2QQ+zaJbLs9j9VhzT4Z56Q9PH6yPVUJ6sxR9ctP45Pxt/n1EXiaDmIXzErCmZXi JaQGO15rkt+8izjBSnrPXZi9CRlH31dHbucRRiZtZGY1+v67A6WC8pSQ2rI9M3i+Ls 9R1pyH9XZDeBp04ZN1gdtQFaGx+5WRhLP2U5JDqMlltYUKqzIN63SfZYk7o54i57Fq WpjHpQ803w9i/rt0vL7Viggfnehm371jmyiGZSDSXzJfF1Ac4pW From: "brian m. carlson" To: Cc: =?utf-8?q?Martin_=C3=85gren?= Subject: [PATCH 2/2] docs: fix step in transition plan Date: Thu, 13 Aug 2020 22:49:01 +0000 Message-Id: <20200813224901.2652387-3-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.28.0.220.ged08abb693 In-Reply-To: <20200813224901.2652387-1-sandals@crustytoothpaste.net> References: <20200813224901.2652387-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org One of the required steps for the objectFormat extension is to implement the loose object index. However, without support for compatObjectFormat, we don't even know if the loose object index is needed, so it makes sense to move that step to the compatObjectFormat section. Do so. Signed-off-by: brian m. carlson --- Documentation/technical/hash-function-transition.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 5b2db3be1e..6fd20ebbc2 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -650,7 +650,6 @@ Some initial steps can be implemented independently of one another: The first user-visible change is the introduction of the objectFormat extension (without compatObjectFormat). This requires: -- implementing the loose-object-idx - teaching fsck about this mode of operation - using the hash function API (vtable) when computing object names - signing objects and verifying signatures @@ -658,6 +657,7 @@ extension (without compatObjectFormat). This requires: repository Next comes introduction of compatObjectFormat: +- implementing the loose-object-idx - translating object names between object formats - translating object content between object formats - generating and verifying signatures in the compat format