From patchwork Mon Dec 21 07:54:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Martin_=C3=85gren?= X-Patchwork-Id: 11984319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F213C433DB for ; Mon, 21 Dec 2020 07:56:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4716F22CA1 for ; Mon, 21 Dec 2020 07:56:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727369AbgLUH4G (ORCPT ); Mon, 21 Dec 2020 02:56:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726239AbgLUH4G (ORCPT ); Mon, 21 Dec 2020 02:56:06 -0500 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD422C061282 for ; Sun, 20 Dec 2020 23:55:25 -0800 (PST) Received: by mail-lf1-x12a.google.com with SMTP id s26so21408879lfc.8 for ; Sun, 20 Dec 2020 23:55:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vSWnhqPvBLBwrplCiTmcJcvXrXL5CVdaXRPBlbrRqd8=; b=XTWqPffBlv6n5hwLcpPqaZxs5syTrKhU3DDPbEfdNf0f+W5ai+ctQJNV5yo8H77V3a A6+zJU41qt779QooOCKrLzNP/JXEQP6LYHWGpZVP9//F0dMoTO8OXoZXvMJ/IVIqRHEQ uYiG4k0k3jxGB7ltaI63Ibspw5WcWIyG0E2irdamwMIPxJybb2p/vKqUBq01GvsDGD7X sxKPjyj6KNyu8MmzwhLPtQzzEvIAWkZTOyYoe1X+R9/A+a/6iR8DJr4Wxpc/aIXrTT2K axkauSmvtr69VRjOhw2nikpwUtc9muXAvhsIlDYqzK9IrM7+HECdMPdf5GyoMwEPTQ63 24VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vSWnhqPvBLBwrplCiTmcJcvXrXL5CVdaXRPBlbrRqd8=; b=G5G0yk+KjdItjjlp8BTLWtf/8cofXKoPBQGcKmBcksvGVrw8f9otTzdgpWk59fnw7A fphoGFecAL2z3D+6JxID16S5kIh4INvWvXK4poQTnFRmrN4do9VLYSmVhDLE23whmQSS snE/i0bYO1kAAd7Ddasb/mFueFrfMKf/RjZYvk33dZd5OtJXfCB+9Fut1TLE2TeMPcUu 87Mpek5jfBHvaAtYTKmK8n8c1jh6964qM6KnqSIqNqpV3nBNjX+fFtkejAIQrbqWPVeZ rGKHfHBSBV4H2ubMlQOWk7Z2xQTCo+3vjw9RKWi0cMHIh1OfKLHpgFKGxDzU7EJYY7DD X7bg== X-Gm-Message-State: AOAM530FhCWuUGwEIQZZFg97xBGQH4Va+yQCZ73BOZH7DYITa7L0fk1d LmvHyDwqqdMohJycfn/euoJyEUWp2SE= X-Google-Smtp-Source: ABdhPJy14nS+QzOQYOe8bre+rCqXJrdby5jMBwHyy16zY5jvPnN+JPBP9gC52SoCTyWOskIPp6dKNQ== X-Received: by 2002:a05:651c:202:: with SMTP id y2mr6637875ljn.162.1608537324335; Sun, 20 Dec 2020 23:55:24 -0800 (PST) Received: from localhost.localdomain (78-66-223-148-no2204.tbcn.telia.com. [78.66.223.148]) by smtp.gmail.com with ESMTPSA id t20sm855663ljk.58.2020.12.20.23.55.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Dec 2020 23:55:23 -0800 (PST) From: =?utf-8?q?Martin_=C3=85gren?= To: Ross Light Cc: git@vger.kernel.org Subject: [PATCH 1/2] pack-format.txt: define "varint" format Date: Mon, 21 Dec 2020 08:54:58 +0100 Message-Id: <42c6206b102cd97290fd9ad207bb39b20660064c.1608537234.git.martin.agren@gmail.com> X-Mailer: git-send-email 2.30.0.rc1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We define our varint format pretty much on the fly as we describe a pack file entry. In preparation for referring to it in more places in this document, define "varint" and refer to it. Signed-off-by: Martin Ågren --- Documentation/technical/pack-format.txt | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f..42198de74c 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -55,6 +55,15 @@ Valid object types are: Type 5 is reserved for future expansion. Type 0 is invalid. +=== Variable-length integer encoding + +This document uses "varint" encoding of non-negative integers: From +each byte, the seven least significant bits are used to form the +resulting integer. As long as the most significant bit is 1, this +process continues; the byte with MSB 0 provides the last seven bits. +The seven-bit chunks are concatenated. Later values are more +significant. + === Deltified representation Conceptually there are only four object types: commit, tree, tag and @@ -196,10 +205,10 @@ Pack file entry: <+ 1-byte size extension bit (MSB) type (next 3 bit) size0 (lower 4-bit) - n-byte sizeN (as long as MSB is set, each 7-bit) - size0..sizeN form 4+7+7+..+7 bit integer, size0 - is the least significant part, and sizeN is the - most significant part. + n-byte size1 (varint encoding; present if MSB is set) + If the MSB is set, the size is size0 + 16*size1, otherwise + it is size0. (Equivalently, the entire packed object header + is a varint encoding of (size/16)*128 + type*16 + size%16.) packed object data: If it is not DELTA, then deflated bytes (the size above is the size before compression). From patchwork Mon Dec 21 07:54:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Martin_=C3=85gren?= X-Patchwork-Id: 11984323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98E72C433E6 for ; Mon, 21 Dec 2020 07:56:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6448022CAE for ; Mon, 21 Dec 2020 07:56:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727392AbgLUH4J (ORCPT ); Mon, 21 Dec 2020 02:56:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725878AbgLUH4J (ORCPT ); Mon, 21 Dec 2020 02:56:09 -0500 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4603C061285 for ; Sun, 20 Dec 2020 23:55:26 -0800 (PST) Received: by mail-lf1-x12f.google.com with SMTP id 23so21397761lfg.10 for ; Sun, 20 Dec 2020 23:55:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yJuGbNSuJlFbRVMwTEnmZfnkL/pNquEGXEnLAN6tAE0=; b=A2u2s7l2IwLZF/bVb3nS2CDPPwJfccCad7ghUB+DDRAsVeTqWIlMV8LTPmxllo2/NS ZLa3xCP8OSV6Xgfx5f3DGlHtLMqW2sQdKu721yD5Uawkh2IxUYnDFvkNo3oFjmelg11P S+7/9NAQ7HI73v7sGza6JPlm+zj5cn0vnfU5HuD8dSG9dBT/yubJqrATTlgcPEnebq7N ixTK7vF6nW2Tbc3V0Biqu6yz3kPFuABUL2FEQnFIoNxuFf28UDqkJLow/xpI8yOgl16P 4v44MLhNqESOdKlM3WPBBY/SSENAtzKpLRBrMAJCwPPaVucDDb9uwCp/9ZhrsbFGwQYl sfKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yJuGbNSuJlFbRVMwTEnmZfnkL/pNquEGXEnLAN6tAE0=; b=T8P9Q+XQjLVRvBblEXH4wIKwvojQVeQuG1vFToPRqfXn6mRgrzQ8TJOhDQDS3BU2HA NecDZ8AsfrNL3hg7vLTHGF0amVFwteeH/dhJW2txyQq284Rb3Y4Jd11WYTBhHYtZOP0p LF0o1fLMuKvcZBrJOPT/NJGHlC+deVWzzs7YuK1ip7hkpBiRJNtaSGDPnfZJTLF535J9 G6DwuKQtY9Sr+BlA7nO/4hm9iixaYx1snOHuV9tzdAHq9aThWa+iJIWQUeof0YvHGgv+ FqaszGlpsWS8DcfDHWPUJvigF6Rbutsc3F7ySnh4r/kfViTOoZFrll+nBMDSCkUrEScy e0aA== X-Gm-Message-State: AOAM530O2ZrZzAWP+Xkt/jzPgYTPh4wgFO799LVE0E643OniwLgjPi86 1zmfQw0PmOgZNV3/6JoCUnU= X-Google-Smtp-Source: ABdhPJysDdwPiUib2TEtcEJHUfIx+EMHBntUd1x/X/F2gyXVMKtNmQB82v2kkvlZh0+Zxf+H4teR5A== X-Received: by 2002:a2e:98e:: with SMTP id 136mr7025534ljj.16.1608537325344; Sun, 20 Dec 2020 23:55:25 -0800 (PST) Received: from localhost.localdomain (78-66-223-148-no2204.tbcn.telia.com. [78.66.223.148]) by smtp.gmail.com with ESMTPSA id t20sm855663ljk.58.2020.12.20.23.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Dec 2020 23:55:24 -0800 (PST) From: =?utf-8?q?Martin_=C3=85gren?= To: Ross Light Cc: git@vger.kernel.org Subject: [PATCH 2/2] pack-format.txt: document lengths at start of delta data Date: Mon, 21 Dec 2020 08:54:59 +0100 Message-Id: X-Mailer: git-send-email 2.30.0.rc1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We document the delta data as a set of instructions, but forget to document the two lengths that precede those instructions: the length of the base object and the length of the object to be reconstructed. Fix this omission. Reported-by: Ross Light Signed-off-by: Martin Ågren --- Documentation/technical/pack-format.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 42198de74c..05889a2e43 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -82,7 +82,10 @@ Ref-delta can also refer to an object outside the pack (i.e. the so-called "thin pack"). When stored on disk however, the pack should be self contained to avoid cyclic dependency. -The delta data is a sequence of instructions to reconstruct an object +The delta data starts with the length of the base object and the +length of the object to be reconstructed. These lengths are +encoded as varints. The remainder of +the delta data is a sequence of instructions to reconstruct the object from the base object. If the base object is deltified, it must be converted to canonical form first. Each instruction appends more and more data to the target object until it's complete. There are two