[v5,0/5] Fix use of uninitialized hash algorithms

Message ID	20240520231434.1816979-1-gitster@pobox.com (mailing list archive)
Headers	show Received: from pb-smtp1.pobox.com (pb-smtp1.pobox.com [64.147.108.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBC3045035 for <git@vger.kernel.org>; Mon, 20 May 2024 23:14:38 +0000 (UTC) From: Junio C Hamano <gitster@pobox.com> To: git@vger.kernel.org Cc: Patrick Steinhardt <ps@pks.im> Subject: [PATCH v5 0/5] Fix use of uninitialized hash algorithms Date: Mon, 20 May 2024 16:14:29 -0700 Message-ID: <20240520231434.1816979-1-gitster@pobox.com> In-Reply-To: <cover.1715582857.git.ps@pks.im> References: <cover.1715582857.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Series	Fix use of uninitialized hash algorithms \| expand [v5,0/5] Fix use of uninitialized hash algorithms [v5,1/5] setup: add an escape hatch for "no more default hash algorithm" change [v5,2/5] t1517: test commands that are designed to be run outside repository [v5,3/5] builtin/patch-id: fix uninitialized hash function [v5,4/5] builtin/hash-object: fix uninitialized hash function [v5,5/5] apply: fix uninitialized hash function

Message ID

20240520231434.1816979-1-gitster@pobox.com (mailing list archive)

Headers

From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v5 0/5] Fix use of uninitialized hash algorithms
Date: Mon, 20 May 2024 16:14:29 -0700
Message-ID: <20240520231434.1816979-1-gitster@pobox.com>
In-Reply-To: <cover.1715582857.git.ps@pks.im>
References: <cover.1715582857.git.ps@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Series

Fix use of uninitialized hash algorithms | expand

Message

Junio C Hamano May 20, 2024, 11:14 p.m. UTC

A change recently merged to 'next' stops us from defaulting to using
SHA-1 unless other code (like a logic early in the start-up sequence
to see what hash is being used in the repository we are working in)
explicitly sets it, leading to a (deliberate) crash of "git" when we
forgot to cover certain code paths.

It turns out we have a few.  Notable ones are all operations that
are designed to work outside a repository.  We should go over all
such code paths and give them a reasonable default when there is one
available (e.g. for historical reasons, patch-id is documented to
work with SHA-1 hashes, so arguably it, or at least when it is
invoked with the "--stable" option, should do so everywhere, not
just in SHA-1 repositories, but in SHA-256 repository or outside any
repository).  In the meantime, if an end-user hits such a "bug"
before we can fix it, it would be nice to give them an escape hatch
to restore the historical behaviour of falling back to use SHA-1.

These patches are designed to apply on a merge of c8aed5e8
(repository: stop setting SHA1 as the default object hash,
2024-05-07) into 3e4a232f (The third batch, 2024-05-13), which has
been the same base throughout the past iterations.

In this fifth iteration:

 - The first step no longer falls back to GIT_DEFAULT_HASH; the
   escape hatch is a dedicated GIT_TEST_DEFAULT_HASH_ALGO
   environment variable, but hopefully we do not have to advertise
   it all that often.

 - The second step has been simplified somewhat to use the "nongit"
   helper when we only need to run a single "git" command in t1517.
   The way the expected output files were prepared in the previous
   versions did not correctly force use of SHA-1 algorithm, which
   has been corrected.  The third step and fourth step for t1517
   continue to be "flip expect_failure to expect_success", but you
   can see context differences in the range-diff.

 - The fourth step also has a fix for t1007 where the previous
   iterations did not correctly force use of SHA-1 to prepare the
   expected output.

Otherwise this round should be ready, modulo possible typoes.


Junio C Hamano (3):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository
  apply: fix uninitialized hash function

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/apply.c         |  4 +++
 builtin/hash-object.c   |  3 +++
 builtin/patch-id.c      | 13 +++++++++
 repository.c            | 44 ++++++++++++++++++++++++++++++
 t/t1007-hash-object.sh  |  6 +++++
 t/t1517-outside-repo.sh | 59 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++
 7 files changed, 163 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

Comments

Patrick Steinhardt May 21, 2024, 7:58 a.m. UTC | #1

On Mon, May 20, 2024 at 04:14:29PM -0700, Junio C Hamano wrote:
> A change recently merged to 'next' stops us from defaulting to using
> SHA-1 unless other code (like a logic early in the start-up sequence
> to see what hash is being used in the repository we are working in)
> explicitly sets it, leading to a (deliberate) crash of "git" when we
> forgot to cover certain code paths.
> 
> It turns out we have a few.  Notable ones are all operations that
> are designed to work outside a repository.  We should go over all
> such code paths and give them a reasonable default when there is one
> available (e.g. for historical reasons, patch-id is documented to
> work with SHA-1 hashes, so arguably it, or at least when it is
> invoked with the "--stable" option, should do so everywhere, not
> just in SHA-1 repositories, but in SHA-256 repository or outside any
> repository).  In the meantime, if an end-user hits such a "bug"
> before we can fix it, it would be nice to give them an escape hatch
> to restore the historical behaviour of falling back to use SHA-1.
> 
> These patches are designed to apply on a merge of c8aed5e8
> (repository: stop setting SHA1 as the default object hash,
> 2024-05-07) into 3e4a232f (The third batch, 2024-05-13), which has
> been the same base throughout the past iterations.
> 
> In this fifth iteration:
> 
>  - The first step no longer falls back to GIT_DEFAULT_HASH; the
>    escape hatch is a dedicated GIT_TEST_DEFAULT_HASH_ALGO
>    environment variable, but hopefully we do not have to advertise
>    it all that often.
> 
>  - The second step has been simplified somewhat to use the "nongit"
>    helper when we only need to run a single "git" command in t1517.
>    The way the expected output files were prepared in the previous
>    versions did not correctly force use of SHA-1 algorithm, which
>    has been corrected.  The third step and fourth step for t1517
>    continue to be "flip expect_failure to expect_success", but you
>    can see context differences in the range-diff.
> 
>  - The fourth step also has a fix for t1007 where the previous
>    iterations did not correctly force use of SHA-1 to prepare the
>    expected output.
> 
> Otherwise this round should be ready, modulo possible typoes.

I have two smallish comments, but neither of them really have to be
addressed. Overall I very much agree with this iteration and think that
it's the right way to go.

Thanks!

Patrick

Junio C Hamano May 21, 2024, 6:07 p.m. UTC | #2

Patrick Steinhardt <ps@pks.im> writes:

> I have two smallish comments, but neither of them really have to be
> addressed. Overall I very much agree with this iteration and think that
> it's the right way to go.

I've locally done the following locally but it probably does not
need to be resent to the list before merging down to 'next'.


1:  b23a93597c ! 1:  d3b2ff75fd setup: add an escape hatch for "no more default hash algorithm" change
    @@ Commit message
         default object hash, 2024-05-07), to keep end-user systems still
         broken when we have gap in our test coverage but yet give them an
         escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
    -    variable to "sha1" in order to revert to the previous behaviour.
    +    variable to "sha1" in order to revert to the previous behaviour, in
    +    case we haven't done a thorough job in fixing the fallout from
    +    c8aed5e8.  After we build confidence, we should remove the escape
    +    hatch support, but we are not there yet after only fixing three
    +    commands (hash-object, apply, and patch-id) in this series.
     
         Due to the way the end-user facing GIT_DEFAULT_HASH environment
         variable is used in our test suite, we unfortunately cannot reuse it
2:  6a20370944 = 2:  abece6e970 t1517: test commands that are designed to be run outside repository
3:  fa258c5d47 = 3:  4a1c95931f builtin/patch-id: fix uninitialized hash function
4:  164d340cbe = 4:  8d058b8024 builtin/hash-object: fix uninitialized hash function
5:  bd0246eb51 ! 5:  4674ab682d apply: fix uninitialized hash function
    @@ Commit message
         Make sure we explicitly fall back to SHA-1 algorithm for backward
         compatibility.
     
    +    It is of dubious value to make this configurable to other hash
    +    algorithms, as the code does not use the_hash_algo for hashing
    +    purposes when working outside a repository (which is how
    +    the_hash_algo is left to NULL)---it is only used to learn the max
    +    length of the hash when parsing the object names on the "index"
    +    line, but failing to parse the "index" line is not a hard failure,
    +    and the program does not support operations like applying binary
    +    patches and --3way fallback that requires object access outside a
    +    repository.
    +
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## builtin/apply.c ##
    @@ builtin/apply.c: int cmd_apply(int argc, const char **argv, const char *prefix)
      	if (init_apply_state(&state, the_repository, prefix))
      		exit(128);
      
    ++	/*
    ++	 * We could to redo the "apply.c" machinery to make this
    ++	 * arbitrary fallback unnecessary, but it is dubious that it
    ++	 * is worth the effort.
    ++	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
    ++	 */
     +	if (!the_hash_algo)
     +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
     +

Patrick Steinhardt May 22, 2024, 4:51 a.m. UTC | #3

On Tue, May 21, 2024 at 11:07:12AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > I have two smallish comments, but neither of them really have to be
> > addressed. Overall I very much agree with this iteration and think that
> > it's the right way to go.
> 
> I've locally done the following locally but it probably does not
> need to be resent to the list before merging down to 'next'.

Thanks, the diff looks good to me.

Patrick