mbox series

[v4,0/5] Fix use of uninitialized hash algorithms

Message ID 20240514011437.3779151-1-gitster@pobox.com (mailing list archive)
Headers show
Series Fix use of uninitialized hash algorithms | expand

Message

Junio C Hamano May 14, 2024, 1:14 a.m. UTC
A change recently merged to 'next' stops us from defaulting to using
SHA-1 unless other code (like a logic early in the start-up sequence
to see what hash is being used in the repository we are working in)
explicitly sets it, leading to a (deliberate) crash of "git" when we
forgot to cover certain code paths.

It turns out we have a few.  Notable ones are all operations that
are designed to work outside a repository.  We should go over all
such code paths and give them a reasonable default when there is one
available (e.g. for historical reasons, patch-id is documented to
work with SHA-1 hashes, so arguably it should do so everywhere, not
just in SHA-1 repositories, but outside any repository), and when
there is none, we can use the existing GIT_DEFAULT_HASH environment
variable to let the user control the choice.

In this fourth iteration, I used the existing GIT_DEFAULT_HASH
environment variable that has been used to allow users to opt into
an early experiment of SHA-256 support.  I saw breakages in SHA-256
CI jobs for earlier rounds of this series, but hopefully this is now
good to go.  Knock wood...

Junio C Hamano (3):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository
  apply: fix uninitialized hash function

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/apply.c         |  4 +++
 builtin/hash-object.c   |  3 ++
 builtin/patch-id.c      | 13 +++++++++
 environment.h           |  1 +
 repository.c            | 40 +++++++++++++++++++++++++++
 setup.c                 |  2 --
 t/t1007-hash-object.sh  |  6 ++++
 t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 +++++++++++++++++++++++
 9 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100755 t/t1517-outside-repo.sh

Range-diff against v3:
1:  7adbf0cc5f ! 1:  3038f73e1c setup: add an escape hatch for "no more default hash algorithm" change
    @@ Commit message
         Partially revert c8aed5e8 (repository: stop setting SHA1 as the
         default object hash, 2024-05-07), to keep end-user systems still
         broken when we have gap in our test coverage but yet give them an
    -    escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGORITHM environment
    -    variable to "sha1" in order to revert to the previous behaviour.
    +    escape hatch to set the GIT_DEFAULT_HASH environment variable to
    +    "sha1" in order to revert to the previous behaviour.  This variable
    +    has been in use for using SHA-256 hash by default, and it should be
    +    a better fit than inventing a new and test-only knob.
     
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
    + ## environment.h ##
    +@@ environment.h: const char *getenv_safe(struct strvec *argv, const char *name);
    + #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
    + #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
    + #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
    ++#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    + 
    + /*
    +  * Environment variable used in handshaking the wire protocol.
    +
      ## repository.c ##
    +@@
    + #include "git-compat-util.h"
    + #include "abspath.h"
    ++#include "environment.h"
    + #include "repository.h"
    + #include "object-store-ll.h"
    + #include "config.h"
     @@
      static struct repository the_repo;
      struct repository *the_repository = &the_repo;
    @@ repository.c
     +	const char *hash_name;
     +	int algo;
     +
    -+	hash_name = getenv("GIT_DEFAULT_HASH_ALGORITHM");
    ++	hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
     +	if (!hash_name)
     +		return;
     +	algo = hash_algo_by_name(hash_name);
    ++
    ++	/*
    ++	 * NEEDSWORK: after all, falling back to SHA-1 by assigning
    ++	 * GIT_HASH_SHA1 to algo here, instead of returning, may give
    ++	 * us better behaviour.
    ++	 */
     +	if (algo == GIT_HASH_UNKNOWN)
     +		return;
    ++
     +	repo_set_hash_algo(repo, algo);
     +}
     +
    @@ repository.c: void initialize_repository(struct repository *repo)
     +	 * giving them an appropriate default individually; any
     +	 * unconverted codepath that tries to access the_hash_algo
     +	 * will thus fail.  The end-users however have an escape hatch
    -+	 * to set GIT_DEFAULT_HASH_ALGORITHM environment variable to
    -+	 * "sha1" get back the old behaviour of defaulting to SHA-1.
    ++	 * to set GIT_DEFAULT_HASH environment variable to "sha1" get
    ++	 * back the old behaviour of defaulting to SHA-1.
     +	 */
     +	if (repo == the_repository)
     +		set_default_hash_algo(repo);
      }
      
      static void expand_base_dir(char **out, const char *in,
    +
    + ## setup.c ##
    +@@ setup.c: int daemonize(void)
    + #define TEST_FILEMODE 1
    + #endif
    + 
    +-#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    +-
    + static void copy_templates_1(struct strbuf *path, struct strbuf *template_path,
    + 			     DIR *dir)
    + {
2:  c40b2bc7b5 ! 2:  e56ae68961 t1517: test commands that are designed to be run outside repository
    @@ t/t1517-outside-repo.sh (new)
     +TEST_PASSES_SANITIZE_LEAK=true
     +. ./test-lib.sh
     +
    -+# Do not use the end-user workaround for this test.
    -+# GIT_DEFAULT_HASH_ALGORITHM=sha1; export GIT_DEFAULT_HASH_ALGORITHM
    -+
     +test_expect_success 'set up a non-repo directory and test file' '
     +	GIT_CEILING_DIRECTORIES=$(pwd) &&
     +	export GIT_CEILING_DIRECTORIES &&
3:  48d83fe1fd ! 3:  e1ae0e95fc builtin/patch-id: fix uninitialized hash function
    @@ builtin/patch-id.c: int cmd_patch_id(int argc, const char **argv, const char *pr
     +	 * NEEDSWORK: This hack should be removed in favor of converting
     +	 * the code that computes patch IDs to always use SHA1.
     +	 */
    -+	if (!startup_info->have_repository)
    ++	if (!the_hash_algo)
     +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
     +
      	generate_id_list(opts ? opts > 1 : config.stable,
4:  a5bf0dfb0a ! 4:  1c8ce0d024 builtin/hash-object: fix uninitialized hash function
    @@ Commit message
         SHA1 as the default object hash, 2024-05-07), this will make us hit an
         uninitialized hash function, which subsequently leads to a segfault.
     
    -    Fix this by falling back to SHA-1 explicitly when running outside of a
    -    Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
    -    specify what hash algorithm they want, so arguably this code should not
    -    be needed.
    +    Fix this by falling back to SHA-1 explicitly when running outside of
    +    a Git repository. Users can use GIT_DEFAULT_HASH environment to
    +    specify what hash algorithm they want, so arguably this code should
    +    not be needed.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
5:  2b9e486c3f = 5:  5b0ec43757 apply: fix uninitialized hash function