mbox series

[0/5] Introduce configs for default repo format

Message ID cover.1723708417.git.ps@pks.im (mailing list archive)
Headers show
Series Introduce configs for default repo format | expand

Message

Patrick Steinhardt Aug. 15, 2024, 7:59 a.m. UTC
Hi,

to set up the default object and ref storage formats, users have to set
up some environment variables. This is somewhat unwieldy and not really
in line with how a user typically expects to configure Git, namely by
using the config system. It makes it harder than necessary to globally
default to the different formats and requires the user to munge with
files like `.profile` to persist that setting. Needless to say, this is
a bit of an awkward user experience.

This patch series thus introduces two new configs to set the default
object hash and ref storage format for newly created repositories. Like
this, folks can simply use the global- or system-level config to adapt
to their needs. This also has the advantage of giving them the ability
to adapt the default formats via guarded includes, such that e.g. repos
in some filesystem hierarchy use format A, whereas others use format B.

This comes from a discussion with Sebastian (Cc'd) at the Git User Group
in Berlin yesterday.

Thanks!

Patrick

Patrick Steinhardt (5):
  t0001: exercise initialization with ref formats more thoroughly
  t0001: delete repositories when object format tests finish
  setup: merge configuration of repository formats
  setup: make object format configurable via config
  setup: make ref storage format configurable via config

 Documentation/config/init.txt |  10 +++
 setup.c                       | 101 ++++++++++++++++-------
 t/t0001-init.sh               | 145 +++++++++++++++++++++++++++++++---
 3 files changed, 216 insertions(+), 40 deletions(-)

Comments

shejialuo Aug. 15, 2024, 3:24 p.m. UTC | #1
On Thu, Aug 15, 2024 at 09:59:54AM +0200, Patrick Steinhardt wrote:
> Hi,
> 
> to set up the default object and ref storage formats, users have to set
> up some environment variables. This is somewhat unwieldy and not really
> in line with how a user typically expects to configure Git, namely by
> using the config system. It makes it harder than necessary to globally
> default to the different formats and requires the user to munge with
> files like `.profile` to persist that setting. Needless to say, this is
> a bit of an awkward user experience.
> 
> This patch series thus introduces two new configs to set the default
> object hash and ref storage format for newly created repositories. Like
> this, folks can simply use the global- or system-level config to adapt
> to their needs. This also has the advantage of giving them the ability
> to adapt the default formats via guarded includes, such that e.g. repos
> in some filesystem hierarchy use format A, whereas others use format B.
> 
> This comes from a discussion with Sebastian (Cc'd) at the Git User Group
> in Berlin yesterday.
> 
> Thanks!
> 
> Patrick
> 

It's a good idea to make the user could set up the default object and
ref storage formats by "git-config(1)". I have read all the patches,
from my perspective, there is no major problems.

But I wanna ask a question here about the following code snippet:

  if (repo_fmt->version >= 0 && hash !=  GIT_HASH_UNKNOWN && hash != repo_fmt->hash_algo)
          die(_("..."));
  else if (hash != GIT_HASH_UNKNOWN)
          repo_fmt->hash_algo = hash;
  else if (env) {
          ...
          repo_fmt->hash_algo = env_algo;
  } else if (cfg.hash != GIT_HASH_UNKNOWN) {
          repo_fmt->hash_algo = cfg.hash;
  }

It's obvious that the precedence of "hash" is the top. We need to make
sure when users execute "git init --object-format=sha256" command, this
explicit info should be considered at the top. However, in the current
design, the precedence of the environment variable is higher than the
"git-config(1)".

If the user uses the following command:

  $ export GIT_DEFAULT_HASH_ENVIRONMENT=sha1
  $ git -c init.defaultObjectFormat=sha256 repo

The repo would be initialized with the sha1 algorithm. I think we should
think carefully which precedence should be higher. I cannot give an
answer here. I am not familiar with the whole database and do not the
concern. But from my own perspective, I think the precedence of the
config should be higher than the environment variable. This is a new
feature, the people who would like to use it, they will never use
environment variable and we should ignore the functionality of the
environment variable. But for people who do not know this feature, they
will continue to use the environment variable and they will never be
influenced by the configs.

Thanks,
Jialuo
brian m. carlson Aug. 15, 2024, 9:16 p.m. UTC | #2
On 2024-08-15 at 15:24:47, shejialuo wrote:
> If the user uses the following command:
> 
>   $ export GIT_DEFAULT_HASH_ENVIRONMENT=sha1
>   $ git -c init.defaultObjectFormat=sha256 repo
> 
> The repo would be initialized with the sha1 algorithm. I think we should
> think carefully which precedence should be higher. I cannot give an
> answer here. I am not familiar with the whole database and do not the
> concern. But from my own perspective, I think the precedence of the
> config should be higher than the environment variable. This is a new
> feature, the people who would like to use it, they will never use
> environment variable and we should ignore the functionality of the
> environment variable. But for people who do not know this feature, they
> will continue to use the environment variable and they will never be
> influenced by the configs.

The standard behaviour we have with other environment variables is that
they override the config, such as with `GIT_SSH_COMMAND` and
`GIT_SSH_VARIANT`.  The reason is that the config in this case is
usually per-user or per-system, but it's very common to override
settings on an ephemeral basis with the environment.

Think of the case where you have a language package manager that's
creating a repository or fetching from one.  You can override on the
command line with `GIT_DEFAULT_HASH` or `GIT_SSH_VARIANT`, but you can't
control the actual `git` invocation, since it's baked into the package
manager.  Or if you're writing a test and need all of the Git commands
to use a particular setting, then it's easy to simply set the
environment once and forget about it.  (This is what we do at $DAYJOB to
implement SHA-256 test modes for our repos: the test script sets
`GIT_DEFAULT_HASH` and all of our test repos magically use the right
setting.)
brian m. carlson Aug. 15, 2024, 9:22 p.m. UTC | #3
On 2024-08-15 at 07:59:54, Patrick Steinhardt wrote:
> Hi,
> 
> to set up the default object and ref storage formats, users have to set
> up some environment variables. This is somewhat unwieldy and not really
> in line with how a user typically expects to configure Git, namely by
> using the config system. It makes it harder than necessary to globally
> default to the different formats and requires the user to munge with
> files like `.profile` to persist that setting. Needless to say, this is
> a bit of an awkward user experience.
> 
> This patch series thus introduces two new configs to set the default
> object hash and ref storage format for newly created repositories. Like
> this, folks can simply use the global- or system-level config to adapt
> to their needs. This also has the advantage of giving them the ability
> to adapt the default formats via guarded includes, such that e.g. repos
> in some filesystem hierarchy use format A, whereas others use format B.

I like the idea of this series, which I think is a much nicer way to set
these defaults and can also correctly apply to existing terminal
windows.  As you mentioned, this also allows different configurations
per directory, and it makes it easier if you need to have a system-wide
policy for whatever reason.

I've taken a look at the patches and they seem fine to me.  Thanks for
sending them.
Junio C Hamano Aug. 15, 2024, 9:52 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2024-08-15 at 15:24:47, shejialuo wrote:
>> If the user uses the following command:
>> 
>>   $ export GIT_DEFAULT_HASH_ENVIRONMENT=sha1
>>   $ git -c init.defaultObjectFormat=sha256 repo
>> 
>> The repo would be initialized with the sha1 algorithm. I think we should
>> think carefully which precedence should be higher. I cannot give an
>> answer here. I am not familiar with the whole database and do not the
>> concern. But from my own perspective, I think the precedence of the
>> config should be higher than the environment variable. This is a new
>> feature, the people who would like to use it, they will never use
>> environment variable and we should ignore the functionality of the
>> environment variable. But for people who do not know this feature, they
>> will continue to use the environment variable and they will never be
>> influenced by the configs.
>
> The standard behaviour we have with other environment variables is that
> they override the config, such as with `GIT_SSH_COMMAND` and
> `GIT_SSH_VARIANT`.  The reason is that the config in this case is
> usually per-user or per-system, but it's very common to override
> settings on an ephemeral basis with the environment.

Right.  It is good that somebody can give a clear answer when a new
person says they cannot and then give an answer that contradicts
with an established practice ;-).
Patrick Steinhardt Aug. 16, 2024, 8:07 a.m. UTC | #5
On Thu, Aug 15, 2024 at 02:52:21PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2024-08-15 at 15:24:47, shejialuo wrote:
> >> If the user uses the following command:
> >> 
> >>   $ export GIT_DEFAULT_HASH_ENVIRONMENT=sha1
> >>   $ git -c init.defaultObjectFormat=sha256 repo
> >> 
> >> The repo would be initialized with the sha1 algorithm. I think we should
> >> think carefully which precedence should be higher. I cannot give an
> >> answer here. I am not familiar with the whole database and do not the
> >> concern. But from my own perspective, I think the precedence of the
> >> config should be higher than the environment variable. This is a new
> >> feature, the people who would like to use it, they will never use
> >> environment variable and we should ignore the functionality of the
> >> environment variable. But for people who do not know this feature, they
> >> will continue to use the environment variable and they will never be
> >> influenced by the configs.
> >
> > The standard behaviour we have with other environment variables is that
> > they override the config, such as with `GIT_SSH_COMMAND` and
> > `GIT_SSH_VARIANT`.  The reason is that the config in this case is
> > usually per-user or per-system, but it's very common to override
> > settings on an ephemeral basis with the environment.
> 
> Right.  It is good that somebody can give a clear answer when a new
> person says they cannot and then give an answer that contradicts
> with an established practice ;-).

Yeah, thanks for putting it so clearly. I'll also put a variant of this
into the commit message to clarify.

Patrick