mbox series

[v3,0/4] cache parent project's gitdir in submodules

Message ID 20210819200953.2105230-1-emilyshaffer@google.com (mailing list archive)
Headers show
Series cache parent project's gitdir in submodules | expand

Message

Emily Shaffer Aug. 19, 2021, 8:09 p.m. UTC
Since v2, mostly documentation changes and a handful of small nits from
Junio and Jonathan Tan. Thanks for the reviews, both.

CI run here: https://github.com/nasamuffin/git/actions/runs/1147984974

 - Emily

Emily Shaffer (4):
  t7400-submodule-basic: modernize inspect() helper
  introduce submodule.superprojectGitDir record
  submodule: record superproject gitdir during absorbgitdirs
  submodule: record superproject gitdir during 'update'

 Documentation/config/submodule.txt | 15 ++++++++++
 builtin/submodule--helper.c        |  4 +++
 git-submodule.sh                   | 10 +++++++
 submodule.c                        | 10 +++++++
 t/t7400-submodule-basic.sh         | 48 ++++++++++++++----------------
 t/t7406-submodule-update.sh        | 10 +++++++
 t/t7412-submodule-absorbgitdirs.sh |  9 +++++-
 7 files changed, 79 insertions(+), 27 deletions(-)

Range-diff against v2:
1:  a6718eea80 ! 1:  f1236dc9d7 t7400-submodule-basic: modernize inspect() helper
    @@ t/t7400-submodule-basic.sh: test_expect_success 'setup - repository to add submo
     +	git -C "$dir" clean -n -d -x >untracked
      }
      
    -+
      test_expect_success 'submodule add' '
    - 	echo "refs/heads/main" >expect &&
    - 
     @@ t/t7400-submodule-basic.sh: test_expect_success 'submodule add' '
      	) &&
      
2:  4cebe7bcb5 ! 2:  2caf9eb8ee introduce submodule.superprojectGitDir cache
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    introduce submodule.superprojectGitDir cache
    +    introduce submodule.superprojectGitDir record
     
         Teach submodules a reference to their superproject's gitdir. This allows
         us to A) know that we're running from a submodule, and B) have a
    @@ Commit message
         superproject directory around on the filesystem without breaking the
         submodule's cache.
     
    -    Since this cached value is only introduced during new submodule creation
    +    Since this hint value is only introduced during new submodule creation
         via `git submodule add`, though, there is more work to do to allow the
    -    cache to be created at other times.
    +    record to be created at other times.
    +
    +    If the new config is present, we can do some optional value-added
    +    behavior, like letting "git status" print additional info about the
    +    submodule's status in relation to its superproject, or like letting the
    +    superproject and submodule share an additional config file separate from
    +    either one's local config.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
    +    Helped-by: Junio C Hamano <gitster@pobox.com>
     
      ## Documentation/config/submodule.txt ##
     @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
      	clone proceeds as if no alternate was specified.
     +
     +submodule.superprojectGitDir::
    -+	The relative path from the submodule's worktree  to the superproject's
    -+	gitdir. This config should only be present in projects which are
    -+	submodules, but is not guaranteed to be present in every submodule. It
    -+	is set automatically during submodule creation.
    ++	The relative path from the submodule's worktree to its superproject's
    ++	gitdir. When Git is run in a repository, it usually makes no difference
    ++	whether this repository is standalone or a submodule, but if this
    ++	configuration variable is present, additional behavior may be possible,
    ++	such as "git status" printing additional information about this
    ++	submodule's status with respect to its superproject. This config should
    ++	only be present in projects which are submodules, but is not guaranteed
    ++	to be present in every submodule, so only optional value-added behavior
    ++	should be linked to it. It is set automatically during
    ++	submodule creation.
     ++
    -+	In situations where more than one superproject references the same
    -+	submodule worktree, the value of this config and the behavior of
    -+	operations which use it are undefined. To reference a single project
    -+	from multiple superprojects, it is better to create a worktree of the
    -+	submodule for each superproject.
    ++	Because of this configuration variable, it is forbidden to use the
    ++	same submodule worktree shared by multiple superprojects.
     
      ## builtin/submodule--helper.c ##
     @@ builtin/submodule--helper.c: static int module_clone(int argc, const char **argv, const char *prefix)
    @@ t/t7400-submodule-basic.sh: test_expect_success 'setup - repository to add submo
     +	git -C "$sub_dir" clean -n -d -x >untracked
      }
      
    - 
    + test_expect_success 'submodule add' '
     @@ t/t7400-submodule-basic.sh: test_expect_success 'submodule add' '
      	) &&
      
3:  df97a9c2bb ! 3:  d278568a8e submodule: cache superproject gitdir during absorbgitdirs
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    submodule: cache superproject gitdir during absorbgitdirs
    +    submodule: record superproject gitdir during absorbgitdirs
     
    -    Already during 'git submodule add' we cache a pointer to the
    +    Already during 'git submodule add' we record a pointer to the
         superproject's gitdir. However, this doesn't help brand-new
         submodules created with 'git init' and later absorbed with 'git
         submodule absorbgitdir'. Let's start adding that pointer during 'git
4:  a3f3be58ad ! 4:  6866c36620 submodule: cache superproject gitdir during 'update'
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    submodule: cache superproject gitdir during 'update'
    +    submodule: record superproject gitdir during 'update'
     
    -    A cached path to the superproject's gitdir might be added during 'git
    -    submodule add', but in some cases - like submodules which were created
    -    before 'git submodule add' learned to cache that info - it might be
    -    useful to update the cache. Let's do it during 'git submodule update',
    -    when we already have a handle to the superproject while calling
    +    A recorded hint path to the superproject's gitdir might be added during
    +    'git submodule add', but in some cases - like submodules which were
    +    created before 'git submodule add' learned to record that info - it might
    +    be useful to update the hint. Let's do it during 'git submodule
    +    update', when we already have a handle to the superproject while calling
         operations on the submodules.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>

Comments

Junio C Hamano Aug. 19, 2021, 9:56 p.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:

> Since v2, mostly documentation changes and a handful of small nits from
> Junio and Jonathan Tan. Thanks for the reviews, both.

Will queue, but ...

>      +submodule.superprojectGitDir::
>     -+	The relative path from the submodule's worktree  to the superproject's
>     -+	gitdir. This config should only be present in projects which are
>     -+	submodules, but is not guaranteed to be present in every submodule. It
>     -+	is set automatically during submodule creation.
>     ++	The relative path from the submodule's worktree to its superproject's
>     ++	gitdir. When Git is run in a repository, it usually makes no difference
>     ++	whether this repository is standalone or a submodule, but if this
>     ++	configuration variable is present, additional behavior may be possible,
>     ++	such as "git status" printing additional information about this
>     ++	submodule's status with respect to its superproject. This config should
>     ++	only be present in projects which are submodules, but is not guaranteed
>     ++	to be present in every submodule, so only optional value-added behavior
>     ++	should be linked to it. It is set automatically during
>     ++	submodule creation.
>      ++
>     -+	In situations where more than one superproject references the same
>     -+	submodule worktree, the value of this config and the behavior of
>     -+	operations which use it are undefined. To reference a single project
>     -+	from multiple superprojects, it is better to create a worktree of the
>     -+	submodule for each superproject.
>     ++	Because of this configuration variable, it is forbidden to use the
>     ++	same submodule worktree shared by multiple superprojects.

... I think this will regress the format from what I've queued in
'seen' unless you dedent the "Because of this..." paragraph.  It is
unfortunate but AsciiDoc wants the follow-up paragraphs that follow
the single '+' line to start at the left edge without indentation.
Derrick Stolee Aug. 20, 2021, 1:09 a.m. UTC | #2
On 8/19/2021 4:09 PM, Emily Shaffer wrote:
> Since v2, mostly documentation changes and a handful of small nits from
> Junio and Jonathan Tan. Thanks for the reviews, both.

Sorry to show up late with many questions (mostly because my understanding
of submodules is not as strong as yours), but I do have some concerns that
we have not covered all the cases that could lead to the relative path
needing an update, such as a 'git checkout' across commits in the super-
project which moves a submodule.

Leading more to my confusion is that I thought 'git submodule update'
was the way to move a submodule, but that does not appear to be the case.
I used 'git mv' to move a submodule and that correctly updated the
.gitmodules file, but did not run any 'git submodule' subcommands (based
on GIT_TRACE2_PERF=1).

You mention in an earlier cover letter that this config does not need to
exist, but it seems to me that if it exists it needs to be correct for us
to depend on it for future features. I'm not convinced that we can rely
on it as-written, but you might be able to convince me by pointing out
why these submodule movements are safe.

Thanks,
-Stolee
Matheus Tavares Sept. 4, 2021, 5:50 p.m. UTC | #3
Hi, Emily

On Thu, Aug 19, 2021 at 5:10 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Emily Shaffer (4):
>   t7400-submodule-basic: modernize inspect() helper
>   introduce submodule.superprojectGitDir record
>   submodule: record superproject gitdir during absorbgitdirs
>   submodule: record superproject gitdir during 'update'

I left a few comments on patches 2 and 3. They are mostly about things
we have already discussed on the Review Club, but I got the chance to
experiment with the code a bit more after that, and I thought it could
be helpful to provide these inline suggestions. Hope it helps, and
thanks for the series!

Thanks,
Matheus
Emily Shaffer Oct. 13, 2021, 6:51 p.m. UTC | #4
On Thu, Aug 19, 2021 at 09:09:46PM -0400, Derrick Stolee wrote:
> 
> On 8/19/2021 4:09 PM, Emily Shaffer wrote:
> > Since v2, mostly documentation changes and a handful of small nits from
> > Junio and Jonathan Tan. Thanks for the reviews, both.
> 
> Sorry to show up late with many questions (mostly because my understanding
> of submodules is not as strong as yours),

Well, here I am apologizing for showing up even later with a reply ;)

> but I do have some concerns that
> we have not covered all the cases that could lead to the relative path
> needing an update, such as a 'git checkout' across commits in the super-
> project which moves a submodule.
> 
> Leading more to my confusion is that I thought 'git submodule update'
> was the way to move a submodule, but that does not appear to be the case.
> I used 'git mv' to move a submodule and that correctly updated the
> .gitmodules file, but did not run any 'git submodule' subcommands (based
> on GIT_TRACE2_PERF=1).

During a live review a few weeks ago it was pointed out, I forget by
who, that this whole series would make a lot more sense if it was
treated as the path from the submodule's gitdir to the superproject's
gitdir. I think this would also fix your 'git mv' example here, as the
submodule gitdir would not change.

> 
> You mention in an earlier cover letter that this config does not need to
> exist, but it seems to me that if it exists it needs to be correct for us
> to depend on it for future features. I'm not convinced that we can rely
> on it as-written, but you might be able to convince me by pointing out
> why these submodule movements are safe.

Yeah, that is a good point, and I wonder how to be defensive about
this... Maybe it makes sense to BUG() if we don't find the
superproject's gitdir from this config? Maybe it makes sense to warn
(asking users to 'git submodule update') and erase the incorrect config
line?

Both those approaches would require a wrapper around
'git_config_get_string()' to cope with a failure, but it might be worth
it.

I'll try proposing such a wrapper in the reroll, unless I hear back
sooner.

 - Emily
Derrick Stolee Oct. 14, 2021, 5:12 p.m. UTC | #5
On 10/13/2021 2:51 PM, Emily Shaffer wrote:
> On Thu, Aug 19, 2021 at 09:09:46PM -0400, Derrick Stolee wrote:
>> but I do have some concerns that
>> we have not covered all the cases that could lead to the relative path
>> needing an update, such as a 'git checkout' across commits in the super-
>> project which moves a submodule.
>>
>> Leading more to my confusion is that I thought 'git submodule update'
>> was the way to move a submodule, but that does not appear to be the case.
>> I used 'git mv' to move a submodule and that correctly updated the
>> .gitmodules file, but did not run any 'git submodule' subcommands (based
>> on GIT_TRACE2_PERF=1).
> 
> During a live review a few weeks ago it was pointed out, I forget by
> who, that this whole series would make a lot more sense if it was
> treated as the path from the submodule's gitdir to the superproject's
> gitdir. I think this would also fix your 'git mv' example here, as the
> submodule gitdir would not change.

I think that's a great way to deliver similar functionality without
the issues I was thinking about.

>> You mention in an earlier cover letter that this config does not need to
>> exist, but it seems to me that if it exists it needs to be correct for us
>> to depend on it for future features. I'm not convinced that we can rely
>> on it as-written, but you might be able to convince me by pointing out
>> why these submodule movements are safe.
> 
> Yeah, that is a good point, and I wonder how to be defensive about
> this... Maybe it makes sense to BUG() if we don't find the
> superproject's gitdir from this config? Maybe it makes sense to warn
> (asking users to 'git submodule update') and erase the incorrect config
> line?

I think we can complain with something like a die() if the config points
to data that doesn't make sense (not a .git directory), but a BUG() is
too heavy-handed because it can just be a user modifying their config
file incorrectly.

I'm happy to shut down a process because a user messed with config
incorrectly. Since your proposed change allows operations like 'git mv'
to move submodules without needing to change this config, I'm much
happier with the design. It becomes impossible for the config to get
out of sync unless the user does something outside of a Git command.

Thanks,
-Stolee
Emily Shaffer Oct. 14, 2021, 6:52 p.m. UTC | #6
On Thu, Oct 14, 2021 at 01:12:11PM -0400, Derrick Stolee wrote:
> 
> On 10/13/2021 2:51 PM, Emily Shaffer wrote:
> > On Thu, Aug 19, 2021 at 09:09:46PM -0400, Derrick Stolee wrote:
> >> but I do have some concerns that
> >> we have not covered all the cases that could lead to the relative path
> >> needing an update, such as a 'git checkout' across commits in the super-
> >> project which moves a submodule.
> >>
> >> Leading more to my confusion is that I thought 'git submodule update'
> >> was the way to move a submodule, but that does not appear to be the case.
> >> I used 'git mv' to move a submodule and that correctly updated the
> >> .gitmodules file, but did not run any 'git submodule' subcommands (based
> >> on GIT_TRACE2_PERF=1).
> > 
> > During a live review a few weeks ago it was pointed out, I forget by
> > who, that this whole series would make a lot more sense if it was
> > treated as the path from the submodule's gitdir to the superproject's
> > gitdir. I think this would also fix your 'git mv' example here, as the
> > submodule gitdir would not change.
> 
> I think that's a great way to deliver similar functionality without
> the issues I was thinking about.
> 
> >> You mention in an earlier cover letter that this config does not need to
> >> exist, but it seems to me that if it exists it needs to be correct for us
> >> to depend on it for future features. I'm not convinced that we can rely
> >> on it as-written, but you might be able to convince me by pointing out
> >> why these submodule movements are safe.
> > 
> > Yeah, that is a good point, and I wonder how to be defensive about
> > this... Maybe it makes sense to BUG() if we don't find the
> > superproject's gitdir from this config? Maybe it makes sense to warn
> > (asking users to 'git submodule update') and erase the incorrect config
> > line?
> 
> I think we can complain with something like a die() if the config points
> to data that doesn't make sense (not a .git directory), but a BUG() is
> too heavy-handed because it can just be a user modifying their config
> file incorrectly.

Ok. Having re-thought since the other day, I think I will skip the
wrapper for this reroll and instead propose it in a series that actually
uses this pointer (so, the shared submodule-and-superproject config), if
it looks like we'd want one. I expect doing that work in the context of
a caller will make the correct behavior a little more clear - in the
context of this series I'm not totally sure what we'd want to do.

> 
> I'm happy to shut down a process because a user messed with config
> incorrectly. Since your proposed change allows operations like 'git mv'
> to move submodules without needing to change this config, I'm much
> happier with the design. It becomes impossible for the config to get
> out of sync unless the user does something outside of a Git command.

Thanks for pulling the context back up here. I appreciate it. Like I
mentioned in reply to your comment on v4, look for a reroll in the next
30 minutes or so (assuming the CI passes ok).

 - Emily