mbox series

[v3,0/3] submodule: port subcommand add from shell to C

Message ID 20201214231939.644175-1-periperidip@gmail.com (mailing list archive)
Headers show
Series submodule: port subcommand add from shell to C | expand

Message

Shourya Shukla Dec. 14, 2020, 11:19 p.m. UTC
Greetings,

This is the v3 of the patch series with the same title. You may view
the v2 here:
https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/

I have applied almost all of the changes asked before, except a few
which confused me a little. It would be great if I could get some help
about them:

    1. In this mail: https://lore.kernel.org/git/xmqqo8ldznjx.fsf@gitster.c.googlers.com/
       Junio asked me to accomodate for a merge in progress since
       'cache_pos < 0' does not necessarily mean that the path exists.
       I was wondering which function would be the most appropriate for
       the if-statements:
            if (!force) {
		        if (cache_pos >= 0 || dir_in_cache)
            }
       I was thinking of going with 'read_cache_unmerged()'. I thought
       of this by seeing what is triggered in case of a merge conflict
       and cam across this. What is your opinion on this?

    2. In this mail: https://lore.kernel.org/git/xmqqimbky6st.fsf@gitster.c.googlers.com/
       In this section:
            /* strip trailing '/' */
	        if (is_dir_sep(sm_path[strlen(sm_path) -1]))
		        sm_path[strlen(sm_path) - 1] = '\0';

       Junio makes a reasonable argument that we need to make sure that
       multiple trailing slashes are eliminated but my code only takes
       care of a single trailing slash. I was looking into the code of
       'normalize_path_copy()' and saw that the function it essentially
       calls: 'normalize_path_copy_len()' does not perform anything on
       the trailing slashes and this behaviour is mentioned as a
       NEEDSWORK.

       I was thinking of correcting the above function instead of
       putting in an extra loop. Is this feasible?

    3. In the following segment:
        /*
         * NEEDSWORK: In a multi-working-tree world, this needs to be
         * set in the per-worktree config.
         */
        if (!git_config_get_string("submodule.active", &var) && var) {

        There was a comment: "What if this were a valueless true
        ("[submodule] active\n" without "= true")?  Wouldn't get_string()
        fail?"

        I was under the impression that even if the above failed, it
        will not really affect the big picture since at the we will set
        'submodule.name.active" as true irrespective of the above value.
        Is this correct?

Feedback and reviews are appreciated.

Regards,
Shourya Shukla

Shourya Shukla (3):
  dir: change the scope of function 'directory_exists_in_index()'
  submodule: port submodule subcommand 'add' from shell to C
  t7400: add test to check 'submodule add' for tracked paths

 builtin/submodule--helper.c | 410 +++++++++++++++++++++++++++++++++++-
 dir.c                       |  30 ++-
 dir.h                       |   9 +
 git-submodule.sh            | 161 +-------------
 t/t7400-submodule-basic.sh  |  13 +-
 5 files changed, 443 insertions(+), 180 deletions(-)

Comments

Junio C Hamano Dec. 15, 2020, 9:44 p.m. UTC | #1
Shourya Shukla <periperidip@gmail.com> writes:

>     3. In the following segment:
>         /*
>          * NEEDSWORK: In a multi-working-tree world, this needs to be
>          * set in the per-worktree config.
>          */
>         if (!git_config_get_string("submodule.active", &var) && var) {
>
>         There was a comment: "What if this were a valueless true
>         ("[submodule] active\n" without "= true")?  Wouldn't get_string()
>         fail?"
>
>         I was under the impression that even if the above failed, it
>         will not really affect the big picture since at the we will set
>         'submodule.name.active" as true irrespective of the above value.
>         Is this correct?

Let's see what kind of value the "submodule.active" variable is
meant to be set to.  Documentation/config/submodule.txt has this:

    submodule.active::
            A repeated field which contains a pathspec used to match against a
            submodule's path to determine if the submodule is of interest to git
            commands. See linkgit:gitsubmodules[7] for details.

It definitely is a string value, and making it a valueless true is
an error in the configuration.  I wonder if we want to diagnose such
an error, or can we just pretend we didn't see it and keep going?

Also the "var" (one of the values set for this multi-valued
variable) is never used in the body of the "if" statement.  The
other user of "submodule.active" in module_init() seems to use
config_get_value_multi() on it.  The new code may deserve a comment
to explain why that is OK to (1) grab just a single value out of the
multi-valued variable, and (2) not even look at its value.
Shourya Shukla Dec. 17, 2020, 2:16 p.m. UTC | #2
On 15/12 01:44, Junio C Hamano wrote:
> Shourya Shukla <periperidip@gmail.com> writes:
> 
> >     3. In the following segment:
> >         /*
> >          * NEEDSWORK: In a multi-working-tree world, this needs to be
> >          * set in the per-worktree config.
> >          */
> >         if (!git_config_get_string("submodule.active", &var) && var) {
> >
> >         There was a comment: "What if this were a valueless true
> >         ("[submodule] active\n" without "= true")?  Wouldn't get_string()
> >         fail?"
> >
> >         I was under the impression that even if the above failed, it
> >         will not really affect the big picture since at the we will set
> >         'submodule.name.active" as true irrespective of the above value.
> >         Is this correct?
> 
> Let's see what kind of value the "submodule.active" variable is
> meant to be set to.  Documentation/config/submodule.txt has this:
> 
>     submodule.active::
>             A repeated field which contains a pathspec used to match against a
>             submodule's path to determine if the submodule is of interest to git
>             commands. See linkgit:gitsubmodules[7] for details.
> 
> It definitely is a string value, and making it a valueless true is
> an error in the configuration.

I think that we did not _make_ it a valueless true. It was already there
and we somehow managed to check it. If you mean that we should ensure
that we set it to "true" so that any such errors don't happen later on,
then that is a different thing.

> I wonder if we want to diagnose such
> an error, or can we just pretend we didn't see it and keep going?

I guess we could pretend we did not see it since it isn't affecting the
run of the sub-command. If you think otherwise, please suggest.

> Also the "var" (one of the values set for this multi-valued
> variable) is never used in the body of the "if" statement.  The
> other user of "submodule.active" in module_init() seems to use
> config_get_value_multi() on it.  The new code may deserve a comment
> to explain why that is OK to (1) grab just a single value out of the
> multi-valued variable, and (2) not even look at its value.

Understood. So a comment along the lines of:

	/*
	 * Since we are fetching information only about one submodule,
	 * we need not fetch a  list of submodules to check the activity
	 * status of a single submodule.
	 *
	 * In case of a valueless true, i.e, '[submodule] active\n'
	 * without '= true', we need not worry about any errors since
	 * irrespective of the above value, we will set
	 * 'submodule.<name>.active' as true.
	 */

will work? Also, could you please comment on the other two issues I
mentioned in the cover letter so I might as well start work on v4 of
this patch?

Regards,
Shourya Shukla
Junio C Hamano Dec. 17, 2020, 10:20 p.m. UTC | #3
Shourya Shukla <periperidip@gmail.com> writes:

> On 15/12 01:44, Junio C Hamano wrote:
>> Shourya Shukla <periperidip@gmail.com> writes:
>> 
>> >     3. In the following segment:
>> >         /*
>> >          * NEEDSWORK: In a multi-working-tree world, this needs to be
>> >          * set in the per-worktree config.
>> >          */
>> >         if (!git_config_get_string("submodule.active", &var) && var) {
>> >
>> >         There was a comment: "What if this were a valueless true
>> >         ("[submodule] active\n" without "= true")?  Wouldn't get_string()
>> >         fail?"
>> >
>> >         I was under the impression that even if the above failed, it
>> >         will not really affect the big picture since at the we will set
>> >         'submodule.name.active" as true irrespective of the above value.
>> >         Is this correct?
>> 
>> Let's see what kind of value the "submodule.active" variable is
>> meant to be set to.  Documentation/config/submodule.txt has this:
>> 
>>     submodule.active::
>>             A repeated field which contains a pathspec used to match against a
>>             submodule's path to determine if the submodule is of interest to git
>>             commands. See linkgit:gitsubmodules[7] for details.
>> 
>> It definitely is a string value, and making it a valueless true is
>> an error in the configuration.
>
> I think that we did not _make_ it a valueless true. It was already there
> and we somehow managed to check it. If you mean that we should ensure
> that we set it to "true" so that any such errors don't happen later on,
> then that is a different thing.

Let me rephrase.  When a user has "[submodule] active" in his or her
configuration file, it is a configuration error.  When Git reads
"submodule.active" configuration variable to make a decision (like
the above code) and finds that the user has such an error, the user
would appreciate if the error is pointed out, so that it can be
corrected, rather than silently ignored.

>> Also the "var" (one of the values set for this multi-valued
>> variable) is never used in the body of the "if" statement.  The
>> other user of "submodule.active" in module_init() seems to use
>> config_get_value_multi() on it.  The new code may deserve a comment
>> to explain why that is OK to (1) grab just a single value out of the
>> multi-valued variable, and (2) not even look at its value.
>
> Understood. So a comment along the lines of:
>
> 	/*
> 	 * Since we are fetching information only about one submodule,
> 	 * we need not fetch a  list of submodules to check the activity
> 	 * status of a single submodule.

Makes me wonder if I am getting the semantics of submodule.active
variable right.

From the three-line description in the documentation (see above), I
would have guessed that if we have three values for
submodule.active, e.g.

	[submodule]
		active = $a
		active = $b
		active = $c

then when deciding if we want to see if a submodule at a $sm_path,
we'd see if $path matches any one of $a, $b, or $c and if it does,
it is determined that the submodule is "of interest to git
commands".

Yes, we may be fetching information only about one submodule at
$sm_path, but given the explanation of how the configuration
variable is designed to work, how can we _not_ fetch the list and
check all of them?

So the comment above (for that matter, the one below that talks
about valuless true) does not make any sense to me, sorry.

> 	 * In case of a valueless true, i.e, '[submodule] active\n'
> 	 * without '= true', we need not worry about any errors since
> 	 * irrespective of the above value, we will set
> 	 * 'submodule.<name>.active' as true.
> 	 */
>
> will work? 

The real reason why it is OK to just check existence of submodule.active
variable without seeing any value of them is because the check is done
to see if this call is needed at all:

	git submodule--helper is-active "$sm_path"

This "helper" eventually calls submodule.c::is_submodule_active()
that does the real check---it gets the multi-valued submodule.active
and checks them against the path to determine if the submodule is
"of interest".

On the other hand, when we know submodule.active does not exist, all
submodules are of interest when it comes to "submodule add".  That
is how

-	if git config --get submodule.active >/dev/null
-	then
-		# If the submodule being adding isn't already covered by the
-		# current configured pathspec, set the submodule's active flag
-		if ! git submodule--helper is-active "$sm_path"
-		then
-			git config submodule."$sm_name".active "true"
-		fi
-	else
-		git config submodule."$sm_name".active "true"
-	fi

taken from your [2/3] that ignored the values of submodule.active is
"correct"; we know that the real work is done elsewhere--we are only
learning if the is-active check is necessary.

I think explaining why it works correctly to show future readers
that the code was written by folks who knew what they were doing
would be worth the effort to help future code evolution.

This, from your [1/3], is a faithful translation of the above,
but ...

+	if (!git_config_get_string("submodule.active", &var) && var) {
+
+		/*
+		 * If the submodule being adding isn't already covered by the
+		 * current configured pathspec, set the submodule's active flag
+		 */
+		if (!is_submodule_active(the_repository, info->sm_path)) {
+			key = xstrfmt("submodule.%s.active", info->sm_name);
+			git_config_set_gently(key, "true");
+			free(key);
+		}
+	} else {
+		key = xstrfmt("submodule.%s.active", info->sm_name);
+		git_config_set_gently(key, "true");
+		free(key);
+	}

... by knowing why we check submodule.active but discard its value,
future developers (read: I think this is outside the scope of this
series) can rewrite it like so to make it more readable and reduce
the repeated code to set submodule.$sm_name.active to true.

        /*
	 * If submodule.active does not exist, we will activate this
	 * module unconditionally.
	 *
         * Otherwise, is_submodule_active() is asked to determine if
         * the path currently is of interest; because it will obtain
         * and iterate over this multi-valued variable by itself, we
         * do not need its values we obtain from git_config_get_string()
         * call here.  We are only checking if we need to ask the
         * is_submodule_active() helper function.  We explicitly set
	 * the submodule.$sm_name.active if submodule.active patterns
	 * do not cover the path (i.e. is_submodule_active() says "no".
         */
	if (git_config_get_string("submodule.active", &var) ||
	    !is_submodule_active(the_repository, info->sm_path)) {
		key = xstrfmt(...);
		git_config_set_gently(key, "true");
		free(key);
	}

and a comment like this would help such readers, for example.

By the way, as you might have noticed, your [1/3] contains a lot of
material that ought to be part of [2/3], doesn't it?  [1/3] was
supposed to be just borrowing helper from dir.c but has the new
"add" code implemented in the same patch.

> Also, could you please comment on the other two issues I
> mentioned in the cover letter so I might as well start work on v4 of
> this patch?

I'll leave the other two to other reviewers and mentors for now, but
may come back to them if I beat them.

Thanks.
Junio C Hamano Dec. 22, 2020, 11:42 p.m. UTC | #4
Shourya Shukla <periperidip@gmail.com> writes:

> Feedback and reviews are appreciated.
>
> Regards,
> Shourya Shukla
>
> Shourya Shukla (3):
>   dir: change the scope of function 'directory_exists_in_index()'
>   submodule: port submodule subcommand 'add' from shell to C
>   t7400: add test to check 'submodule add' for tracked paths

Sorry for not being a feedback nor a review, but we are seeing a
segfault from "git submodule add" when the topic is tested with the
rest of 'seen':

  https://github.com/git/git/runs/1597682274#step:6:3155

It seems that you need to be logged in to see the full CI output to
the line level when visiting the above URL.

Thanks.