diff mbox series

[1/7] fsck_tree(): fix shadowed variable

Message ID 20201005071905.GA2291074@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series forbidding symlinked .gitattributes and .gitignore | expand

Commit Message

Jeff King Oct. 5, 2020, 7:19 a.m. UTC
Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).

Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.

Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop wee already correct but need touched), but makes sure we catch
all cases and will avoid similar confusion in the future.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                     | 40 +++++++++++++++++++-------------------
 t/t7415-submodule-names.sh |  5 +++--
 2 files changed, 23 insertions(+), 22 deletions(-)

Comments

Jonathan Nieder Oct. 5, 2020, 7:44 a.m. UTC | #1
Jeff King wrote:

> Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
> fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
> fsck_tree(), and we pass it to the report() function when we find
> problems. However, that is shadowed within the tree-walking loop by the
> existing "oid" variable which we use to store the oid of each tree
> entry. As a result, we may report the wrong oid for some problems we
> detect within the loop (the entry oid, instead of the tree oid).
>
> Our tests didn't catch this because they checked only that we found the
> expected fsck problem, not that it was attached to the correct object.

Oh, goodness.  Does this mean we should be similarly checking oid in
the rest of the fsck test scripts?  (I'm not saying this patch should
do so, just curious about what you think on the subject.)

> Let's rename both variables in the function to avoid confusion. This
> makes the diff a little noisy (e.g., all of the report() calls outside
> the loop wee already correct but need touched), but makes sure we catch

nit: s/wee/are/, s/need touched/need to be touched/

> all cases and will avoid similar confusion in the future.

Thanks for doing that --- it does make reviewing easier.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  fsck.c                     | 40 +++++++++++++++++++-------------------
>  t/t7415-submodule-names.sh |  5 +++--
>  2 files changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/fsck.c b/fsck.c
> index f82e2fe9e3..46a108839f 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
>  	return c1 < c2 ? 0 : TREE_UNORDERED;
>  }
>  
> -static int fsck_tree(const struct object_id *oid,
> +static int fsck_tree(const struct object_id *tree_oid,

optional: we could call it "tree".

>  		     const char *buffer, unsigned long size,
>  		     struct fsck_options *options)
>  {
> @@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid,
>  	struct name_stack df_dup_candidates = { NULL };
>  
>  	if (init_tree_desc_gently(&desc, buffer, size)) {
> -		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
> +		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
>  		return retval;
>  	}
>  
> @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
>  	while (desc.size) {
>  		unsigned short mode;
>  		const char *name, *backslash;
> -		const struct object_id *oid;
> +		const struct object_id *entry_oid;
>  
> -		oid = tree_entry_extract(&desc, &name, &mode);
> +		entry_oid = tree_entry_extract(&desc, &name, &mode);

optional: could call it "child".

>  
> -		has_null_sha1 |= is_null_oid(oid);
> +		has_null_sha1 |= is_null_oid(entry_oid);
>  		has_full_path |= !!strchr(name, '/');
>  		has_empty_name |= !*name;
>  		has_dot |= !strcmp(name, ".");
> @@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid,
>  
>  		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
>  			if (!S_ISLNK(mode))
> -				oidset_insert(&gitmodules_found, oid);
> +				oidset_insert(&gitmodules_found, entry_oid);
>  			else
>  				retval += report(options,
> -						 oid, OBJ_TREE,
> +						 tree_oid, OBJ_TREE,
>  						 FSCK_MSG_GITMODULES_SYMLINK,
>  						 ".gitmodules is a symbolic link");

Right, this is a property of the enclosing tree, not the blob
representing the symlink target.  Good.

not about this patch: Could report() notice when the oid doesn't match
the type passed in, to more easily catch this kind of mistake?

[...]
> @@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid,
>  				has_dotgit |= is_ntfs_dotgit(backslash);
>  				if (is_ntfs_dotgitmodules(backslash)) {
>  					if (!S_ISLNK(mode))
> -						oidset_insert(&gitmodules_found, oid);
> +						oidset_insert(&gitmodules_found, entry_oid);
>  					else
> -						retval += report(options, oid, OBJ_TREE,
> +						retval += report(options, tree_oid, OBJ_TREE,
>  								 FSCK_MSG_GITMODULES_SYMLINK,
>  								 ".gitmodules is a symbolic link");

Likewise, good.

[...]
> --- a/t/t7415-submodule-names.sh
> +++ b/t/t7415-submodule-names.sh
> @@ -148,13 +148,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>  		{
>  			printf "100644 blob $content\t$tricky\n" &&
>  			printf "120000 blob $target\t.gitmodules\n"
> -		} | git mktree &&
> +		} >bad-tree &&
> +		tree=$(git mktree <bad-tree) &&
>  
>  		# Check not only that we fail, but that it is due to the
>  		# symlink detector; this grep string comes from the config
>  		# variable name and will not be translated.
>  		test_must_fail git fsck 2>output &&
> -		test_i18ngrep gitmodulesSymlink output
> +		test_i18ngrep "tree $tree: gitmodulesSymlink" output

Makes sense.

By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
keyword?  Is this just a limitation of GETTEXT_POISON losing
information that's passed in with %s?

With the commit message tweak mentioned above,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.
Jeff King Oct. 5, 2020, 8:20 a.m. UTC | #2
On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote:

> > Our tests didn't catch this because they checked only that we found the
> > expected fsck problem, not that it was attached to the correct object.
> 
> Oh, goodness.  Does this mean we should be similarly checking oid in
> the rest of the fsck test scripts?  (I'm not saying this patch should
> do so, just curious about what you think on the subject.)

I don't feel strongly either way. It would be a bigger change than you
might hope, I'd expect, because it requires collecting the oid of the
bad object in the test script as we create it. So I'm not planning to
work on it, but if somebody else wants to, be my guest.

> > Let's rename both variables in the function to avoid confusion. This
> > makes the diff a little noisy (e.g., all of the report() calls outside
> > the loop wee already correct but need touched), but makes sure we catch
> 
> nit: s/wee/are/, s/need touched/need to be touched/

Foiled by my last-minute editing. It was originally "are", but I meant
to change it to "were".

> > -static int fsck_tree(const struct object_id *oid,
> > +static int fsck_tree(const struct object_id *tree_oid,
> 
> optional: we could call it "tree".

Yeah, I started that way but wondered if it was confusing with a "struct
tree" (which we don't have here; the point of the change that introduced
the shadowing was getting rid of that). Being longer doesn't hurt too
much and is quite clear.

> > @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
> >  	while (desc.size) {
> >  		unsigned short mode;
> >  		const char *name, *backslash;
> > -		const struct object_id *oid;
> > +		const struct object_id *entry_oid;
> >  
> > -		oid = tree_entry_extract(&desc, &name, &mode);
> > +		entry_oid = tree_entry_extract(&desc, &name, &mode);
> 
> optional: could call it "child".

IMHO that's too vague. We have a child name, a child mode, etc.

Another option is to just refer to desc.entry.oid directly, but that's
longer and seemed a bit too intimate with how tree_entry_extract works.

> not about this patch: Could report() notice when the oid doesn't match
> the type passed in, to more easily catch this kind of mistake?

Hmm. It would incur an extra hash lookup, but since we expect to
report() infrequently, I don't think the cost would be too high. I
suspect we could even drop the "type" field and have report() figure it
out from the oid, but that is working in the opposite direction of your
suggestion. ;)

I'm not 100% sure we'd always be able to look up such a struct, though.
This code path also gets called from index-pack as it's checking objects
(so likewise we might not even have something in the object database
yet).

> >  		# Check not only that we fail, but that it is due to the
> >  		# symlink detector; this grep string comes from the config
> >  		# variable name and will not be translated.
> >  		test_must_fail git fsck 2>output &&
> > -		test_i18ngrep gitmodulesSymlink output
> > +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
> 
> Makes sense.
> 
> By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
> keyword?  Is this just a limitation of GETTEXT_POISON losing
> information that's passed in with %s?

Yes, I think so. It comes from 674ba34038 (fsck: mark strings for
translation, 2018-11-10) which is passing through our string. Arguably
that commit made the comment lines rather confusing and pointless.

Though hmm. Looks like the "tree %s: %s" part is in the translated
string. So a translation _could_ change it, though I'd expect it to only
change the words and not the syntax. Maybe an RTL language would. I
dunno.

-Peff
Jonathan Nieder Oct. 5, 2020, 8:29 a.m. UTC | #3
Jeff King wrote:
> On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> -		test_i18ngrep gitmodulesSymlink output
>>> +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
>>
>> Makes sense.
>>
>> By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
>> keyword?  Is this just a limitation of GETTEXT_POISON losing
>> information that's passed in with %s?
>
> Yes, I think so. It comes from 674ba34038 (fsck: mark strings for
> translation, 2018-11-10) which is passing through our string. Arguably
> that commit made the comment lines rather confusing and pointless.
>
> Though hmm. Looks like the "tree %s: %s" part is in the translated
> string. So a translation _could_ change it, though I'd expect it to only
> change the words and not the syntax. Maybe an RTL language would. I
> dunno.

Ah, right.  I can use $n markers in the format string to reorder
parameters, so we don't want scripts to rely on their order.  In
principle one can imagine GETTEXT_POISON printing a random permutation
of its arguments but this kind of nondeterminism in tests would be
harder to debug than the current state where the args aren't passed
through at all.

And after all, the main point of GETTEXT_POISON is to allow us to test
scripted commands to make sure we are not translating any output that
scripts would want to use.  For that purpose, the current poisoning
does what we need.  (In other words, we don't encourage non-test
scripts to rely on these lines at all, even to grep for
gitmodulesSymlink.)

Thanks,
Jonathan
diff mbox series

Patch

diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..46a108839f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -633,7 +633,7 @@  static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object_id *tree_oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -654,7 +654,7 @@  static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -664,11 +664,11 @@  static int fsck_tree(const struct object_id *oid,
 	while (desc.size) {
 		unsigned short mode;
 		const char *name, *backslash;
-		const struct object_id *oid;
+		const struct object_id *entry_oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		entry_oid = tree_entry_extract(&desc, &name, &mode);
 
-		has_null_sha1 |= is_null_oid(oid);
+		has_null_sha1 |= is_null_oid(entry_oid);
 		has_full_path |= !!strchr(name, '/');
 		has_empty_name |= !*name;
 		has_dot |= !strcmp(name, ".");
@@ -678,10 +678,10 @@  static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&gitmodules_found, entry_oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -692,9 +692,9 @@  static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&gitmodules_found, entry_oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, tree_oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -703,7 +703,7 @@  static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -751,25 +751,25 @@  static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index f70368bc2e..5c95247180 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -148,13 +148,14 @@  test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} | git mktree &&
+		} >bad-tree &&
+		tree=$(git mktree <bad-tree) &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector; this grep string comes from the config
 		# variable name and will not be translated.
 		test_must_fail git fsck 2>output &&
-		test_i18ngrep gitmodulesSymlink output
+		test_i18ngrep "tree $tree: gitmodulesSymlink" output
 	)
 '