Message ID | 20201005071905.GA2291074@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | forbidding symlinked .gitattributes and .gitignore | expand |
Jeff King wrote: > Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for > fsck_tree(), 2019-10-18) introduced a new "oid" parameter to > fsck_tree(), and we pass it to the report() function when we find > problems. However, that is shadowed within the tree-walking loop by the > existing "oid" variable which we use to store the oid of each tree > entry. As a result, we may report the wrong oid for some problems we > detect within the loop (the entry oid, instead of the tree oid). > > Our tests didn't catch this because they checked only that we found the > expected fsck problem, not that it was attached to the correct object. Oh, goodness. Does this mean we should be similarly checking oid in the rest of the fsck test scripts? (I'm not saying this patch should do so, just curious about what you think on the subject.) > Let's rename both variables in the function to avoid confusion. This > makes the diff a little noisy (e.g., all of the report() calls outside > the loop wee already correct but need touched), but makes sure we catch nit: s/wee/are/, s/need touched/need to be touched/ > all cases and will avoid similar confusion in the future. Thanks for doing that --- it does make reviewing easier. > Signed-off-by: Jeff King <peff@peff.net> > --- > fsck.c | 40 +++++++++++++++++++------------------- > t/t7415-submodule-names.sh | 5 +++-- > 2 files changed, 23 insertions(+), 22 deletions(-) > > diff --git a/fsck.c b/fsck.c > index f82e2fe9e3..46a108839f 100644 > --- a/fsck.c > +++ b/fsck.c > @@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1, > return c1 < c2 ? 0 : TREE_UNORDERED; > } > > -static int fsck_tree(const struct object_id *oid, > +static int fsck_tree(const struct object_id *tree_oid, optional: we could call it "tree". > const char *buffer, unsigned long size, > struct fsck_options *options) > { > @@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid, > struct name_stack df_dup_candidates = { NULL }; > > if (init_tree_desc_gently(&desc, buffer, size)) { > - retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); > + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); > return retval; > } > > @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid, > while (desc.size) { > unsigned short mode; > const char *name, *backslash; > - const struct object_id *oid; > + const struct object_id *entry_oid; > > - oid = tree_entry_extract(&desc, &name, &mode); > + entry_oid = tree_entry_extract(&desc, &name, &mode); optional: could call it "child". > > - has_null_sha1 |= is_null_oid(oid); > + has_null_sha1 |= is_null_oid(entry_oid); > has_full_path |= !!strchr(name, '/'); > has_empty_name |= !*name; > has_dot |= !strcmp(name, "."); > @@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid, > > if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) { > if (!S_ISLNK(mode)) > - oidset_insert(&gitmodules_found, oid); > + oidset_insert(&gitmodules_found, entry_oid); > else > retval += report(options, > - oid, OBJ_TREE, > + tree_oid, OBJ_TREE, > FSCK_MSG_GITMODULES_SYMLINK, > ".gitmodules is a symbolic link"); Right, this is a property of the enclosing tree, not the blob representing the symlink target. Good. not about this patch: Could report() notice when the oid doesn't match the type passed in, to more easily catch this kind of mistake? [...] > @@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid, > has_dotgit |= is_ntfs_dotgit(backslash); > if (is_ntfs_dotgitmodules(backslash)) { > if (!S_ISLNK(mode)) > - oidset_insert(&gitmodules_found, oid); > + oidset_insert(&gitmodules_found, entry_oid); > else > - retval += report(options, oid, OBJ_TREE, > + retval += report(options, tree_oid, OBJ_TREE, > FSCK_MSG_GITMODULES_SYMLINK, > ".gitmodules is a symbolic link"); Likewise, good. [...] > --- a/t/t7415-submodule-names.sh > +++ b/t/t7415-submodule-names.sh > @@ -148,13 +148,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' ' > { > printf "100644 blob $content\t$tricky\n" && > printf "120000 blob $target\t.gitmodules\n" > - } | git mktree && > + } >bad-tree && > + tree=$(git mktree <bad-tree) && > > # Check not only that we fail, but that it is due to the > # symlink detector; this grep string comes from the config > # variable name and will not be translated. > test_must_fail git fsck 2>output && > - test_i18ngrep gitmodulesSymlink output > + test_i18ngrep "tree $tree: gitmodulesSymlink" output Makes sense. By the way, why does GETTEXT_POISON lose the gitmodulesSymlink keyword? Is this just a limitation of GETTEXT_POISON losing information that's passed in with %s? With the commit message tweak mentioned above, Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Thanks.
On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote: > > Our tests didn't catch this because they checked only that we found the > > expected fsck problem, not that it was attached to the correct object. > > Oh, goodness. Does this mean we should be similarly checking oid in > the rest of the fsck test scripts? (I'm not saying this patch should > do so, just curious about what you think on the subject.) I don't feel strongly either way. It would be a bigger change than you might hope, I'd expect, because it requires collecting the oid of the bad object in the test script as we create it. So I'm not planning to work on it, but if somebody else wants to, be my guest. > > Let's rename both variables in the function to avoid confusion. This > > makes the diff a little noisy (e.g., all of the report() calls outside > > the loop wee already correct but need touched), but makes sure we catch > > nit: s/wee/are/, s/need touched/need to be touched/ Foiled by my last-minute editing. It was originally "are", but I meant to change it to "were". > > -static int fsck_tree(const struct object_id *oid, > > +static int fsck_tree(const struct object_id *tree_oid, > > optional: we could call it "tree". Yeah, I started that way but wondered if it was confusing with a "struct tree" (which we don't have here; the point of the change that introduced the shadowing was getting rid of that). Being longer doesn't hurt too much and is quite clear. > > @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid, > > while (desc.size) { > > unsigned short mode; > > const char *name, *backslash; > > - const struct object_id *oid; > > + const struct object_id *entry_oid; > > > > - oid = tree_entry_extract(&desc, &name, &mode); > > + entry_oid = tree_entry_extract(&desc, &name, &mode); > > optional: could call it "child". IMHO that's too vague. We have a child name, a child mode, etc. Another option is to just refer to desc.entry.oid directly, but that's longer and seemed a bit too intimate with how tree_entry_extract works. > not about this patch: Could report() notice when the oid doesn't match > the type passed in, to more easily catch this kind of mistake? Hmm. It would incur an extra hash lookup, but since we expect to report() infrequently, I don't think the cost would be too high. I suspect we could even drop the "type" field and have report() figure it out from the oid, but that is working in the opposite direction of your suggestion. ;) I'm not 100% sure we'd always be able to look up such a struct, though. This code path also gets called from index-pack as it's checking objects (so likewise we might not even have something in the object database yet). > > # Check not only that we fail, but that it is due to the > > # symlink detector; this grep string comes from the config > > # variable name and will not be translated. > > test_must_fail git fsck 2>output && > > - test_i18ngrep gitmodulesSymlink output > > + test_i18ngrep "tree $tree: gitmodulesSymlink" output > > Makes sense. > > By the way, why does GETTEXT_POISON lose the gitmodulesSymlink > keyword? Is this just a limitation of GETTEXT_POISON losing > information that's passed in with %s? Yes, I think so. It comes from 674ba34038 (fsck: mark strings for translation, 2018-11-10) which is passing through our string. Arguably that commit made the comment lines rather confusing and pointless. Though hmm. Looks like the "tree %s: %s" part is in the translated string. So a translation _could_ change it, though I'd expect it to only change the words and not the syntax. Maybe an RTL language would. I dunno. -Peff
Jeff King wrote: > On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote: >> Jeff King wrote: >>> - test_i18ngrep gitmodulesSymlink output >>> + test_i18ngrep "tree $tree: gitmodulesSymlink" output >> >> Makes sense. >> >> By the way, why does GETTEXT_POISON lose the gitmodulesSymlink >> keyword? Is this just a limitation of GETTEXT_POISON losing >> information that's passed in with %s? > > Yes, I think so. It comes from 674ba34038 (fsck: mark strings for > translation, 2018-11-10) which is passing through our string. Arguably > that commit made the comment lines rather confusing and pointless. > > Though hmm. Looks like the "tree %s: %s" part is in the translated > string. So a translation _could_ change it, though I'd expect it to only > change the words and not the syntax. Maybe an RTL language would. I > dunno. Ah, right. I can use $n markers in the format string to reorder parameters, so we don't want scripts to rely on their order. In principle one can imagine GETTEXT_POISON printing a random permutation of its arguments but this kind of nondeterminism in tests would be harder to debug than the current state where the args aren't passed through at all. And after all, the main point of GETTEXT_POISON is to allow us to test scripted commands to make sure we are not translating any output that scripts would want to use. For that purpose, the current poisoning does what we need. (In other words, we don't encourage non-test scripts to rely on these lines at all, even to grep for gitmodulesSymlink.) Thanks, Jonathan
diff --git a/fsck.c b/fsck.c index f82e2fe9e3..46a108839f 100644 --- a/fsck.c +++ b/fsck.c @@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1, return c1 < c2 ? 0 : TREE_UNORDERED; } -static int fsck_tree(const struct object_id *oid, +static int fsck_tree(const struct object_id *tree_oid, const char *buffer, unsigned long size, struct fsck_options *options) { @@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid, struct name_stack df_dup_candidates = { NULL }; if (init_tree_desc_gently(&desc, buffer, size)) { - retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); return retval; } @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid, while (desc.size) { unsigned short mode; const char *name, *backslash; - const struct object_id *oid; + const struct object_id *entry_oid; - oid = tree_entry_extract(&desc, &name, &mode); + entry_oid = tree_entry_extract(&desc, &name, &mode); - has_null_sha1 |= is_null_oid(oid); + has_null_sha1 |= is_null_oid(entry_oid); has_full_path |= !!strchr(name, '/'); has_empty_name |= !*name; has_dot |= !strcmp(name, "."); @@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid, if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) { if (!S_ISLNK(mode)) - oidset_insert(&gitmodules_found, oid); + oidset_insert(&gitmodules_found, entry_oid); else retval += report(options, - oid, OBJ_TREE, + tree_oid, OBJ_TREE, FSCK_MSG_GITMODULES_SYMLINK, ".gitmodules is a symbolic link"); } @@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid, has_dotgit |= is_ntfs_dotgit(backslash); if (is_ntfs_dotgitmodules(backslash)) { if (!S_ISLNK(mode)) - oidset_insert(&gitmodules_found, oid); + oidset_insert(&gitmodules_found, entry_oid); else - retval += report(options, oid, OBJ_TREE, + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_GITMODULES_SYMLINK, ".gitmodules is a symbolic link"); } @@ -703,7 +703,7 @@ static int fsck_tree(const struct object_id *oid, } if (update_tree_entry_gently(&desc)) { - retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); break; } @@ -751,25 +751,25 @@ static int fsck_tree(const struct object_id *oid, name_stack_clear(&df_dup_candidates); if (has_null_sha1) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1"); if (has_full_path) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames"); if (has_empty_name) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname"); if (has_dot) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'"); if (has_dotdot) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'"); if (has_dotgit) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'"); if (has_zero_pad) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes"); if (has_bad_modes) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes"); if (has_dup_entries) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries"); if (not_properly_sorted) - retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted"); + retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted"); return retval; } diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh index f70368bc2e..5c95247180 100755 --- a/t/t7415-submodule-names.sh +++ b/t/t7415-submodule-names.sh @@ -148,13 +148,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' ' { printf "100644 blob $content\t$tricky\n" && printf "120000 blob $target\t.gitmodules\n" - } | git mktree && + } >bad-tree && + tree=$(git mktree <bad-tree) && # Check not only that we fail, but that it is due to the # symlink detector; this grep string comes from the config # variable name and will not be translated. test_must_fail git fsck 2>output && - test_i18ngrep gitmodulesSymlink output + test_i18ngrep "tree $tree: gitmodulesSymlink" output ) '
Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for fsck_tree(), 2019-10-18) introduced a new "oid" parameter to fsck_tree(), and we pass it to the report() function when we find problems. However, that is shadowed within the tree-walking loop by the existing "oid" variable which we use to store the oid of each tree entry. As a result, we may report the wrong oid for some problems we detect within the loop (the entry oid, instead of the tree oid). Our tests didn't catch this because they checked only that we found the expected fsck problem, not that it was attached to the correct object. Let's rename both variables in the function to avoid confusion. This makes the diff a little noisy (e.g., all of the report() calls outside the loop wee already correct but need touched), but makes sure we catch all cases and will avoid similar confusion in the future. Signed-off-by: Jeff King <peff@peff.net> --- fsck.c | 40 +++++++++++++++++++------------------- t/t7415-submodule-names.sh | 5 +++-- 2 files changed, 23 insertions(+), 22 deletions(-)