diff mbox series

[v3,6/8] packed-backend: add "packed-refs" entry consistency check

Message ID Z6RPzIGD-fSwIEPV@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Feb. 6, 2025, 5:59 a.m. UTC
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:

1. Parse the main line of the ref entry to inspect whether the oid is
   not correct. Then, check whether the next character is oid. Then
   check the refname.
2. If the next line starts with '^', it would continue to parse the
   peeled oid and check whether the last character is '\n'.

As we decide to implement the ref consistency check for "packed-refs",
let's port these two checks and update the test to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/packed-backend.c         | 95 ++++++++++++++++++++++++++++++++++-
 t/t0602-reffiles-fsck.sh      | 42 ++++++++++++++++
 4 files changed, 140 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt Feb. 12, 2025, 9:56 a.m. UTC | #1
On Thu, Feb 06, 2025 at 01:59:40PM +0800, shejialuo wrote:
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index c8bb93bb18..658f6bc7da 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1826,6 +1899,26 @@ static int packed_fsck_ref_content(struct fsck_options *o,
>  		line_number++;
>  	}
>  
> +	while (start < eof) {
> +		strbuf_reset(&packed_entry);
> +		strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);

Instead of greedily computing the name of the line, can we pass in the
line number? The motivation is that in a well-formatted packed-refs file
we won't ever need this string at all, so it's wasteful to proactively
compute it for every single line.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index da321f16c6..3ab6b5bba5 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -664,4 +664,46 @@ test_expect_success 'packed-refs header should be checked' '
>  	)
>  '
>  
> +test_expect_success 'packed-refs content should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit default &&
> +		git branch branch-1 &&
> +		git branch branch-2 &&
> +		git tag -a annotated-tag-1 -m tag-1 &&
> +		git tag -a annotated-tag-2 -m tag-2 &&
> +
> +		branch_1_oid=$(git rev-parse branch-1) &&
> +		branch_2_oid=$(git rev-parse branch-2) &&
> +		tag_1_oid=$(git rev-parse annotated-tag-1) &&
> +		tag_2_oid=$(git rev-parse annotated-tag-2) &&
> +		tag_1_peeled_oid=$(git rev-parse annotated-tag-1^{}) &&
> +		tag_2_peeled_oid=$(git rev-parse annotated-tag-2^{}) &&
> +		short_oid=$(printf "%s" $tag_1_peeled_oid | cut -c 1-4) &&
> +
> +		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
> +		printf "%s\n" "$short_oid refs/heads/branch-1" >>.git/packed-refs &&
> +		printf "%sx\n" "$branch_1_oid" >>.git/packed-refs &&
> +		printf "%s   refs/heads/bad-branch\n" "$branch_2_oid" >>.git/packed-refs &&
> +		printf "%s refs/heads/branch.\n" "$branch_2_oid" >>.git/packed-refs &&
> +		printf "%s refs/tags/annotated-tag-3\n" "$tag_1_oid" >>.git/packed-refs &&
> +		printf "^%s\n" "$short_oid" >>.git/packed-refs &&
> +		printf "%s refs/tags/annotated-tag-4.\n" "$tag_2_oid" >>.git/packed-refs &&
> +		printf "^%s garbage\n" "$tag_2_peeled_oid" >>.git/packed-refs &&

This can be simplified using HERE docs.

        cat >.git/packed-refs <<-EOF
        # pack-refs with: peeled fully-peeled sorted 
        $short_oid refs/heads/branch-1
        ${branch_1_oid}x
        $branch_2_oid   refs/heads/bad-branch
        $branch_2_oid refs/heads/branch.
        $tag_1_oid refs/tags/annotated-tag-3
        ^$short_oid\n
        $tag_2_oid refs/tags/annotated-tag-4.
        ^$tag_2_peeled_oid garbage
        EOF

Patrick
shejialuo Feb. 12, 2025, 10:18 a.m. UTC | #2
On Wed, Feb 12, 2025 at 10:56:50AM +0100, Patrick Steinhardt wrote:
> On Thu, Feb 06, 2025 at 01:59:40PM +0800, shejialuo wrote:
> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index c8bb93bb18..658f6bc7da 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -1826,6 +1899,26 @@ static int packed_fsck_ref_content(struct fsck_options *o,
> >  		line_number++;
> >  	}
> >  
> > +	while (start < eof) {
> > +		strbuf_reset(&packed_entry);
> > +		strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);
> 
> Instead of greedily computing the name of the line, can we pass in the
> line number? The motivation is that in a well-formatted packed-refs file
> we won't ever need this string at all, so it's wasteful to proactively
> compute it for every single line.
> 

I agree with you here. And I already have idea to do this. Let me
improve this in the next version.

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index da321f16c6..3ab6b5bba5 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -664,4 +664,46 @@ test_expect_success 'packed-refs header should be checked' '
> >  	)
> >  '
> >  
> > +test_expect_success 'packed-refs content should be checked' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit default &&
> > +		git branch branch-1 &&
> > +		git branch branch-2 &&
> > +		git tag -a annotated-tag-1 -m tag-1 &&
> > +		git tag -a annotated-tag-2 -m tag-2 &&
> > +
> > +		branch_1_oid=$(git rev-parse branch-1) &&
> > +		branch_2_oid=$(git rev-parse branch-2) &&
> > +		tag_1_oid=$(git rev-parse annotated-tag-1) &&
> > +		tag_2_oid=$(git rev-parse annotated-tag-2) &&
> > +		tag_1_peeled_oid=$(git rev-parse annotated-tag-1^{}) &&
> > +		tag_2_peeled_oid=$(git rev-parse annotated-tag-2^{}) &&
> > +		short_oid=$(printf "%s" $tag_1_peeled_oid | cut -c 1-4) &&
> > +
> > +		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
> > +		printf "%s\n" "$short_oid refs/heads/branch-1" >>.git/packed-refs &&
> > +		printf "%sx\n" "$branch_1_oid" >>.git/packed-refs &&
> > +		printf "%s   refs/heads/bad-branch\n" "$branch_2_oid" >>.git/packed-refs &&
> > +		printf "%s refs/heads/branch.\n" "$branch_2_oid" >>.git/packed-refs &&
> > +		printf "%s refs/tags/annotated-tag-3\n" "$tag_1_oid" >>.git/packed-refs &&
> > +		printf "^%s\n" "$short_oid" >>.git/packed-refs &&
> > +		printf "%s refs/tags/annotated-tag-4.\n" "$tag_2_oid" >>.git/packed-refs &&
> > +		printf "^%s garbage\n" "$tag_2_peeled_oid" >>.git/packed-refs &&
> 
> This can be simplified using HERE docs.
> 
>         cat >.git/packed-refs <<-EOF
>         # pack-refs with: peeled fully-peeled sorted 
>         $short_oid refs/heads/branch-1
>         ${branch_1_oid}x
>         $branch_2_oid   refs/heads/bad-branch
>         $branch_2_oid refs/heads/branch.
>         $tag_1_oid refs/tags/annotated-tag-3
>         ^$short_oid\n
>         $tag_2_oid refs/tags/annotated-tag-4.
>         ^$tag_2_peeled_oid garbage
>         EOF
> 

Thanks for the suggestion, I will improve this in the next version.

> Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 11906f90fd..02a7bf0503 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -16,6 +16,9 @@ 
 `badObjectSha1`::
 	(ERROR) An object has a bad sha1.
 
+`badPackedRefEntry`::
+	(ERROR) The "packed-refs" file contains an invalid entry.
+
 `badPackedRefHeader`::
 	(ERROR) The "packed-refs" file contains an invalid
 	header.
diff --git a/fsck.h b/fsck.h
index 67e3c97bc0..14d70f6653 100644
--- a/fsck.h
+++ b/fsck.h
@@ -30,6 +30,7 @@  enum fsck_msg_type {
 	FUNC(BAD_EMAIL, ERROR) \
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
 	FUNC(BAD_PACKED_REF_HEADER, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
 	FUNC(BAD_REF_CONTENT, ERROR) \
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index c8bb93bb18..658f6bc7da 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1809,10 +1809,83 @@  static int packed_fsck_ref_header(struct fsck_options *o,
 	return 0;
 }
 
+static int packed_fsck_ref_peeled_line(struct fsck_options *o,
+				       struct ref_store *ref_store,
+				       struct strbuf *packed_entry,
+				       const char *start, const char *eol)
+{
+	struct fsck_ref_report report = { 0 };
+	struct object_id peeled;
+	const char *p;
+
+	report.path = packed_entry->buf;
+
+	/*
+	 * Skip the '^' and parse the peeled oid.
+	 */
+	start++;
+	if (parse_oid_hex_algop(start, &peeled, &p, ref_store->repo->hash_algo))
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "'%.*s' has invalid peeled oid",
+				       (int)(eol - start), start);
+
+	if (p != eol)
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "has trailing garbage after peeled oid '%.*s'",
+				       (int)(eol - p), p);
+
+	return 0;
+}
+
+static int packed_fsck_ref_main_line(struct fsck_options *o,
+				     struct ref_store *ref_store,
+				     struct strbuf *packed_entry,
+				     struct strbuf *refname,
+				     const char *start, const char *eol)
+{
+	struct fsck_ref_report report = { 0 };
+	struct object_id oid;
+	const char *p;
+
+	report.path = packed_entry->buf;
+
+	if (parse_oid_hex_algop(start, &oid, &p, ref_store->repo->hash_algo))
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "'%.*s' has invalid oid",
+				       (int)(eol - start), start);
+
+	if (p == eol || !isspace(*p))
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "has no space after oid '%s' but with '%.*s'",
+				       oid_to_hex(&oid), (int)(eol - p), p);
+
+	p++;
+	strbuf_reset(refname);
+	strbuf_add(refname, p, eol - p);
+	if (refname_contains_nul(refname))
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "refname '%s' contains NULL binaries",
+				       refname->buf);
+
+	if (check_refname_format(refname->buf, 0))
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_REF_NAME,
+				       "has bad refname '%s'", refname->buf);
+
+	return 0;
+}
+
 static int packed_fsck_ref_content(struct fsck_options *o,
+				   struct ref_store *ref_store,
 				   const char *start, const char *eof)
 {
 	struct strbuf packed_entry = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
 	unsigned long line_number = 1;
 	const char *eol;
 	int ret = 0;
@@ -1826,6 +1899,26 @@  static int packed_fsck_ref_content(struct fsck_options *o,
 		line_number++;
 	}
 
+	while (start < eof) {
+		strbuf_reset(&packed_entry);
+		strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);
+		ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+		ret |= packed_fsck_ref_main_line(o, ref_store, &packed_entry, &refname, start, eol);
+		start = eol + 1;
+		line_number++;
+		if (start < eof && *start == '^') {
+			strbuf_reset(&packed_entry);
+			strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);
+			ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+			ret |= packed_fsck_ref_peeled_line(o, ref_store, &packed_entry,
+							   start, eol);
+			start = eol + 1;
+			line_number++;
+		}
+	}
+
+	strbuf_release(&packed_entry);
+	strbuf_release(&refname);
 	strbuf_release(&packed_entry);
 	return ret;
 }
@@ -1873,7 +1966,7 @@  static int packed_fsck(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
-	ret = packed_fsck_ref_content(o, packed_ref_content.buf,
+	ret = packed_fsck_ref_content(o, ref_store, packed_ref_content.buf,
 				      packed_ref_content.buf + packed_ref_content.len);
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index da321f16c6..3ab6b5bba5 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -664,4 +664,46 @@  test_expect_success 'packed-refs header should be checked' '
 	)
 '
 
+test_expect_success 'packed-refs content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit default &&
+		git branch branch-1 &&
+		git branch branch-2 &&
+		git tag -a annotated-tag-1 -m tag-1 &&
+		git tag -a annotated-tag-2 -m tag-2 &&
+
+		branch_1_oid=$(git rev-parse branch-1) &&
+		branch_2_oid=$(git rev-parse branch-2) &&
+		tag_1_oid=$(git rev-parse annotated-tag-1) &&
+		tag_2_oid=$(git rev-parse annotated-tag-2) &&
+		tag_1_peeled_oid=$(git rev-parse annotated-tag-1^{}) &&
+		tag_2_peeled_oid=$(git rev-parse annotated-tag-2^{}) &&
+		short_oid=$(printf "%s" $tag_1_peeled_oid | cut -c 1-4) &&
+
+		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
+		printf "%s\n" "$short_oid refs/heads/branch-1" >>.git/packed-refs &&
+		printf "%sx\n" "$branch_1_oid" >>.git/packed-refs &&
+		printf "%s   refs/heads/bad-branch\n" "$branch_2_oid" >>.git/packed-refs &&
+		printf "%s refs/heads/branch.\n" "$branch_2_oid" >>.git/packed-refs &&
+		printf "%s refs/tags/annotated-tag-3\n" "$tag_1_oid" >>.git/packed-refs &&
+		printf "^%s\n" "$short_oid" >>.git/packed-refs &&
+		printf "%s refs/tags/annotated-tag-4.\n" "$tag_2_oid" >>.git/packed-refs &&
+		printf "^%s garbage\n" "$tag_2_peeled_oid" >>.git/packed-refs &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: packed-refs line 2: badPackedRefEntry: '\''$short_oid refs/heads/branch-1'\'' has invalid oid
+		error: packed-refs line 3: badPackedRefEntry: has no space after oid '\''$branch_1_oid'\'' but with '\''x'\''
+		error: packed-refs line 4: badRefName: has bad refname '\''  refs/heads/bad-branch'\''
+		error: packed-refs line 5: badRefName: has bad refname '\''refs/heads/branch.'\''
+		error: packed-refs line 7: badPackedRefEntry: '\''$short_oid'\'' has invalid peeled oid
+		error: packed-refs line 8: badRefName: has bad refname '\''refs/tags/annotated-tag-4.'\''
+		error: packed-refs line 9: badPackedRefEntry: has trailing garbage after peeled oid '\'' garbage'\''
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done