diff mbox series

[GSoC,2/2] refs: add name and content check for file backend

Message ID 20240530122753.1114818-3-shejialuo@gmail.com (mailing list archive)
State Superseded
Headers show
Series ref consistency check infra setup | expand

Commit Message

shejialuo May 30, 2024, 12:27 p.m. UTC
The existing git-fsck does not fully check bad format ref name such as
standalone '@' or name ending with ".lock". And also `git-fsck` does not
fully check ref content, the following contents will not be reported by
the original git-fsck command.

1. "c71dba5654404b9b0d378a6cbff1b743b9d31a34 garbage": with trailing
   characters.
2. "c71dba5654404b9b0d378a6cbff1b743b9d31a34    ": with trailing empty
   characters.
3. "C71DBA5654404b9b0d378a6cbff1b743b9d31A34": with uppercase hex.
4. "z71dba5654404b9b0d378a6cbff1b743b9d31a34": with not hex character.

Provide functionality to support such consistency checks for branch and
tag refs and add a new unit test to verify this functionality.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     | 112 ++++++++++++++++++++++++++++++++-
 t/t0602-reffiles-fsck.sh | 132 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 243 insertions(+), 1 deletion(-)
 create mode 100755 t/t0602-reffiles-fsck.sh

Comments

Junio C Hamano May 31, 2024, 2:23 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> +static int files_fsck_refs_content(const char *refs_check_dir,
> +				struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;

The caller makes sure that this gets called only on a regular file,
so ...

> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {

... the use of strbuf_read_file() is fine (i.e. we do not have to
worry about the iter->path to be pointing at a symbolic link) here.

> +		error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename);
> +		goto clean;
> +	}
> +
> +	/*
> +	 * Case 1: check if the ref content length is valid and the last
> +	 * character is a newline.
> +	 */
> +	if (ref_content.len != the_hash_algo->hexsz + 1 ||
> +			ref_content.buf[ref_content.len - 1] != '\n') {

Funny indentation.

	if (ref_content.len != the_hash_algo->hexsz + 1 ||
	    ref_content.buf[ref_content.len - 1] != '\n') {

Also, we do not want {braces} around a single statement block.

In any case, the two checks are good ONLY for regular refs and not
for symbolic refs.  The users are free to create symbolic refs next
to their branches, e.g. here is a way to say, "among the maintenance
tracks maint-2.30, maint-2.31, ... maint-2.44, maint-2.45, what I
consider the primary maintenance track is currently maint-2.45":

    $ git symbolic-ref refs/heads/maint
    refs/heads/maint-2.45

and if your directory walk encounters such a symbolic ref, your
strbuf_read_file() will read something like:

    $ cat .git/refs/heads/maint
    ref: refs/heads/maint-2.45

Also, the caller also needs to be prepared to find a real symbolic
link that is used as a symbolic ref, but for them you'd need to do a
readlink() and the expected contents would not have "ref: " prefix,
so the code to implement actual check would have to be different.
The way how refs/files-backend.c:read_ref_internal() handles S_ISLNK
would be illuminating.

> +		goto failure;
> +	}
> +	/*
> +	 * Case 2: the content should be range of [0-9a-f].
> +	 */
> +	for (size_t i = 0; i < the_hash_algo->hexsz; i++) {
> +		if (!isdigit(ref_content.buf[i]) &&
> +				(ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) {
> +			goto failure;

I do not think it is a good idea to suddenly redefine what a valid
way to write object names in a loose ref file after ~20 years.
Given the popularity of Git, I would not be surprised at all if a
third-party tool or two have been writing their own refs with
uppercase hexadecimal and we have been happily using them as
everybody expects.  It is a good idea to be strict when you are a
producer, and we do write object names always in lowercase hex, but
we are lenient when consuming what is possibly written by others.
As long as the hexadecimal string names an existing object
(otherwise we have a dangling ref which would be another kind of
error, I presume), it should be at most FSCK_WARN when it does not
look like what _we_ wrote (e.g., if it uses uppercase hex, or if it
has extra bytes after the 40-hex (or 64-hex) other than the final
LF).  If it is shorter, of course that is an outright FSCK_ERROR.

How well does this interact with the fsck error levels (aka
fsck_msg_type), by the way?  It should be made to work well if the
current design does not.

> +		}
> +	}
> +
> +	strbuf_release(&ref_content);
> +	return 0;
> +
> +failure:
> +	error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename);

In addition, when we honor fsck error levels, such an unconditional
call to "error" may become an issue.  If the user says a certain
kind of anomaly is tolerated by setting a specific finding to
FSCK_IGNORE, we should be silent about our finding.

> +clean:
> +	strbuf_release(&ref_content);
> +	return -1;
> +}
> +
> +static int files_fsck_refs(struct ref_store *ref_store,
> +				const char* refs_check_dir,
> +				files_fsck_refs_fn *fsck_refs_fns)
> +{
> +	struct dir_iterator *iter;
> +	struct strbuf sb = STRBUF_INIT;
> +	int ret = 0;
> +	int iter_status;
> +
> +	strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir);
> +
> +	iter = dir_iterator_begin(sb.buf, 0);
> +
> +	/*
> +	 * The current implementation does not care about the worktree, the worktree
> +	 * may have no refs/heads or refs/tags directory. Simply return 0 now.
> +	*/
> +	if (!iter) {
> +		return 0;
> +	}
> +
> +	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
> +		if (S_ISDIR(iter->st.st_mode)) {
> +			continue;
> +		} else if (S_ISREG(iter->st.st_mode)) {
> +			for (files_fsck_refs_fn *fsck_refs_fn = fsck_refs_fns;
> +					*fsck_refs_fn; fsck_refs_fn++) {

Funny indentation (again---the patch 2/2 has funny "two extra HT"
indentation style all over the place).  Write it like so:

			for (files_fsck_refs_fn *fn = fsck_refs_fns;
			     fn;
                             fn++)

> +				ret |= (*fsck_refs_fn)(refs_check_dir, iter);

A pointer to a function does not need to be dereferenced explicitly
like so.  Also, writing

				if (fn(refs_check_dir, iter))
					ret = -1;

would be more consistent with the rest of the code around here,
which does not OR int ret but explicitly set it to -1 when any
helper function detects an error.

> +			}
> +		} else {
> +			error(_("unexpected file type for '%s'"), iter->basename);
> +			ret = -1;

This is wrong.  A symbolic link is a valid symbolic ref, even though
we no longer create such a symbolic ref by default.

> +		}
> +	}
> +
> +	if (iter_status != ITER_DONE) {
> +		ret = -1;
> +		error(_("failed to iterate over '%s'"), sb.buf);
> +	}
> +
> +	strbuf_release(&sb);
> +
> +	return ret;
> +}
> +
> +static int files_fsck(struct ref_store *ref_store)
> +{
> +	int ret = 0;
> +
> +	files_fsck_refs_fn fsck_refs_fns[] = {
> +		files_fsck_refs_name,
> +		files_fsck_refs_content,
> +		NULL
> +	};
> +
> +	ret = files_fsck_refs(ref_store, "refs/heads",fsck_refs_fns)

Missing SP after a comma.

> +	    | files_fsck_refs(ref_store, "refs/tags", fsck_refs_fns);

Why only these two hierarchies?  Shouldn't it also be checking the
remote tracking branches and notes?

> +	return ret;
> +}
> +
>  struct ref_storage_be refs_be_files = {
>  	.name = "files",
>  	.init = files_ref_store_create,
shejialuo May 31, 2024, 4 p.m. UTC | #2
On Fri, May 31, 2024 at 07:23:31AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > +static int files_fsck_refs_content(const char *refs_check_dir,
> > +				struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> 
> The caller makes sure that this gets called only on a regular file,
> so ...
> 
> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> 
> ... the use of strbuf_read_file() is fine (i.e. we do not have to
> worry about the iter->path to be pointing at a symbolic link) here.
> 
> > +		error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename);
> > +		goto clean;
> > +	}
> > +
> > +	/*
> > +	 * Case 1: check if the ref content length is valid and the last
> > +	 * character is a newline.
> > +	 */
> > +	if (ref_content.len != the_hash_algo->hexsz + 1 ||
> > +			ref_content.buf[ref_content.len - 1] != '\n') {
> 
> Funny indentation.
> 
> 	if (ref_content.len != the_hash_algo->hexsz + 1 ||
> 	    ref_content.buf[ref_content.len - 1] != '\n') {
> 
> Also, we do not want {braces} around a single statement block.
> 

I will clean the code. I am used to applying one extra HT.

> In any case, the two checks are good ONLY for regular refs and not
> for symbolic refs.  The users are free to create symbolic refs next
> to their branches, e.g. here is a way to say, "among the maintenance
> tracks maint-2.30, maint-2.31, ... maint-2.44, maint-2.45, what I
> consider the primary maintenance track is currently maint-2.45":
> 
>     $ git symbolic-ref refs/heads/maint
>     refs/heads/maint-2.45
> 
> and if your directory walk encounters such a symbolic ref, your
> strbuf_read_file() will read something like:
> 
>     $ cat .git/refs/heads/maint
>     ref: refs/heads/maint-2.45
> 

Yes, currently I intentionally ignore the symbolic ref. My original idea
was to make this patch as small as possible to set up the infrastructure
of the ref consistence check.

So this is why I only choose "refs/heads" and "refs/tags" at current.
The "refs/remote" will contain "HEAD" which is a symbolic ref. So I
ignore "refs/remote".

I just thought git would ONLY have one symbolic ref "HEAD". I will add
more patches in the next version to support the symbolic ref checks.

> Also, the caller also needs to be prepared to find a real symbolic
> link that is used as a symbolic ref, but for them you'd need to do a
> readlink() and the expected contents would not have "ref: " prefix,
> so the code to implement actual check would have to be different.
> The way how refs/files-backend.c:read_ref_internal() handles S_ISLNK
> would be illuminating.
> 

I will handle this in the next version. However, I am a little curious
about the reason why people use a real symbolic in git. git has
provided the symbolic ref which should be more consistent and generic
against the OS.

> > +		goto failure;
> > +	}
> > +	/*
> > +	 * Case 2: the content should be range of [0-9a-f].
> > +	 */
> > +	for (size_t i = 0; i < the_hash_algo->hexsz; i++) {
> > +		if (!isdigit(ref_content.buf[i]) &&
> > +				(ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) {
> > +			goto failure;
> 
> I do not think it is a good idea to suddenly redefine what a valid
> way to write object names in a loose ref file after ~20 years.
> Given the popularity of Git, I would not be surprised at all if a
> third-party tool or two have been writing their own refs with
> uppercase hexadecimal and we have been happily using them as
> everybody expects.  It is a good idea to be strict when you are a
> producer, and we do write object names always in lowercase hex, but
> we are lenient when consuming what is possibly written by others.
> As long as the hexadecimal string names an existing object
> (otherwise we have a dangling ref which would be another kind of
> error, I presume), it should be at most FSCK_WARN when it does not
> look like what _we_ wrote (e.g., if it uses uppercase hex, or if it
> has extra bytes after the 40-hex (or 64-hex) other than the final
> LF).  If it is shorter, of course that is an outright FSCK_ERROR.
> 

Yes, actually I have done some experiments using git-fsck. I am really
surprised by the fact that the few bytes will cause FSCK_ERROR. But the
situation with extra bytes are OK. I don't know this design is
intentional.

> How well does this interact with the fsck error levels (aka
> fsck_msg_type), by the way?  It should be made to work well if the
> current design does not.
> 

Unfortunately, I totally ignored this. I will try to implement the fsck
error levels in the next version.

> > +		}
> > +	}
> > +
> > +	strbuf_release(&ref_content);
> > +	return 0;
> > +
> > +failure:
> > +	error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename);
> 
> In addition, when we honor fsck error levels, such an unconditional
> call to "error" may become an issue.  If the user says a certain
> kind of anomaly is tolerated by setting a specific finding to
> FSCK_IGNORE, we should be silent about our finding.
> 

Yes, I will add the code to support this.

> > +clean:
> > +	strbuf_release(&ref_content);
> > +	return -1;
> > +}
> > +
> > +static int files_fsck_refs(struct ref_store *ref_store,
> > +				const char* refs_check_dir,
> > +				files_fsck_refs_fn *fsck_refs_fns)
> > +{
> > +	struct dir_iterator *iter;
> > +	struct strbuf sb = STRBUF_INIT;
> > +	int ret = 0;
> > +	int iter_status;
> > +
> > +	strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir);
> > +
> > +	iter = dir_iterator_begin(sb.buf, 0);
> > +
> > +	/*
> > +	 * The current implementation does not care about the worktree, the worktree
> > +	 * may have no refs/heads or refs/tags directory. Simply return 0 now.
> > +	*/
> > +	if (!iter) {
> > +		return 0;
> > +	}
> > +
> > +	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
> > +		if (S_ISDIR(iter->st.st_mode)) {
> > +			continue;
> > +		} else if (S_ISREG(iter->st.st_mode)) {
> > +			for (files_fsck_refs_fn *fsck_refs_fn = fsck_refs_fns;
> > +					*fsck_refs_fn; fsck_refs_fn++) {
> 
> Funny indentation (again---the patch 2/2 has funny "two extra HT"
> indentation style all over the place).  Write it like so:
> 
> 			for (files_fsck_refs_fn *fn = fsck_refs_fns;
> 			     fn;
>                              fn++)
> 
> > +				ret |= (*fsck_refs_fn)(refs_check_dir, iter);
> 
> A pointer to a function does not need to be dereferenced explicitly
> like so.  Also, writing
> 
> 				if (fn(refs_check_dir, iter))
> 					ret = -1;
> 
> would be more consistent with the rest of the code around here,
> which does not OR int ret but explicitly set it to -1 when any
> helper function detects an error.
> 

I will handle this in the next version.

> > +			}
> > +		} else {
> > +			error(_("unexpected file type for '%s'"), iter->basename);
> > +			ret = -1;
> 
> This is wrong.  A symbolic link is a valid symbolic ref, even though
> we no longer create such a symbolic ref by default.
> 

As above said, I have thought git should not handle this situation. I
will implement support for real symbolic file.

> > +		}
> > +	}
> > +
> > +	if (iter_status != ITER_DONE) {
> > +		ret = -1;
> > +		error(_("failed to iterate over '%s'"), sb.buf);
> > +	}
> > +
> > +	strbuf_release(&sb);
> > +
> > +	return ret;
> > +}
> > +
> > +static int files_fsck(struct ref_store *ref_store)
> > +{
> > +	int ret = 0;
> > +
> > +	files_fsck_refs_fn fsck_refs_fns[] = {
> > +		files_fsck_refs_name,
> > +		files_fsck_refs_content,
> > +		NULL
> > +	};
> > +
> > +	ret = files_fsck_refs(ref_store, "refs/heads",fsck_refs_fns)
> 
> Missing SP after a comma.
> 

I will fix it in the next version.

> > +	    | files_fsck_refs(ref_store, "refs/tags", fsck_refs_fns);
> 
> Why only these two hierarchies?  Shouldn't it also be checking the
> remote tracking branches and notes?
> 

I have explained above, because I intentionally ignored the symbolic ref
in this patch, so I don't handle "refs/remote" and notes. I guess I will
handle these situations in the next patch.

> > +	return ret;
> > +}
> > +
> >  struct ref_storage_be refs_be_files = {
> >  	.name = "files",
> >  	.init = files_ref_store_create,

Thanks for your dedicated reviews. I have a more clean direction to
setup the consistence check for file backend.

Thanks,
Jialuo
Eric Sunshine May 31, 2024, 6:31 p.m. UTC | #3
On Fri, May 31, 2024 at 10:24 AM Junio C Hamano <gitster@pobox.com> wrote:
> shejialuo <shejialuo@gmail.com> writes:
> > +                     error(_("unexpected file type for '%s'"), iter->basename);
> > +                     ret = -1;
>
> This is wrong.  A symbolic link is a valid symbolic ref, even though
> we no longer create such a symbolic ref by default.

Very minor additional note: error() unconditionally returns -1 in
order to support this pattern:

    ret = error(_("..."));
Patrick Steinhardt June 3, 2024, 7:22 a.m. UTC | #4
On Thu, May 30, 2024 at 08:27:53PM +0800, shejialuo wrote:
> The existing git-fsck does not fully check bad format ref name such as
> standalone '@' or name ending with ".lock". And also `git-fsck` does not
> fully check ref content, the following contents will not be reported by
> the original git-fsck command.
> 
> 1. "c71dba5654404b9b0d378a6cbff1b743b9d31a34 garbage": with trailing
>    characters.
> 2. "c71dba5654404b9b0d378a6cbff1b743b9d31a34    ": with trailing empty
>    characters.
> 3. "C71DBA5654404b9b0d378a6cbff1b743b9d31A34": with uppercase hex.
> 4. "z71dba5654404b9b0d378a6cbff1b743b9d31a34": with not hex character.
> 
> Provide functionality to support such consistency checks for branch and
> tag refs and add a new unit test to verify this functionality.

I would recommend to only introduce one check per commit. This commit
introduces multiple different checks to the files backend, which makes
it harder to review.

[snip]
> +static int files_fsck_refs_content(const char *refs_check_dir,
> +				struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename);
> +		goto clean;
> +	}
> +
> +	/*
> +	 * Case 1: check if the ref content length is valid and the last
> +	 * character is a newline.
> +	 */
> +	if (ref_content.len != the_hash_algo->hexsz + 1 ||
> +			ref_content.buf[ref_content.len - 1] != '\n') {
> +		goto failure;
> +	}
> +
> +	/*
> +	 * Case 2: the content should be range of [0-9a-f].
> +	 */
> +	for (size_t i = 0; i < the_hash_algo->hexsz; i++) {
> +		if (!isdigit(ref_content.buf[i]) &&
> +				(ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) {
> +			goto failure;
> +		}
> +	}
> +
> +	strbuf_release(&ref_content);
> +	return 0;
> +
> +failure:
> +	error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename);
> +
> +clean:
> +	strbuf_release(&ref_content);
> +	return -1;

Can we reuse `parse_loose_ref_contents()` to get rid of much of the
duplication here? That function would need to learn to indicate to the
caller that there is trailing data, but other than that it's already as
strict as we want it to be.

Other than that, Junio has already added quite a bunch of comments, so
I'll refrain from repeating them here.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index b6147c588b..3c811f5c03 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3299,11 +3299,121 @@  static int files_init_db(struct ref_store *ref_store,
 	return 0;
 }
 
-static int files_fsck(struct ref_store *ref_store)
+typedef int (*files_fsck_refs_fn)(const char *refs_check_dir,
+				struct dir_iterator *iter);
+
+static int files_fsck_refs_name(const char *refs_check_dir,
+				struct dir_iterator *iter)
 {
+	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+		error(_("%s/%s: invalid refname format"), refs_check_dir, iter->basename);
+		return -1;
+	}
+
 	return 0;
 }
 
+static int files_fsck_refs_content(const char *refs_check_dir,
+				struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename);
+		goto clean;
+	}
+
+	/*
+	 * Case 1: check if the ref content length is valid and the last
+	 * character is a newline.
+	 */
+	if (ref_content.len != the_hash_algo->hexsz + 1 ||
+			ref_content.buf[ref_content.len - 1] != '\n') {
+		goto failure;
+	}
+
+	/*
+	 * Case 2: the content should be range of [0-9a-f].
+	 */
+	for (size_t i = 0; i < the_hash_algo->hexsz; i++) {
+		if (!isdigit(ref_content.buf[i]) &&
+				(ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) {
+			goto failure;
+		}
+	}
+
+	strbuf_release(&ref_content);
+	return 0;
+
+failure:
+	error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename);
+
+clean:
+	strbuf_release(&ref_content);
+	return -1;
+}
+
+static int files_fsck_refs(struct ref_store *ref_store,
+				const char* refs_check_dir,
+				files_fsck_refs_fn *fsck_refs_fns)
+{
+	struct dir_iterator *iter;
+	struct strbuf sb = STRBUF_INIT;
+	int ret = 0;
+	int iter_status;
+
+	strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir);
+
+	iter = dir_iterator_begin(sb.buf, 0);
+
+	/*
+	 * The current implementation does not care about the worktree, the worktree
+	 * may have no refs/heads or refs/tags directory. Simply return 0 now.
+	*/
+	if (!iter) {
+		return 0;
+	}
+
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
+		if (S_ISDIR(iter->st.st_mode)) {
+			continue;
+		} else if (S_ISREG(iter->st.st_mode)) {
+			for (files_fsck_refs_fn *fsck_refs_fn = fsck_refs_fns;
+					*fsck_refs_fn; fsck_refs_fn++) {
+				ret |= (*fsck_refs_fn)(refs_check_dir, iter);
+			}
+		} else {
+			error(_("unexpected file type for '%s'"), iter->basename);
+			ret = -1;
+		}
+	}
+
+	if (iter_status != ITER_DONE) {
+		ret = -1;
+		error(_("failed to iterate over '%s'"), sb.buf);
+	}
+
+	strbuf_release(&sb);
+
+	return ret;
+}
+
+static int files_fsck(struct ref_store *ref_store)
+{
+	int ret = 0;
+
+	files_fsck_refs_fn fsck_refs_fns[] = {
+		files_fsck_refs_name,
+		files_fsck_refs_content,
+		NULL
+	};
+
+	ret = files_fsck_refs(ref_store, "refs/heads",fsck_refs_fns)
+	    | files_fsck_refs(ref_store, "refs/tags", fsck_refs_fns);
+
+	return ret;
+}
+
 struct ref_storage_be refs_be_files = {
 	.name = "files",
 	.init = files_ref_store_create,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
new file mode 100755
index 0000000000..1c6c3804ff
--- /dev/null
+++ b/t/t0602-reffiles-fsck.sh
@@ -0,0 +1,132 @@ 
+#!/bin/bash
+
+test_description='Test reffiles backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=files
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'ref name should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+		git checkout -b branch-1 &&
+		git tag tag-1 &&
+		git commit --allow-empty -m second &&
+		git checkout -b branch-2 &&
+		git tag tag-2
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/.branch-1: invalid refname format
+		error: ref database is corrupt
+		EOF
+		rm $branch_dir_prefix/.branch-1 &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/tag-1.lock: invalid refname format
+		error: ref database is corrupt
+		EOF
+		rm $tag_dir_prefix/tag-1.lock &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/@: invalid refname format
+		error: ref database is corrupt
+		EOF
+		rm $branch_dir_prefix/@ &&
+		test_cmp expect err
+	)
+'
+
+test_expect_success 'ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+		git checkout -b branch-1 &&
+		git tag tag-1 &&
+		git commit --allow-empty -m second &&
+		git checkout -b branch-2 &&
+		git tag tag-2
+	) &&
+	(
+		cd repo &&
+		printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-1-garbage: invalid ref content
+		error: ref database is corrupt
+		EOF
+		rm $branch_dir_prefix/branch-1-garbage &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/tag-1-garbage: invalid ref content
+		error: ref database is corrupt
+		EOF
+		rm $tag_dir_prefix/tag-1-garbage &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		printf "%s    " "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/tag-2-garbage: invalid ref content
+		error: ref database is corrupt
+		EOF
+		rm $tag_dir_prefix/tag-2-garbage &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		tr -d "\n" < $branch_dir_prefix/branch-1 > $branch_dir_prefix/branch-1-without-newline &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-1-without-newline: invalid ref content
+		error: ref database is corrupt
+		EOF
+		rm $branch_dir_prefix/branch-1-without-newline &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		tr "[:lower:]" "[:upper:]" < $branch_dir_prefix/branch-2 > $branch_dir_prefix/branch-2-upper &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-2-upper: invalid ref content
+		error: ref database is corrupt
+		EOF
+		rm $branch_dir_prefix/branch-2-upper &&
+		test_cmp expect err
+	)
+'
+
+test_done