mbox series

[v2,0/2] Document fsck msg ids

Message ID pull.1369.v2.git.git.1666667864.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Document fsck msg ids | expand

Message

Philippe Blain via GitGitGadget Oct. 25, 2022, 3:17 a.m. UTC
fsck does a number of checks, and prints warning or error messages based on
the type of check. The only place the config values that control whether to
ignore these checks are documented is through git-help(1). However, often
times peoples' first instinct to look for a list of valid config values is
in the documentation page for git-config(1). These fsck. configuration
values can be hard to find.

Document these so that both git-config and git-fsck documentation has a list
of valid s.

Patch [1/2] removes an unused msg-id BAD_TAG_OBJECT Patch [2/2] adds a
fsck-msgids.txt that lists msg-ids that fsck checks for

Since v1:

 * provided a full list of error messages along with corresponding default
   severity
 * no longer including the whole list in git-config
 * add a comment in fsck.h to remind developers to keep the documentation
   accurate when making changes to the list of error messages

John Cai (2):
  fsck: remove the unused BAD_TAG_OBJECT
  fsck: document msg-id

 Documentation/config/fsck.txt |   4 +
 Documentation/fsck-msgids.txt | 149 ++++++++++++++++++++++++++++++++++
 Documentation/git-fsck.txt    |  12 +++
 fsck.h                        |   7 +-
 4 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/fsck-msgids.txt


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1369%2Fjohn-cai%2Fjc%2Fdocument-fsck-msg-id-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1369/john-cai/jc/document-fsck-msg-id-v2
Pull-Request: https://github.com/git/git/pull/1369

Range-diff vs v1:

 1:  f32ff5dc4ee ! 1:  797c6251671 fsck: remove the unused BAD_TAG_OBJECT
     @@ Metadata
       ## Commit message ##
          fsck: remove the unused BAD_TAG_OBJECT
      
     -    The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
     -    it.
     +    2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
     +    checking the tagged object referred to by a tag object, which is what the
     +    error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT
     +    message is no longer used anywhere.
     +
     +    Remove the BAD_TAG_OBJECT msg-id.
      
          Signed-off-by: John Cai <johncai86@gmail.com>
     +    Reviewed-by: Junio C Hamano <gitster@pobox.com>
      
       ## fsck.h ##
      @@ fsck.h: enum fsck_msg_type {
 2:  3aec3d2c9ca ! 2:  ad43adab435 fsck: document msg-id
     @@ Metadata
       ## Commit message ##
          fsck: document msg-id
      
     -    The git-config documentation lacks mention of specific <msg-id> that
     -    are supported. While git-help --config will display a list of these options,
     -    often developers' first instinct is to consult the git docs to find valid
     +    The documentation lacks mention of specific <msg-id> that are supported.
     +    While git-help --config will display a list of these options, often
     +    developers' first instinct is to consult the git docs to find valid
          config values.
      
     -    Add a section under the docs for fsck.<msg-id> with the msg-ids that
     -    git-fsck recognizes.
     +    Add a list of fsck error messages, and link to it from the git-fsck
     +    documentation.
      
          Signed-off-by: John Cai <johncai86@gmail.com>
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config/fsck.txt ##
      @@ Documentation/config/fsck.txt: allow new instances of the same breakages go unnoticed.
     @@ Documentation/config/fsck.txt: allow new instances of the same breakages go unno
       doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
       will only cause git to warn.
      ++
     -+The following `<msg-id>` are supported:
     -++
     ++See `Fsck Messages` section of linkgit:git-fsck[1] for supported
     ++values of `<msg-id>`.
      +
     -+include::../fsck-msgids.txt[]
       
       fsck.skipList::
       	The path to a list of object names (i.e. one unabbreviated SHA-1 per
      
       ## Documentation/fsck-msgids.txt (new) ##
      @@
     -+--
     -+`badDate`;;
     -+	Invalid date format in an author/committer line.
     ++`badDate`::
     ++	(ERROR) Invalid date format in an author/committer line.
     ++
     ++`badDateOverflow`::
     ++	(ERROR) Invalid date value in an author/committer line.
     ++
     ++`badEmail`::
     ++	(ERROR) Invalid email format in an author/committer line.
      +
     -+`badEmail`;;
     -+	Invalid email format in an author/committer line.
     ++`badFilemode`::
     ++	(INFO) A tree contains a bad filemode entry.
      +
     -+`badFilemode`;;
     -+	A tree contains a bad filemode entry.
     ++`badName`::
     ++	(ERROR) An author/committer name is empty.
      +
     -+`badName`;;
     -+	An author/committer name is empty.
     ++`badObjectSha1`::
     ++	(ERROR) An object has a bad sha1.
      +
     -+`badObjectSha1`;;
     -+	An object has a bad sha1.
     ++`badParentSha1`::
     ++	(ERROR) A commit object has a bad parent sha1.
      +
     -+`badParentSha1`;;
     -+	A commit object has a bad parent sha1.
     ++`badTagName`::
     ++	(INFO) A tag has an invalid format.
      +
     -+`badTagName`;;
     -+	A tag has an invalid format.
     ++`badTimezone`::
     ++	(ERROR) Found an invalid time zone in an author/committer line.
      +
     -+`badTimezone`;;
     -+	Found an invalid time zone in an author/committer line.
     ++`badTree`::
     ++	(ERROR) A tree cannot be parsed.
      +
     -+`badTree`;;
     -+	A tree cannot be parsed.
     ++`badTreeSha1`::
     ++	(ERROR) A tree has an invalid format.
      +
     -+`badTreeSha1`;;
     -+	A tree has an invalid format.
     ++`badType`::
     ++	(ERROR) Found an invalid object type.
      +
     -+`badType`;;
     -+	Found an invalid object type.
     ++`duplicateEntries`::
     ++	(ERROR) A tree contains duplicate file entries.
      +
     -+`duplicateEntries`;;
     -+	A tree contains duplicate file entries.
     ++`emptyName`::
     ++	(WARN) A path contains an empty name.
      +
     -+`emptyName`;;
     -+	A path contains an empty name.
     ++`extraHeaderEntry`::
     ++	(IGNORE) Extra headers found after `tagger`.
      +
     -+`fullPathName`;;
     -+	A path contains the full path starting with "/".
     ++`fullPathName`::
     ++	(WARN) A path contains the full path starting with "/".
      +
     -+`gitAttributesSymlink`;;
     -+	`.gitattributes` is a symlink.
     ++`gitAttributesSymlink`::
     ++	(INFO) `.gitattributes` is a symlink.
      +
     -+`gitignoreSymlink`;;
     -+	`.gitignore` is a symlink.
     ++`gitignoreSymlink`::
     ++	(INFO) `.gitignore` is a symlink.
      +
     -+`gitmodulesBlob`;;
     -+	A non-blob found at `.gitmodules`.
     ++`gitmodulesBlob`::
     ++	(ERROR) A non-blob found at `.gitmodules`.
      +
     -+`gitmodulesMissing`;;
     -+	Unable to read `.gitmodules` blob.
     ++`gitmodulesMissing`::
     ++	(ERROR) Unable to read `.gitmodules` blob.
      +
     -+`gitmodulesName`;;
     -+	A submodule name is invalid.
     ++`gitmodulesName`::
     ++	(ERROR) A submodule name is invalid.
      +
     -+`gitmodulesParse`;;
     -+	Could not parse `.gitmodules` blob.
     ++`gitmodulesParse`::
     ++	(INFO) Could not parse `.gitmodules` blob.
      +
      +`gitmodulesLarge`;
     -+	`.gitmodules` blob is too large to parse.
     ++	(ERROR) `.gitmodules` blob is too large to parse.
     ++
     ++`gitmodulesPath`::
     ++	(ERROR) `.gitmodules` path is invalid.
     ++
     ++`gitmodulesSymlink`::
     ++	(ERROR) `.gitmodules` is a symlink.
      +
     -+`gitmodulesPath`;;
     -+	`.gitmodules` path is invalid.
     ++`gitmodulesUpdate`::
     ++	(ERROR) Found an invalid submodule update setting.
      +
     -+`gitmodulesSymlink`;;
     -+	`.gitmodules` is a symlink.
     ++`gitmodulesUrl`::
     ++	(ERROR) Found an invalid submodule url.
      +
     -+`gitmodulesUpdate`;;
     -+	Found an invalid submodule update setting.
     ++`hasDot`::
     ++	(WARN) A tree contains an entry named `.`.
      +
     -+`gitmodulesUrl`;;
     -+	Found an invalid submodule url.
     ++`hasDotdot`::
     ++	(WARN) A tree contains an entry named `..`.
      +
     -+`hasDot`;;
     -+	A tree contains an entry named `.`.
     ++`hasDotgit`::
     ++	(WARN) A tree contains an entry named `.git`.
      +
     -+`hasDotdot`;;
     -+	A tree contains an entry named `..`.
     ++`mailmapSymlink`::
     ++	(INFO) `.mailmap` is a symlink.
      +
     -+`hasDotgit`;;
     -+	A tree contains an entry named `.git`.
     ++`missingAuthor`::
     ++	(ERROR) Author is missing.
      +
     -+`mailmapSymlink`;;
     -+	`.mailmap` is a symlink.
     ++`missingCommitter`::
     ++	(ERROR) Committer is missing.
      +
     -+`missingAuthor`;;
     -+	Author is missing.
     ++`missingEmail`::
     ++	(ERROR) Email is missing in an author/committer line.
      +
     -+`missingCommitter`;;
     -+	Committer is missing.
     ++`missingNameBeforeEmail`::
     ++	(ERROR) Missing name before an email in an author/committer line.
      +
     -+`missingEmail`;;
     -+	Email is missing in an author/committer line.
     ++`missingObject`::
     ++	(ERROR) Missing `object` line in tag object.
      +
     -+`missingNameBeforeEmail`;;
     -+	Missing space before an email in an author/committer line.
     ++`missingSpaceBeforeDate`::
     ++	(ERROR) Missing space before date in an author/committer line.
      +
     -+`missingObject`;;
     -+	Missing `object` line in tag object.
     ++`missingSpaceBeforeEmail`::
     ++	(ERROR) Missing space before the email in author/committer line.
      +
     -+`missingSpaceBeforeDate`;;
     -+	Missing space before date in an author/committer line.
     ++`missingTag`::
     ++	(ERROR) Unexpected end after `type` line in a tag object.
      +
     -+`missingSpaceBeforeEmail`;;
     -+	Missing space before the email in author/committer line.
     ++`missingTagEntry`::
     ++	(ERROR) Missing `tag` line in a tag object.
      +
     -+`missingTag`;;
     -+	Unexpected end after `type` line in a tag object.
     ++`missingTypeEntry`::
     ++	(ERROR) Missing `type` line in a tag object.
      +
     -+`missingTypeEntry`;;
     -+	Missing `type` line in a tag object.
     ++`multipleAuthors`::
     ++	(ERROR) Multiple author lines found in a commit.
      +
     -+`multipleAuthors`;;`
     -+	Multiple author lines found in a commit.
     ++`nulInCommit`::
     ++	(WARN) Found a NUL byte in the commit object body.
      +
     -+`nulInCommit`;;
     -+	Found a NUL byte in the commit object body.
     ++`nulInHeader`::
     ++	(FATAL) NUL byte exists in the object header.
      +
     -+`treeNotSorted`;;
     -+	A tree is not properly sorted.
     ++`nulInSha1`::
     ++	(WARN) Tree contains entries pointing to a null sha1.
      +
     -+`unknownType`;;
     -+	Found an unknown object type.
     ++`treeNotSorted`::
     ++	(ERROR) A tree is not properly sorted.
      +
     -+`zeroPaddingDate`;;
     -+	Found a zero padded date in an author/commiter line.
     ++`unknownType`::
     ++	(ERROR) Found an unknown object type.
      +
     -+`zeroPaddedFilemode`;;
     -+	Found a zero padded filemode in a tree.
     -+--
     ++`unterminatedHeader`::
     ++	(FATAL) Missing end-of-line in the object header.
     ++
     ++`zeroPaddingDate`::
     ++	(ERROR) Found a zero padded date in an author/commiter line.
     ++
     ++`zeroPaddedFilemode`::
     ++	(WARN) Found a zero padded filemode in a tree.
     +
     + ## Documentation/git-fsck.txt ##
     +@@ Documentation/git-fsck.txt: hash mismatch <object>::
     + 	object database value.
     + 	This indicates a serious data integrity problem.
     + 
     ++
     ++FSCK MESSAGES
     ++-------------
     ++
     ++The following lists the types of errors `git fsck` detects and what
     ++each error means, with their default severity.  The severity of the
     ++error, other than those that are marked as "(FATAL)", can be tweaked
     ++by setting the corresponding `fsck.<msg-id>` configuration variable.
     ++
     ++include::fsck-msgids.txt[]
     ++
     ++
     + Environment Variables
     + ---------------------
     + 
     +
     + ## fsck.h ##
     +@@ fsck.h: enum fsck_msg_type {
     + 	FSCK_WARN,
     + };
     + 
     ++/*
     ++ * Documentation/fsck-msgids.txt documents these; when
     ++ * modifying this list in any way, make sure to keep the
     ++ * two in sync.
     ++ */
     ++
     + #define FOREACH_FSCK_MSG_ID(FUNC) \
     + 	/* fatal errors */ \
     + 	FUNC(NUL_IN_HEADER, FATAL) \

Comments

Junio C Hamano Oct. 25, 2022, 4:30 a.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Patch [1/2] removes an unused msg-id BAD_TAG_OBJECT Patch [2/2] adds a
> fsck-msgids.txt that lists msg-ids that fsck checks for
>
> Since v1:
>
>  * provided a full list of error messages along with corresponding default
>    severity
>  * no longer including the whole list in git-config
>  * add a comment in fsck.h to remind developers to keep the documentation
>    accurate when making changes to the list of error messages

Thanks.  I did a bit of sanity checking and it made my earlier
suspicion stronger.  We MUST have at least an automated checker to
check the doc against the fsck.h header, if not an automated
generator of the doc from the fsck.h header.

 * Just like your [1/2] pointed out an error type that is no longer
   in use, MISSING_TREE_OBJECT is not used.  It seems that this was
   never used since it was introduced at 159e7b08 (fsck: detect
   gitmodules files, 2018-05-02)

 * A few items were misspelled.

 * A handful items were missing.

The latter two are shown in the attached, which I am tempted to just
squash into your 2/2.  I think the MISSING_TREE_OBJECT one deserves
a separate commit between 1/2 and 2/2.

 Documentation/fsck-msgids.txt | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git c/Documentation/fsck-msgids.txt w/Documentation/fsck-msgids.txt
index 08e19bac8a..7af76ff99a 100644
--- c/Documentation/fsck-msgids.txt
+++ w/Documentation/fsck-msgids.txt
@@ -43,10 +43,10 @@
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
-`fullPathName`::
+`fullPathname`::
 	(WARN) A path contains the full path starting with "/".
 
-`gitAttributesSymlink`::
+`gitattributesSymlink`::
 	(INFO) `.gitattributes` is a symlink.
 
 `gitignoreSymlink`::
@@ -55,6 +55,9 @@
 `gitmodulesBlob`::
 	(ERROR) A non-blob found at `.gitmodules`.
 
+`gitmodulesLarge`::
+	(ERROR) The `.gitmodules` file is too large to parse.
+
 `gitmodulesMissing`::
 	(ERROR) Unable to read `.gitmodules` blob.
 
@@ -118,6 +121,15 @@
 `missingTagEntry`::
 	(ERROR) Missing `tag` line in a tag object.
 
+`missingTaggerEntry`::
+	(INFO) Missing `tagger` line in a tag object.
+
+`missingTree`::
+	(ERROR) Missing `tree` line in a commit object.
+
+`missingType`::
+	(ERROR) Invalid type value on the `type` line in a tag object.
+
 `missingTypeEntry`::
 	(ERROR) Missing `type` line in a tag object.
 
@@ -130,7 +142,7 @@
 `nulInHeader`::
 	(FATAL) NUL byte exists in the object header.
 
-`nulInSha1`::
+`nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
 `treeNotSorted`::
@@ -142,7 +154,7 @@
 `unterminatedHeader`::
 	(FATAL) Missing end-of-line in the object header.
 
-`zeroPaddingDate`::
+`zeroPaddedDate`::
 	(ERROR) Found a zero padded date in an author/commiter line.
 
 `zeroPaddedFilemode`::
Junio C Hamano Oct. 25, 2022, 4:40 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Thanks.  I did a bit of sanity checking and it made my earlier
> suspicion stronger.  We MUST have at least an automated checker to
> check the doc against the fsck.h header, if not an automated
> generator of the doc from the fsck.h header.

FYI, here are a pair of quick-and-dirty Perl scripts that I used for
the sanity checking.  The first one "parses" the fsck-msgs.txt and
formats lines like

    badTagName	INFO

i.e. camelCased error message name, a TAB, and the severity.

The second one reads the FOREACH_FSCK_MSG_ID() definition in fsck.h
that look like "FUNC(BAD_TAG_NAME, INFO)", camelcases the name and
shows what can be compared with the output of the first one.

There are two sanity checks that must pass when a developer updates
the documentation.

 - The output from m.perl on the documentation must already be sorted.

 - The output from n.perl on fsck.h, when sorted, must match the
   output from m.perl on the documentation.

$ cat >m.perl <<\EOF
#!/usr/bin/perl

my ($previous, $current);

while (<>) {
	if (!defined $current) {
		if (/^\`([a-zA-Z0-9]*)\`::/) {
			$current = $1;
			if ((defined $previous) &&
			    ($current le $previous)) {
				print STDERR "$previous >= $current???\n";
			}
		}
	} elsif (/^\s+\(([A-Z]+)\) /) {
		print "$current	$1\n";
		$previous = $current;
		undef $current;
	}
}
EOF
$ cat >n.perl <<\EOF
#!/usr/bin/perl

while (<>) {
	if (/^\s+FUNC\(([0-9A-Z_]+), ([A-Z]+)\)/) {
		my ($name, $severity) = ($1, $2);
		my ($first) = 1;
		$name = join('',
			     map {
				     y/A-Z/a-z/;
				     if (!$first) {
					     s/^(.)/uc($1)/e;
				     } else {
					     $first = 0;
				     }
				     $_;
			     }
			     split(/_/, $name));
		print "$name	$severity\n";
	}
}
EOF
$ perl m.perl Documentation/fsck-msgids.txt >/var/tmp/1
$ sort /var/tmp/1 >/var/tmp/2
$ diff -u /var/tmp/1 /var/tmp/2
#### no output should appear in the above comparison
$ perl n.perl fsck.h | sort >/var/tmp/2
$ diff -u /var/tmp/1 /var/tmp/2
#### no output should appear in the above comparison