Message ID | ZuRzzwZds8ys-JEN@ArchLinux (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4,1/5] ref: initialize "fsck_ref_report" with zero | expand |
shejialuo <shejialuo@gmail.com> writes: Expect that people do not read the body of the message as completing a paragrpah the title started. I.e. ... > We have already introduced the checks for regular refs. There is no need > to check the consistency of the target which the symref points to. > Instead, we just need to check the content of the symref itself. ... this needs a bit of preamble, like We have code that check regular ref contents, but we do not yet check contents of symbolic refs. > A regular file is accepted as a textual symref if it begins with > "ref:", followed by zero or more whitespaces, followed by the full > refname, followed only by whitespace characters. We always write > a single SP after "ref:" and a single LF after the refname, but > third-party reimplementations of Git may have taken advantage of the > looser syntax. Put it more specific, we accept the following contents > of the symref: > > 1. "ref: refs/heads/master " > 2. "ref: refs/heads/master \n \n" > 3. "ref: refs/heads/master\n\n" > > Thus, we could reuse "refMissingNewline" and "trailingRefContent" > FSCK_INFOs to do the same retroactive tightening as we introduce for > regular references. > > But we do not allow any other trailing garbage. The followings are bad > symref contents which will be reported as fsck error by "git-fsck(1)". This description needs to be updated, as it is unclear if you are talking about errors we already detect, or if you are planning to update fsck to notice and report these errors. > 1. "ref: refs/heads/master garbage\n" > 2. "ref: refs/heads/master \n\n\n garbage " > > And we introduce a new "badReferentName(ERROR)" fsck message to report > above errors to the user. OK. > In order to check the content of the symref, create a function > "files_fsck_symref_target". It will first check whether the "referent" > is under the "refs/" directory, if not, we will report "escapeReferent" > fsck error message to notify the user this situation. > > Then, we will first check whether the symref content misses the newline > by peeking the last byte of the "referent" to see whether it is '\n'. "Then, we will first" -> "Then it checks" or something. You already consumed "first" for the check to limit the referent to those under "refs/" hierarchy. > And we will remember the untrimmed length of the "referent" and call > "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format" > to check whether the trimmed referent format is valid. If not, we will > report to the user that the symref points to referent which has invalid > format. If it is valid, we will compare the untrimmed length and trimmed > length, if they are not the same, we need to warn the user there is some > trailing garbage in the symref content. That is an implementation detail of what you did. But if the implementation were buggy and did not exactly what you intended to do, the above description gives no information to help others to fix it up so that it works as you intended it to work, because you do not explain it. So what did you want to achieve in the third step (the first being "limit to refs/ hiararchy", the second being "no incomplete lines allowed")? Third, we want to make sure that the contents of a textual symref MUST have a single LF after the target refname and NOTHING ELSE. or something. > At last, we need to check whether the referent is a directory. We cannot "a directory" -> "an existing directory"? I am not comfortable to see the word "directory" used in this proposed log message, as some refs could be stored in the packed backend and are referenced by the symbolic ref you are inspecting (this comment also refers to the "refs/ directory" you mentioned earlier as "the first check"). Lastly, a symbolic ref MUST either point to an existing ref, or if the referent does not exist, it MUST NOT be a leading subpath for another existing ref (e.g., when "refs/heads/main" exists, a symbolic ref that points at "refs/heads" is a no-no). or something (but again, I am open to a phrasing better than "subpath"). Design question. What do we want to do when we have no loose refs under the "refs/heads/historical/" hiearchy, (i.e. all of them are in packed-refs file) hence ".git/refs/heads/historical" directory does not exist on the filesystem. And a symbolic ref points at "refs/heads/historical". Shouldn't we give the same error whether the .git/refs/heads/historical directory exist or not, as long as the refs/heads/historical/main branch exists (in the packed-refs backend)? > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index 8827137ef0..03bcb77972 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -28,6 +28,12 @@ > `badRefName`:: > (ERROR) A ref has an invalid format. > > +`badReferentFiletype`:: > + (ERROR) The referent of a symref has a bad file type. > + > +`badReferentName`:: > + (ERROR) The referent name of a symref is invalid. > + > `badTagName`:: > (INFO) A tag has an invalid format. > > @@ -49,6 +55,9 @@ > `emptyName`:: > (WARN) A path contains an empty name. > > +`escapeReferent`:: > + (ERROR) The referent of a symref is outside the "ref" directory. I am not sure starting this as ERROR is wise. Users and third-party tools make creative uses of the system and I cannot offhand think of an argument why it should be forbidden to create a symbolic link to our own HEAD or to some worktree-specific ref in another worktree. > + size_t len = referent->len - 1; > + struct stat st; > + int ret = 0; > + > + if (!starts_with(referent->buf, "refs/")) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_ESCAPE_REFERENT, > + "points to ref outside the refs directory"); > + goto out; > + } > + > + if (referent->buf[referent->len - 1] != '\n') { As you initialized "len" to "referent->len-1" earlier, wouldn't it more natural to use it here? That would match the incrementing of len++ later in this block. > + ret = fsck_report_ref(o, report, > + FSCK_MSG_REF_MISSING_NEWLINE, > + "missing newline"); > + len++; > + } Having said that, the above should be simplified more like: * declare but not initialize "len". better yet, declare "orig_len" and leave it uninitialized. * do not touch "len++" in the above block (actually, you can discard the above "if(it does not end with LF)" block, see below). * instead grab "referent->len" in "len" (or "orig_len") immediately before you first modify referent, i.e. before strbuf_rtrim() call. orig_len = referent->len; orig_last_byte = referent->buf[orig_len - 1]; > + strbuf_rtrim(referent); > + if (check_refname_format(referent->buf, 0)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_REFERENT_NAME, > + "points to refname with invalid format"); Similar to an earlier step, the message does not give any more information than the enum. Wouldn't the user who got this error want to learn what referent->buf said and which part of it was bad in the same message, instead of having to look it up on their own after fsck finishes? > + goto out; > + } At this point we know check_refname_format() is happy with what is left after rtrimming the referent. There are four cases: - rtrim() did not trim anything (orig_len == referent->len); the file lacked the terminating LF. - rtrim() trimmed one byte (orig_len - 1 == referent->len) and the byte was not LF (orig_last_byte != '\n'). The file lacked the terminating LF. - rtrim() trimmed exactly one byte (orig_len - 1 == referent->len) and the byte was LF (orig_last_byte == '\n'). There is no error. - all other cases, i.e., rtrim() trimmed two or more bytes. The file had trailing whitespaces after a valid referent that passed check_refname_format(). So in short, if (referent->len == orig_len || referent->len == orig_len - 1 && orig_last_byte != '\n') { FSCK_MSG_REF_MISSING_NEWLINE; } else if (referent->len < orig_len - 1) { FSCK_MSG_REF_TRAILING_WHITESPACE; } can replace the next block you wrote, and we can also remove the earlier "it is an error if it does not end with '\n'", I think. > + if (len != referent->len) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_TRAILING_REF_CONTENT, > + "trailing garbage in ref"); As check_refname_format() was happy, the difference between orig_len and referent->len are only coming from trailing whitespaces, i.e. it is not that it had arbitrary garbage. Shouldn't we be more explicit about that? > + /* > + * Dangling symrefs are common and so we don't report them. > + */ > + if (lstat(referent_path->buf, &st)) { > + if (errno != ENOENT) { > + ret = error_errno(_("unable to stat '%s'"), > + referent_path->buf); > + } > + goto out; > + } > + > + /* > + * We cannot distinguish whether "refs/heads/a" is a directory or not by > + * using "check_refname_format(referent->buf, 0)". Instead, we need to > + * check the file type of the target. > + */ > + if (S_ISDIR(st.st_mode)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_REFERENT_FILETYPE, > + "points to the directory"); > + goto out; > + } If referent_path->buf refers to "refs/heads/historical/", and all the branches under the hierarchy have been sent to packed-refs, then this check will not trigger. I wonder if this check is the right thing to enforce in the first place, though. As far as the end user is concerned, refs/heads/historical/master branch stil exists, and there is no refs/heads/historical branch, so such a symbolic ref, for all intents and purposes, is the same as any other dangling symbolic refs, no? Of course, "git update-ref SUCH_A_SYMREF HEAD" will complain because there is refs/heads/historical, with something like "refs/heads/historical/master" exists, cannot create "refs/heads/historical" but that is to be expected. If you remove the last branch in the refs/heads/historical hierarchy, you should be able to do such an update-ref to instanciate refs/heads/historical as a regular ref. > @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store, > "trailing garbage in ref"); > goto cleanup; > } > + } else { > + strbuf_addf(&referent_path, "%s/%s", > + ref_store->gitdir, referent.buf); > + /* > + * the referent may contain the spaces and the newline, need to > + * trim for path. > + */ > + strbuf_rtrim(&referent_path); I doubt this is a good design. We have referent, and the symbolic ref checker knows that the true referent refname may be followed by whitespaces, so instead of inventing referent _path here, it would be a better design to let the files_fsck_symref_target() to decide what file to open and check based on referent, no? Give it the refstore or refstore's gitdir and have the concatenation with the rtrimmed contents in the referent->buf after it inspected it instead, perhaps? > + ret = files_fsck_symref_target(o, &report, > + &referent, > + &referent_path);
On Wed, Sep 18, 2024 at 01:19:13PM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > Expect that people do not read the body of the message as completing > a paragrpah the title started. I.e. ... > > > We have already introduced the checks for regular refs. There is no need > > to check the consistency of the target which the symref points to. > > Instead, we just need to check the content of the symref itself. > > ... this needs a bit of preamble, like > > We have code that check regular ref contents, but we do not yet > check contents of symbolic refs. > Thanks, I will improve this in the next version. > > A regular file is accepted as a textual symref if it begins with > > "ref:", followed by zero or more whitespaces, followed by the full > > refname, followed only by whitespace characters. We always write > > a single SP after "ref:" and a single LF after the refname, but > > third-party reimplementations of Git may have taken advantage of the > > looser syntax. Put it more specific, we accept the following contents > > of the symref: > > > > 1. "ref: refs/heads/master " > > 2. "ref: refs/heads/master \n \n" > > 3. "ref: refs/heads/master\n\n" > > > > Thus, we could reuse "refMissingNewline" and "trailingRefContent" > > FSCK_INFOs to do the same retroactive tightening as we introduce for > > regular references. > > > > But we do not allow any other trailing garbage. The followings are bad > > symref contents which will be reported as fsck error by "git-fsck(1)". > > This description needs to be updated, as it is unclear if you are > talking about errors we already detect, or if you are planning to > update fsck to notice and report these errors. > Yes, When I was writing this part, I felt a little painful to express my words. I have thought how could I express the connection between the current patch and the previous one. > > And we will remember the untrimmed length of the "referent" and call > > "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format" > > to check whether the trimmed referent format is valid. If not, we will > > report to the user that the symref points to referent which has invalid > > format. If it is valid, we will compare the untrimmed length and trimmed > > length, if they are not the same, we need to warn the user there is some > > trailing garbage in the symref content. > > That is an implementation detail of what you did. But if the > implementation were buggy and did not exactly what you intended to > do, the above description gives no information to help others to fix > it up so that it works as you intended it to work, because you do > not explain it. > > So what did you want to achieve in the third step (the first being > "limit to refs/ hiararchy", the second being "no incomplete lines > allowed")? > > Third, we want to make sure that the contents of a textual > symref MUST have a single LF after the target refname and > NOTHING ELSE. > > or something. > From the above comments, I need to organize the commit message of this patch to make things clear here. > "a directory" -> "an existing directory"? > > I am not comfortable to see the word "directory" used in this > proposed log message, as some refs could be stored in the packed > backend and are referenced by the symbolic ref you are inspecting > (this comment also refers to the "refs/ directory" you mentioned > earlier as "the first check"). > > Lastly, a symbolic ref MUST either point to an existing ref, > or if the referent does not exist, it MUST NOT be a leading > subpath for another existing ref (e.g., when "refs/heads/main" > exists, a symbolic ref that points at "refs/heads" is a no-no). > > or something (but again, I am open to a phrasing better than > "subpath"). > > Design question. What do we want to do when we have no loose refs > under the "refs/heads/historical/" hiearchy, (i.e. all of them are > in packed-refs file) hence ".git/refs/heads/historical" directory > does not exist on the filesystem. And a symbolic ref points at > "refs/heads/historical". Shouldn't we give the same error whether > the .git/refs/heads/historical directory exist or not, as long as > the refs/heads/historical/main branch exists (in the packed-refs > backend)? > I guess I need to think carefully here. Actually, my intention is that I want to concentrate on the loose refs and then take consideration about the packed refs. However, from what you have said above, it seems I could not do this. They are connected. But at current, I am not so familiar with packed refs behavior, I could not answer all the questions above. I decide to understand what packed-ref done. So, this series may be stalled sometime until I have a good knowledge and re-think the design here. > > +`escapeReferent`:: > > + (ERROR) The referent of a symref is outside the "ref" directory. > > I am not sure starting this as ERROR is wise. Users and third-party > tools make creative uses of the system and I cannot offhand think of > an argument why it should be forbidden to create a symbolic link to > our own HEAD or to some worktree-specific ref in another worktree. > Do we allow this cross-access (hack)? It might cause some trouble from my perspective. > > + if (referent->buf[referent->len - 1] != '\n') { > > As you initialized "len" to "referent->len-1" earlier, wouldn't it > more natural to use it here? That would match the incrementing of > len++ later in this block. > Yes, exactly. > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_REF_MISSING_NEWLINE, > > + "missing newline"); > > + len++; > > + } > > Having said that, the above should be simplified more like: > > * declare but not initialize "len". better yet, declare "orig_len" > and leave it uninitialized. > > * do not touch "len++" in the above block (actually, you can > discard the above "if(it does not end with LF)" block, see > below). > > * instead grab "referent->len" in "len" (or "orig_len") immediately > before you first modify referent, i.e. before strbuf_rtrim() call. > > orig_len = referent->len; > orig_last_byte = referent->buf[orig_len - 1]; > I agree. > > + strbuf_rtrim(referent); > > + if (check_refname_format(referent->buf, 0)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_REFERENT_NAME, > > + "points to refname with invalid format"); > > Similar to an earlier step, the message does not give any more > information than the enum. Wouldn't the user who got this error > want to learn what referent->buf said and which part of it was bad > in the same message, instead of having to look it up on their own > after fsck finishes? > Yes, I agree. I will improve this. > > + goto out; > > + } > > At this point we know check_refname_format() is happy with what is > left after rtrimming the referent. There are four cases: > > - rtrim() did not trim anything (orig_len == referent->len); the file > lacked the terminating LF. > > - rtrim() trimmed one byte (orig_len - 1 == referent->len) and > the byte was not LF (orig_last_byte != '\n'). The file lacked > the terminating LF. > > - rtrim() trimmed exactly one byte (orig_len - 1 == referent->len) > and the byte was LF (orig_last_byte == '\n'). There is no error. > > - all other cases, i.e., rtrim() trimmed two or more bytes. The > file had trailing whitespaces after a valid referent that passed > check_refname_format(). > That's so clear. My implementation is not good compared with this. > So in short, > > if (referent->len == orig_len || > referent->len == orig_len - 1 && orig_last_byte != '\n') { > FSCK_MSG_REF_MISSING_NEWLINE; > } else if (referent->len < orig_len - 1) { > FSCK_MSG_REF_TRAILING_WHITESPACE; > } > > can replace the next block you wrote, and we can also remove the > earlier "it is an error if it does not end with '\n'", I think. > > > + if (len != referent->len) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_TRAILING_REF_CONTENT, > > + "trailing garbage in ref"); > > As check_refname_format() was happy, the difference between orig_len > and referent->len are only coming from trailing whitespaces, i.e. it > is not that it had arbitrary garbage. Shouldn't we be more explicit > about that? > Yes, I made a lot of mistakes when calling the "fsck_report_ref". I will report the exact garbage content to the user. > > + /* > > + * Dangling symrefs are common and so we don't report them. > > + */ > > + if (lstat(referent_path->buf, &st)) { > > + if (errno != ENOENT) { > > + ret = error_errno(_("unable to stat '%s'"), > > + referent_path->buf); > > + } > > + goto out; > > + } > > + > > + /* > > + * We cannot distinguish whether "refs/heads/a" is a directory or not by > > + * using "check_refname_format(referent->buf, 0)". Instead, we need to > > + * check the file type of the target. > > + */ > > + if (S_ISDIR(st.st_mode)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_REFERENT_FILETYPE, > > + "points to the directory"); > > + goto out; > > + } > > If referent_path->buf refers to "refs/heads/historical/", and all > the branches under the hierarchy have been sent to packed-refs, > then this check will not trigger. > Yes, because "refs/heads/historical" will not appear in the filesystem. > I wonder if this check is the right thing to enforce in the first > place, though. > > As far as the end user is concerned, refs/heads/historical/master > branch stil exists, and there is no refs/heads/historical branch, so > such a symbolic ref, for all intents and purposes, is the same as > any other dangling symbolic refs, no? > > Of course, "git update-ref SUCH_A_SYMREF HEAD" will complain because > there is refs/heads/historical, with something like > > "refs/heads/historical/master" exists, cannot create "refs/heads/historical" > > but that is to be expected. If you remove the last branch in the > refs/heads/historical hierarchy, you should be able to do such an > update-ref to instanciate refs/heads/historical as a regular ref. > I am a little shocked here. I do this in action and find the directory will be automatically converted to a regular file in the filesystem. So, I agree with you here. We should never check this, because we allow symref to point to a directory. As long as there is no loose refs and packed refs under this directory, we could use "git update-ref" for this symref. Thanks, > > @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store, > > "trailing garbage in ref"); > > goto cleanup; > > } > > + } else { > > + strbuf_addf(&referent_path, "%s/%s", > > + ref_store->gitdir, referent.buf); > > + /* > > + * the referent may contain the spaces and the newline, need to > > + * trim for path. > > + */ > > + strbuf_rtrim(&referent_path); > > I doubt this is a good design. We have referent, and the symbolic > ref checker knows that the true referent refname may be followed by > whitespaces, so instead of inventing referent _path here, it would > be a better design to let the files_fsck_symref_target() to decide > what file to open and check based on referent, no? Give it the > refstore or refstore's gitdir and have the concatenation with the > rtrimmed contents in the referent->buf after it inspected it > instead, perhaps? > Yes, I agree with you here. We should use "files_fsck_symref_target" to do this. ---- From this review, I think I need to understand more behaviors about files backend and packed backend. Thanks for your so dedicated reviews. I may spend more time to send the next version. And there may be some delay. Thanks, Jialuo
shejialuo <shejialuo@gmail.com> writes: >> > +`escapeReferent`:: >> > + (ERROR) The referent of a symref is outside the "ref" directory. >> >> I am not sure starting this as ERROR is wise. Users and third-party >> tools make creative uses of the system and I cannot offhand think of >> an argument why it should be forbidden to create a symbolic link to >> our own HEAD or to some worktree-specific ref in another worktree. >> > Do we allow this cross-access (hack)? It might cause some trouble from > my perspective. If the current implementation allows users to set up and take advantage of, then it is not a hack. It would cause breakage if we make it an error. Does such a symref successfully refer to the referent right now? I think it does. Thanks.
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 8827137ef0..03bcb77972 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -28,6 +28,12 @@ `badRefName`:: (ERROR) A ref has an invalid format. +`badReferentFiletype`:: + (ERROR) The referent of a symref has a bad file type. + +`badReferentName`:: + (ERROR) The referent name of a symref is invalid. + `badTagName`:: (INFO) A tag has an invalid format. @@ -49,6 +55,9 @@ `emptyName`:: (WARN) A path contains an empty name. +`escapeReferent`:: + (ERROR) The referent of a symref is outside the "ref" directory. + `extraHeaderEntry`:: (IGNORE) Extra headers found after `tagger`. diff --git a/fsck.h b/fsck.h index b85072df57..c90561c6db 100644 --- a/fsck.h +++ b/fsck.h @@ -34,11 +34,14 @@ enum fsck_msg_type { FUNC(BAD_REF_CONTENT, ERROR) \ FUNC(BAD_REF_FILETYPE, ERROR) \ FUNC(BAD_REF_NAME, ERROR) \ + FUNC(BAD_REFERENT_FILETYPE, ERROR) \ + FUNC(BAD_REFERENT_NAME, ERROR) \ FUNC(BAD_TIMEZONE, ERROR) \ FUNC(BAD_TREE, ERROR) \ FUNC(BAD_TREE_SHA1, ERROR) \ FUNC(BAD_TYPE, ERROR) \ FUNC(DUPLICATE_ENTRIES, ERROR) \ + FUNC(ESCAPE_REFERENT, ERROR) \ FUNC(MISSING_AUTHOR, ERROR) \ FUNC(MISSING_COMMITTER, ERROR) \ FUNC(MISSING_EMAIL, ERROR) \ diff --git a/refs/files-backend.c b/refs/files-backend.c index df4ce270ae..0cb4a2da73 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3434,11 +3434,80 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, const char *refs_check_dir, struct dir_iterator *iter); +/* + * Check the symref "referent" and "referent_path". For textual symref, + * "referent" would be the content after "refs:". + */ +static int files_fsck_symref_target(struct fsck_options *o, + struct fsck_ref_report *report, + struct strbuf *referent, + struct strbuf *referent_path) +{ + size_t len = referent->len - 1; + struct stat st; + int ret = 0; + + if (!starts_with(referent->buf, "refs/")) { + ret = fsck_report_ref(o, report, + FSCK_MSG_ESCAPE_REFERENT, + "points to ref outside the refs directory"); + goto out; + } + + if (referent->buf[referent->len - 1] != '\n') { + ret = fsck_report_ref(o, report, + FSCK_MSG_REF_MISSING_NEWLINE, + "missing newline"); + len++; + } + + strbuf_rtrim(referent); + if (check_refname_format(referent->buf, 0)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_REFERENT_NAME, + "points to refname with invalid format"); + goto out; + } + + if (len != referent->len) { + ret = fsck_report_ref(o, report, + FSCK_MSG_TRAILING_REF_CONTENT, + "trailing garbage in ref"); + } + + /* + * Dangling symrefs are common and so we don't report them. + */ + if (lstat(referent_path->buf, &st)) { + if (errno != ENOENT) { + ret = error_errno(_("unable to stat '%s'"), + referent_path->buf); + } + goto out; + } + + /* + * We cannot distinguish whether "refs/heads/a" is a directory or not by + * using "check_refname_format(referent->buf, 0)". Instead, we need to + * check the file type of the target. + */ + if (S_ISDIR(st.st_mode)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_REFERENT_FILETYPE, + "points to the directory"); + goto out; + } + +out: + return ret; +} + static int files_fsck_refs_content(struct ref_store *ref_store, struct fsck_options *o, const char *refs_check_dir, struct dir_iterator *iter) { + struct strbuf referent_path = STRBUF_INIT; struct strbuf ref_content = STRBUF_INIT; struct strbuf referent = STRBUF_INIT; struct strbuf refname = STRBUF_INIT; @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store, "trailing garbage in ref"); goto cleanup; } + } else { + strbuf_addf(&referent_path, "%s/%s", + ref_store->gitdir, referent.buf); + /* + * the referent may contain the spaces and the newline, need to + * trim for path. + */ + strbuf_rtrim(&referent_path); + ret = files_fsck_symref_target(o, &report, + &referent, + &referent_path); } cleanup: strbuf_release(&refname); strbuf_release(&ref_content); strbuf_release(&referent); + strbuf_release(&referent_path); return ret; } diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index a06ad044f2..9580c340ab 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -209,4 +209,121 @@ test_expect_success 'regular ref content should be checked (aggregate)' ' test_cmp expect sorted_err ' +test_expect_success 'textual symref content should be checked (individual)' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + test_commit default && + mkdir -p "$branch_dir_prefix/a/b" && + + printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good && + git refs verify 2>err && + rm $branch_dir_prefix/branch-good && + test_must_be_empty err && + + printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline + EOF + rm $branch_dir_prefix/branch-no-newline-1 && + test_cmp expect err && + + printf "ref: refs/heads/branch " >$branch_dir_prefix/a/b/branch-trailing-1 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref + EOF + rm $branch_dir_prefix/a/b/branch-trailing-1 && + test_cmp expect err && + + printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref + EOF + rm $branch_dir_prefix/a/b/branch-trailing-2 && + test_cmp expect err && + + printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref + EOF + rm $branch_dir_prefix/a/b/branch-trailing-3 && + test_cmp expect err && + + printf "ref: refs/heads/branch \n " >$branch_dir_prefix/a/b/branch-complicated && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref + EOF + rm $branch_dir_prefix/a/b/branch-complicated && + test_cmp expect err && + + printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format + EOF + rm $branch_dir_prefix/branch-bad-1 && + test_cmp expect err && + + printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory + EOF + rm $branch_dir_prefix/branch-bad-2 && + test_cmp expect err && + + printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory + EOF + rm $branch_dir_prefix/branch-bad-3 && + test_cmp expect err +' + +test_expect_success 'textual symref content should be checked (aggregate)' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + test_commit default && + mkdir -p "$branch_dir_prefix/a/b" && + + printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good && + printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 && + printf "ref: refs/heads/branch " >$branch_dir_prefix/a/b/branch-trailing-1 && + printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 && + printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 && + printf "ref: refs/heads/branch \n " >$branch_dir_prefix/a/b/branch-complicated && + printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 && + printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 && + printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 && + + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format + error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory + error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory + warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref + warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref + warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref + warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref + warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline + EOF + sort err >sorted_err && + test_cmp expect sorted_err +' + test_done
We have already introduced the checks for regular refs. There is no need to check the consistency of the target which the symref points to. Instead, we just need to check the content of the symref itself. A regular file is accepted as a textual symref if it begins with "ref:", followed by zero or more whitespaces, followed by the full refname, followed only by whitespace characters. We always write a single SP after "ref:" and a single LF after the refname, but third-party reimplementations of Git may have taken advantage of the looser syntax. Put it more specific, we accept the following contents of the symref: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" Thus, we could reuse "refMissingNewline" and "trailingRefContent" FSCK_INFOs to do the same retroactive tightening as we introduce for regular references. But we do not allow any other trailing garbage. The followings are bad symref contents which will be reported as fsck error by "git-fsck(1)". 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " And we introduce a new "badReferentName(ERROR)" fsck message to report above errors to the user. In order to check the content of the symref, create a function "files_fsck_symref_target". It will first check whether the "referent" is under the "refs/" directory, if not, we will report "escapeReferent" fsck error message to notify the user this situation. Then, we will first check whether the symref content misses the newline by peeking the last byte of the "referent" to see whether it is '\n'. And we will remember the untrimmed length of the "referent" and call "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format" to check whether the trimmed referent format is valid. If not, we will report to the user that the symref points to referent which has invalid format. If it is valid, we will compare the untrimmed length and trimmed length, if they are not the same, we need to warn the user there is some trailing garbage in the symref content. At last, we need to check whether the referent is a directory. We cannot distinguish whether a given reference like "refs/heads/a" is a file or a directory by using "check_refname_format". We have already checked bad file type when iterating the "refs/" directory but we ignore the directory. Thus, we need to explicitly add check here. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 9 +++ fsck.h | 3 + refs/files-backend.c | 81 +++++++++++++++++++++++ t/t0602-reffiles-fsck.sh | 117 ++++++++++++++++++++++++++++++++++ 4 files changed, 210 insertions(+)