Message ID | 20221025224224.2352979-5-gitster@pobox.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 3882a0d3ad92ace6559c950921d9afb4cdcc7ea0 |
Headers | show |
Series | document fsck error message ids | expand |
On Tue, Oct 25 2022, Junio C Hamano wrote: > During the initial development of the fsck-msgids.txt feature, it > has become apparent that it is very much error prone to make sure > the description in the documentation file are sorted and correctly > match what is in the fsck.h header file. I have local fixes for the same issues in the list of advice in our docs, some of it's missing, wrong, out of date etc. I tried to quickly adapt the generation script I had for that, which works nicely, and by line count much shorter than the lint :) Having to exhaustively list every *.c file that uses fsck.h is a bit of a bother, but we have the same with the other generated *.h's, so it shouldn't be too bad. And this way, if we get it wrong we get a compile error: $ git -P diff; make diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 7af76ff99a5..f0b4308a6e7 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -1,6 +1,3 @@ -`badDate`:: - (ERROR) Invalid date format in an author/committer line. - `badDateOverflow`:: (ERROR) Invalid date value in an author/committer line. CC fsck.o fsck.c:31:9: error: excess elements in array initializer [-Werror] 31 | { NULL, NULL, NULL, -1 } | ^ fsck.c:31:9: note: (near initialization for ‘msg_id_info’) fsck.c: In function ‘fsck_ident’: fsck.c:810:51: error: ‘FSCK_MSG_BAD_DATE’ undeclared (first use in this function); did you mean ‘FSCK_MSG_BAD_NAME’? 810 | return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date"); | ^~~~~~~~~~~~~~~~~ | FSCK_MSG_BAD_NAME fsck.c:810:51: note: each undeclared identifier is reported only once for each function it appears in cc1: all warnings being treated as errors make: *** [Makefile:2617: fsck.o] Error 1 So, if you're interested: Makefile | 12 ++++++++++++ fsck.h | 8 +------- generate-fsckmsgids.sh | 42 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index 85f03c6aed1..21bbc3dfb9f 100644 --- a/Makefile +++ b/Makefile @@ -860,6 +860,7 @@ REFTABLE_TEST_LIB = reftable/libreftable_test.a GENERATED_H += command-list.h GENERATED_H += config-list.h GENERATED_H += hook-list.h +GENERATED_H += fsck-msgids.h .PHONY: generated-hdrs generated-hdrs: $(GENERATED_H) @@ -2289,6 +2290,14 @@ git$X: git.o GIT-LDFLAGS $(BUILTIN_OBJS) $(GITLIBS) $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) \ $(filter %.o,$^) $(LIBS) +# Unfortunately our dependency management of generated headers used +# from within other headers suck, so we'll need to list every user of +# fsck.h here, but not too bad... +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h +FSCK_MSGIDS_H_LIBS = fetch-pack fsck +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h + help.sp help.s help.o: command-list.h builtin/bugreport.sp builtin/bugreport.s builtin/bugreport.o: hook-list.h @@ -2333,6 +2342,9 @@ command-list.h: $(wildcard Documentation/git*.txt) hook-list.h: generate-hooklist.sh Documentation/githooks.txt $(QUIET_GEN)$(SHELL_PATH) ./generate-hooklist.sh >$@ +fsck-msgids.h: generate-fsckmsgids.sh Documentation/fsck-msgids.txt + $(QUIET_GEN)$(SHELL_PATH) ./generate-fsckmsgids.sh >$@ + SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):\ $(localedir_SQ):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\ $(gitwebdir_SQ):$(PERL_PATH_SQ):$(PAGER_ENV):\ diff --git a/fsck.h b/fsck.h index 6fbce68ad67..1a9d68482ea 100644 --- a/fsck.h +++ b/fsck.h @@ -2,6 +2,7 @@ #define GIT_FSCK_H #include "oidset.h" +#include "fsck-msgids.h" enum fsck_msg_type { /* for internal use only */ @@ -79,13 +80,6 @@ enum fsck_msg_type { /* ignored (elevated when requested) */ \ FUNC(EXTRA_HEADER_ENTRY, IGNORE) -#define MSG_ID(id, msg_type) FSCK_MSG_##id, -enum fsck_msg_id { - FOREACH_FSCK_MSG_ID(MSG_ID) - FSCK_MSG_MAX -}; -#undef MSG_ID - struct fsck_options; struct object; diff --git a/generate-fsckmsgids.sh b/generate-fsckmsgids.sh new file mode 100755 index 00000000000..bbf236159aa --- /dev/null +++ b/generate-fsckmsgids.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +HT=' ' + +fsck_list () { + sed -n \ + -e "/::$/ { + s/::\$//; + s/^\`//; + s/\`$//; + p; + }" \ + <Documentation/fsck-msgids.txt +} + +txt2label () { + sed \ + -e 's/\([^_]\)\([[:upper:]]\)/\1_\2/g' \ + -e 's/^/FSCK_MSG_/' | + tr '[:lower:]' '[:upper:]' +} + +fsck_labels () { + fsck_list | + txt2label +} + +listify () { + sed \ + -e "2,\$s/^/$HT/" \ + -e 's/$/,/' +} + +cat <<EOF +/* Automatically generated by generate-fsck-msgids.sh */ + +enum fsck_msg_id { + /* Auto-generated from Documentation/fsck-msgids.txt */ + $(fsck_labels | listify) + FSCK_MSG_MAX +}; +EOF
On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Oct 25 2022, Junio C Hamano wrote: > > > During the initial development of the fsck-msgids.txt feature, it > > has become apparent that it is very much error prone to make sure > > the description in the documentation file are sorted and correctly > > match what is in the fsck.h header file. > > I have local fixes for the same issues in the list of advice in our > docs, some of it's missing, wrong, out of date etc. > > I tried to quickly adapt the generation script I had for that, which > works nicely, and by line count much shorter than the lint :) Yeah, my instinct here was to generate rather than lint. If you make a mistake and the linter hits you over the head, that is better than quietly letting your mistake go. But better still is making it impossible to make in the first place. The downside is added complexity to the build, but it doesn't seem too bad in this case. (I had a similar thought after getting hit on the head by the recent t0450-txt-doc-vs-help.sh). > Having to exhaustively list every *.c file that uses fsck.h is a bit of > a bother, but we have the same with the other generated *.h's, so it > shouldn't be too bad. It feels like this could be made much shorter by having a separate fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c need the actual list. It would probably have to stop being an "enum", though. Another alternative is to generate the doc from the code, rather than the other way around. > +# Unfortunately our dependency management of generated headers used > +# from within other headers suck, so we'll need to list every user of > +# fsck.h here, but not too bad... > +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects > +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h > +FSCK_MSGIDS_H_LIBS = fetch-pack fsck > +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h I don't understand the "used within other headers" part here. Computed dependencies will get this right. It's only the initial build (before we have any computed dependencies) that needs to know that the C files in question depend on the generated file. But that is true whether they do so directly or indirectly. -Peff
Jeff King <peff@peff.net> writes: > Another alternative is to generate the doc from the code, rather than > the other way around. Yup, that indeed was what I envisioned in the "Possible future directions" comment in the cover letter. It might not even be bad to do #define FOREACH_FSCK_MSG_ID(FUNC) \ /* fatal errors */ \ FUNC(NUL_IN_HEADER, FATAL, N_("NUL byte exists in the object header.")) \ FUNC(UNTERMINATED_HEADER, FATAL, N_("Missing end-of-line in the object header.")) \ /* errors */ \ FUNC(BAD_DATE, ERROR, N_("Invalid date format in an author/committer line.")) \ ... as some of the users of the macro may have use of the translatable messages themselves.
On Wed, Oct 26 2022, Jeff King wrote: > On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> On Tue, Oct 25 2022, Junio C Hamano wrote: >> >> > During the initial development of the fsck-msgids.txt feature, it >> > has become apparent that it is very much error prone to make sure >> > the description in the documentation file are sorted and correctly >> > match what is in the fsck.h header file. >> >> I have local fixes for the same issues in the list of advice in our >> docs, some of it's missing, wrong, out of date etc. >> >> I tried to quickly adapt the generation script I had for that, which >> works nicely, and by line count much shorter than the lint :) > > Yeah, my instinct here was to generate rather than lint. If you make a > mistake and the linter hits you over the head, that is better than > quietly letting your mistake go. But better still is making it > impossible to make in the first place. > > The downside is added complexity to the build, but it doesn't seem too > bad in this case. Yeah, it's not, I have local patches to generate advice-{type,config}.h, and builtin.h. This is a quick POC to do it for fsck-msgids.h. I see I forgot the .gitignore entry, so it's a rough POC :) > (I had a similar thought after getting hit on the head by the recent > t0450-txt-doc-vs-help.sh). Sorry about that. FWIW I've wanted to assert that for a while, and to do it by e.g. having the doc *.txt blurbs generated from running "$buildin -h" during the build. But when I suggested that I gathered That Junio wasn't up for that approach, it does have its downsides thorugh. E.g. to build the docs you'd now have to compile C code, and e.g. that git-scm.com publisher written in Ruby would have to compile the code to generate its docs. Or we could do it the other way around, and scape the *.txt for the *.c generation, but then we need to run a new script for building builtin/*.o. Also possible, and I think eventually both are better than what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive change than both... >> Having to exhaustively list every *.c file that uses fsck.h is a bit of >> a bother, but we have the same with the other generated *.h's, so it >> shouldn't be too bad. > > It feels like this could be made much shorter by having a separate > fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c > need the actual list. It would probably have to stop being an "enum", > though. Yes, that would make it shorter, but C doesn't have forward decls of enums, so we'd need to make it "int", .... > Another alternative is to generate the doc from the code, rather than > the other way around. *nod* :) >> +# Unfortunately our dependency management of generated headers used >> +# from within other headers suck, so we'll need to list every user of >> +# fsck.h here, but not too bad... >> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects >> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h >> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck >> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h > > I don't understand the "used within other headers" part here. Computed > dependencies will get this right. It's only the initial build (before we > have any computed dependencies) that needs to know that the C files in > question depend on the generated file. But that is true whether they do > so directly or indirectly. I forgot the full story, but luckily I wrote it down in the WIP commits :) FWIW if you want to scour that it's mainly: https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice Also, before generating builtin.h I've got e.g. this: https://github.com/avar/git/commit/5a5360d0134 # just check with 'make' that any file is sorted To actualy generate it, very WIP: http://github.com/avar/git/commit/cf1d02fa6b2 Anyway, in partial summary, why (and sorry this got so long): * Yes, once we build e.g. fsck.o we have the full dependency tree, yay.... * ...but only for gcc & clang, but we support other compilers. * ...for other compilers (or gcc & clang without the dep detection enabled) what should this do: make file-that-does-not-use-generated-header.o It sucks a bit to have e.g. every *.o depend on command-list.h, when we only use it in 1-2 places, ditto the other headers. The approach I took to this was to manually list headers that had "small uses" (1-10), but say that all of *.h dependend on running the *.sh to generate some "widely used" headers. E.g. if we generate builtin.h we'll either need that, or add builtin/%.o as a dependency, with a manual listing of the handful of uses outside of that subdirectory. * .... or just not care about any of that, i.e. to have all of our *.sh run on the "first build" no matter what, which certainly simplifies things, but once you have e.g. 5-10 generated headers doing e.g.: make grep.o and having it build command-list.h or whatever is a bit noisy, but it's a trade-off of manually maintaining deps v.s. not. * Our "COMPUTED_HEADER_DEPENDENCIES" only happens when you build the *.o, but we have *.sp and *.s targets too. I have some local changes *to generalize that, so we e.g. get proper dependencies for building **.s files. * We also have e.g. "make hdr-check", which works just fine now, but it's becausue we have no *.h file that uses another generated *.h file. Actually, I suspect the posted WIP change is broken in that regard (but doing this for real on my topic branch works), i.e. we need to start caring about deps for those auxiliary targets. * I may be wrong (I'm just looking at some old "here be dragons" note of mine), but I think there's also an edge case where the dependency graph of .depend.d/* is correct *once you've always had it*, but if a file now e.g. depends on foo.h, and I make foo.h include a new foo-generated.h things go boom. That issue (it if even exists, sorry, I'm blanking on the details) was solvable by just doing a "make clean", and building again, i.e. once we had re-built once we were OK, it was only a problem with an older checkout with an existing build pulling new sources. Still, I wanted to not make that case annoying either if I added a generated header...
On Wed, Oct 26, 2022 at 01:35:43PM +0200, Ævar Arnfjörð Bjarmason wrote: > > (I had a similar thought after getting hit on the head by the recent > > t0450-txt-doc-vs-help.sh). > > Sorry about that. FWIW I've wanted to assert that for a while, and to do > it by e.g. having the doc *.txt blurbs generated from running "$buildin > -h" during the build. > > But when I suggested that I gathered That Junio wasn't up for that > approach, it does have its downsides thorugh. E.g. to build the docs > you'd now have to compile C code, and e.g. that git-scm.com publisher > written in Ruby would have to compile the code to generate its docs. Yes, it would definitely break the git-scm.com importer. It might be possible to make it work by actually running "make man" (or maybe even some partial targets) in a local repo. The nightly update job pulls all of the data using GitHub's API, but it does run in a heroku dyno that has git and gcc. Doing a shallow clone of the tag to build is probably more expensive, but the cost of an actual build isn't that important. 99% of the scheduled job runs are noops because there's no new version of Git to build manpages for; as long as those remain cheap-ish, we're OK. I also think in the long run that the site should move to a system that builds off a local repo, which we can trigger manually or via GitHub Actions. But that doesn't exist yet, and I don't think anyone's actively working on it. I also think it would be nice if the git-scm.com system relied more on "make man USE_ASCIIDOCTOR=1", possibly by hooking into asciidoctor-extensions.rb or similar, rather than munging files in an ad-hoc manner. But that's also a big change that nobody is actively working on. All of which is to say that yes, having docs depend on C code will cause work on the site, but it may be work that takes us in the right direction. > Or we could do it the other way around, and scape the *.txt for the *.c > generation, but then we need to run a new script for building > builtin/*.o. Also possible, and I think eventually both are better than > what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive > change than both... I think this particular case is tricky in that direction, because it's a big set of dependencies that aren't necessarily one-to-one. E.g., builtin/log.c needs to depend on git-log.txt, but also on git-show.txt and git-format-patch.txt. > >> +# Unfortunately our dependency management of generated headers used > >> +# from within other headers suck, so we'll need to list every user of > >> +# fsck.h here, but not too bad... > >> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects > >> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h > >> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck > >> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h > > > > I don't understand the "used within other headers" part here. Computed > > dependencies will get this right. It's only the initial build (before we > > have any computed dependencies) that needs to know that the C files in > > question depend on the generated file. But that is true whether they do > > so directly or indirectly. > > I forgot the full story, but luckily I wrote it down in the WIP commits > :) FWIW if you want to scour that it's mainly: > > https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h > https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice That seems like the same problem to me. It's just that if something is in cache.h, then it's needed by basically everything, and so the dependency list is big. > Anyway, in partial summary, why (and sorry this got so long): > > * Yes, once we build e.g. fsck.o we have the full dependency tree, > yay.... > * ...but only for gcc & clang, but we support other compilers. > * ...for other compilers (or gcc & clang without the dep detection > enabled) what should this do: > > make file-that-does-not-use-generated-header.o > > It sucks a bit to have e.g. every *.o depend on command-list.h, when > we only use it in 1-2 places, ditto the other headers. But that is already true of non-generated headers. If your system doesn't support computed deps, then all objects depend on all headers. Yes, that sucks for you. But almost nobody actively develops on such a system, and people building Git to use rarely notice (because they are doing an initial build, or jumping versions, both of which generally require recompilation anyway). That all comes from b8ba629264 (Makefile: fold MISC_H into LIB_H, 2012-06-20). > * We also have e.g. "make hdr-check", which works just fine now, but > it's becausue we have no *.h file that uses another generated *.h > file. I'm not that surprised. Probably it should depend on $(GENERATED_H), and possibly even be checking those files too. > * I may be wrong (I'm just looking at some old "here be dragons" note > of mine), but I think there's also an edge case where the dependency > graph of .depend.d/* is correct *once you've always had it*, but if a > file now e.g. depends on foo.h, and I make foo.h include a new > foo-generated.h things go boom. That should work because the timestamp on foo.h will have been updated, causing anything that includes it to rebuild (and thus updating its computed dependencies to include foo-generated.h). -Peff
On Thu, Oct 27 2022, Jeff King wrote: > On Wed, Oct 26, 2022 at 01:35:43PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > (I had a similar thought after getting hit on the head by the recent >> > t0450-txt-doc-vs-help.sh). >> >> Sorry about that. FWIW I've wanted to assert that for a while, and to do >> it by e.g. having the doc *.txt blurbs generated from running "$buildin >> -h" during the build. >> >> But when I suggested that I gathered That Junio wasn't up for that >> approach, it does have its downsides thorugh. E.g. to build the docs >> you'd now have to compile C code, and e.g. that git-scm.com publisher >> written in Ruby would have to compile the code to generate its docs. > > Yes, it would definitely break the git-scm.com importer. It might be > possible to make it work by actually running "make man" (or maybe even > some partial targets) in a local repo. The nightly update job pulls all > of the data using GitHub's API, but it does run in a heroku dyno that > has git and gcc. Doing a shallow clone of the tag to build is probably > more expensive, but the cost of an actual build isn't that important. > 99% of the scheduled job runs are noops because there's no new version > of Git to build manpages for; as long as those remain cheap-ish, we're > OK. > > I also think in the long run that the site should move to a system that > builds off a local repo, which we can trigger manually or via GitHub > Actions. But that doesn't exist yet, and I don't think anyone's actively > working on it. > > I also think it would be nice if the git-scm.com system relied more on > "make man USE_ASCIIDOCTOR=1", possibly by hooking into > asciidoctor-extensions.rb or similar, rather than munging files in an > ad-hoc manner. But that's also a big change that nobody is actively > working on. > > All of which is to say that yes, having docs depend on C code will cause > work on the site, but it may be work that takes us in the right > direction. Good to know. >> Or we could do it the other way around, and scape the *.txt for the *.c >> generation, but then we need to run a new script for building >> builtin/*.o. Also possible, and I think eventually both are better than >> what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive >> change than both... > > I think this particular case is tricky in that direction, because it's a > big set of dependencies that aren't necessarily one-to-one. E.g., > builtin/log.c needs to depend on git-log.txt, but also on git-show.txt > and git-format-patch.txt. I was thinking of just a generated usage-strings.c or whatever. I.e. one file with every usage string in it. Then you don't need to hunt down which thing goes in which file. We'll just have a variable named "builtin_format_patch_usage". Include it in "builtin.h" and it doesn't matter if that's in builtin/log.c or builtin/format-patch.c. It does mean you need to rebuild the C code for other built-ins every time one of the SYNOPSIS sections changes, but doesn't happen too often. >> >> +# Unfortunately our dependency management of generated headers used >> >> +# from within other headers suck, so we'll need to list every user of >> >> +# fsck.h here, but not too bad... >> >> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects >> >> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h >> >> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck >> >> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h >> > >> > I don't understand the "used within other headers" part here. Computed >> > dependencies will get this right. It's only the initial build (before we >> > have any computed dependencies) that needs to know that the C files in >> > question depend on the generated file. But that is true whether they do >> > so directly or indirectly. >> >> I forgot the full story, but luckily I wrote it down in the WIP commits >> :) FWIW if you want to scour that it's mainly: >> >> https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h >> https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice > > That seems like the same problem to me. It's just that if something is > in cache.h, then it's needed by basically everything, and so the > dependency list is big. I think I either did or was planning to take it out of cache.h as part of that, we put way too much in cache.h. Even advice.c depends on cache.h for its advice.h *sigh*. Trying it just now putting advice.h in builtin.h instead leaves 10 top-level files not-compiling (and they just need the header). I think it's fine to include common utilties in our "included by everything" headers, but if we've got ~500 *.c files and something is only needed by ~20 of them (as in this case) we should probably just include it in those 20. >> Anyway, in partial summary, why (and sorry this got so long): >> >> * Yes, once we build e.g. fsck.o we have the full dependency tree, >> yay.... >> * ...but only for gcc & clang, but we support other compilers. >> * ...for other compilers (or gcc & clang without the dep detection >> enabled) what should this do: I didn't summarize this well enough, I should have said that whether you have computed deps or not that..... >> >> make file-that-does-not-use-generated-header.o >> >> It sucks a bit to have e.g. every *.o depend on command-list.h, when >> we only use it in 1-2 places, ditto the other headers. > But that is already true of non-generated headers. If your system > doesn't support computed deps, then all objects depend on all headers. ... this does not build e.g. command-list.h: make clean make grep.o But this does: make clean make help.o Because we've manually declared that. [Re-arranging a bit] > That all comes from b8ba629264 (Makefile: fold MISC_H into LIB_H, > 2012-06-20). Yes, but you get that for free whether you have computed deps or not. I.e. your compiler is about to build the code anyway. But it wasn't about to run a bunch of shellscripts to generate headers it doesn't actually need. > Yes, that sucks for you. But almost nobody actively develops on such a > system, and people building Git to use rarely notice (because they are > doing an initial build, or jumping versions, both of which generally > require recompilation anyway). I guess I'm "almost nobody" :) Not because I don't use computed deps, but because I tend to e.g. do large interactive rebases with: git rebase -i 'make grep.o' Or some other subset of our built assets/objects. On that front no one thing is the end of the world, i.e. sure, if we run some shellscript to make a needed-header.h that takes 10ms we can live with it. But it adds up, which is one reason I've been slowly trickling in patches to optimize various common parts of the Makefile & those scripts. I think before I started that the fixed cost even to have just make decide it had nothing to do was often 500ms-1s. I think that's down to 1/4 or so of that on "master", and I've got some local patches to get it consistently down to <50ms. It adds up, especially when combined with ccache & the like. I.e. if i'm testing 10 commits I happen to have cached a "rebase -i" goes from noticeably slow (~10s) to not being so slow as to be distracting. Anyway, all of which is to say that that's one thing I was aiming for: If I was going to submit patches for new generated headers to find some good trade-off between complexity and over-declaring dependencies. >> * We also have e.g. "make hdr-check", which works just fine now, but >> it's becausue we have no *.h file that uses another generated *.h >> file. > > I'm not that surprised. Probably it should depend on $(GENERATED_H), and > possibly even be checking those files too. > >> * I may be wrong (I'm just looking at some old "here be dragons" note >> of mine), but I think there's also an edge case where the dependency >> graph of .depend.d/* is correct *once you've always had it*, but if a >> file now e.g. depends on foo.h, and I make foo.h include a new >> foo-generated.h things go boom. > > That should work because the timestamp on foo.h will have been updated, > causing anything that includes it to rebuild (and thus updating its > computed dependencies to include foo-generated.h). Yes, in general it should work, maybe there's some Heisenbug there. I can't recall or reproduce it, sorry. But that's a good thing, maybe there's no issue :)
Hi Ævar, On 26 Oct 2022, at 7:35, Ævar Arnfjörð Bjarmason wrote: > On Wed, Oct 26 2022, Jeff King wrote: > >> On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> >>> On Tue, Oct 25 2022, Junio C Hamano wrote: >>> >>>> During the initial development of the fsck-msgids.txt feature, it >>>> has become apparent that it is very much error prone to make sure >>>> the description in the documentation file are sorted and correctly >>>> match what is in the fsck.h header file. >>> >>> I have local fixes for the same issues in the list of advice in our >>> docs, some of it's missing, wrong, out of date etc. >>> >>> I tried to quickly adapt the generation script I had for that, which >>> works nicely, and by line count much shorter than the lint :) >> >> Yeah, my instinct here was to generate rather than lint. If you make a >> mistake and the linter hits you over the head, that is better than >> quietly letting your mistake go. But better still is making it >> impossible to make in the first place. >> >> The downside is added complexity to the build, but it doesn't seem too >> bad in this case. > > Yeah, it's not, I have local patches to generate advice-{type,config}.h, > and builtin.h. This is a quick POC to do it for fsck-msgids.h. > > I see I forgot the .gitignore entry, so it's a rough POC :) > >> (I had a similar thought after getting hit on the head by the recent >> t0450-txt-doc-vs-help.sh). > > Sorry about that. FWIW I've wanted to assert that for a while, and to do > it by e.g. having the doc *.txt blurbs generated from running "$buildin > -h" during the build. If we wanted to go this route of generating the docs from the code (which sounds like a better long term solution), how would this work? Would we print out the list of message ids in builtin/fsck.c and write it to Documentation/fsck-msgids.txt ? > > But when I suggested that I gathered That Junio wasn't up for that > approach, it does have its downsides thorugh. E.g. to build the docs > you'd now have to compile C code, and e.g. that git-scm.com publisher > written in Ruby would have to compile the code to generate its docs. > > Or we could do it the other way around, and scape the *.txt for the *.c > generation, but then we need to run a new script for building > builtin/*.o. Also possible, and I think eventually both are better than > what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive > change than both... > >>> Having to exhaustively list every *.c file that uses fsck.h is a bit of >>> a bother, but we have the same with the other generated *.h's, so it >>> shouldn't be too bad. >> >> It feels like this could be made much shorter by having a separate >> fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c >> need the actual list. It would probably have to stop being an "enum", >> though. > > Yes, that would make it shorter, but C doesn't have forward decls of > enums, so we'd need to make it "int", .... > >> Another alternative is to generate the doc from the code, rather than >> the other way around. > > *nod* :) > >>> +# Unfortunately our dependency management of generated headers used >>> +# from within other headers suck, so we'll need to list every user of >>> +# fsck.h here, but not too bad... >>> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects >>> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h >>> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck >>> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h >> >> I don't understand the "used within other headers" part here. Computed >> dependencies will get this right. It's only the initial build (before we >> have any computed dependencies) that needs to know that the C files in >> question depend on the generated file. But that is true whether they do >> so directly or indirectly. > > I forgot the full story, but luckily I wrote it down in the WIP commits > :) FWIW if you want to scour that it's mainly: > > https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h > https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice > > Also, before generating builtin.h I've got e.g. this: > > https://github.com/avar/git/commit/5a5360d0134 # just check with 'make' that any file is sorted > > To actualy generate it, very WIP: > > http://github.com/avar/git/commit/cf1d02fa6b2 > > Anyway, in partial summary, why (and sorry this got so long): > > * Yes, once we build e.g. fsck.o we have the full dependency tree, > yay.... > * ...but only for gcc & clang, but we support other compilers. > * ...for other compilers (or gcc & clang without the dep detection > enabled) what should this do: > > make file-that-does-not-use-generated-header.o > > It sucks a bit to have e.g. every *.o depend on command-list.h, when > we only use it in 1-2 places, ditto the other headers. > > The approach I took to this was to manually list headers that had > "small uses" (1-10), but say that all of *.h dependend on running the > *.sh to generate some "widely used" headers. > > E.g. if we generate builtin.h we'll either need that, or add > builtin/%.o as a dependency, with a manual listing of the handful of > uses outside of that subdirectory. > > * .... or just not care about any of that, i.e. to have all of our *.sh > run on the "first build" no matter what, which certainly simplifies > things, but once you have e.g. 5-10 generated headers doing e.g.: > > make grep.o > > and having it build command-list.h or whatever is a bit noisy, but > it's a trade-off of manually maintaining deps v.s. not. > > * Our "COMPUTED_HEADER_DEPENDENCIES" only happens when you build the > *.o, but we have *.sp and *.s targets too. I have some local changes > *to generalize that, so we e.g. get proper dependencies for building > **.s files. > > * We also have e.g. "make hdr-check", which works just fine now, but > it's becausue we have no *.h file that uses another generated *.h > file. > > Actually, I suspect the posted WIP change is broken in that regard > (but doing this for real on my topic branch works), i.e. we need to > start caring about deps for those auxiliary targets. > > * I may be wrong (I'm just looking at some old "here be dragons" note > of mine), but I think there's also an edge case where the dependency > graph of .depend.d/* is correct *once you've always had it*, but if a > file now e.g. depends on foo.h, and I make foo.h include a new > foo-generated.h things go boom. > > That issue (it if even exists, sorry, I'm blanking on the details) > was solvable by just doing a "make clean", and building again, > i.e. once we had re-built once we were OK, it was only a problem with > an older checkout with an existing build pulling new sources. > > Still, I wanted to not make that case annoying either if I added a > generated header...
On Thu, Oct 27 2022, John Cai wrote: > Hi Ævar, > > On 26 Oct 2022, at 7:35, Ævar Arnfjörð Bjarmason wrote: > >> On Wed, Oct 26 2022, Jeff King wrote: >> >>> On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote: >>> >>>> >>>> On Tue, Oct 25 2022, Junio C Hamano wrote: >>>> >>>>> During the initial development of the fsck-msgids.txt feature, it >>>>> has become apparent that it is very much error prone to make sure >>>>> the description in the documentation file are sorted and correctly >>>>> match what is in the fsck.h header file. >>>> >>>> I have local fixes for the same issues in the list of advice in our >>>> docs, some of it's missing, wrong, out of date etc. >>>> >>>> I tried to quickly adapt the generation script I had for that, which >>>> works nicely, and by line count much shorter than the lint :) >>> >>> Yeah, my instinct here was to generate rather than lint. If you make a >>> mistake and the linter hits you over the head, that is better than >>> quietly letting your mistake go. But better still is making it >>> impossible to make in the first place. >>> >>> The downside is added complexity to the build, but it doesn't seem too >>> bad in this case. >> >> Yeah, it's not, I have local patches to generate advice-{type,config}.h, >> and builtin.h. This is a quick POC to do it for fsck-msgids.h. >> >> I see I forgot the .gitignore entry, so it's a rough POC :) >> >>> (I had a similar thought after getting hit on the head by the recent >>> t0450-txt-doc-vs-help.sh). >> >> Sorry about that. FWIW I've wanted to assert that for a while, and to do >> it by e.g. having the doc *.txt blurbs generated from running "$buildin >> -h" during the build. > > If we wanted to go this route of generating the docs from the code (which sounds > like a better long term solution), how would this work? Would we print out the > list of message ids in builtin/fsck.c and write it to > Documentation/fsck-msgids.txt ? First, for the purposes of this thread I think Jeff and I are far off into the weeds here :) I think nothing needs to change in how this topic's doing things, we're just takling about the longer term. But if we go for that: I think in this case & most I can think of generating the code from the docs is better (as that rough POC I had showed), because: - You just need a shellscript to scrape the docs to make a *.c or *.h, whereas you'd need a C compiler to make the docs if it's the other way around. But more importantly: - The docs are way easier to scrape with some sed/awk/grep/whatever few-liner than to scrape C code for generating docs. E.g. see config-list.h. - Scraping the C code sucks so much that we'd probably make some dedicated interface for it, e.g. what we have for "git <cmd> --git-completion-helper". In that case it's worth it, but for other things we'n need to make the interface & maintain it (even if it's some test helper just for the build). But mainly it helps to have a use-case, replacing the linter script with e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git help -c" uses one of those generated files (config-list.h), and actually does something useful ... Is there a good use-case for the fsck data like that? I'd think that we'd want to make sure the docs are in sync with the code, as in we're not adding new warnings/errors etc. without documenting them. But beyond that maybe not much, and people would just run "git help fsck" to get the list of variables..
On Fri, Oct 28, 2022 at 04:04:12AM +0200, Ævar Arnfjörð Bjarmason wrote: > > I think this particular case is tricky in that direction, because it's a > > big set of dependencies that aren't necessarily one-to-one. E.g., > > builtin/log.c needs to depend on git-log.txt, but also on git-show.txt > > and git-format-patch.txt. > > I was thinking of just a generated usage-strings.c or whatever. I.e. one > file with every usage string in it. Then you don't need to hunt down > which thing goes in which file. We'll just have a variable named > "builtin_format_patch_usage". Include it in "builtin.h" and it doesn't > matter if that's in builtin/log.c or builtin/format-patch.c. Yes, though you have the opposite problem, then: what are the source files that can produce that usage-strings.c? If your answer is "Documentation/git-*.txt", then that is a recipe for flaky dependencies. We already have one such for config-list.h. Try this: # introduce a new file printf 'foo.bar::\n\ta fake config var\n' \ >Documentation/config/foo.txt # it shows up in the output, as expected make ./git help --config-for-completion | grep foo # now drop it rm Documentation/config/foo.txt # oops; make won't rebuild anything, and it's still there make ./git help --config-for-completion | grep foo The bug is that config-list.h depends on a glob. So we notice when a file changes, but not when one goes away. And this isn't just hypothetical. Files come and go as you "git checkout" around history (or bisect). I don't remember the details, but I'm pretty sure I've gotten false positive test failures out of this before (maybe a topic branch with a bogus entry that broke t0012 or t9902, and then moving back to master doesn't fix it?). So I'd prefer to avoid introducing more flakiness if we can. You might be able to piggy back on command-list.txt in this case (that's what makes command-list.h not flaky). > It does mean you need to rebuild the C code for other built-ins every > time one of the SYNOPSIS sections changes, but doesn't happen too often. If it's a c/h combo that only makes the variable names public (not the contents of the strings they point to), then it would only trigger a rebuild when a command is added or removed. > I think I either did or was planning to take it out of cache.h as part > of that, we put way too much in cache.h. > > Even advice.c depends on cache.h for its advice.h *sigh*. > > Trying it just now putting advice.h in builtin.h instead leaves 10 > top-level files not-compiling (and they just need the header). > > I think it's fine to include common utilties in our "included by > everything" headers, but if we've got ~500 *.c files and something is > only needed by ~20 of them (as in this case) we should probably just > include it in those 20. Oh, definitely, we should be shrinking cache.h, and not adding more to it. Especially not generated stuff. > >> make file-that-does-not-use-generated-header.o > >> > >> It sucks a bit to have e.g. every *.o depend on command-list.h, when > >> we only use it in 1-2 places, ditto the other headers. > > But that is already true of non-generated headers. If your system > > doesn't support computed deps, then all objects depend on all headers. > > ... this does not build e.g. command-list.h: > > make clean > make grep.o > > But this does: > > make clean > make help.o > > Because we've manually declared that. Right, but...does grep.o actually need command-list.h? If it doesn't (and that seems to be the case), then all is working as intended. If grep included some other header that included command-list.h, then yeah, that would be a bug. But that is true whether grep.c includes it directly or not. Any Makefile dependency needs to take into account recursive includes. > > Yes, that sucks for you. But almost nobody actively develops on such a > > system, and people building Git to use rarely notice (because they are > > doing an initial build, or jumping versions, both of which generally > > require recompilation anyway). > > I guess I'm "almost nobody" :) Not because I don't use computed deps, > but because I tend to e.g. do large interactive rebases with: > > git rebase -i 'make grep.o' > > Or some other subset of our built assets/objects. I do that all the time, too. But with computed deps, it works (by which I mean it rebuilds only stuff needed by grep.o). I'm not even sure what we're talking about anymore. If you are saying that no, we don't want to just say "everything depends on foo.h that is generated", I agree. That is wrong to do, and we should specify the minimal dependencies where appropriate (and take care to keep that set as small as is feasible using small C interfaces). If you're saying "for people without computed dependencies, everything will want to rebuild shell scripts", then I don't care. We decided long ago that maintaining a manual list of header dependencies was not worth doing, and people with sub-par compilers will have to suffer. In the email you're replying to, I was trying to express the second one. But it sounds like you thought I was trying to argue against the first one. -Peff
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > First, for the purposes of this thread I think Jeff and I are far off > into the weeds here :) It is good to clearly separate where we would want to draw the line for this round, to get the already-worked-on-and-immediately-available improvement going, while we envision a future direction for the longer term. > But if we go for that: I think in this case & most I can think of > generating the code from the docs is better (as that rough POC I had > showed), because: > > - You just need a shellscript to scrape the docs to make a *.c or *.h, > whereas you'd need a C compiler to make the docs if it's the other > way around. But more importantly: > > - The docs are way easier to scrape with some sed/awk/grep/whatever > few-liner than to scrape C code for generating docs. E.g. see > config-list.h. Scraping docs is easier because we do not have to choose from multiple choices that are all reasonable ;-). Either way, the source material needs some discipline to keep it scrapable (e.g. to keep the doc scrapable, you'd probably keep each entry a single line, or a fixed format like "<token>::" followed by "^I(<token>) " followed by description. Nothing forbids us from giving developers a similar rule to keep each entry in FOR_EACH_MSG_ID() macro easier to scrape, so it is about the same difficulty going either way. But if you choose to make the code the source of truth, you'd have to see if it makes more sense to "compile and run" instead of scraping. That's another thing to consider and choose from, which makes it harder ;-) > But mainly it helps to have a use-case, replacing the linter script with > e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git > help -c" uses one of those generated files (config-list.h), and actually > does something useful ... Yes, I've shown how N_("explanation of the error") may fit into the existing scheme in a separate message upthread. If we go from code to doc, it would be a reasonable starting point. Whichever way the aout-generation goes, we'd complicate the build dependency graph, which is a downside. Another is that third-party consumers of docs now need to generate some docs from the source, which may be additional burden for them. > Is there a good use-case for the fsck data like that? I'd think that > we'd want to make sure the docs are in sync with the code, as in we're > not adding new warnings/errors etc. without documenting them. But beyond > that maybe not much, and people would just run "git help fsck" to get > the list of variables.. "git help fsck-error-codes" that does not have a pre-generated documentation (instead we'd just dump the N_("explanation") to the output) is certainly tempting. I am not sure if it would fly well. When was the last time you saw a manpage that says "run this command and view its output for the most up-to-date list" ;-)?
On Fri, Oct 28, 2022 at 05:11:07AM +0200, Ævar Arnfjörð Bjarmason wrote: > - The docs are way easier to scrape with some sed/awk/grep/whatever > few-liner than to scrape C code for generating docs. E.g. see > config-list.h. > > - Scraping the C code sucks so much that we'd probably make some > dedicated interface for it, e.g. what we have for "git <cmd> > --git-completion-helper". In the general case, scraping C code is awful. But if you are looking for one particular thing, it is not unreasonable to say something like "when we write a usage string, we write it like: const char *git_cmdname_usage[] = "..."; > But mainly it helps to have a use-case, replacing the linter script with > e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git > help -c" uses one of those generated files (config-list.h), and actually > does something useful ... Yeah. In cases like config-list.h, it really is helpful for it to be something we can query at run-time (the other option would be stuffing them into git-completion.bash at build-time, but there are some downsides there). -Peff
On Thu, Oct 27, 2022 at 10:32:30PM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > First, for the purposes of this thread I think Jeff and I are far off > > into the weeds here :) > > It is good to clearly separate where we would want to draw the line > for this round, to get the already-worked-on-and-immediately-available > improvement going, while we envision a future direction for the longer > term. Yeah, to be clear, I'm OK with your linting script in the near term. I didn't look at it _too_ carefully, but if the linter passes, then I think the proof is in the pudding, and we can consider the topic done for now. -Peff
On Fri, Oct 28 2022, Jeff King wrote: > On Fri, Oct 28, 2022 at 04:04:12AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > I think this particular case is tricky in that direction, because it's a >> > big set of dependencies that aren't necessarily one-to-one. E.g., >> > builtin/log.c needs to depend on git-log.txt, but also on git-show.txt >> > and git-format-patch.txt. >> >> I was thinking of just a generated usage-strings.c or whatever. I.e. one >> file with every usage string in it. Then you don't need to hunt down >> which thing goes in which file. We'll just have a variable named >> "builtin_format_patch_usage". Include it in "builtin.h" and it doesn't >> matter if that's in builtin/log.c or builtin/format-patch.c. > > Yes, though you have the opposite problem, then: what are the source > files that can produce that usage-strings.c? If your answer is > "Documentation/git-*.txt", then that is a recipe for flaky dependencies. First, you're not telling me anything new here :) I've looked at literally all of these special-cases recently for t0450*. > We already have one such for config-list.h. Try this: > > # introduce a new file > printf 'foo.bar::\n\ta fake config var\n' \ > >Documentation/config/foo.txt > > # it shows up in the output, as expected > make > ./git help --config-for-completion | grep foo > > # now drop it > rm Documentation/config/foo.txt > > # oops; make won't rebuild anything, and it's still there > make > ./git help --config-for-completion | grep foo Yes, much of the dependency graph in Documentation/Makefile is simply broken, and that's not the worst offender. Try adding a new category to command-list.txt, then remove it, and watch your entire build fail until you 'git clean -dxf' (because Documentation/Makefile will only remove the generated cmds-*.txt it knows about, but another part includes things based on a wildcard. > The bug is that config-list.h depends on a glob. So we notice when a > file changes, but not when one goes away. And this isn't just > hypothetical. Files come and go as you "git checkout" around history (or > bisect). I don't remember the details, but I'm pretty sure I've gotten > false positive test failures out of this before (maybe a topic branch > with a bogus entry that broke t0012 or t9902, and then moving back to > master doesn't fix it?). > > So I'd prefer to avoid introducing more flakiness if we can. You might > be able to piggy back on command-list.txt in this case (that's what > makes command-list.h not flaky). I'm in violent agreement with you that this problem sucks, but it's also trivial to solve: Just don't create such crappy dependency graphs in our Makefiles. So, in this case (and keep in mind this is off the cuff, and I'm not advocating for doing this now) is: 1. We already have an exhaustive list of built-ins in the top-level Makefile 2. Turn those into corresponding Documentation/git-*.txt dependencies, e.g. builtin/rebase.o to Documentation/git-rebase.txt. 3. Some won't exist on the other side, we have no builtin/cherry-pick.o, but a Documentation/git-cherry-pick.txt etc. I know, keep reading... :) 4. Slurp up the *.txt, parse it (this code's mostly in t0450 already), spew out a generated *.c or *.h. Critically this is based on an exhaustive known list at this point, if we add/remove built-ins we *will* re-generate it. It's not using a wildcard dependency. At this point we have e.g. variables like: - git_builtin_bundle_usage - git_builtin_log_usage But because we're not smart enough and/or it doesn't correspond to our *.txt structure we don't have e.g.: - git_builtin_bundle_create_usage (a sub-command) - git_builtin_fsck_objects_usage But that's fine. We can just handle remaining special-cases by not removing the cases we didn't handle from the C sources, we'll still need to maintain those on both sides, and have t0450 hopefully catch them drifting. I.e. a subsequent change to the C code would be something like: diff --git a/builtin/bundle.c b/builtin/bundle.c index e80efce3a42..557e00be5a2 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -11,14 +11,7 @@ * bundle supporting "fetch", "pull", and "ls-remote". */ -static const char * const builtin_bundle_usage[] = { - N_("git bundle create [<options>] <file> <git-rev-list args>"), - N_("git bundle verify [<options>] <file>"), - N_("git bundle list-heads <file> [<refname>...]"), - N_("git bundle unbundle <file> [<refname>...]"), - NULL -}; - +/* too dumb for these still... */ static const char * const builtin_bundle_create_usage[] = { N_("git bundle create [<options>] <file> <git-rev-list args>"), NULL We didn't handle everything, but that's OK. Perfect shouldn't be the enemy of the good. We have on the order of 150 built-ins, by far the majority of them are relativtely easy to correlate to their docs & scrape them out. The goal here would be to mostly get rid of the maintenance burden of maintaining these in two places, if we still need to do that in 10-30 cases instead of 150-200 we've solved most of the problem. >> It does mean you need to rebuild the C code for other built-ins every >> time one of the SYNOPSIS sections changes, but doesn't happen too often. > > If it's a c/h combo that only makes the variable names public (not the > contents of the strings they point to), then it would only trigger a > rebuild when a command is added or removed. Yeah, and if we didn't like that we can make it a file per variable (which I think sucks), but a full re-build every time we add a built-in isn't too bad. And we *way* worse now, every time some utility function is added to a few common headers we basically do a re-build. >> I think I either did or was planning to take it out of cache.h as part >> of that, we put way too much in cache.h. >> >> Even advice.c depends on cache.h for its advice.h *sigh*. >> >> Trying it just now putting advice.h in builtin.h instead leaves 10 >> top-level files not-compiling (and they just need the header). >> >> I think it's fine to include common utilties in our "included by >> everything" headers, but if we've got ~500 *.c files and something is >> only needed by ~20 of them (as in this case) we should probably just >> include it in those 20. > > Oh, definitely, we should be shrinking cache.h, and not adding more to > it. Especially not generated stuff. *more violent agreement noises* :) FWIW I have some patches to make headway on that stuff, which I've kept unsubmitted because I thought nobody was interested... I.e. a *very large* part of cache.h and friends is stuff that maybe 3-5 files here and there need, not "nearly everything", as one might assume. We should still have e.g. gettext.h and the like in there, probably. Although for those we should probably split them off into builtin.h and e.g. a new libs.h, so that we don't get translations added in t/helper/*.c and the like, so maybe ... >> >> make file-that-does-not-use-generated-header.o >> >> >> >> It sucks a bit to have e.g. every *.o depend on command-list.h, when >> >> we only use it in 1-2 places, ditto the other headers. >> > But that is already true of non-generated headers. If your system >> > doesn't support computed deps, then all objects depend on all headers. >> >> ... this does not build e.g. command-list.h: >> >> make clean >> make grep.o >> >> But this does: >> >> make clean >> make help.o >> >> Because we've manually declared that. > > Right, but...does grep.o actually need command-list.h? If it doesn't > (and that seems to be the case), then all is working as intended. If > grep included some other header that included command-list.h, then yeah, > that would be a bug. But that is true whether grep.c includes it > directly or not. Any Makefile dependency needs to take into account > recursive includes. Yes, it doesn't need it, and it shouldn't build it. I'm saying that this is the reason we've needed manually maintained dependencies of these generated headers. I.e. if you take the easy way out and say that we have no generated deps yet, let's build all of it you would build that stuff with grep.o for no reason. And yes, I'm aware that that's what we do right now. I'm saying that it's probably OK because we've had 3-4 of these headers, but people might mind if it's more, or a lot more... >> > Yes, that sucks for you. But almost nobody actively develops on such a >> > system, and people building Git to use rarely notice (because they are >> > doing an initial build, or jumping versions, both of which generally >> > require recompilation anyway). >> >> I guess I'm "almost nobody" :) Not because I don't use computed deps, >> but because I tend to e.g. do large interactive rebases with: >> >> git rebase -i 'make grep.o' >> >> Or some other subset of our built assets/objects. > > I do that all the time, too. But with computed deps, it works (by which > I mean it rebuilds only stuff needed by grep.o). > > I'm not even sure what we're talking about anymore. If you are saying > that no, we don't want to just say "everything depends on foo.h that is > generated", I agree. That is wrong to do, and we should specify the > minimal dependencies where appropriate (and take care to keep that set > as small as is feasible using small C interfaces). > > If you're saying "for people without computed dependencies, everything > will want to rebuild shell scripts", then I don't care. We decided long > ago that maintaining a manual list of header dependencies was not worth > doing, and people with sub-par compilers will have to suffer. > > In the email you're replying to, I was trying to express the second one. > But it sounds like you thought I was trying to argue against the first > one. I'm saying there's a large chasm between using generated dependencies, and using them all the time. Maybe my dev setup is just different, but I do most of my development in a primary development tree where I've generally built things already, so all the incremental compilation we support works out nicely. But I also do things like spin up different workrtees for different revisions, build some subset, and destroy them afterwards. Those don't benefit from computed dependencies, so it's nice that we're not over-building things. Anyway: All I was trying to get across with the upthread was that I had further patches in this area that might be useful, but that part of the reason I hadn't submitted them was because I didn't want to introduce suckage with them. We don't make a lot of use of generated code, and the uses we have now are really isolated (one header used by one *.c file, mostly). If we make that a one<->many we should be careful, and there's some dragons lurking e.g. for generated *.h's used by other *.h's (all our use is by *.c's at this point).
diff --git a/Documentation/Makefile b/Documentation/Makefile index d47acb2e25..5e1a7f655c 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -476,8 +476,19 @@ $(LINT_DOCS_MAN_SECTION_ORDER): .build/lint-docs/man-section-order/%.ok: %.txt .PHONY: lint-docs-man-section-order lint-docs-man-section-order: $(LINT_DOCS_MAN_SECTION_ORDER) +.PHONY: lint-docs-fsck-msgids +LINT_DOCS_FSCK_MSGIDS = .build/lint-docs/fsck-msgids.ok +$(LINT_DOCS_FSCK_MSGIDS): lint-fsck-msgids.perl +$(LINT_DOCS_FSCK_MSGIDS): ../fsck.h fsck-msgids.txt + $(call mkdir_p_parent_template) + $(QUIET_GEN)$(PERL_PATH) lint-fsck-msgids.perl \ + ../fsck.h fsck-msgids.txt $@ + +lint-docs-fsck-msgids: $(LINT_DOCS_FSCK_MSGIDS) + ## Lint: list of targets above .PHONY: lint-docs +lint-docs: lint-docs-fsck-msgids lint-docs: lint-docs-gitlink lint-docs: lint-docs-man-end-blurb lint-docs: lint-docs-man-section-order diff --git a/Documentation/lint-fsck-msgids.perl b/Documentation/lint-fsck-msgids.perl new file mode 100755 index 0000000000..1233ffe815 --- /dev/null +++ b/Documentation/lint-fsck-msgids.perl @@ -0,0 +1,70 @@ +#!/usr/bin/perl + +my ($fsck_h, $fsck_msgids_txt, $okfile) = @ARGV; + +my (%in_fsck_h, $fh, $bad); + +open($fh, "<", "$fsck_h") or die; +while (<$fh>) { + if (/^\s+FUNC\(([0-9A-Z_]+), ([A-Z]+)\)/) { + my ($name, $severity) = ($1, $2); + my ($first) = 1; + $name = join('', + map { + y/A-Z/a-z/; + if (!$first) { + s/^(.)/uc($1)/e; + } else { + $first = 0; + } + $_; + } + split(/_/, $name)); + $in_fsck_h{$name} = $severity; + } +} +close($fh); + +open($fh, "<", "$fsck_msgids_txt") or die; +my ($previous, $current); +while (<$fh>) { + if (!defined $current) { + if (/^\`([a-zA-Z0-9]*)\`::/) { + $current = $1; + if ((defined $previous) && + ($current le $previous)) { + print STDERR "$previous >= $current in doc\n"; + $bad = 1; + } + } + } elsif (/^\s+\(([A-Z]+)\) /) { + my ($level) = $1; + if (!exists $in_fsck_h{$current}) { + print STDERR "$current does not exist in fsck.h\n"; + $bad = 1; + } elsif ($in_fsck_h{$current} eq "") { + print STDERR "$current defined twice\n"; + $bad = 1; + } elsif ($in_fsck_h{$current} ne $level) { + print STDERR "$current severity $level != $in_fsck_h{$current}\n"; + $bad = 1; + } + $previous = $current; + $in_fsck_h{$current} = ""; # mark as seen. + undef $current; + } +} +close($fh); + +for my $key (keys %in_fsck_h) { + if ($in_fsck_h{$key} ne "") { + print STDERR "$key not explained in doc.\n"; + $bad = 1; + } +} + +die if ($bad); + +open($fh, ">", "$okfile"); +print $fh "good\n"; +close($fh);
During the initial development of the fsck-msgids.txt feature, it has become apparent that it is very much error prone to make sure the description in the documentation file are sorted and correctly match what is in the fsck.h header file. Add a quick-and-dirty Perl script and doc-lint target to sanity check that the fsck-msgids.txt is consistent with the error type list in the fsck.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/Makefile | 11 +++++ Documentation/lint-fsck-msgids.perl | 70 +++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+) create mode 100755 Documentation/lint-fsck-msgids.perl