Message ID | 20240222175033.1489723-2-calvinwan@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,1/3] pager: include stdint.h because uintmax_t is used | expand |
Calvin Wan <calvinwan@google.com> writes: > From: Jonathan Tan <jonathantanmy@google.com> > > pager.h uses uintmax_t but does not include stdint.h. Therefore, add > this include statement. > > This was discovered when writing a stub pager.c file. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > pager.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/pager.h b/pager.h > index b77433026d..015bca95e3 100644 > --- a/pager.h > +++ b/pager.h > @@ -1,6 +1,8 @@ > #ifndef PAGER_H > #define PAGER_H > > +#include <stdint.h> > + > struct child_process; > > const char *git_pager(int stdout_is_tty); This is not going in a sensible direction from our portability standard's point of view. The reason why we do not include these system headers directly to our source files, and instead make it a rule to include <git-compat-util.h> as the first header instead, is exactly because there are curiosities in various platforms that Git wants to run on which system include headers give us the declarations for types and functions we rely on, in what order they must be included, and after what feature macros (the ones that give adjustment to what the system headers do, like _POSIX_C_SOURCE) are defined, etc. Given that in <git-compat-util.h>, inclusion of <stdint.h> is conditional behind some #ifdef's, it does not look like a sensible change. It is not very likely for <inttypes.h> and <stdint.h> to declare uintmax_t in an incompatible way, but on a platform where <git-compat-util.h> decides to include <inttypes.h> and use its definition of what uintmax_t is, we should follow the same choice and be consistent. If there is a feature macro that affects sizes of the integer on a platform, this patch will break it even more badly. Perhaps there is a platform whose C-library header requires you to define a feature macro to use 64-bit, and we may define that feature macro in <git-compat-util.h> before including either <inttypes.h> or <stdint.h>, but by including <stdint.h> directly like the above patch does, only this file and the sources that include only this file, refusing to include <git-compat-util.h> as everybody in the Git source tree should, will end up using different notion of what the integral type with maximum width is from everybody else. What this patch _wants_ to do is of course sympathizable, and we have "make hdr-check" rule to enforce "a header must include the headers that declare what it uses", except that it lets the header files being tested assume that the things made available by including <git-compat-util.h> are always available. I think a sensible direction to go for libification purposes is to also make sure that sources that are compiled into gitstdlib.a, and the headers that makes what is in gitstdlib.a available, include the <git-compat-util.h> header file. There may be things declared in the <git-compat-util.h> header that are _too_ specific to what ought to be linked into the final "git" binary and unwanted by library clients that are not "git" binary, and the right way to deal with it is to split <git-compat-util.h> into two parts, i.e. what makes system services available like its conditional inclusion of <stdint.h> vs <inttypes.h>, definition of feature macros, order in which the current <git-compat-util.h> includes system headers, etc., excluding those that made you write this patch to avoid assuming that the client code would have included <git-compat-util.h> before <pager.h>, would be the new <git-compat-core.h>. And everything else will remain in <git-compat-util.h>, which will include the <git-compat-core.h>. The <pager.h> header for library clients would include <git-compat-core.h> instead, to still allow them to use the same types as "git" binary itself that way.
On Thu, Feb 22, 2024 at 9:51 AM Calvin Wan <calvinwan@google.com> wrote: > > From: Jonathan Tan <jonathantanmy@google.com> > > pager.h uses uintmax_t but does not include stdint.h. Therefore, add > this include statement. > > This was discovered when writing a stub pager.c file. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > pager.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/pager.h b/pager.h > index b77433026d..015bca95e3 100644 > --- a/pager.h > +++ b/pager.h > @@ -1,6 +1,8 @@ > #ifndef PAGER_H > #define PAGER_H > > +#include <stdint.h> > + > struct child_process; > > const char *git_pager(int stdout_is_tty); > -- > 2.44.0.rc0.258.g7320e95886-goog > > As far as I can tell, we need pager.h because of the `pager_in_use` symbol. We need that symbol because of its use in date.c's `parse_date_format`. I wonder if we can side step the `#include <stdint.h>` concerns by splitting pager.h into pager.h and pager_in_use.h, and have pager.h include pager_in_use.h instead. This way pager.h (and its [unused] forward declarations) aren't part of git-std-lib at all. I believe this was done for things like hex-ll.h, so maybe we call it pager-ll.h. The goal being to (a) not need the `#include <stdint.h>` because that's currently contentious, but also (b) to identify the minimum set of symbols needed for the stubs library, and not declare things that we don't have any intention of actually providing / stubbing out. I have some more thoughts on this, but they're much more appropriate for the next patch in the series, so I'll leave them there.
Kyle Lippincott <spectral@google.com> writes: > As far as I can tell, we need pager.h because of the `pager_in_use` > symbol. We need that symbol because of its use in date.c's > `parse_date_format`. I wonder if we can side step the `#include > <stdint.h>` concerns by splitting pager.h into pager.h and > pager_in_use.h, and have pager.h include pager_in_use.h instead. This > way pager.h (and its [unused] forward declarations) aren't part of > git-std-lib at all. Step back a bit. Why do you even need to touch pager.h in the first place? Whatever thing that needs to define a mock version of pager_in_use() would need to be able to find out that it is supposed to take nothing as arguments and return an integer, and it can include <pager.h> without modification. Just like everybody else, it has to include <git-compat-util.h> so that the system header that gives us uintmax_t gets include appropriately in platform-dependent way, no? Why do we even need to butcher pager.h into two pieces in the first place? If you just include <git-compat-util.h> and then <pager.h> in stubs/pager.c and you're OK, no? If anything, as I already said, I think it is more reasonable to tweak what <git-compat-util.h> does. For example, it might be unwieldy for gitstdlib's purpose that it unconditionally overrides exit(), in which case it may be OK to introduce some conditional compilation macros to omit that override when building stub code. Or even split parts of the <git-compat-util.h> that both Git's use and gitstdlib's purpose are OK with into a separate header file <git-compat-core.h>, while leaving (hopefully a very minor) other parts in <git-compat-util.h> *and* include <git-compat-core.h> in <git-compat-util.h>. That way, the sources of Git can continue including <git-compat-util.h> while stub code can include <git-compat-core.h>, and we will get system library symbols and system defined types like uintmax_t in a consistent way, both in Git itself and in gitstdlib. But once such a sanitization is done on the compat-util header, other "ordinary" header files that should not have to care about portability (because they can assume that inclusion of git-compat-util.h will give them access to system types and symbols without having to worry about portability issues) and should not have to include system header files themselves. At least, that is the idea behind <git-compat-util.h> in the first place. Including any system headers directly in ordinary headers, or splitting ordinary headers at an arbitrary and artificial boundary, should not be necessary. I'd have to say that such changes are tail wagging the dog. I do not have sufficient cycles to spend actually splitting git-compat-util.h into two myself, but as an illustration, here is how I would tweak cw/git-std-lib topic to make it build without breaking our headers and including system header files directly. git-compat-util.h | 2 ++ pager.h | 2 -- stubs/misc.c | 4 ++-- stubs/pager.c | 1 + 4 files changed, 5 insertions(+), 4 deletions(-) diff --git c/git-compat-util.h w/git-compat-util.h index 7c2a6538e5..981d526d18 100644 --- c/git-compat-util.h +++ w/git-compat-util.h @@ -1475,12 +1475,14 @@ static inline int is_missing_file_error(int errno_) int cmd_main(int, const char **); +#ifndef _GIT_NO_OVERRIDE_EXIT /* * Intercept all calls to exit() and route them to trace2 to * optionally emit a message before calling the real exit(). */ int common_exit(const char *file, int line, int code); #define exit(code) exit(common_exit(__FILE__, __LINE__, (code))) +#endif /* * You can mark a stack variable with UNLEAK(var) to avoid it being diff --git c/pager.h w/pager.h index 015bca95e3..b77433026d 100644 --- c/pager.h +++ w/pager.h @@ -1,8 +1,6 @@ #ifndef PAGER_H #define PAGER_H -#include <stdint.h> - struct child_process; const char *git_pager(int stdout_is_tty); diff --git c/stubs/misc.c w/stubs/misc.c index 8d80581e39..d0379dcb69 100644 --- c/stubs/misc.c +++ w/stubs/misc.c @@ -1,5 +1,5 @@ -#include <assert.h> -#include <stdlib.h> +#define _GIT_NO_OVERRIDE_EXIT +#include <git-compat-util.h> #ifndef NO_GETTEXT /* diff --git c/stubs/pager.c w/stubs/pager.c index 4f575cada7..04517aad4c 100644 --- c/stubs/pager.c +++ w/stubs/pager.c @@ -1,3 +1,4 @@ +#include <git-compat-util.h> #include "pager.h" int pager_in_use(void)
On Thu, Feb 22, 2024 at 1:44 PM Junio C Hamano <gitster@pobox.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > From: Jonathan Tan <jonathantanmy@google.com> > > > > pager.h uses uintmax_t but does not include stdint.h. Therefore, add > > this include statement. > > > > This was discovered when writing a stub pager.c file. > > > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > > Signed-off-by: Calvin Wan <calvinwan@google.com> > > --- > > pager.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/pager.h b/pager.h > > index b77433026d..015bca95e3 100644 > > --- a/pager.h > > +++ b/pager.h > > @@ -1,6 +1,8 @@ > > #ifndef PAGER_H > > #define PAGER_H > > > > +#include <stdint.h> > > + > > struct child_process; > > > > const char *git_pager(int stdout_is_tty); > > This is not going in a sensible direction from our portability > standard's point of view. > > The reason why we do not include these system headers directly to > our source files, and instead make it a rule to include > <git-compat-util.h> as the first header instead, is exactly because > there are curiosities in various platforms that Git wants to run on > which system include headers give us the declarations for types and > functions we rely on, in what order they must be included, and after > what feature macros (the ones that give adjustment to what the > system headers do, like _POSIX_C_SOURCE) are defined, etc. > > Given that in <git-compat-util.h>, inclusion of <stdint.h> is > conditional behind some #ifdef's, it does not look like a sensible > change. It is not very likely for <inttypes.h> and <stdint.h> to > declare uintmax_t in an incompatible way, but on a platform where > <git-compat-util.h> decides to include <inttypes.h> and use its > definition of what uintmax_t is, we should follow the same choice > and be consistent. Speaking of this specific header file inclusion and the oddities that have gotten us to where we are: - Originally, it seems that we were including stdint.h - 17 years ago, to work around Solaris not providing stdint.h, but providing inttypes.h, it was switched to being just inttypes.h, with the explanation being that inttypes is a superset of stdint. https://github.com/git/git/commit/007e2ba65902b484fc65a313e54594a009841740 - 13 years ago, to work around some platforms not having inttypes.h, it was made conditional. (https://github.com/git/git/commit/2844923d62a4c408bd59ddb2caacca4aa7eb86bc) The condition added 13 years ago was, IMHO, backwards from what it should have been. The intent is to have stdint.h included. We should include stdint.h. I suspect that 17 years is enough time for that platform to start conforming to what is now a 25 year old standard, and I don't know how we can verify that and have this stop being a haunted graveyard without just trying it and seeing if the build bots or maintainers identify it as a continuing issue. If it's still an issue (and only if), we should reintroduce a conditional, but invert it: if there's no stdint.h, THEN include inttypes.h. Oh, no, that doesn't work. I tried that, and the build bots told me that doesn't work, because we're using things from inttypes.h (PRIuMAX showed up several times in the errors, there may be others). This makes me wonder how the platforms with no inttypes.h work at all. I still think we should do something here because it's a 13-year-old compatibility fix that "shouldn't" be needed anymore, and causes confusion/concerns like this thread. Maybe just see if we can get away with always including inttypes.h in git-compat-util.h, or maybe _both_ inttypes.h and stdint.h (in either order), just to be really obvious that it's acceptable to include stdint.h. > > If there is a feature macro that affects sizes of the integer on a > platform, this patch will break it even more badly. Perhaps there > is a platform whose C-library header requires you to define a > feature macro to use 64-bit, and we may define that feature macro > in <git-compat-util.h> before including either <inttypes.h> or > <stdint.h>, but by including <stdint.h> directly like the above > patch does, only this file and the sources that include only this > file, refusing to include <git-compat-util.h> as everybody in the > Git source tree should, will end up using different notion of what > the integral type with maximum width is from everybody else. I agree that for pager.h, something like the patch in your next email would resolve that particular problem. The stub library is of basically the same stature as git-std-lib: it's code that is provided by the Git project, compiled by Makefile rules owned and maintained by the Git project, and should conform to the Git coding standards. The .c files in the stubs library should include git-compat-util.h, there's basically no reason not to. However, I believe that we'll need a good policy for what to do with libified headers + sources in general. I can see many potential categorizations of source; there's no need to formally define all of them and assign files to each category, but the main categories are basically: 1. files that have code that is an internal part of Git, one of the helper binaries, or one of its tests, whether it's a library or not. These should include git-compat-util.h at the top of the .c files like they do today. The .h files for these translation units are also considered "internal". These header files should assume that git-compat-util.h has been included properly, and don't need to be self-contained, because they're _only_ included by things in this category. 2. files that have code that define the "library interface", probably only the ones defining the library interface _as used by external projects_. I think that we'll likely need to be principled about defining these, and having them be as minimal and compatible as possible. 3. code in external projects that directly uses libraries from the Git project, and thus includes Git headers from category 2 4. the rest of the code in external projects (the code that does not directly use libraries from the Git project) A hypothetical git-compat-core.h being included at the top of the .c files in category 2 is feasible, but needs to be carefully handled due to potential symbol collision (which we're discussing in another thread and I may have a possible solution for, at least on some platforms). On the other hand, a git-compat-core.h being included at the top of the .h files belonging to category 2 doesn't work, because when these .h files are included by code in category 3, it's too late. In this example, gnu-source-header.h below is a system header that changes behavior depending on _GNU_SOURCE (effectively the same concern as you were raising in the quoted paragraph): external-project-category3.c: #include <stdint.h> #include <gnu-source-header.h> #include <git/some-lib-interface.h> git/some-lib-interface.h: #include <git/git-compat-core.h> We can't do anything in git/git-compat-core.h that relies on careful inclusion order, requiring that various things are #defined prior to the first inclusion of certain headers, etc. stdint.h and gnu-source-header.h are already included, and so it's too late for us to #define things that change their behavior, because that won't have any effect. I don't think it's reasonable to expect the external project to #include <git/git-compat-core.h> at the top of their files that are in category3. It's definitely not reasonable to require the external project to do that for all their files (category 3 and category 4). It's slightly more reasonable to have them do some set of #defines for their binaries and libraries, but still quite awkward and potentially not feasible (for things like _FILE_OFFSET_BITS). This is why I split it into 4 categories. I believe that category 2 files need to be maximally compatible, both to platforms [where we provide support for libraries, and I think this probably will end up being a subset of all the platforms, especially at first] and to the environment they're being #included in and interacting with. So they need to be self-contained: they can't rely on stdint.h having been included, but they also can't rely on it _not_ having been included already. The category 2 .h files need to be minimal: just what we want in this external library interface, and ideally nothing else. The category 2 .h and .c files need to be compatible with common #defines being set OR not set. The point of a category 2 .c file is to bridge the gap between the category 1 and category 3 environments. This likely means that we need to be careful about certain structs and typedefs defined by the system (vs. structs defined by the category 2 headers themselves) being passed between the different environments. For example, if we were to have a library interface that included a `struct stat`, and the category 3 files didn't have _FILE_OFFSET_BITS 64, but the category 1 files do? Instant breakage. This aspect of this discussion probably should happen on the next patch, or in a separate thread :) But since I'm here, I'll summarize these thoughts: basically, the next patch, imho, doesn't go far enough, but is a very good first step that we can build on. We need to define what belongs to the "external interface" of the various libraries (category 2 above) and what is considered category 1. pager.h is pretty obviously category 1. strbuf.h, abspath.h, etc? I'm not sure. git-std-lib is weird, because it's so low level and there's not really much "internal" code to this library. So maybe we declare those as category 2. But I don't know how that will actually work in practice. I'll try to find time to write up these thoughts on that patch. > > What this patch _wants_ to do is of course sympathizable, and we > have "make hdr-check" rule to enforce "a header must include the > headers that declare what it uses", except that it lets the header > files being tested assume that the things made available by > including <git-compat-util.h> are always available. > > I think a sensible direction to go for libification purposes is to > also make sure that sources that are compiled into gitstdlib.a, and > the headers that makes what is in gitstdlib.a available, include the > <git-compat-util.h> header file. There may be things declared in > the <git-compat-util.h> header that are _too_ specific to what ought > to be linked into the final "git" binary and unwanted by library > clients that are not "git" binary, and the right way to deal with it > is to split <git-compat-util.h> into two parts, i.e. what makes > system services available like its conditional inclusion of > <stdint.h> vs <inttypes.h>, definition of feature macros, order in > which the current <git-compat-util.h> includes system headers, etc., > excluding those that made you write this patch to avoid assuming > that the client code would have included <git-compat-util.h> before > <pager.h>, would be the new <git-compat-core.h>. And everything > else will remain in <git-compat-util.h>, which will include the > <git-compat-core.h>. The <pager.h> header for library clients would > include <git-compat-core.h> instead, to still allow them to use the > same types as "git" binary itself that way. > > > > > >
Kyle Lippincott <spectral@google.com> writes: > The condition added 13 years ago was, IMHO, backwards from what it > should have been. The intent is to have stdint.h included. We should > include stdint.h. I suspect that 17 years is enough time for that > platform to start conforming to what is now a 25 year old standard, > and I don't know how we can verify that and have this stop being a > haunted graveyard without just trying it and seeing if the build bots > or maintainers identify it as a continuing issue. The nightmare of Solaris might be luckily behind us, but the world does not only run Linux and GNU libc, and it is not just <stdint.h> vs <inttypes.h>. This is about general code hygiene. > If it's still an > issue (and only if), we should reintroduce a conditional, but invert > it: if there's no stdint.h, THEN include inttypes.h. But it would give the wrong order in general in the modern world, where <inttypes.h> is supposed to include <stdint.h> and extends it. We use inttypes.h by default because the C standard already talks about it, and fall back to stdint.h when the platform lacks it. But what I suspect is that nobody compiles with NO_INTTYPES_H and we would unknowingly (but not "unintentionally") start using the extended types that are only available in <inttypes.h> but not in <stdint.h> sometime in the future. It might already have happened, but I do not know. I haven't compiled with NO_INTTYPES_H for some time (to experiment), and I haven't met a platform that actually requires NO_INTTYPES_H defined to build. Once after such a change is made without anybody being knowingly breaking some rare platform, if nobody complains, we can just drop the support to allow us to limit ourselves to <stdint.h>, but since we hear nobody complaining, we should be OK with the current rule of including system header files that is embodied in <git-compat-util.h> header file. In any case, your sources should not include a standard library header directly yourself, period. Instead let <git-compat-util.h> take care of the details of how we need to obtain what we need out of the system on various platforms. Thanks.
On Mon, Feb 26, 2024 at 4:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Kyle Lippincott <spectral@google.com> writes: > > > The condition added 13 years ago was, IMHO, backwards from what it > > should have been. The intent is to have stdint.h included. We should > > include stdint.h. I suspect that 17 years is enough time for that > > platform to start conforming to what is now a 25 year old standard, > > and I don't know how we can verify that and have this stop being a > > haunted graveyard without just trying it and seeing if the build bots > > or maintainers identify it as a continuing issue. > > The nightmare of Solaris might be luckily behind us, but the world > does not only run Linux and GNU libc, and it is not just <stdint.h> > vs <inttypes.h>. This is about general code hygiene. > > > If it's still an > > issue (and only if), we should reintroduce a conditional, but invert > > it: if there's no stdint.h, THEN include inttypes.h. > > But it would give the wrong order in general in the modern world, > where <inttypes.h> is supposed to include <stdint.h> and extends it. > > We use inttypes.h by default because the C standard already talks > about it, and fall back to stdint.h when the platform lacks it. But > what I suspect is that nobody compiles with NO_INTTYPES_H and we > would unknowingly (but not "unintentionally") start using the > extended types that are only available in <inttypes.h> but not in > <stdint.h> sometime in the future. It might already have happened, It has. We use PRIuMAX, which is from inttypes.h. I think it's only "accidentally" working if anyone uses NO_INTTYPES_H. I changed my stance halfway through this investigation in my previous email, I apologize for not going back and editing it to make it clear at the beginning that I'd done so. My current stance is that <git-compat-util.h> should be either (a) including only inttypes.h (since it includes stdint.h), or (b) including both inttypes.h and stdint.h (in either order), just to demonstrate that we can. > but I do not know. I haven't compiled with NO_INTTYPES_H for some > time (to experiment), and I haven't met a platform that actually > requires NO_INTTYPES_H defined to build. Once after such a change > is made without anybody being knowingly breaking some rare platform, > if nobody complains, we can just drop the support to allow us to > limit ourselves to <stdint.h>, but since we hear nobody complaining, > we should be OK with the current rule of including system header > files that is embodied in <git-compat-util.h> header file. > > In any case, your sources should not include a standard library > header directly yourself, period. Instead let <git-compat-util.h> > take care of the details of how we need to obtain what we need out > of the system on various platforms. I disagree with this statement. We _can't_ use a magic compatibility header file in the library interfaces, for the reasons I outlined further below in my previous message. For those headers, the ones that might be included by code that's not under the Git project's control, they need to be self-contained, minimal, and maximally compatible. > > Thanks.
Kyle Lippincott <spectral@google.com> writes: >> In any case, your sources should not include a standard library >> header directly yourself, period. Instead let <git-compat-util.h> >> take care of the details of how we need to obtain what we need out >> of the system on various platforms. > > I disagree with this statement. We _can't_ use a magic compatibility > header file in the library interfaces, for the reasons I outlined > further below in my previous message. For those headers, the ones that > might be included by code that's not under the Git project's control, > they need to be self-contained, minimal, and maximally compatible. Note that I am not talking about your random outside program that happens to link with gitstdlib.a; it would want to include a header file <gitstdlib.h> that comes with the library. Earlier I suggested that you may want to take a subset of <git-compat-util.h>, because <git-compat-util.h> may have a lot more than what is minimally necessary to allow our sources to be insulated from details of platform dependence. You can think of that subset as a good starting point to build the <gitstdlib.h> header file to be given to the library customers. But the sources that go to the library, as gitstdlib.a is supposed to serve as a subset of gitlib.a to our internal codebase when building the git binary, should still follow our header inclusion rules. Because we would want to make sure that the sources that are made into gitstdlib.a, the sources to the rest of libgit.a, and the sources to the rest of git, all agree on what system features we ask from the system, feature macros that must be defined to certain values before we include system library files (like _XOPEN_SOURCE and _FILE_OFFSET_BITS) must be defined consistently across all of these three pieces. One way to do so may be to ensure that the definition of them would be migrated to <gitstdlib.h> when we separate a subset out of <git-compat-util.h> to it (and of course, we make <git-compat-util.h> to include <gitstdlib.h> so that it would be still sufficient for our in-tree users to include the <git-compat-util.h>) <gitstdlib.h> may have to expose an API function that uses some extended types only available by including system header files, e.g. some function may return ssize_t as its value or take an off_t value as its argument. If our header should include system headers to make these types available to our definitions is probably open to discussion. It is harder to do so portably, unless your world is limited to POSIX.1 and ISO C, than making it the responsibility of library users. But if the platform headers and libraries support feature macros that allows you to tweak these sizes (e.g. the size of off_t may be controlled by setting the _FILE_OFFSET_BITS to an appropriate value), it may be irresponsible to leave that to the library users, as they MUST make sure to define such feature macros exactly the same way as we define for our code, which currently is done in <git-compat-util.h>, before they include their system headers to obtain off_t so that they can use <gitstdlib.h>. So the rules for library clients (random outside programs that happen to link with gitstdlib.a) may not be that they must include <git-compat-util.h> as the first thing, but they probably still have to include <gitstdlib.h> fairly early before including any of their system headers, I would suspect, unless they are willing to accept such responsibility fully to ensure they compile the same way as the gitstdlib library, I would think.
On Mon, Feb 26, 2024 at 04:56:28PM -0800, Kyle Lippincott wrote: > > We use inttypes.h by default because the C standard already talks > > about it, and fall back to stdint.h when the platform lacks it. But > > what I suspect is that nobody compiles with NO_INTTYPES_H and we > > would unknowingly (but not "unintentionally") start using the > > extended types that are only available in <inttypes.h> but not in > > <stdint.h> sometime in the future. It might already have happened, > > It has. We use PRIuMAX, which is from inttypes.h. Is it always, though? That's what C99 says, but if you have a system that does not have inttypes.h in the first place, but does have stdint.h, it seems possible that it provides conversion macros elsewhere (either via stdint.h, or possibly just as part of stdio.h). So it might be that things have been horribly broken on NO_INTTYPES_H systems for a while, and nobody is checking. But I don't think you can really say so without looking at such a system. And looking at config.mak.uname, it looks like Windows is such a system. Does it really have inttypes.h and it is getting included from somewhere else, making format conversion macros work? Or does it provide those macros elsewhere, and really needs stdint? It does look like compat/mingw.h includes it, but I think we wouldn't use that for msvc builds. > I think it's only > "accidentally" working if anyone uses NO_INTTYPES_H. I changed my > stance halfway through this investigation in my previous email, I > apologize for not going back and editing it to make it clear at the > beginning that I'd done so. My current stance is that > <git-compat-util.h> should be either (a) including only inttypes.h > (since it includes stdint.h), or (b) including both inttypes.h and > stdint.h (in either order), just to demonstrate that we can. It is good to clean up old conditionals if they are no longer applicable, as they are a burden to reason about later (as this discussion shows). But I am not sure about your "just to demonstrate we can". It is good to try it out, but it looks like there is a non-zero chance that MSVC on Windows might break. It is probably better to try building there or looping in folks who can, rather than just making a change and seeing if anybody screams. I think the "win+VS" test in the GitHub Actions CI job might cover this case. It is not run by default (because it was considered be mostly redundant with the mingw build), but it shouldn't be too hard to enable it for a one-off test. -Peff
On Tue, Feb 27, 2024 at 03:45:29AM -0500, Jeff King wrote: > It is good to clean up old conditionals if they are no longer > applicable, as they are a burden to reason about later (as this > discussion shows). But I am not sure about your "just to demonstrate we > can". It is good to try it out, but it looks like there is a non-zero > chance that MSVC on Windows might break. It is probably better to try > building there or looping in folks who can, rather than just making a > change and seeing if anybody screams. > > I think the "win+VS" test in the GitHub Actions CI job might cover this > case. It is not run by default (because it was considered be mostly > redundant with the mingw build), but it shouldn't be too hard to enable > it for a one-off test. Here's a successful run with the NO_INTTYPES_H line removed: https://github.com/peff/git/actions/runs/8062063219 Blaming the code around the inttypes.h reference in compat/mingw.h turned up 0ef60afdd4 (MSVC: use shipped headers instead of fallback definitions, 2016-03-30), which claims that inttypes.h was added to VS2013. That sounds old-ish in 2024, but I don't know how old is normal in the Windows world. All of which to me says that cleaning this up is something that should involve Windows folks, who can make the judgement for their platform. -Peff
On Tue, Feb 27, 2024 at 12:45 AM Jeff King <peff@peff.net> wrote: > > On Mon, Feb 26, 2024 at 04:56:28PM -0800, Kyle Lippincott wrote: > > > > We use inttypes.h by default because the C standard already talks > > > about it, and fall back to stdint.h when the platform lacks it. But > > > what I suspect is that nobody compiles with NO_INTTYPES_H and we > > > would unknowingly (but not "unintentionally") start using the > > > extended types that are only available in <inttypes.h> but not in > > > <stdint.h> sometime in the future. It might already have happened, > > > > It has. We use PRIuMAX, which is from inttypes.h. > > Is it always, though? That's what C99 says, but if you have a system > that does not have inttypes.h in the first place, but does have > stdint.h, it seems possible that it provides conversion macros elsewhere > (either via stdint.h, or possibly just as part of stdio.h). It's of course possible that on some platforms, stdio.h or stdint.h defines these types (or includes inttypes.h internally, which defines these types). However, I think that to be "correct" and for a compiler to claim it supports C99 (and the compiler _must_ claim that because of the guard in <git-compat-util.h>), inttypes.h must exist, and it must cause these symbols to appear. If there are platforms that are claiming to be C99 and inttypes.h doesn't exist or doesn't provide the symbols it should, I don't think we should try to support them - they can maintain platform-specific patches for whatever not-actually-C99 language the platform supports. Basically what git for windows is already doing (presumably for other reasons), as far as I can tell. > > So it might be that things have been horribly broken on NO_INTTYPES_H > systems for a while, and nobody is checking. But I don't think you can > really say so without looking at such a system. > > And looking at config.mak.uname, it looks like Windows is such a system. > Does it really have inttypes.h and it is getting included from somewhere > else, making format conversion macros work? Or does it provide those > macros elsewhere, and really needs stdint? It does look like > compat/mingw.h includes it, but I think we wouldn't use that for msvc > builds. > > > I think it's only > > "accidentally" working if anyone uses NO_INTTYPES_H. I changed my > > stance halfway through this investigation in my previous email, I > > apologize for not going back and editing it to make it clear at the > > beginning that I'd done so. My current stance is that > > <git-compat-util.h> should be either (a) including only inttypes.h > > (since it includes stdint.h), or (b) including both inttypes.h and > > stdint.h (in either order), just to demonstrate that we can. > > It is good to clean up old conditionals if they are no longer > applicable, as they are a burden to reason about later (as this > discussion shows). But I am not sure about your "just to demonstrate we > can". Yeah, I'm also not convinced the "just to demonstrate we can" has much value. I was trying to get ahead of future discussions where we claim it's important to never include stdint.h (because people remember this discussion and how contentious it was) and think it might misbehave, and instead just include it in <git-compat-util.h> to prove it _doesn't_ misbehave, and thus start to allow usage in self-contained headers when necessary. > It is good to try it out, but it looks like there is a non-zero > chance that MSVC on Windows might break. It is probably better to try > building there or looping in folks who can, rather than just making a > change and seeing if anybody screams. I think I miscommunicated here, or had too many assumptions about the current state of things that I didn't actually verify. When I wrote "and seeing if the build bots or maintainers identify it as a continuing issue", I was assuming that we had build bots for all major platforms (including windows, with however it gets built: mingw or VC or whatever), and I included the "maintainers" part of it for the long tail of esoteric platforms that we either don't know about, or can't have automated builds on for whatever reason. I agree that making changes that have a high likelihood of breaking supported platforms (which gets back to that platform support thread that was started a few weeks ago) should not be done lightly, and it's not reasonable to make the change and wait for maintainers of these "supported platforms" to complain. I was relying on the build bots covering the "supported platforms" and stopping me from even sending such a patch to the mailing list :) > > I think the "win+VS" test in the GitHub Actions CI job might cover this > case. It is not run by default (because it was considered be mostly > redundant with the mingw build), but it shouldn't be too hard to enable > it for a one-off test. > > -Peff
On Mon, Feb 26, 2024 at 6:45 PM Junio C Hamano <gitster@pobox.com> wrote: > > Kyle Lippincott <spectral@google.com> writes: > > >> In any case, your sources should not include a standard library > >> header directly yourself, period. Instead let <git-compat-util.h> > >> take care of the details of how we need to obtain what we need out > >> of the system on various platforms. > > > > I disagree with this statement. We _can't_ use a magic compatibility > > header file in the library interfaces, for the reasons I outlined > > further below in my previous message. For those headers, the ones that > > might be included by code that's not under the Git project's control, > > they need to be self-contained, minimal, and maximally compatible. > > Note that I am not talking about your random outside program that > happens to link with gitstdlib.a; it would want to include a header > file <gitstdlib.h> that comes with the library. I agree with this. > > Earlier I suggested that you may want to take a subset of > <git-compat-util.h>, because <git-compat-util.h> may have a lot more > than what is minimally necessary to allow our sources to be > insulated from details of platform dependence. You can think of > that subset as a good starting point to build the <gitstdlib.h> > header file to be given to the library customers. > > But the sources that go to the library, as gitstdlib.a is supposed > to serve as a subset of gitlib.a to our internal codebase when > building the git binary, should still follow our header inclusion > rules. If I'm understanding this correctly, I agree with it. The .c files still include <git-compat-util.h>, and don't change. The internal-only .h files (ones that a pre-built-library consumer doesn't need to even have in the filesystem) still assume that <git-compat-util.h> was included, and don't change. <pager.h> falls into this category. > > Because we would want to make sure that the sources that are made > into gitstdlib.a, the sources to the rest of libgit.a, and the > sources to the rest of git, all agree on what system features we ask > from the system, feature macros that must be defined to certain > values before we include system library files (like _XOPEN_SOURCE > and _FILE_OFFSET_BITS) must be defined consistently across all of > these three pieces. One way to do so may be to ensure that the > definition of them would be migrated to <gitstdlib.h> when we > separate a subset out of <git-compat-util.h> to it (and of course, > we make <git-compat-util.h> to include <gitstdlib.h> so that it > would be still sufficient for our in-tree users to include the > <git-compat-util.h>) > > <gitstdlib.h> may have to expose an API function that uses some > extended types only available by including system header files, > e.g. some function may return ssize_t as its value or take an off_t > value as its argument. I agree that these types will be necessary (specifically ssize_t and int##_t, but less so off_t) in the "external" (used by projects other than Git) library interfaces. > > If our header should include system headers to make these types > available to our definitions is probably open to discussion. It is > harder to do so portably, unless your world is limited to POSIX.1 > and ISO C, than making it the responsibility of library users. I think I'm probably missing the nuance here, and may be making this discussion much harder because of it. My understanding is that Git is using C99; is that different from ISO C? There's something at the top of <git-compat-util.h> that enforces that we're using C99. Therefore, I'm assuming that any compiler that claims to be C99 and passes that check at the top of <git-compat-util.h> will support inttypes.h, stdint.h, stdbool.h, and other files defined by the C99 standard to include types that we need in our .h files are able to be included without reservation. To flip it around: any compiler/platform that's missing inttypes.h, or is missing stdint.h, or raises errors if both are included, or requires other headers to be included before them _isn't a C99 compiler_, and _isn't supported_. I'm picking on these files because I think they will be necessary for the external library interfaces. I'm intentionally ignoring any file not mentioned in the C99 standard, because those are platform specific. I acknowledge that there may be some functionality in these files that's only enabled if certain #defines are set. Our external interfaces should strive to not use that functionality, and only do so if we are able to test for this functionality and refuse to compile if it's not available. I have an example with uintmax_t below. > > But if the platform headers and libraries support feature macros > that allows you to tweak these sizes (e.g. the size of off_t may be > controlled by setting the _FILE_OFFSET_BITS to an appropriate > value), it may be irresponsible to leave that to the library users, > as they MUST make sure to define such feature macros exactly the > same way as we define for our code, which currently is done in > <git-compat-util.h>, before they include their system headers to > obtain off_t so that they can use <gitstdlib.h>. I think the only viable solution to this is to not use these types that depend on #defines in the interface available to non-git projects. We can't set _FILE_OFFSET_BITS in the library's external (used by non-Git projects) interface header, as there's a high likelihood that it's either too late (external project #included something that relies on _FILE_OFFSET_BITS already), or, if not, we create the "off_t is a different size" problem for their code. This means that we can't use off_t in these external interface headers (and in the .c files that support them, if any). We can't use `struct stat`. We likely need to limit ourselves to just the typedefs from stdint.h, and probably will need some additional checks that enforce that we have the types and sizes we expect (ex: I could imagine that some platforms define uintmax_t as 32-bit. or 128-bit. Either we can't use it in these external interfaces, or we have to enforce somehow that the simplest file we can imagine (#include <stdint.h>) gets a definition of uintmax_t that is the exact same as the one we'd get if we included <git-compat-util.h>). The external interface headers don't need to be as platform-compatible as the rest of the git code base, because not every platform is going to be a supported target for using the library in non-git projects, especially at first. The external interface headers _do_ need to be as tolerant and well behaved as possible when being included by external projects, which I'm asserting means they need to be self-contained and minimal. If that means these external interfaces don't get to use off_t at all, so be it. If it means they can only be included if sizeof(off_t) == 64, and we have a way of enforcing that at compile time, that's fine with me too. But we can't #define _FILE_OFFSET_BITS ourselves in this external interface to get that behavior, because it just doesn't work. I'm making some assumptions here. I'm assuming that the git binary uses a different interface to a hypothetical libgitobjstore.a than an external project would (i.e. that there'd be some git-obj-store-interface.h that gets included by non-Git projects, but not by git itself). Is git-std-lib an obvious counterexample to this assumption? Yes and no. No one (besides Git itself) is going to include libgitstdlib.a in their project any time soon, so there's no real "external interface" to define right now. Eventually, having git-std-lib types in the hypothetical git-obj-store-interface.h _may_ happen, or it may not. I don't know. ... But I think we're in agreement that pager.h isn't part of git-std-lib's (currently undefined/non-existent) external interface, and so doesn't need to be self-contained, and this patch should probably be dropped? > > So the rules for library clients (random outside programs that > happen to link with gitstdlib.a) may not be that they must include > <git-compat-util.h> as the first thing, but they probably still have > to include <gitstdlib.h> fairly early before including any of their > system headers, I would suspect, unless they are willing to accept > such responsibility fully to ensure they compile the same way as the > gitstdlib library, I would think. > > >
Kyle Lippincott <spectral@google.com> writes: > of <git-compat-util.h> that enforces that we're using C99. Therefore, > I'm assuming that any compiler that claims to be C99 and passes that > check at the top of <git-compat-util.h> will support inttypes.h, > stdint.h, stdbool.h, and other files defined by the C99 standard to > include types that we need in our .h files are able to be included > without reservation. We at the git project is much more conservative than trusting the compilers and take their "claim" to support a standard at the face value, though ;-). As our CodingGuidelines say, we honor real world constraints more than what's written on paper as "standard", and would ... > To flip it around: any compiler/platform that's > missing inttypes.h, or is missing stdint.h, or raises errors if both > are included, or requires other headers to be included before them > _isn't a C99 compiler_, and _isn't supported_. ... refrain from taking such a position as much as possible. > I think the only viable solution to this is to not use these types > that depend on #defines in the interface available to non-git > projects. OK. Now we have that behind us, can we start outlining what kind of things _are_ exported out in the library to the outside world? The only example of the C source file that is a client of the library we have is t/helper/test-stdlib.c but it includes <abspath.h> <hex-ll.h> <parse.h> <strbuf.h> <string-list.h> after including <git-compat-util.h>. Are the services offered by these five header files the initial set of the library, minimally viable demonstration? Has somebody already analyzed and enumerated what kind of system definitions we need to support the interface these five header files offer? Once we know that, perhaps the next task would be to create a <git-absolute-minimum-portability-requirement.h> header by taking a subset of <git-compat-util.h>, and have test-stdlib.c include that instead of <git-compat-util.h>. <git-compat-util.h> will of course include that header to replace the parts it lost to the new header. It does *not* make it a requirement for client programs to include the <git-absolute-minimum-portability-requirement.h>, though. They can include the system headers in the way they like, as long as they do not let them define symbols in ways contradicting to what our headers expect. <git-absolute-minimum-portability-requirement.h> is merely a way for us to tell those who write these client programs what the system symbols we rely on are. So, that's one (or two? first to analyse and enumerate, second to split a new header file out of compat-util) actionable task, I guess.
diff --git a/pager.h b/pager.h index b77433026d..015bca95e3 100644 --- a/pager.h +++ b/pager.h @@ -1,6 +1,8 @@ #ifndef PAGER_H #define PAGER_H +#include <stdint.h> + struct child_process; const char *git_pager(int stdout_is_tty);