diff mbox series

[v5,1/3] pager: include stdint.h because uintmax_t is used

Message ID 20240222175033.1489723-2-calvinwan@google.com (mailing list archive)
State New, archived
Headers show
Series [v5,1/3] pager: include stdint.h because uintmax_t is used | expand

Commit Message

Calvin Wan Feb. 22, 2024, 5:50 p.m. UTC
From: Jonathan Tan <jonathantanmy@google.com>

pager.h uses uintmax_t but does not include stdint.h. Therefore, add
this include statement.

This was discovered when writing a stub pager.c file.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 pager.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Junio C Hamano Feb. 22, 2024, 9:43 p.m. UTC | #1
Calvin Wan <calvinwan@google.com> writes:

> From: Jonathan Tan <jonathantanmy@google.com>
>
> pager.h uses uintmax_t but does not include stdint.h. Therefore, add
> this include statement.
>
> This was discovered when writing a stub pager.c file.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  pager.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/pager.h b/pager.h
> index b77433026d..015bca95e3 100644
> --- a/pager.h
> +++ b/pager.h
> @@ -1,6 +1,8 @@
>  #ifndef PAGER_H
>  #define PAGER_H
>  
> +#include <stdint.h>
> +
>  struct child_process;
>  
>  const char *git_pager(int stdout_is_tty);

This is not going in a sensible direction from our portability
standard's point of view.

The reason why we do not include these system headers directly to
our source files, and instead make it a rule to include
<git-compat-util.h> as the first header instead, is exactly because
there are curiosities in various platforms that Git wants to run on
which system include headers give us the declarations for types and
functions we rely on, in what order they must be included, and after
what feature macros (the ones that give adjustment to what the
system headers do, like _POSIX_C_SOURCE) are defined, etc.

Given that in <git-compat-util.h>, inclusion of <stdint.h> is
conditional behind some #ifdef's, it does not look like a sensible
change.  It is not very likely for <inttypes.h> and <stdint.h> to
declare uintmax_t in an incompatible way, but on a platform where
<git-compat-util.h> decides to include <inttypes.h> and use its
definition of what uintmax_t is, we should follow the same choice
and be consistent.

If there is a feature macro that affects sizes of the integer on a
platform, this patch will break it even more badly.  Perhaps there
is a platform whose C-library header requires you to define a
feature macro to use 64-bit, and we may define that feature macro
in <git-compat-util.h> before including either <inttypes.h> or
<stdint.h>, but by including <stdint.h> directly like the above
patch does, only this file and the sources that include only this
file, refusing to include <git-compat-util.h> as everybody in the
Git source tree should, will end up using different notion of what
the integral type with maximum width is from everybody else.

What this patch _wants_ to do is of course sympathizable, and we
have "make hdr-check" rule to enforce "a header must include the
headers that declare what it uses", except that it lets the header
files being tested assume that the things made available by
including <git-compat-util.h> are always available.

I think a sensible direction to go for libification purposes is to
also make sure that sources that are compiled into gitstdlib.a, and
the headers that makes what is in gitstdlib.a available, include the
<git-compat-util.h> header file.  There may be things declared in
the <git-compat-util.h> header that are _too_ specific to what ought
to be linked into the final "git" binary and unwanted by library
clients that are not "git" binary, and the right way to deal with it
is to split <git-compat-util.h> into two parts, i.e. what makes
system services available like its conditional inclusion of
<stdint.h> vs <inttypes.h>, definition of feature macros, order in
which the current <git-compat-util.h> includes system headers, etc.,
excluding those that made you write this patch to avoid assuming
that the client code would have included <git-compat-util.h> before
<pager.h>, would be the new <git-compat-core.h>.  And everything
else will remain in <git-compat-util.h>, which will include the
<git-compat-core.h>.  The <pager.h> header for library clients would
include <git-compat-core.h> instead, to still allow them to use the
same types as "git" binary itself that way.
Kyle Lippincott Feb. 24, 2024, 1:33 a.m. UTC | #2
On Thu, Feb 22, 2024 at 9:51 AM Calvin Wan <calvinwan@google.com> wrote:
>
> From: Jonathan Tan <jonathantanmy@google.com>
>
> pager.h uses uintmax_t but does not include stdint.h. Therefore, add
> this include statement.
>
> This was discovered when writing a stub pager.c file.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  pager.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/pager.h b/pager.h
> index b77433026d..015bca95e3 100644
> --- a/pager.h
> +++ b/pager.h
> @@ -1,6 +1,8 @@
>  #ifndef PAGER_H
>  #define PAGER_H
>
> +#include <stdint.h>
> +
>  struct child_process;
>
>  const char *git_pager(int stdout_is_tty);
> --
> 2.44.0.rc0.258.g7320e95886-goog
>
>

As far as I can tell, we need pager.h because of the `pager_in_use`
symbol. We need that symbol because of its use in date.c's
`parse_date_format`. I wonder if we can side step the `#include
<stdint.h>` concerns by splitting pager.h into pager.h and
pager_in_use.h, and have pager.h include pager_in_use.h instead. This
way pager.h (and its [unused] forward declarations) aren't part of
git-std-lib at all. I believe this was done for things like hex-ll.h,
so maybe we call it pager-ll.h. The goal being to (a) not need the
`#include <stdint.h>` because that's currently contentious, but also
(b) to identify the minimum set of symbols needed for the stubs
library, and not declare things that we don't have any intention of
actually providing / stubbing out.

I have some more thoughts on this, but they're much more appropriate
for the next patch in the series, so I'll leave them there.
Junio C Hamano Feb. 24, 2024, 7:58 a.m. UTC | #3
Kyle Lippincott <spectral@google.com> writes:

> As far as I can tell, we need pager.h because of the `pager_in_use`
> symbol. We need that symbol because of its use in date.c's
> `parse_date_format`. I wonder if we can side step the `#include
> <stdint.h>` concerns by splitting pager.h into pager.h and
> pager_in_use.h, and have pager.h include pager_in_use.h instead. This
> way pager.h (and its [unused] forward declarations) aren't part of
> git-std-lib at all.

Step back a bit.  Why do you even need to touch pager.h in the first
place?  Whatever thing that needs to define a mock version of
pager_in_use() would need to be able to find out that it is supposed
to take nothing as arguments and return an integer, and it can
include <pager.h> without modification.  Just like everybody else,
it has to include <git-compat-util.h> so that the system header that
gives us uintmax_t gets include appropriately in platform-dependent
way, no?  Why do we even need to butcher pager.h into two pieces in
the first place?

If you just include <git-compat-util.h> and then <pager.h> in
stubs/pager.c and you're OK, no?

If anything, as I already said, I think it is more reasonable to
tweak what <git-compat-util.h> does.  For example, it might be
unwieldy for gitstdlib's purpose that it unconditionally overrides
exit(), in which case it may be OK to introduce some conditional
compilation macros to omit that override when building stub code.
Or even split parts of the <git-compat-util.h> that both Git's use
and gitstdlib's purpose are OK with into a separate header file
<git-compat-core.h>, while leaving (hopefully a very minor) other
parts in <git-compat-util.h> *and* include <git-compat-core.h> in
<git-compat-util.h>.  That way, the sources of Git can continue
including <git-compat-util.h> while stub code can include
<git-compat-core.h>, and we will get system library symbols and
system defined types like uintmax_t in a consistent way, both in Git
itself and in gitstdlib.

But once such a sanitization is done on the compat-util header,
other "ordinary" header files that should not have to care about
portability (because they can assume that inclusion of
git-compat-util.h will give them access to system types and symbols
without having to worry about portability issues) and should not
have to include system header files themselves.

At least, that is the idea behind <git-compat-util.h> in the first
place.  Including any system headers directly in ordinary headers,
or splitting ordinary headers at an arbitrary and artificial
boundary, should not be necessary.  I'd have to say that such
changes are tail wagging the dog.

I do not have sufficient cycles to spend actually splitting
git-compat-util.h into two myself, but as an illustration, here is
how I would tweak cw/git-std-lib topic to make it build without
breaking our headers and including system header files directly.

 git-compat-util.h | 2 ++
 pager.h           | 2 --
 stubs/misc.c      | 4 ++--
 stubs/pager.c     | 1 +
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git c/git-compat-util.h w/git-compat-util.h
index 7c2a6538e5..981d526d18 100644
--- c/git-compat-util.h
+++ w/git-compat-util.h
@@ -1475,12 +1475,14 @@ static inline int is_missing_file_error(int errno_)
 
 int cmd_main(int, const char **);
 
+#ifndef _GIT_NO_OVERRIDE_EXIT
 /*
  * Intercept all calls to exit() and route them to trace2 to
  * optionally emit a message before calling the real exit().
  */
 int common_exit(const char *file, int line, int code);
 #define exit(code) exit(common_exit(__FILE__, __LINE__, (code)))
+#endif
 
 /*
  * You can mark a stack variable with UNLEAK(var) to avoid it being
diff --git c/pager.h w/pager.h
index 015bca95e3..b77433026d 100644
--- c/pager.h
+++ w/pager.h
@@ -1,8 +1,6 @@
 #ifndef PAGER_H
 #define PAGER_H
 
-#include <stdint.h>
-
 struct child_process;
 
 const char *git_pager(int stdout_is_tty);
diff --git c/stubs/misc.c w/stubs/misc.c
index 8d80581e39..d0379dcb69 100644
--- c/stubs/misc.c
+++ w/stubs/misc.c
@@ -1,5 +1,5 @@
-#include <assert.h>
-#include <stdlib.h>
+#define _GIT_NO_OVERRIDE_EXIT
+#include <git-compat-util.h>
 
 #ifndef NO_GETTEXT
 /*
diff --git c/stubs/pager.c w/stubs/pager.c
index 4f575cada7..04517aad4c 100644
--- c/stubs/pager.c
+++ w/stubs/pager.c
@@ -1,3 +1,4 @@
+#include <git-compat-util.h>
 #include "pager.h"
 
 int pager_in_use(void)
Kyle Lippincott Feb. 26, 2024, 6:59 p.m. UTC | #4
On Thu, Feb 22, 2024 at 1:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Calvin Wan <calvinwan@google.com> writes:
>
> > From: Jonathan Tan <jonathantanmy@google.com>
> >
> > pager.h uses uintmax_t but does not include stdint.h. Therefore, add
> > this include statement.
> >
> > This was discovered when writing a stub pager.c file.
> >
> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> > Signed-off-by: Calvin Wan <calvinwan@google.com>
> > ---
> >  pager.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/pager.h b/pager.h
> > index b77433026d..015bca95e3 100644
> > --- a/pager.h
> > +++ b/pager.h
> > @@ -1,6 +1,8 @@
> >  #ifndef PAGER_H
> >  #define PAGER_H
> >
> > +#include <stdint.h>
> > +
> >  struct child_process;
> >
> >  const char *git_pager(int stdout_is_tty);
>
> This is not going in a sensible direction from our portability
> standard's point of view.
>
> The reason why we do not include these system headers directly to
> our source files, and instead make it a rule to include
> <git-compat-util.h> as the first header instead, is exactly because
> there are curiosities in various platforms that Git wants to run on
> which system include headers give us the declarations for types and
> functions we rely on, in what order they must be included, and after
> what feature macros (the ones that give adjustment to what the
> system headers do, like _POSIX_C_SOURCE) are defined, etc.
>
> Given that in <git-compat-util.h>, inclusion of <stdint.h> is
> conditional behind some #ifdef's, it does not look like a sensible
> change.  It is not very likely for <inttypes.h> and <stdint.h> to
> declare uintmax_t in an incompatible way, but on a platform where
> <git-compat-util.h> decides to include <inttypes.h> and use its
> definition of what uintmax_t is, we should follow the same choice
> and be consistent.

Speaking of this specific header file inclusion and the oddities that
have gotten us to where we are:
- Originally, it seems that we were including stdint.h
- 17 years ago, to work around Solaris not providing stdint.h, but
providing inttypes.h, it was switched to being just inttypes.h, with
the explanation being that inttypes is a superset of stdint.
https://github.com/git/git/commit/007e2ba65902b484fc65a313e54594a009841740
- 13 years ago, to work around some platforms not having inttypes.h,
it was made conditional.
(https://github.com/git/git/commit/2844923d62a4c408bd59ddb2caacca4aa7eb86bc)

The condition added 13 years ago was, IMHO, backwards from what it
should have been. The intent is to have stdint.h included. We should
include stdint.h. I suspect that 17 years is enough time for that
platform to start conforming to what is now a 25 year old standard,
and I don't know how we can verify that and have this stop being a
haunted graveyard without just trying it and seeing if the build bots
or maintainers identify it as a continuing issue. If it's still an
issue (and only if), we should reintroduce a conditional, but invert
it: if there's no stdint.h, THEN include inttypes.h.

Oh, no, that doesn't work. I tried that, and the build bots told me
that doesn't work, because we're using things from inttypes.h (PRIuMAX
showed up several times in the errors, there may be others). This
makes me wonder how the platforms with no inttypes.h work at all. I
still think we should do something here because it's a 13-year-old
compatibility fix that "shouldn't" be needed anymore, and causes
confusion/concerns like this thread. Maybe just see if we can get away
with always including inttypes.h in git-compat-util.h, or maybe _both_
inttypes.h and stdint.h (in either order), just to be really obvious
that it's acceptable to include stdint.h.

>
> If there is a feature macro that affects sizes of the integer on a
> platform, this patch will break it even more badly.  Perhaps there
> is a platform whose C-library header requires you to define a
> feature macro to use 64-bit, and we may define that feature macro
> in <git-compat-util.h> before including either <inttypes.h> or
> <stdint.h>, but by including <stdint.h> directly like the above
> patch does, only this file and the sources that include only this
> file, refusing to include <git-compat-util.h> as everybody in the
> Git source tree should, will end up using different notion of what
> the integral type with maximum width is from everybody else.

I agree that for pager.h, something like the patch in your next email
would resolve that particular problem. The stub library is of
basically the same stature as git-std-lib: it's code that is provided
by the Git project, compiled by Makefile rules owned and maintained by
the Git project, and should conform to the Git coding standards. The
.c files in the stubs library should include git-compat-util.h,
there's basically no reason not to.

However, I believe that we'll need a good policy for what to do with
libified headers + sources in general. I can see many potential
categorizations of source; there's no need to formally define all of
them and assign files to each category, but the main categories are
basically:
1. files that have code that is an internal part of Git, one of the
helper binaries, or one of its tests, whether it's a library or not.
These should include git-compat-util.h at the top of the .c files like
they do today. The .h files for these translation units are also
considered "internal". These header files should assume that
git-compat-util.h has been included properly, and don't need to be
self-contained, because they're _only_ included by things in this
category.
2. files that have code that define the "library interface", probably
only the ones defining the library interface _as used by external
projects_. I think that we'll likely need to be principled about
defining these, and having them be as minimal and compatible as
possible.
3. code in external projects that directly uses libraries from the Git
project, and thus includes Git headers from category 2
4. the rest of the code in external projects (the code that does not
directly use libraries from the Git project)

A hypothetical git-compat-core.h being included at the top of the .c
files in category 2 is feasible, but needs to be carefully handled due
to potential symbol collision (which we're discussing in another
thread and I may have a possible solution for, at least on some
platforms). On the other hand, a git-compat-core.h being included at
the top of the .h files belonging to category 2 doesn't work, because
when these .h files are included by code in category 3, it's too late.

In this example, gnu-source-header.h below is a system header that
changes behavior depending on _GNU_SOURCE (effectively the same
concern as you were raising in the quoted paragraph):

external-project-category3.c:
#include <stdint.h>
#include <gnu-source-header.h>
#include <git/some-lib-interface.h>

git/some-lib-interface.h:
#include <git/git-compat-core.h>

We can't do anything in git/git-compat-core.h that relies on careful
inclusion order, requiring that various things are #defined prior to
the first inclusion of certain headers, etc. stdint.h and
gnu-source-header.h are already included, and so it's too late for us
to #define things that change their behavior, because that won't have
any effect. I don't think it's reasonable to expect the external
project to #include <git/git-compat-core.h> at the top of their files
that are in category3. It's definitely not reasonable to require the
external project to do that for all their files (category 3 and
category 4). It's slightly more reasonable to have them do some set of
#defines for their binaries and libraries, but still quite awkward and
potentially not feasible (for things like _FILE_OFFSET_BITS). This is
why I split it into 4 categories.

I believe that category 2 files need to be maximally compatible, both
to platforms [where we provide support for libraries, and I think this
probably will end up being a subset of all the platforms, especially
at first] and to the environment they're being #included in and
interacting with. So they need to be self-contained: they can't rely
on stdint.h having been included, but they also can't rely on it _not_
having been included already. The category 2 .h files need to be
minimal: just what we want in this external library interface, and
ideally nothing else. The category 2 .h and .c files need to be
compatible with common #defines being set OR not set. The point of a
category 2 .c file is to bridge the gap between the category 1 and
category 3 environments. This likely means that we need to be careful
about certain structs and typedefs defined by the system (vs. structs
defined by the category 2 headers themselves) being passed between the
different environments. For example, if we were to have a library
interface that included a `struct stat`, and the category 3 files
didn't have _FILE_OFFSET_BITS 64, but the category 1 files do? Instant
breakage.

This aspect of this discussion probably should happen on the next
patch, or in a separate thread :) But since I'm here, I'll summarize
these thoughts: basically, the next patch, imho, doesn't go far
enough, but is a very good first step that we can build on. We need to
define what belongs to the "external interface" of the various
libraries (category 2 above) and what is considered category 1.
pager.h is pretty obviously category 1. strbuf.h, abspath.h, etc? I'm
not sure. git-std-lib is weird, because it's so low level and there's
not really much "internal" code to this library. So maybe we declare
those as category 2. But I don't know how that will actually work in
practice. I'll try to find time to write up these thoughts on that
patch.

>
> What this patch _wants_ to do is of course sympathizable, and we
> have "make hdr-check" rule to enforce "a header must include the
> headers that declare what it uses", except that it lets the header
> files being tested assume that the things made available by
> including <git-compat-util.h> are always available.
>
> I think a sensible direction to go for libification purposes is to
> also make sure that sources that are compiled into gitstdlib.a, and
> the headers that makes what is in gitstdlib.a available, include the
> <git-compat-util.h> header file.  There may be things declared in
> the <git-compat-util.h> header that are _too_ specific to what ought
> to be linked into the final "git" binary and unwanted by library
> clients that are not "git" binary, and the right way to deal with it
> is to split <git-compat-util.h> into two parts, i.e. what makes
> system services available like its conditional inclusion of
> <stdint.h> vs <inttypes.h>, definition of feature macros, order in
> which the current <git-compat-util.h> includes system headers, etc.,






> excluding those that made you write this patch to avoid assuming
> that the client code would have included <git-compat-util.h> before
> <pager.h>, would be the new <git-compat-core.h>.  And everything
> else will remain in <git-compat-util.h>, which will include the
> <git-compat-core.h>.  The <pager.h> header for library clients would
> include <git-compat-core.h> instead, to still allow them to use the
> same types as "git" binary itself that way.
>
>
>
>
>
>
Junio C Hamano Feb. 27, 2024, 12:20 a.m. UTC | #5
Kyle Lippincott <spectral@google.com> writes:

> The condition added 13 years ago was, IMHO, backwards from what it
> should have been. The intent is to have stdint.h included. We should
> include stdint.h. I suspect that 17 years is enough time for that
> platform to start conforming to what is now a 25 year old standard,
> and I don't know how we can verify that and have this stop being a
> haunted graveyard without just trying it and seeing if the build bots
> or maintainers identify it as a continuing issue.

The nightmare of Solaris might be luckily behind us, but the world
does not only run Linux and GNU libc, and it is not just <stdint.h>
vs <inttypes.h>.  This is about general code hygiene.

> If it's still an
> issue (and only if), we should reintroduce a conditional, but invert
> it: if there's no stdint.h, THEN include inttypes.h.

But it would give the wrong order in general in the modern world,
where <inttypes.h> is supposed to include <stdint.h> and extends it.

We use inttypes.h by default because the C standard already talks
about it, and fall back to stdint.h when the platform lacks it.  But
what I suspect is that nobody compiles with NO_INTTYPES_H and we
would unknowingly (but not "unintentionally") start using the
extended types that are only available in <inttypes.h> but not in
<stdint.h> sometime in the future.  It might already have happened,
but I do not know.  I haven't compiled with NO_INTTYPES_H for some
time (to experiment), and I haven't met a platform that actually
requires NO_INTTYPES_H defined to build.  Once after such a change
is made without anybody being knowingly breaking some rare platform,
if nobody complains, we can just drop the support to allow us to
limit ourselves to <stdint.h>, but since we hear nobody complaining,
we should be OK with the current rule of including system header
files that is embodied in <git-compat-util.h> header file.

In any case, your sources should not include a standard library
header directly yourself, period.  Instead let <git-compat-util.h>
take care of the details of how we need to obtain what we need out
of the system on various platforms.

Thanks.
Kyle Lippincott Feb. 27, 2024, 12:56 a.m. UTC | #6
On Mon, Feb 26, 2024 at 4:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kyle Lippincott <spectral@google.com> writes:
>
> > The condition added 13 years ago was, IMHO, backwards from what it
> > should have been. The intent is to have stdint.h included. We should
> > include stdint.h. I suspect that 17 years is enough time for that
> > platform to start conforming to what is now a 25 year old standard,
> > and I don't know how we can verify that and have this stop being a
> > haunted graveyard without just trying it and seeing if the build bots
> > or maintainers identify it as a continuing issue.
>
> The nightmare of Solaris might be luckily behind us, but the world
> does not only run Linux and GNU libc, and it is not just <stdint.h>
> vs <inttypes.h>.  This is about general code hygiene.
>
> > If it's still an
> > issue (and only if), we should reintroduce a conditional, but invert
> > it: if there's no stdint.h, THEN include inttypes.h.
>
> But it would give the wrong order in general in the modern world,
> where <inttypes.h> is supposed to include <stdint.h> and extends it.
>
> We use inttypes.h by default because the C standard already talks
> about it, and fall back to stdint.h when the platform lacks it.  But
> what I suspect is that nobody compiles with NO_INTTYPES_H and we
> would unknowingly (but not "unintentionally") start using the
> extended types that are only available in <inttypes.h> but not in
> <stdint.h> sometime in the future.  It might already have happened,

It has. We use PRIuMAX, which is from inttypes.h. I think it's only
"accidentally" working if anyone uses NO_INTTYPES_H. I changed my
stance halfway through this investigation in my previous email, I
apologize for not going back and editing it to make it clear at the
beginning that I'd done so. My current stance is that
<git-compat-util.h> should be either (a) including only inttypes.h
(since it includes stdint.h), or (b) including both inttypes.h and
stdint.h (in either order), just to demonstrate that we can.

> but I do not know.  I haven't compiled with NO_INTTYPES_H for some
> time (to experiment), and I haven't met a platform that actually
> requires NO_INTTYPES_H defined to build.  Once after such a change
> is made without anybody being knowingly breaking some rare platform,
> if nobody complains, we can just drop the support to allow us to
> limit ourselves to <stdint.h>, but since we hear nobody complaining,
> we should be OK with the current rule of including system header
> files that is embodied in <git-compat-util.h> header file.
>
> In any case, your sources should not include a standard library
> header directly yourself, period.  Instead let <git-compat-util.h>
> take care of the details of how we need to obtain what we need out
> of the system on various platforms.

I disagree with this statement. We _can't_ use a magic compatibility
header file in the library interfaces, for the reasons I outlined
further below in my previous message. For those headers, the ones that
might be included by code that's not under the Git project's control,
they need to be self-contained, minimal, and maximally compatible.

>
> Thanks.
Junio C Hamano Feb. 27, 2024, 2:45 a.m. UTC | #7
Kyle Lippincott <spectral@google.com> writes:

>> In any case, your sources should not include a standard library
>> header directly yourself, period.  Instead let <git-compat-util.h>
>> take care of the details of how we need to obtain what we need out
>> of the system on various platforms.
>
> I disagree with this statement. We _can't_ use a magic compatibility
> header file in the library interfaces, for the reasons I outlined
> further below in my previous message. For those headers, the ones that
> might be included by code that's not under the Git project's control,
> they need to be self-contained, minimal, and maximally compatible.

Note that I am not talking about your random outside program that
happens to link with gitstdlib.a; it would want to include a header
file <gitstdlib.h> that comes with the library.

Earlier I suggested that you may want to take a subset of
<git-compat-util.h>, because <git-compat-util.h> may have a lot more
than what is minimally necessary to allow our sources to be
insulated from details of platform dependence.  You can think of
that subset as a good starting point to build the <gitstdlib.h>
header file to be given to the library customers.

But the sources that go to the library, as gitstdlib.a is supposed
to serve as a subset of gitlib.a to our internal codebase when
building the git binary, should still follow our header inclusion
rules.

Because we would want to make sure that the sources that are made
into gitstdlib.a, the sources to the rest of libgit.a, and the
sources to the rest of git, all agree on what system features we ask
from the system, feature macros that must be defined to certain
values before we include system library files (like _XOPEN_SOURCE
and _FILE_OFFSET_BITS) must be defined consistently across all of
these three pieces.  One way to do so may be to ensure that the
definition of them would be migrated to <gitstdlib.h> when we
separate a subset out of <git-compat-util.h> to it (and of course,
we make <git-compat-util.h> to include <gitstdlib.h> so that it
would be still sufficient for our in-tree users to include the
<git-compat-util.h>)

<gitstdlib.h> may have to expose an API function that uses some
extended types only available by including system header files,
e.g. some function may return ssize_t as its value or take an off_t
value as its argument.

If our header should include system headers to make these types
available to our definitions is probably open to discussion.  It is
harder to do so portably, unless your world is limited to POSIX.1
and ISO C, than making it the responsibility of library users.  

But if the platform headers and libraries support feature macros
that allows you to tweak these sizes (e.g. the size of off_t may be
controlled by setting the _FILE_OFFSET_BITS to an appropriate
value), it may be irresponsible to leave that to the library users,
as they MUST make sure to define such feature macros exactly the
same way as we define for our code, which currently is done in
<git-compat-util.h>, before they include their system headers to
obtain off_t so that they can use <gitstdlib.h>.

So the rules for library clients (random outside programs that
happen to link with gitstdlib.a) may not be that they must include
<git-compat-util.h> as the first thing, but they probably still have
to include <gitstdlib.h> fairly early before including any of their
system headers, I would suspect, unless they are willing to accept
such responsibility fully to ensure they compile the same way as the
gitstdlib library, I would think.
Jeff King Feb. 27, 2024, 8:45 a.m. UTC | #8
On Mon, Feb 26, 2024 at 04:56:28PM -0800, Kyle Lippincott wrote:

> > We use inttypes.h by default because the C standard already talks
> > about it, and fall back to stdint.h when the platform lacks it.  But
> > what I suspect is that nobody compiles with NO_INTTYPES_H and we
> > would unknowingly (but not "unintentionally") start using the
> > extended types that are only available in <inttypes.h> but not in
> > <stdint.h> sometime in the future.  It might already have happened,
> 
> It has. We use PRIuMAX, which is from inttypes.h.

Is it always, though? That's what C99 says, but if you have a system
that does not have inttypes.h in the first place, but does have
stdint.h, it seems possible that it provides conversion macros elsewhere
(either via stdint.h, or possibly just as part of stdio.h).

So it might be that things have been horribly broken on NO_INTTYPES_H
systems for a while, and nobody is checking. But I don't think you can
really say so without looking at such a system.

And looking at config.mak.uname, it looks like Windows is such a system.
Does it really have inttypes.h and it is getting included from somewhere
else, making format conversion macros work? Or does it provide those
macros elsewhere, and really needs stdint? It does look like
compat/mingw.h includes it, but I think we wouldn't use that for msvc
builds.

> I think it's only
> "accidentally" working if anyone uses NO_INTTYPES_H. I changed my
> stance halfway through this investigation in my previous email, I
> apologize for not going back and editing it to make it clear at the
> beginning that I'd done so. My current stance is that
> <git-compat-util.h> should be either (a) including only inttypes.h
> (since it includes stdint.h), or (b) including both inttypes.h and
> stdint.h (in either order), just to demonstrate that we can.

It is good to clean up old conditionals if they are no longer
applicable, as they are a burden to reason about later (as this
discussion shows). But I am not sure about your "just to demonstrate we
can". It is good to try it out, but it looks like there is a non-zero
chance that MSVC on Windows might break. It is probably better to try
building there or looping in folks who can, rather than just making a
change and seeing if anybody screams.

I think the "win+VS" test in the GitHub Actions CI job might cover this
case. It is not run by default (because it was considered be mostly
redundant with the mingw build), but it shouldn't be too hard to enable
it for a one-off test.

-Peff
Jeff King Feb. 27, 2024, 9:05 a.m. UTC | #9
On Tue, Feb 27, 2024 at 03:45:29AM -0500, Jeff King wrote:

> It is good to clean up old conditionals if they are no longer
> applicable, as they are a burden to reason about later (as this
> discussion shows). But I am not sure about your "just to demonstrate we
> can". It is good to try it out, but it looks like there is a non-zero
> chance that MSVC on Windows might break. It is probably better to try
> building there or looping in folks who can, rather than just making a
> change and seeing if anybody screams.
> 
> I think the "win+VS" test in the GitHub Actions CI job might cover this
> case. It is not run by default (because it was considered be mostly
> redundant with the mingw build), but it shouldn't be too hard to enable
> it for a one-off test.

Here's a successful run with the NO_INTTYPES_H line removed:

  https://github.com/peff/git/actions/runs/8062063219

Blaming the code around the inttypes.h reference in compat/mingw.h
turned up 0ef60afdd4 (MSVC: use shipped headers instead of fallback
definitions, 2016-03-30), which claims that inttypes.h was added to
VS2013.

That sounds old-ish in 2024, but I don't know how old is normal in the
Windows world.

All of which to me says that cleaning this up is something that should
involve Windows folks, who can make the judgement for their platform.

-Peff
Kyle Lippincott Feb. 27, 2024, 8:10 p.m. UTC | #10
On Tue, Feb 27, 2024 at 12:45 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Feb 26, 2024 at 04:56:28PM -0800, Kyle Lippincott wrote:
>
> > > We use inttypes.h by default because the C standard already talks
> > > about it, and fall back to stdint.h when the platform lacks it.  But
> > > what I suspect is that nobody compiles with NO_INTTYPES_H and we
> > > would unknowingly (but not "unintentionally") start using the
> > > extended types that are only available in <inttypes.h> but not in
> > > <stdint.h> sometime in the future.  It might already have happened,
> >
> > It has. We use PRIuMAX, which is from inttypes.h.
>
> Is it always, though? That's what C99 says, but if you have a system
> that does not have inttypes.h in the first place, but does have
> stdint.h, it seems possible that it provides conversion macros elsewhere
> (either via stdint.h, or possibly just as part of stdio.h).

It's of course possible that on some platforms, stdio.h or stdint.h
defines these types (or includes inttypes.h internally, which defines
these types). However, I think that to be "correct" and for a compiler
to claim it supports C99 (and the compiler _must_ claim that because
of the guard in <git-compat-util.h>), inttypes.h must exist, and it
must cause these symbols to appear. If there are platforms that are
claiming to be C99 and inttypes.h doesn't exist or doesn't provide the
symbols it should, I don't think we should try to support them - they
can maintain platform-specific patches for whatever not-actually-C99
language the platform supports. Basically what git for windows is
already doing (presumably for other reasons), as far as I can tell.

>
> So it might be that things have been horribly broken on NO_INTTYPES_H
> systems for a while, and nobody is checking. But I don't think you can
> really say so without looking at such a system.
>
> And looking at config.mak.uname, it looks like Windows is such a system.
> Does it really have inttypes.h and it is getting included from somewhere
> else, making format conversion macros work? Or does it provide those
> macros elsewhere, and really needs stdint? It does look like
> compat/mingw.h includes it, but I think we wouldn't use that for msvc
> builds.
>
> > I think it's only
> > "accidentally" working if anyone uses NO_INTTYPES_H. I changed my
> > stance halfway through this investigation in my previous email, I
> > apologize for not going back and editing it to make it clear at the
> > beginning that I'd done so. My current stance is that
> > <git-compat-util.h> should be either (a) including only inttypes.h
> > (since it includes stdint.h), or (b) including both inttypes.h and
> > stdint.h (in either order), just to demonstrate that we can.
>
> It is good to clean up old conditionals if they are no longer
> applicable, as they are a burden to reason about later (as this
> discussion shows). But I am not sure about your "just to demonstrate we
> can".

Yeah, I'm also not convinced the "just to demonstrate we can" has much
value. I was trying to get ahead of future discussions where we claim
it's important to never include stdint.h (because people remember this
discussion and how contentious it was) and think it might misbehave,
and instead just include it in <git-compat-util.h> to prove it
_doesn't_ misbehave, and thus start to allow usage in self-contained
headers when necessary.

> It is good to try it out, but it looks like there is a non-zero
> chance that MSVC on Windows might break. It is probably better to try
> building there or looping in folks who can, rather than just making a
> change and seeing if anybody screams.

I think I miscommunicated here, or had too many assumptions about the
current state of things that I didn't actually verify. When I wrote
"and seeing if the build bots or maintainers identify it as a
continuing issue", I was assuming that we had build bots for all major
platforms (including windows, with however it gets built: mingw or VC
or whatever), and I included the "maintainers" part of it for the long
tail of esoteric platforms that we either don't know about, or can't
have automated builds on for whatever reason. I agree that making
changes that have a high likelihood of breaking supported platforms
(which gets back to that platform support thread that was started a
few weeks ago) should not be done lightly, and it's not reasonable to
make the change and wait for maintainers of these "supported
platforms" to complain. I was relying on the build bots covering the
"supported platforms" and stopping me from even sending such a patch
to the mailing list :)

>
> I think the "win+VS" test in the GitHub Actions CI job might cover this
> case. It is not run by default (because it was considered be mostly
> redundant with the mingw build), but it shouldn't be too hard to enable
> it for a one-off test.
>
> -Peff
Kyle Lippincott Feb. 27, 2024, 10:29 p.m. UTC | #11
On Mon, Feb 26, 2024 at 6:45 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kyle Lippincott <spectral@google.com> writes:
>
> >> In any case, your sources should not include a standard library
> >> header directly yourself, period.  Instead let <git-compat-util.h>
> >> take care of the details of how we need to obtain what we need out
> >> of the system on various platforms.
> >
> > I disagree with this statement. We _can't_ use a magic compatibility
> > header file in the library interfaces, for the reasons I outlined
> > further below in my previous message. For those headers, the ones that
> > might be included by code that's not under the Git project's control,
> > they need to be self-contained, minimal, and maximally compatible.
>
> Note that I am not talking about your random outside program that
> happens to link with gitstdlib.a; it would want to include a header
> file <gitstdlib.h> that comes with the library.

I agree with this.

>
> Earlier I suggested that you may want to take a subset of
> <git-compat-util.h>, because <git-compat-util.h> may have a lot more
> than what is minimally necessary to allow our sources to be
> insulated from details of platform dependence.  You can think of
> that subset as a good starting point to build the <gitstdlib.h>
> header file to be given to the library customers.
>
> But the sources that go to the library, as gitstdlib.a is supposed
> to serve as a subset of gitlib.a to our internal codebase when
> building the git binary, should still follow our header inclusion
> rules.

If I'm understanding this correctly, I agree with it. The .c files
still include <git-compat-util.h>, and don't change. The internal-only
.h files (ones that a pre-built-library consumer doesn't need to even
have in the filesystem) still assume that <git-compat-util.h> was
included, and don't change. <pager.h> falls into this category.

>
> Because we would want to make sure that the sources that are made
> into gitstdlib.a, the sources to the rest of libgit.a, and the
> sources to the rest of git, all agree on what system features we ask
> from the system, feature macros that must be defined to certain
> values before we include system library files (like _XOPEN_SOURCE
> and _FILE_OFFSET_BITS) must be defined consistently across all of
> these three pieces.  One way to do so may be to ensure that the
> definition of them would be migrated to <gitstdlib.h> when we
> separate a subset out of <git-compat-util.h> to it (and of course,
> we make <git-compat-util.h> to include <gitstdlib.h> so that it
> would be still sufficient for our in-tree users to include the
> <git-compat-util.h>)
>
> <gitstdlib.h> may have to expose an API function that uses some
> extended types only available by including system header files,
> e.g. some function may return ssize_t as its value or take an off_t
> value as its argument.

I agree that these types will be necessary (specifically ssize_t and
int##_t, but less so off_t) in the "external" (used by projects other
than Git) library interfaces.

>
> If our header should include system headers to make these types
> available to our definitions is probably open to discussion.  It is
> harder to do so portably, unless your world is limited to POSIX.1
> and ISO C, than making it the responsibility of library users.

I think I'm probably missing the nuance here, and may be making this
discussion much harder because of it. My understanding is that Git is
using C99; is that different from ISO C? There's something at the top
of <git-compat-util.h> that enforces that we're using C99. Therefore,
I'm assuming that any compiler that claims to be C99 and passes that
check at the top of <git-compat-util.h> will support inttypes.h,
stdint.h, stdbool.h, and other files defined by the C99 standard to
include types that we need in our .h files are able to be included
without reservation. To flip it around: any compiler/platform that's
missing inttypes.h, or is missing stdint.h, or raises errors if both
are included, or requires other headers to be included before them
_isn't a C99 compiler_, and _isn't supported_. I'm picking on these
files because I think they will be necessary for the external library
interfaces. I'm intentionally ignoring any file not mentioned in the
C99 standard, because those are platform specific. I acknowledge that
there may be some functionality in these files that's only enabled if
certain #defines are set. Our external interfaces should strive to not
use that functionality, and only do so if we are able to test for this
functionality and refuse to compile if it's not available. I have an
example with uintmax_t below.

>
> But if the platform headers and libraries support feature macros
> that allows you to tweak these sizes (e.g. the size of off_t may be
> controlled by setting the _FILE_OFFSET_BITS to an appropriate
> value), it may be irresponsible to leave that to the library users,
> as they MUST make sure to define such feature macros exactly the
> same way as we define for our code, which currently is done in
> <git-compat-util.h>, before they include their system headers to
> obtain off_t so that they can use <gitstdlib.h>.

I think the only viable solution to this is to not use these types
that depend on #defines in the interface available to non-git
projects. We can't set _FILE_OFFSET_BITS in the library's external
(used by non-Git projects) interface header, as there's a high
likelihood that it's either too late (external project #included
something that relies on _FILE_OFFSET_BITS already), or, if not, we
create the "off_t is a different size" problem for their code.

This means that we can't use off_t in these external interface headers
(and in the .c files that support them, if any). We can't use `struct
stat`. We likely need to limit ourselves to just the typedefs from
stdint.h, and probably will need some additional checks that enforce
that we have the types and sizes we expect (ex: I could imagine that
some platforms define uintmax_t as 32-bit. or 128-bit. Either we can't
use it in these external interfaces, or we have to enforce somehow
that the simplest file we can imagine (#include <stdint.h>) gets a
definition of uintmax_t that is the exact same as the one we'd get if
we included <git-compat-util.h>). The external interface headers don't
need to be as platform-compatible as the rest of the git code base,
because not every platform is going to be a supported target for using
the library in non-git projects, especially at first. The external
interface headers _do_ need to be as tolerant and well behaved as
possible when being included by external projects, which I'm asserting
means they need to be self-contained and minimal. If that means these
external interfaces don't get to use off_t at all, so be it. If it
means they can only be included if sizeof(off_t) == 64, and we have a
way of enforcing that at compile time, that's fine with me too. But we
can't #define _FILE_OFFSET_BITS ourselves in this external interface
to get that behavior, because it just doesn't work.

I'm making some assumptions here. I'm assuming that the git binary
uses a different interface to a hypothetical libgitobjstore.a than an
external project would (i.e. that there'd be some
git-obj-store-interface.h that gets included by non-Git projects, but
not by git itself). Is git-std-lib an obvious counterexample to this
assumption? Yes and no. No one (besides Git itself) is going to
include libgitstdlib.a in their project any time soon, so there's no
real "external interface" to define right now. Eventually, having
git-std-lib types in the hypothetical git-obj-store-interface.h _may_
happen, or it may not. I don't know.

...

But I think we're in agreement that pager.h isn't part of
git-std-lib's (currently undefined/non-existent) external interface,
and so doesn't need to be self-contained, and this patch should
probably be dropped?
>
> So the rules for library clients (random outside programs that
> happen to link with gitstdlib.a) may not be that they must include
> <git-compat-util.h> as the first thing, but they probably still have
> to include <gitstdlib.h> fairly early before including any of their
> system headers, I would suspect, unless they are willing to accept
> such responsibility fully to ensure they compile the same way as the
> gitstdlib library, I would think.
>
>
>
Junio C Hamano Feb. 27, 2024, 11:25 p.m. UTC | #12
Kyle Lippincott <spectral@google.com> writes:

> of <git-compat-util.h> that enforces that we're using C99. Therefore,
> I'm assuming that any compiler that claims to be C99 and passes that
> check at the top of <git-compat-util.h> will support inttypes.h,
> stdint.h, stdbool.h, and other files defined by the C99 standard to
> include types that we need in our .h files are able to be included
> without reservation.

We at the git project is much more conservative than trusting the
compilers and take their "claim" to support a standard at the face
value, though ;-).  As our CodingGuidelines say, we honor real world
constraints more than what's written on paper as "standard", and
would ...

> To flip it around: any compiler/platform that's
> missing inttypes.h, or is missing stdint.h, or raises errors if both
> are included, or requires other headers to be included before them
> _isn't a C99 compiler_, and _isn't supported_.

... refrain from taking such a position as much as possible.

> I think the only viable solution to this is to not use these types
> that depend on #defines in the interface available to non-git
> projects.

OK.  Now we have that behind us, can we start outlining what kind of
things _are_ exported out in the library to the outside world?  The
only example of the C source file that is a client of the library we
have is t/helper/test-stdlib.c but it includes 

    <abspath.h>
    <hex-ll.h>
    <parse.h>
    <strbuf.h>
    <string-list.h>

after including <git-compat-util.h>.  Are the services offered by
these five header files the initial set of the library, minimally
viable demonstration?  Has somebody already analyzed and enumerated
what kind of system definitions we need to support the interface
these five header files offer?

Once we know that, perhaps the next task would be to create a
<git-absolute-minimum-portability-requirement.h> header by taking a
subset of <git-compat-util.h>, and have test-stdlib.c include that
instead of <git-compat-util.h>.  <git-compat-util.h> will of course
include that header to replace the parts it lost to the new header.

It does *not* make it a requirement for client programs to include
the <git-absolute-minimum-portability-requirement.h>, though.  They
can include the system headers in the way they like, as long as they
do not let them define symbols in ways contradicting to what our
headers expect.  <git-absolute-minimum-portability-requirement.h> is
merely a way for us to tell those who write these client programs
what the system symbols we rely on are.

So, that's one (or two? first to analyse and enumerate, second to
split a new header file out of compat-util) actionable task, I
guess.
diff mbox series

Patch

diff --git a/pager.h b/pager.h
index b77433026d..015bca95e3 100644
--- a/pager.h
+++ b/pager.h
@@ -1,6 +1,8 @@ 
 #ifndef PAGER_H
 #define PAGER_H
 
+#include <stdint.h>
+
 struct child_process;
 
 const char *git_pager(int stdout_is_tty);