diff mbox series

[v3,07/15] sign-compare: 32-bit support

Message ID 20241205-pks-sign-compare-v3-7-e211ee089717@pks.im (mailing list archive)
State Superseded
Headers show
Series Start compiling with `-Wsign-compare` | expand

Commit Message

Patrick Steinhardt Dec. 5, 2024, 9:36 a.m. UTC
From: Junio C Hamano <gitster@pobox.com>

On 32-bit platforms, ssize_t may be "int" while size_t may be
"unsigned int".  At times we compare the number of bytes we read
stored in a ssize_t variable with "unsigned int", but that is done
after we check that we did not get an error return (which is
negative---and that is the whole reason why we used ssize_t and not
size_t), so these comparisons are safe.

But compilers may not realize that.  Cast these to size_t to work
around the false positives.  On platforms with size_t/ssize_t wider
than a normal int, this won't be an issue.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 csum-file.c | 3 +--
 pkt-line.c  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

Comments

Jeff King Dec. 5, 2024, 7:34 p.m. UTC | #1
On Thu, Dec 05, 2024 at 10:36:29AM +0100, Patrick Steinhardt wrote:

> @@ -24,7 +23,7 @@ static void verify_buffer_or_die(struct hashfile *f,
>  
>  	if (ret < 0)
>  		die_errno("%s: sha1 file read error", f->name);
> -	if (ret != count)
> +	if ((size_t)ret != (size_t)count)
>  		die("%s: sha1 file truncated", f->name);

You really only need the cast on the left-hand side here. "count" is
already an unsigned value (and will get promoted as necessary on a
system where "unsigned int" is smaller than "size_t").

It's probably not hurting too much, but my philosophy is that we should
do as few casts as strictly necessary. Casts are a blunt instrument for
telling the compiler we know what we are doing, and can cover up issues
(in this case a false positive, but imagine "count" was switched to
"int").

IMHO "count" should probably be a size_t here anyway, since we are
dealing with a buffer size. If you look at the call stack, it is based
on hashfile.buffer, which we'd expect to be small. But it is initialized
from a size_t, so really it is one errant hashfd_internal() from being a
truncation bug. That's not a mistake I expect to be likely, but I think
we are better off in general making code as obviously/trivially correct
as possible.

I think truncation is getting out of scope for your series, though, so
probably not worth doing right at this moment.

> diff --git a/pkt-line.c b/pkt-line.c
> index 90ea2b6974b1d0957cfdc5e2f9a2c30720723f12..f48b558ad23dd99f334d2d60e954ce9a83ac6114 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -363,7 +363,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
>  	}
>  
>  	/* And complain if we didn't get enough bytes to satisfy the read. */
> -	if (ret != size) {
> +	if ((size_t)ret != (size_t)size) {
>  		if (options & PACKET_READ_GENTLE_ON_EOF)
>  			return -1;

Likewise here, "size" is already unsigned.


I also wondered if there was a safer solution than a bare cast here.
Both of these are OK because the lines immediately before them checked
for the negative value, but there's nothing at the compiler level to
enforce that.

I guess a solution that uses the type system would be akin to Option
from Rust, et al. A helper that checks for negative values and also
promotes to an unsigned type, like:

  ssize_t ret = read_in_full(fd, buf, count);
  size_t bytes_read;

  if (!signed_to_unsigned(ret, &bytes_read))
	die_errno(...); /* error */
  if (bytes_read != count)
        ...

I don't know if there's a more ergonomic way that ditches the extra
variable. Or if there are enough cases like this to merit having a
helper.

-Peff
Patrick Steinhardt Dec. 6, 2024, 8:44 a.m. UTC | #2
On Thu, Dec 05, 2024 at 02:34:39PM -0500, Jeff King wrote:
> On Thu, Dec 05, 2024 at 10:36:29AM +0100, Patrick Steinhardt wrote:
> 
> > @@ -24,7 +23,7 @@ static void verify_buffer_or_die(struct hashfile *f,
> >  
> >  	if (ret < 0)
> >  		die_errno("%s: sha1 file read error", f->name);
> > -	if (ret != count)
> > +	if ((size_t)ret != (size_t)count)
> >  		die("%s: sha1 file truncated", f->name);
> 
> You really only need the cast on the left-hand side here. "count" is
> already an unsigned value (and will get promoted as necessary on a
> system where "unsigned int" is smaller than "size_t").
> 
> It's probably not hurting too much, but my philosophy is that we should
> do as few casts as strictly necessary. Casts are a blunt instrument for
> telling the compiler we know what we are doing, and can cover up issues
> (in this case a false positive, but imagine "count" was switched to
> "int").

Fair, will adapt.

> IMHO "count" should probably be a size_t here anyway, since we are
> dealing with a buffer size. If you look at the call stack, it is based
> on hashfile.buffer, which we'd expect to be small. But it is initialized
> from a size_t, so really it is one errant hashfd_internal() from being a
> truncation bug. That's not a mistake I expect to be likely, but I think
> we are better off in general making code as obviously/trivially correct
> as possible.

Agreed, it's also something I've been pushing for when doing reviews.

> I think truncation is getting out of scope for your series, though, so
> probably not worth doing right at this moment.

Agreed, as well.

> > diff --git a/pkt-line.c b/pkt-line.c
> > index 90ea2b6974b1d0957cfdc5e2f9a2c30720723f12..f48b558ad23dd99f334d2d60e954ce9a83ac6114 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -363,7 +363,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
> >  	}
> >  
> >  	/* And complain if we didn't get enough bytes to satisfy the read. */
> > -	if (ret != size) {
> > +	if ((size_t)ret != (size_t)size) {
> >  		if (options & PACKET_READ_GENTLE_ON_EOF)
> >  			return -1;
> 
> Likewise here, "size" is already unsigned.
> 
> 
> I also wondered if there was a safer solution than a bare cast here.
> Both of these are OK because the lines immediately before them checked
> for the negative value, but there's nothing at the compiler level to
> enforce that.
> 
> I guess a solution that uses the type system would be akin to Option
> from Rust, et al. A helper that checks for negative values and also
> promotes to an unsigned type, like:
> 
>   ssize_t ret = read_in_full(fd, buf, count);
>   size_t bytes_read;
> 
>   if (!signed_to_unsigned(ret, &bytes_read))
> 	die_errno(...); /* error */
>   if (bytes_read != count)
>         ...

The function is kind of curious anyway. It returns an `int` that has
been assigned the `ssize_t`, which may overflow. Callers don't care for
the number of bytes read in the first place though, so we can just adapt
the function to return an error code, only. I'll do that.

> I don't know if there's a more ergonomic way that ditches the extra
> variable. Or if there are enough cases like this to merit having a
> helper.

We already have `cast_size_t_to_int()`, `cast_size_t_to_long()` and
`cast_size_t_to_uint32_t()`, all of which cause us to die in case the
cast needs to truncate. I think we can easily extend this mechanism
going forward.

Patrick
diff mbox series

Patch

diff --git a/csum-file.c b/csum-file.c
index c14bacc7f9e5f56fcdb06a3abc7ac9babc45041a..c8ec7b73e640c00b895c1b3ba92f052012a680e0 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -9,7 +9,6 @@ 
  */
 
 #define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
 
 #include "git-compat-util.h"
 #include "progress.h"
@@ -24,7 +23,7 @@  static void verify_buffer_or_die(struct hashfile *f,
 
 	if (ret < 0)
 		die_errno("%s: sha1 file read error", f->name);
-	if (ret != count)
+	if ((size_t)ret != (size_t)count)
 		die("%s: sha1 file truncated", f->name);
 	if (memcmp(buf, f->check_buffer, count))
 		die("sha1 file '%s' validation error", f->name);
diff --git a/pkt-line.c b/pkt-line.c
index 90ea2b6974b1d0957cfdc5e2f9a2c30720723f12..f48b558ad23dd99f334d2d60e954ce9a83ac6114 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -363,7 +363,7 @@  static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 	}
 
 	/* And complain if we didn't get enough bytes to satisfy the read. */
-	if (ret != size) {
+	if ((size_t)ret != (size_t)size) {
 		if (options & PACKET_READ_GENTLE_ON_EOF)
 			return -1;