diff mbox series

[v2] date: detect underflow when parsing dates with positive timezone offset

Message ID pull.1726.v2.git.git.1717369608923.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] date: detect underflow when parsing dates with positive timezone offset | expand

Commit Message

darcy June 2, 2024, 11:06 p.m. UTC
From: darcy <acednes@gmail.com>

Overriding the date of a commit to be close to "1970-01-01 00:00:00"
with a large enough timezone for the equivelant GMT time to be before
the epoch is considered valid by `parse_date_basic`.

This leads to an integer underflow in the commit timestamp, which is not
caught by `git-commit`, but will cause other services to fail, such as
`git-fsck`, which reports "badDateOverflow: invalid author/committer
line - date causes integer overflow".

Instead check the timezone offset and fail if the resulting time comes
before the epoch, "1970-01-01T00:00:00Z", when parsing.

Signed-off-by: Darcy Burke <acednes@gmail.com>
---
    fix: prevent date underflow when using positive timezone offset
    
    cc: Patrick Steinhardt ps@pks.im cc: Phillip Wood
    phillip.wood123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1726%2Fdxrcy%2Fmaster-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1726/dxrcy/master-v2
Pull-Request: https://github.com/git/git/pull/1726

Range-diff vs v1:

 1:  4542d984aab ! 1:  db508b2f533 fix: prevent date underflow when using positive timezone offset
     @@ Metadata
      Author: darcy <acednes@gmail.com>
      
       ## Commit message ##
     -    fix: prevent date underflow when using positive timezone offset
     +    date: detect underflow when parsing dates with positive timezone offset
      
     -    Overriding the date of a commit to be `1970-01-01` with a large enough
     -    timezone for the equivalent GMT time to before 1970 is no longer
     -    accepted.
     +    Overriding the date of a commit to be close to "1970-01-01 00:00:00"
     +    with a large enough timezone for the equivelant GMT time to be before
     +    the epoch is considered valid by `parse_date_basic`.
      
     -    Example: `GIT_COMMITTER_DATE='1970-01-01T00:00:00+10' git commit` would
     -    previously be accepted, only to unexpectedly fail in other parts of the
     -    code, such as `git push`. The timestamp is now checked against postitive
     -    timezone values.
     +    This leads to an integer underflow in the commit timestamp, which is not
     +    caught by `git-commit`, but will cause other services to fail, such as
     +    `git-fsck`, which reports "badDateOverflow: invalid author/committer
     +    line - date causes integer overflow".
      
     -    Signed-off-by: darcy <acednes@gmail.com>
     +    Instead check the timezone offset and fail if the resulting time comes
     +    before the epoch, "1970-01-01T00:00:00Z", when parsing.
     +
     +    Signed-off-by: Darcy Burke <acednes@gmail.com>
      
       ## date.c ##
     -@@ date.c: int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset)
     - 			match = match_alpha(date, &tm, offset);
     - 		else if (isdigit(c))
     - 			match = match_digit(date, &tm, offset, &tm_gmt);
     --		else if ((c == '-' || c == '+') && isdigit(date[1]))
     -+		else if ((c == '-' || c == '+') && isdigit(date[1]) && tm.tm_hour != -1)
     - 			match = match_tz(date, offset);
     - 
     - 		if (!match) {
      @@ date.c: int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset)
       		}
       	}
       
      -	if (!tm_gmt)
      +	if (!tm_gmt) {
     -+		if (*offset > 0 && *offset * 60 > *timestamp) {
     ++		if (*offset > 0 && *offset * 60 > *timestamp)
      +			return -1;
     -+		}
       		*timestamp -= *offset * 60;
      +	}
      +
       	return 0; /* success */
       }
       
     +
     + ## t/t0006-date.sh ##
     +@@ t/t0006-date.sh: check_parse '2008-02-14 20:30:45 -:30' '2008-02-14 20:30:45 +0000'
     + check_parse '2008-02-14 20:30:45 -05:00' '2008-02-14 20:30:45 -0500'
     + check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 -0500' EST5
     + check_parse 'Thu, 7 Apr 2005 15:14:13 -0700' '2005-04-07 15:14:13 -0700'
     ++check_parse '1970-01-01 00:00:00' '1970-01-01 00:00:00 +0000'
     ++check_parse '1970-01-01 00:00:00 +00' '1970-01-01 00:00:00 +0000'
     ++check_parse '1970-01-01 00:00:00 Z' '1970-01-01 00:00:00 +0000'
     ++check_parse '1970-01-01 00:00:00 -01' '1970-01-01 00:00:00 -0100'
     ++check_parse '1970-01-01 00:00:00 +01' bad
     ++check_parse '1970-01-01 00:00:00 +11' bad
     ++check_parse '1970-01-01 00:59:59 +01' bad
     ++check_parse '1970-01-01 01:00:00 +01' '1970-01-01 01:00:00 +0100'
     ++check_parse '1970-01-01 01:00:00 +11' bad
     ++check_parse '1970-01-02 00:00:00 +11' '1970-01-02 00:00:00 +1100'
     ++check_parse '1969-12-31 23:59:59' bad
     ++check_parse '1969-12-31 23:59:59 +00' bad
     ++check_parse '1969-12-31 23:59:59 Z' bad
     ++check_parse '1969-12-31 23:59:59 +11' bad
     ++check_parse '1969-12-31 23:59:59 -11' bad
     + 
     + check_approxidate() {
     + 	echo "$1 -> $2 +0000" >expect


 date.c          |  6 +++++-
 t/t0006-date.sh | 15 +++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)


base-commit: 9eaef5822cd76bbeb53b6479ce0ddaad34ee2b14

Comments

Junio C Hamano June 3, 2024, 11:13 a.m. UTC | #1
"darcy via GitGitGadget" <gitgitgadget@gmail.com> writes:

>     fix: prevent date underflow when using positive timezone offset
>     
>     cc: Patrick Steinhardt ps@pks.im cc: Phillip Wood
>     phillip.wood123@gmail.com

You're expected to respond to review comments before you send in
updated patches.  I didn't see the review comments responded to in
the thread:

  https://lore.kernel.org/git/pull.1726.git.git.1716801427015.gitgitgadget@gmail.com/

Please see "A typical life cycle of a patch series" section of the
SubmittingPatches document.

  https://git.github.io/htmldocs/SubmittingPatches.html#patch-flow

Step #3 and Step #4 are distinct.
darcy June 3, 2024, 11:44 a.m. UTC | #2
> Without having a deep understanding of the code I don't quite see the
> connection between this change and the problem description. Is it
> necessary? If so, it might help to explain why it's needed in the commit
> message or in the code.

> ...I guess that
> checking for tm_hour is assuming that TZ offset should always come
> before the values necessary to compute the timestamp comes, but it
> smells like an unwarranted assumption and not explaining the change
> in the proposed log message is a bad sign.

This line has been reverted. The point was it would only parse the
timezone offset if it occurs after the time part of the date, but I
have realized that this is unrelated to the purpose of this change.

> You should also add at least one test.

Yep, thanks, added now to `t0006-date.sh`.

> Do we also want to check for overflows in the other direction (a large timestamp with a negative timezone offset)?

Is this something people want added? I am happy to implement this if
so, though it wasn't my original intention.

Issues with the commit message should also be resolved. Thank you
everyone for your patience :)

On 3/6/24 21:13, Junio C Hamano wrote:

> "darcy via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>      fix: prevent date underflow when using positive timezone offset
>>      
>>      cc: Patrick Steinhardt ps@pks.im cc: Phillip Wood
>>      phillip.wood123@gmail.com
> You're expected to respond to review comments before you send in
> updated patches.  I didn't see the review comments responded to in
> the thread:
>
>    https://lore.kernel.org/git/pull.1726.git.git.1716801427015.gitgitgadget@gmail.com/
>
> Please see "A typical life cycle of a patch series" section of the
> SubmittingPatches document.
>
>    https://git.github.io/htmldocs/SubmittingPatches.html#patch-flow
>
> Step #3 and Step #4 are distinct.
Phillip Wood June 3, 2024, 2:13 p.m. UTC | #3
Hi darcy

On 03/06/2024 12:44, darcy wrote:
>> Do we also want to check for overflows in the other direction (a large 
>> timestamp with a negative timezone offset)?
> 
> Is this something people want added? I am happy to implement this if
> so, though it wasn't my original intention.

I think if we're worried about the date overflowing in one direction it 
makes sense to fix overflows in the other direction at the same time 
especially as I think that the other case involves a signed integer 
overflow which is undefined behavior in C.

Best Wishes

Phillip
darcy June 4, 2024, 8:48 a.m. UTC | #4
On 4/6/24 00:13, Phillip Wood wrote:

> Hi darcy
> On 03/06/2024 12:44, darcy wrote:
>>> Do we also want to check for overflows in the other direction (a large
>>> timestamp with a negative timezone offset)?
>> Is this something people want added? I am happy to implement this if
>> so, though it wasn't my original intention.
> I think if we're worried about the date overflowing in one direction it
> makes sense to fix overflows in the other direction at the same time
> especially as I think that the other case involves a signed integer
> overflow which is undefined behavior in C.

That makes sense.

Though I am reading the `tm_to_time_t` code now and it only allows years
up to 2099.

> 	if (year < 0 || year > 129) /* algo only works for 1970-2099 */
>		return -1;

I can of course add a check here for dates close to the end of 2099, but
it seems that the bigger issue is that some day people will want to use
Git after 2099... Should I see if I can extend this range? I'm not sure
where that specific year comes from, it doesn't seem to be based on a
limit of the size of `time_t`, and the comment or git logs don't seem to
provide a reason.
Jeff King June 4, 2024, 9:33 a.m. UTC | #5
On Tue, Jun 04, 2024 at 06:48:59PM +1000, darcy wrote:

> Though I am reading the `tm_to_time_t` code now and it only allows years
> up to 2099.
> 
> > 	if (year < 0 || year > 129) /* algo only works for 1970-2099 */
> > 		return -1;
> 
> I can of course add a check here for dates close to the end of 2099, but
> it seems that the bigger issue is that some day people will want to use
> Git after 2099... Should I see if I can extend this range? I'm not sure
> where that specific year comes from, it doesn't seem to be based on a
> limit of the size of `time_t`, and the comment or git logs don't seem to
> provide a reason.

I think the 2099 limitation is there because 2100 is not a leap year,
and the code has over-simplified computation there (it counts every 4th
year as a leap year, which is not generally true for century boundaries,
except for 2000 because it is divisible by 400).

From the log message of ecee9d9e79 ([PATCH] Do date parsing by hand...,
2005-04-30), I think the 1970 limitation may be about avoiding
non-60-minute DST (though it is not clear to me how the function cares
about dst either way, as it operates on UTC). You can't represent times
before 1970 in our code base anyway, because our epoch timestamp is
unsigned.

So yes, we could do better than that function. I worked on a series to
handle negative timestamps a while ago, and as part of that, I had to
replace this function. You can see the full series at the
"jk/time-negative-wip" branch of https://github.com/peff/git, but I'll
reproduce the commit in question below for the list archive.

The resulting series worked fine for me, but IIRC there was some issue
on Windows. I think its gmtime() did not understand negative timestamps,
maybe? And we'd need the reverse equivalent of the patch below (which we
could take from the same source).

There are some overflow checks already in date.c (grep for "overflow")
that might cover us, but I won't be surprised if there's a case that's
missing (although we store the timestamps as uintmax_t, so you're
unlikely to overflow that in practice). Underflow is more likely, though
if we finally support negative timestamps, then it becomes a non-issue
(and we'd actually want to revert the patch you're working on).

If you're interested in picking up the negative timestamp work, I'd be
happy to discuss it. It's been on my todo list for a long time, but I
never quite get around to it.

Anyway, here's the tm_to_time_t() cleanup.

-- >8 --
date: make tm_to_time_t() less restricted

We've been using a hand-rolled tm_to_time_t() conversion since
ecee9d9e79 ([PATCH] Do date parsing by hand..., 2005-04-30). It works
fine for recent dates, but it doesn't work before 1970 nor after 2099.

Let's lift those restrictions by using a more robust algorithm. The
days_from_civil() code here is lifted from Howard Hinnant's date
library, which can be found at:

  https://github.com/HowardHinnant/date

The code is in the year_month_day::to_days() function, but I also relied
on his excellent explanation in:

  http://howardhinnant.github.io/date_algorithms.html

The library is under the MIT license, which is GPL compatible. I've also
left a comment in the source file marking the licensing (the code is
substantially similar to the original; I only expanded the variable
names to make it easier to follow).

Ideally we could just rely on a system function to do this, but there
isn't one. POSIX specifies mktime(), but it uses the local timezone.
There's a timegm() function available in glibc and some BSD systems, as
well as a similar function on Windows (with a different name). However,
it's likely that there are still going to be some platforms without it,
so we'd need a fallback anyway. We may as well just use that fallback
everywhere to make sure that Git behaves consistently (and get more
complete test coverage).

Note that some callers, like parse_date_basic(), rely on tm_to_time_t()
complaining when we have negative values in any field. So we'll continue
to detect and complain about this case (the alternative would be to turn
"month -1" into "month 11" of the previous year, and so on).

Signed-off-by: Jeff King <peff@peff.net>
---
 date.c | 58 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/date.c b/date.c
index 4337e18abd..df6c414c78 100644
--- a/date.c
+++ b/date.c
@@ -10,29 +10,55 @@
 #include "pager.h"
 #include "strbuf.h"
 
+/*
+ * Convert a year-month-day time into a number of days since 1970 (possibly
+ * negative). Note that "year" is the full year (not offset from 1900), and
+ * "month" is in the range 1-12 (unlike a "struct tm" 0-11).
+ *
+ * Working in time_t is overkill (since it usually represents seconds), but
+ * this makes sure we don't hit any integer range issues.
+ *
+ * This function is taken from https://github.com/HowardHinnant/date,
+ * which is released under an MIT license.
+ */
+static time_t days_from_civil(int year, int month, int day)
+{
+	time_t era;
+	unsigned year_of_era, day_of_year, day_of_era;
+
+	year -= month <= 2;
+	era = (year >= 0 ? year : year - 399) / 400;
+	year_of_era = year - era * 400;
+	day_of_year = (153 * (month + (month > 2 ? -3 : 9)) + 2) / 5 + day - 1;
+	day_of_era = year_of_era * 365 + year_of_era / 4 - year_of_era / 100
+			+ day_of_year;
+	return era * 146097 + (time_t)day_of_era - 719468;
+}
+
 /*
  * This is like mktime, but without normalization of tm_wday and tm_yday.
  * It also always operates in UTC, not the local timezone.
  */
 time_t tm_to_time_t(const struct tm *tm)
 {
-	static const int mdays[] = {
-	    0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334
-	};
-	int year = tm->tm_year - 70;
-	int month = tm->tm_mon;
-	int day = tm->tm_mday;
-
-	if (year < 0 || year > 129) /* algo only works for 1970-2099 */
-		return -1;
-	if (month < 0 || month > 11) /* array bounds */
-		return -1;
-	if (month < 2 || (year + 2) % 4)
-		day--;
-	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_sec < 0)
+	time_t days, hours, minutes, seconds;
+
+	if (tm->tm_year < 0 ||
+	    tm->tm_mon < 0 ||
+	    tm->tm_mday < 0 ||
+	    tm->tm_hour < 0 ||
+	    tm->tm_min < 0 ||
+	    tm->tm_sec < 0)
 		return -1;
-	return (year * 365 + (year + 1) / 4 + mdays[month] + day) * 24*60*60UL +
-		tm->tm_hour * 60*60 + tm->tm_min * 60 + tm->tm_sec;
+
+	days = days_from_civil(tm->tm_year + 1900,
+			       tm->tm_mon + 1,
+			       tm->tm_mday);
+	hours = 24 * days + tm->tm_hour;
+	minutes = 60 * hours + tm->tm_min;
+	seconds = 60 * minutes + tm->tm_sec;
+
+	return seconds;
 }
 
 static const char *month_names[] = {
darcy June 5, 2024, 6:52 a.m. UTC | #6
On 4/6/24 19:33, Jeff King wrote:

> I think the 2099 limitation is there because 2100 is not a leap year,
> and the code has over-simplified computation there (it counts every 4th
> year as a leap year, which is not generally true for century boundaries,
> except for 2000 because it is divisible by 400).

Ah, that makes sense, thanks.

> If you're interested in picking up the negative timestamp work, I'd be
> happy to discuss it. It's been on my todo list for a long time, but I
> never quite get around to it.

I may look at this in the future, but this looks a little out of my depth
at the moment, as I am still unfamiliar with the codebase.

For the moment, I will just add the check at year 2099, and possibly will
revisit this when I have more time on my hands.
Phillip Wood June 5, 2024, 1:10 p.m. UTC | #7
On 04/06/2024 09:48, darcy wrote:
> On 4/6/24 00:13, Phillip Wood wrote:
> 
>> Hi darcy
>> On 03/06/2024 12:44, darcy wrote:
>>>> Do we also want to check for overflows in the other direction (a large
>>>> timestamp with a negative timezone offset)?
>>> Is this something people want added? I am happy to implement this if
>>> so, though it wasn't my original intention.
>> I think if we're worried about the date overflowing in one direction it
>> makes sense to fix overflows in the other direction at the same time
>> especially as I think that the other case involves a signed integer
>> overflow which is undefined behavior in C.
> 
> That makes sense.
> 
> Though I am reading the `tm_to_time_t` code now and it only allows years
> up to 2099.
> 
>>     if (year < 0 || year > 129) /* algo only works for 1970-2099 */
>>         return -1;

Thanks for taking a closer look, I wasn't aware of that. Reading Peff's 
reply it seems like timestamp is actually unsigned and as we limit the 
maximum year to 2099 it seems unlikely we'll overflow from too larger 
date after all.

Best Wishes

Phillip

> I can of course add a check here for dates close to the end of 2099, but
> it seems that the bigger issue is that some day people will want to use
> Git after 2099... Should I see if I can extend this range? I'm not sure
> where that specific year comes from, it doesn't seem to be based on a
> limit of the size of `time_t`, and the comment or git logs don't seem to
> provide a reason.
> 
>
Junio C Hamano June 5, 2024, 5:27 p.m. UTC | #8
Phillip Wood <phillip.wood123@gmail.com> writes:

> Thanks for taking a closer look, I wasn't aware of that. Reading
> Peff's reply it seems like timestamp is actually unsigned and as we
> limit the maximum year to 2099 it seems unlikely we'll overflow from
> too larger date after all.

Thanks all.

I haven't seen one question I posed answered, though.  Has
https://git.github.io/htmldocs/SubmittingPatches.html#real-name
been followed in this patch when signing off the patch?
darcy June 6, 2024, 4:56 a.m. UTC | #9
On 6/6/24 03:27, Junio C Hamano wrote:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> Thanks for taking a closer look, I wasn't aware of that. Reading
>> Peff's reply it seems like timestamp is actually unsigned and as we
>> limit the maximum year to 2099 it seems unlikely we'll overflow from
>> too larger date after all.
> Thanks all.
>
> I haven't seen one question I posed answered, though.  Has
> https://git.github.io/htmldocs/SubmittingPatches.html#real-name
> been followed in this patch when signing off the patch?

I have changed the line in the commit to properly follow that rule.
diff mbox series

Patch

diff --git a/date.c b/date.c
index 7365a4ad24f..8e3ec1bcb00 100644
--- a/date.c
+++ b/date.c
@@ -937,8 +937,12 @@  int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset)
 		}
 	}
 
-	if (!tm_gmt)
+	if (!tm_gmt) {
+		if (*offset > 0 && *offset * 60 > *timestamp)
+			return -1;
 		*timestamp -= *offset * 60;
+	}
+
 	return 0; /* success */
 }
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 3031256d143..cdbb40bec01 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -116,6 +116,21 @@  check_parse '2008-02-14 20:30:45 -:30' '2008-02-14 20:30:45 +0000'
 check_parse '2008-02-14 20:30:45 -05:00' '2008-02-14 20:30:45 -0500'
 check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 -0500' EST5
 check_parse 'Thu, 7 Apr 2005 15:14:13 -0700' '2005-04-07 15:14:13 -0700'
+check_parse '1970-01-01 00:00:00' '1970-01-01 00:00:00 +0000'
+check_parse '1970-01-01 00:00:00 +00' '1970-01-01 00:00:00 +0000'
+check_parse '1970-01-01 00:00:00 Z' '1970-01-01 00:00:00 +0000'
+check_parse '1970-01-01 00:00:00 -01' '1970-01-01 00:00:00 -0100'
+check_parse '1970-01-01 00:00:00 +01' bad
+check_parse '1970-01-01 00:00:00 +11' bad
+check_parse '1970-01-01 00:59:59 +01' bad
+check_parse '1970-01-01 01:00:00 +01' '1970-01-01 01:00:00 +0100'
+check_parse '1970-01-01 01:00:00 +11' bad
+check_parse '1970-01-02 00:00:00 +11' '1970-01-02 00:00:00 +1100'
+check_parse '1969-12-31 23:59:59' bad
+check_parse '1969-12-31 23:59:59 +00' bad
+check_parse '1969-12-31 23:59:59 Z' bad
+check_parse '1969-12-31 23:59:59 +11' bad
+check_parse '1969-12-31 23:59:59 -11' bad
 
 check_approxidate() {
 	echo "$1 -> $2 +0000" >expect