Message ID | pull.1726.v2.git.git.1717369608923.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] date: detect underflow when parsing dates with positive timezone offset | expand |
"darcy via GitGitGadget" <gitgitgadget@gmail.com> writes: > fix: prevent date underflow when using positive timezone offset > > cc: Patrick Steinhardt ps@pks.im cc: Phillip Wood > phillip.wood123@gmail.com You're expected to respond to review comments before you send in updated patches. I didn't see the review comments responded to in the thread: https://lore.kernel.org/git/pull.1726.git.git.1716801427015.gitgitgadget@gmail.com/ Please see "A typical life cycle of a patch series" section of the SubmittingPatches document. https://git.github.io/htmldocs/SubmittingPatches.html#patch-flow Step #3 and Step #4 are distinct.
> Without having a deep understanding of the code I don't quite see the > connection between this change and the problem description. Is it > necessary? If so, it might help to explain why it's needed in the commit > message or in the code. > ...I guess that > checking for tm_hour is assuming that TZ offset should always come > before the values necessary to compute the timestamp comes, but it > smells like an unwarranted assumption and not explaining the change > in the proposed log message is a bad sign. This line has been reverted. The point was it would only parse the timezone offset if it occurs after the time part of the date, but I have realized that this is unrelated to the purpose of this change. > You should also add at least one test. Yep, thanks, added now to `t0006-date.sh`. > Do we also want to check for overflows in the other direction (a large timestamp with a negative timezone offset)? Is this something people want added? I am happy to implement this if so, though it wasn't my original intention. Issues with the commit message should also be resolved. Thank you everyone for your patience :) On 3/6/24 21:13, Junio C Hamano wrote: > "darcy via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> fix: prevent date underflow when using positive timezone offset >> >> cc: Patrick Steinhardt ps@pks.im cc: Phillip Wood >> phillip.wood123@gmail.com > You're expected to respond to review comments before you send in > updated patches. I didn't see the review comments responded to in > the thread: > > https://lore.kernel.org/git/pull.1726.git.git.1716801427015.gitgitgadget@gmail.com/ > > Please see "A typical life cycle of a patch series" section of the > SubmittingPatches document. > > https://git.github.io/htmldocs/SubmittingPatches.html#patch-flow > > Step #3 and Step #4 are distinct.
Hi darcy On 03/06/2024 12:44, darcy wrote: >> Do we also want to check for overflows in the other direction (a large >> timestamp with a negative timezone offset)? > > Is this something people want added? I am happy to implement this if > so, though it wasn't my original intention. I think if we're worried about the date overflowing in one direction it makes sense to fix overflows in the other direction at the same time especially as I think that the other case involves a signed integer overflow which is undefined behavior in C. Best Wishes Phillip
On 4/6/24 00:13, Phillip Wood wrote: > Hi darcy > On 03/06/2024 12:44, darcy wrote: >>> Do we also want to check for overflows in the other direction (a large >>> timestamp with a negative timezone offset)? >> Is this something people want added? I am happy to implement this if >> so, though it wasn't my original intention. > I think if we're worried about the date overflowing in one direction it > makes sense to fix overflows in the other direction at the same time > especially as I think that the other case involves a signed integer > overflow which is undefined behavior in C. That makes sense. Though I am reading the `tm_to_time_t` code now and it only allows years up to 2099. > if (year < 0 || year > 129) /* algo only works for 1970-2099 */ > return -1; I can of course add a check here for dates close to the end of 2099, but it seems that the bigger issue is that some day people will want to use Git after 2099... Should I see if I can extend this range? I'm not sure where that specific year comes from, it doesn't seem to be based on a limit of the size of `time_t`, and the comment or git logs don't seem to provide a reason.
On Tue, Jun 04, 2024 at 06:48:59PM +1000, darcy wrote: > Though I am reading the `tm_to_time_t` code now and it only allows years > up to 2099. > > > if (year < 0 || year > 129) /* algo only works for 1970-2099 */ > > return -1; > > I can of course add a check here for dates close to the end of 2099, but > it seems that the bigger issue is that some day people will want to use > Git after 2099... Should I see if I can extend this range? I'm not sure > where that specific year comes from, it doesn't seem to be based on a > limit of the size of `time_t`, and the comment or git logs don't seem to > provide a reason. I think the 2099 limitation is there because 2100 is not a leap year, and the code has over-simplified computation there (it counts every 4th year as a leap year, which is not generally true for century boundaries, except for 2000 because it is divisible by 400). From the log message of ecee9d9e79 ([PATCH] Do date parsing by hand..., 2005-04-30), I think the 1970 limitation may be about avoiding non-60-minute DST (though it is not clear to me how the function cares about dst either way, as it operates on UTC). You can't represent times before 1970 in our code base anyway, because our epoch timestamp is unsigned. So yes, we could do better than that function. I worked on a series to handle negative timestamps a while ago, and as part of that, I had to replace this function. You can see the full series at the "jk/time-negative-wip" branch of https://github.com/peff/git, but I'll reproduce the commit in question below for the list archive. The resulting series worked fine for me, but IIRC there was some issue on Windows. I think its gmtime() did not understand negative timestamps, maybe? And we'd need the reverse equivalent of the patch below (which we could take from the same source). There are some overflow checks already in date.c (grep for "overflow") that might cover us, but I won't be surprised if there's a case that's missing (although we store the timestamps as uintmax_t, so you're unlikely to overflow that in practice). Underflow is more likely, though if we finally support negative timestamps, then it becomes a non-issue (and we'd actually want to revert the patch you're working on). If you're interested in picking up the negative timestamp work, I'd be happy to discuss it. It's been on my todo list for a long time, but I never quite get around to it. Anyway, here's the tm_to_time_t() cleanup. -- >8 -- date: make tm_to_time_t() less restricted We've been using a hand-rolled tm_to_time_t() conversion since ecee9d9e79 ([PATCH] Do date parsing by hand..., 2005-04-30). It works fine for recent dates, but it doesn't work before 1970 nor after 2099. Let's lift those restrictions by using a more robust algorithm. The days_from_civil() code here is lifted from Howard Hinnant's date library, which can be found at: https://github.com/HowardHinnant/date The code is in the year_month_day::to_days() function, but I also relied on his excellent explanation in: http://howardhinnant.github.io/date_algorithms.html The library is under the MIT license, which is GPL compatible. I've also left a comment in the source file marking the licensing (the code is substantially similar to the original; I only expanded the variable names to make it easier to follow). Ideally we could just rely on a system function to do this, but there isn't one. POSIX specifies mktime(), but it uses the local timezone. There's a timegm() function available in glibc and some BSD systems, as well as a similar function on Windows (with a different name). However, it's likely that there are still going to be some platforms without it, so we'd need a fallback anyway. We may as well just use that fallback everywhere to make sure that Git behaves consistently (and get more complete test coverage). Note that some callers, like parse_date_basic(), rely on tm_to_time_t() complaining when we have negative values in any field. So we'll continue to detect and complain about this case (the alternative would be to turn "month -1" into "month 11" of the previous year, and so on). Signed-off-by: Jeff King <peff@peff.net> --- date.c | 58 ++++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 42 insertions(+), 16 deletions(-) diff --git a/date.c b/date.c index 4337e18abd..df6c414c78 100644 --- a/date.c +++ b/date.c @@ -10,29 +10,55 @@ #include "pager.h" #include "strbuf.h" +/* + * Convert a year-month-day time into a number of days since 1970 (possibly + * negative). Note that "year" is the full year (not offset from 1900), and + * "month" is in the range 1-12 (unlike a "struct tm" 0-11). + * + * Working in time_t is overkill (since it usually represents seconds), but + * this makes sure we don't hit any integer range issues. + * + * This function is taken from https://github.com/HowardHinnant/date, + * which is released under an MIT license. + */ +static time_t days_from_civil(int year, int month, int day) +{ + time_t era; + unsigned year_of_era, day_of_year, day_of_era; + + year -= month <= 2; + era = (year >= 0 ? year : year - 399) / 400; + year_of_era = year - era * 400; + day_of_year = (153 * (month + (month > 2 ? -3 : 9)) + 2) / 5 + day - 1; + day_of_era = year_of_era * 365 + year_of_era / 4 - year_of_era / 100 + + day_of_year; + return era * 146097 + (time_t)day_of_era - 719468; +} + /* * This is like mktime, but without normalization of tm_wday and tm_yday. * It also always operates in UTC, not the local timezone. */ time_t tm_to_time_t(const struct tm *tm) { - static const int mdays[] = { - 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 - }; - int year = tm->tm_year - 70; - int month = tm->tm_mon; - int day = tm->tm_mday; - - if (year < 0 || year > 129) /* algo only works for 1970-2099 */ - return -1; - if (month < 0 || month > 11) /* array bounds */ - return -1; - if (month < 2 || (year + 2) % 4) - day--; - if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_sec < 0) + time_t days, hours, minutes, seconds; + + if (tm->tm_year < 0 || + tm->tm_mon < 0 || + tm->tm_mday < 0 || + tm->tm_hour < 0 || + tm->tm_min < 0 || + tm->tm_sec < 0) return -1; - return (year * 365 + (year + 1) / 4 + mdays[month] + day) * 24*60*60UL + - tm->tm_hour * 60*60 + tm->tm_min * 60 + tm->tm_sec; + + days = days_from_civil(tm->tm_year + 1900, + tm->tm_mon + 1, + tm->tm_mday); + hours = 24 * days + tm->tm_hour; + minutes = 60 * hours + tm->tm_min; + seconds = 60 * minutes + tm->tm_sec; + + return seconds; } static const char *month_names[] = {
On 4/6/24 19:33, Jeff King wrote: > I think the 2099 limitation is there because 2100 is not a leap year, > and the code has over-simplified computation there (it counts every 4th > year as a leap year, which is not generally true for century boundaries, > except for 2000 because it is divisible by 400). Ah, that makes sense, thanks. > If you're interested in picking up the negative timestamp work, I'd be > happy to discuss it. It's been on my todo list for a long time, but I > never quite get around to it. I may look at this in the future, but this looks a little out of my depth at the moment, as I am still unfamiliar with the codebase. For the moment, I will just add the check at year 2099, and possibly will revisit this when I have more time on my hands.
On 04/06/2024 09:48, darcy wrote: > On 4/6/24 00:13, Phillip Wood wrote: > >> Hi darcy >> On 03/06/2024 12:44, darcy wrote: >>>> Do we also want to check for overflows in the other direction (a large >>>> timestamp with a negative timezone offset)? >>> Is this something people want added? I am happy to implement this if >>> so, though it wasn't my original intention. >> I think if we're worried about the date overflowing in one direction it >> makes sense to fix overflows in the other direction at the same time >> especially as I think that the other case involves a signed integer >> overflow which is undefined behavior in C. > > That makes sense. > > Though I am reading the `tm_to_time_t` code now and it only allows years > up to 2099. > >> if (year < 0 || year > 129) /* algo only works for 1970-2099 */ >> return -1; Thanks for taking a closer look, I wasn't aware of that. Reading Peff's reply it seems like timestamp is actually unsigned and as we limit the maximum year to 2099 it seems unlikely we'll overflow from too larger date after all. Best Wishes Phillip > I can of course add a check here for dates close to the end of 2099, but > it seems that the bigger issue is that some day people will want to use > Git after 2099... Should I see if I can extend this range? I'm not sure > where that specific year comes from, it doesn't seem to be based on a > limit of the size of `time_t`, and the comment or git logs don't seem to > provide a reason. > >
Phillip Wood <phillip.wood123@gmail.com> writes: > Thanks for taking a closer look, I wasn't aware of that. Reading > Peff's reply it seems like timestamp is actually unsigned and as we > limit the maximum year to 2099 it seems unlikely we'll overflow from > too larger date after all. Thanks all. I haven't seen one question I posed answered, though. Has https://git.github.io/htmldocs/SubmittingPatches.html#real-name been followed in this patch when signing off the patch?
On 6/6/24 03:27, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> Thanks for taking a closer look, I wasn't aware of that. Reading >> Peff's reply it seems like timestamp is actually unsigned and as we >> limit the maximum year to 2099 it seems unlikely we'll overflow from >> too larger date after all. > Thanks all. > > I haven't seen one question I posed answered, though. Has > https://git.github.io/htmldocs/SubmittingPatches.html#real-name > been followed in this patch when signing off the patch? I have changed the line in the commit to properly follow that rule.
diff --git a/date.c b/date.c index 7365a4ad24f..8e3ec1bcb00 100644 --- a/date.c +++ b/date.c @@ -937,8 +937,12 @@ int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset) } } - if (!tm_gmt) + if (!tm_gmt) { + if (*offset > 0 && *offset * 60 > *timestamp) + return -1; *timestamp -= *offset * 60; + } + return 0; /* success */ } diff --git a/t/t0006-date.sh b/t/t0006-date.sh index 3031256d143..cdbb40bec01 100755 --- a/t/t0006-date.sh +++ b/t/t0006-date.sh @@ -116,6 +116,21 @@ check_parse '2008-02-14 20:30:45 -:30' '2008-02-14 20:30:45 +0000' check_parse '2008-02-14 20:30:45 -05:00' '2008-02-14 20:30:45 -0500' check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 -0500' EST5 check_parse 'Thu, 7 Apr 2005 15:14:13 -0700' '2005-04-07 15:14:13 -0700' +check_parse '1970-01-01 00:00:00' '1970-01-01 00:00:00 +0000' +check_parse '1970-01-01 00:00:00 +00' '1970-01-01 00:00:00 +0000' +check_parse '1970-01-01 00:00:00 Z' '1970-01-01 00:00:00 +0000' +check_parse '1970-01-01 00:00:00 -01' '1970-01-01 00:00:00 -0100' +check_parse '1970-01-01 00:00:00 +01' bad +check_parse '1970-01-01 00:00:00 +11' bad +check_parse '1970-01-01 00:59:59 +01' bad +check_parse '1970-01-01 01:00:00 +01' '1970-01-01 01:00:00 +0100' +check_parse '1970-01-01 01:00:00 +11' bad +check_parse '1970-01-02 00:00:00 +11' '1970-01-02 00:00:00 +1100' +check_parse '1969-12-31 23:59:59' bad +check_parse '1969-12-31 23:59:59 +00' bad +check_parse '1969-12-31 23:59:59 Z' bad +check_parse '1969-12-31 23:59:59 +11' bad +check_parse '1969-12-31 23:59:59 -11' bad check_approxidate() { echo "$1 -> $2 +0000" >expect