Message ID | 20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | builtin/blame: fix out-of-bounds read with excessive `--abbrev` | expand |
On Thu, Jan 9, 2025, at 07:21, Patrick Steinhardt wrote: > object ID. This is because fwrite(3p) of course doesn't stop when it > sees a NUL byte, where as printf(3p) does. s/where as/whereas
Hi Patrick, On Thu, 9 Jan 2025, Patrick Steinhardt wrote: > In 6411a0a896 (builtin/blame: fix type of `length` variable when > emitting object ID, 2024-12-06) we have fixed the type of the `length` > variable. In order to avoid a cast from `size_t` to `int` in the call to > printf(3p) with the "%.*s" formatter we have converted the code to > instead use fwrite(3p), which accepts the length as a `size_t`. > > It was reported though that this makes us read over the end of the OID > array when the provided `--abbrev=` length exceeds the length of the > object ID. This is because fwrite(3p) of course doesn't stop when it > sees a NUL byte, where as printf(3p) does. > > Fix the bug by reverting back to printf(3p) and culling the provided > length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an > `int`. > > Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > This fixes the issue reported in [1]. Thanks! Thank you for the quick fix! We will need to adjust it a little more, though: > > Patrick > > [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de> > --- > builtin/blame.c | 4 +++- > t/t8002-blame.sh | 4 ++++ > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/builtin/blame.c b/builtin/blame.c > index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > char ch; > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > the_hash_algo->hexsz : (size_t) abbrev; > + if (length > GIT_MAX_HEXSZ) > + length = GIT_MAX_HEXSZ; This causes a subtle change of behavior because there are a couple of conditional code blocks between this change and the `printf()` call decrease `length`, i.e. specifying values larger than the maximal hex size causes potentially-desirable, different behavior (and think about https://www.hyrumslaw.com/). > > if (opt & OUTPUT_COLOR_LINE) { > if (cnt > 0) { > @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > length--; > putchar('?'); > } > - fwrite(hex, 1, length, stdout); > + printf("%.*s", (int)length, hex); > if (opt & OUTPUT_ANNOTATE_COMPAT) { > const char *name; > if (opt & OUTPUT_SHOW_EMAIL) > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755 > --- a/t/t8002-blame.sh > +++ b/t/t8002-blame.sh > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > check_abbrev $hexsz --no-abbrev > ' > > +test_expect_success 'blame --abbrev gets truncated' ' > + check_abbrev 9000 --abbrev=$hexsz HEAD This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value that needs to be truncated. > +' > + > test_expect_success '--exclude-promisor-objects does not BUG-crash' ' > test_must_fail git blame --exclude-promisor-objects one > ' > > --- > base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e > change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71 Here is my proposed fixup: -- snipsnap -- [PATCH] fixup! builtin/blame: fix out-of-bounds read with excessive `--abbrev` The test case needs to actually test an excessive `--abbrev` value. Also, when calling `git blame --abbrev=<N>` with an `N` that is larger than the maximal OID hex size, there is a subtle side effect that makes it behave _differently_ than specifying said maximal hex size: When the command outputs boundary, unblamable or ignored commits' OIDs, those outputs are prefixed with characters indicating this, and the `abbrev` value is used to align the information that comes after the OID, clipping it as needed. Specifying a "too large" abbrev value here tells Git that yes, we want the full OIDs and don't you worry about alignment. Thanks to SHA-256 being _larger_ than the default SHA-1-based OIDs, and thanks to clipping at `GIT_MAX_HEXSZ`, this change of behavior can only be observed when running the test in SHA-256 mode. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- builtin/blame.c | 9 +++++++-- t/t8002-blame.sh | 2 +- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index ad91fe9e97f9..5b4976835066 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int char ch; size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : (size_t) abbrev; - if (length > GIT_MAX_HEXSZ) - length = GIT_MAX_HEXSZ; + + /* + * Leave enough space for ^, * and ? indicators (boundary, + * unblamable, ignored). + */ + if (length > GIT_MAX_HEXSZ + 3) + length = GIT_MAX_HEXSZ + 3; if (opt & OUTPUT_COLOR_LINE) { if (cnt > 0) { diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh index fcaba8c11f7e..71fa70a64679 100755 --- a/t/t8002-blame.sh +++ b/t/t8002-blame.sh @@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' ' test_expect_success 'blame --abbrev gets truncated' ' - check_abbrev 9000 --abbrev=$hexsz HEAD + check_abbrev 9000 --abbrev=9000 HEAD.. ' test_expect_success '--exclude-promisor-objects does not BUG-crash' ' -- 2.48.0.rc0.windows.1
On Thu, Jan 09, 2025 at 11:49:43AM +0100, Johannes Schindelin wrote: > > diff --git a/builtin/blame.c b/builtin/blame.c > > index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644 > > --- a/builtin/blame.c > > +++ b/builtin/blame.c > > @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > > char ch; > > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > > the_hash_algo->hexsz : (size_t) abbrev; > > + if (length > GIT_MAX_HEXSZ) > > + length = GIT_MAX_HEXSZ; > > This causes a subtle change of behavior because there are a couple of > conditional code blocks between this change and the `printf()` call > decrease `length`, i.e. specifying values larger than the maximal hex size > causes potentially-desirable, different behavior (and think about > https://www.hyrumslaw.com/). Alternatively we can move this until after we have done the subtractions. Then we don't have to do weird gymnastics. > > > > if (opt & OUTPUT_COLOR_LINE) { > > if (cnt > 0) { > > @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > > length--; > > putchar('?'); > > } > > - fwrite(hex, 1, length, stdout); > > + printf("%.*s", (int)length, hex); > > if (opt & OUTPUT_ANNOTATE_COMPAT) { > > const char *name; > > if (opt & OUTPUT_SHOW_EMAIL) > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755 > > --- a/t/t8002-blame.sh > > +++ b/t/t8002-blame.sh > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > > check_abbrev $hexsz --no-abbrev > > ' > > > > +test_expect_success 'blame --abbrev gets truncated' ' > > + check_abbrev 9000 --abbrev=$hexsz HEAD > > This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value > that needs to be truncated. Oh dear. The test did manage to catch the bug, but thinking more about it that was only because my initial fix was broken. > diff --git a/builtin/blame.c b/builtin/blame.c > index ad91fe9e97f9..5b4976835066 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > char ch; > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > the_hash_algo->hexsz : (size_t) abbrev; > - if (length > GIT_MAX_HEXSZ) > - length = GIT_MAX_HEXSZ; > + > + /* > + * Leave enough space for ^, * and ? indicators (boundary, > + * unblamable, ignored). > + */ > + if (length > GIT_MAX_HEXSZ + 3) > + length = GIT_MAX_HEXSZ + 3; > > if (opt & OUTPUT_COLOR_LINE) { > if (cnt > 0) { How about this instead? diff --git a/builtin/blame.c b/builtin/blame.c index ad91fe9e97..f92e487bed 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -475,8 +475,6 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int char ch; size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : (size_t) abbrev; - if (length > GIT_MAX_HEXSZ) - length = GIT_MAX_HEXSZ; if (opt & OUTPUT_COLOR_LINE) { if (cnt > 0) { @@ -507,6 +505,9 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int length--; putchar('?'); } + + if (length > GIT_MAX_HEXSZ) + length = GIT_MAX_HEXSZ; printf("%.*s", (int)length, hex); if (opt & OUTPUT_ANNOTATE_COMPAT) { const char *name; In that case there's no need to juggle with the magic indicators, which makes it a bit easier to reason about. > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > index fcaba8c11f7e..71fa70a64679 100755 > --- a/t/t8002-blame.sh > +++ b/t/t8002-blame.sh > @@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > ' > > test_expect_success 'blame --abbrev gets truncated' ' > - check_abbrev 9000 --abbrev=$hexsz HEAD > + check_abbrev 9000 --abbrev=9000 HEAD.. > ' This should be `check_abbrev $hexsz --abbrev=9000`, shouldn't it? Patrick
Hi Patrick, On Thu, 9 Jan 2025, Patrick Steinhardt wrote: > On Thu, Jan 09, 2025 at 11:49:43AM +0100, Johannes Schindelin wrote: > > > diff --git a/builtin/blame.c b/builtin/blame.c > > > index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644 > > > --- a/builtin/blame.c > > > +++ b/builtin/blame.c > > > @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > > > char ch; > > > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > > > the_hash_algo->hexsz : (size_t) abbrev; > > > + if (length > GIT_MAX_HEXSZ) > > > + length = GIT_MAX_HEXSZ; > > > > This causes a subtle change of behavior because there are a couple of > > conditional code blocks between this change and the `printf()` call > > decrease `length`, i.e. specifying values larger than the maximal hex size > > causes potentially-desirable, different behavior (and think about > > https://www.hyrumslaw.com/). > > Alternatively we can move this until after we have done the > subtractions. Then we don't have to do weird gymnastics. Or we can even avoid assiging a maximum altogether: if (length < GIT_MAX_HEXSZ) printf("%.*s", (int)length, hex); else printf("%s", hex); Or be more consistent with Git's source code style which often prefers ternaries, favoring succinctness over readability: printf("%.*s", (int)(length < GIT_MAX_HEXSZ ? length : GIT_MAX_HEXSZ), hex); > > > if (opt & OUTPUT_COLOR_LINE) { > > > if (cnt > 0) { > > > @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > > > length--; > > > putchar('?'); > > > } > > > - fwrite(hex, 1, length, stdout); > > > + printf("%.*s", (int)length, hex); > > > if (opt & OUTPUT_ANNOTATE_COMPAT) { > > > const char *name; > > > if (opt & OUTPUT_SHOW_EMAIL) > > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755 > > > --- a/t/t8002-blame.sh > > > +++ b/t/t8002-blame.sh > > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > > > check_abbrev $hexsz --no-abbrev > > > ' > > > > > > +test_expect_success 'blame --abbrev gets truncated' ' > > > + check_abbrev 9000 --abbrev=$hexsz HEAD > > > > This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value > > that needs to be truncated. > > Oh dear. The test did manage to catch the bug, but thinking more about > it that was only because my initial fix was broken. > > > diff --git a/builtin/blame.c b/builtin/blame.c > > index ad91fe9e97f9..5b4976835066 100644 > > --- a/builtin/blame.c > > +++ b/builtin/blame.c > > @@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > > char ch; > > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > > the_hash_algo->hexsz : (size_t) abbrev; > > - if (length > GIT_MAX_HEXSZ) > > - length = GIT_MAX_HEXSZ; > > + > > + /* > > + * Leave enough space for ^, * and ? indicators (boundary, > > + * unblamable, ignored). > > + */ > > + if (length > GIT_MAX_HEXSZ + 3) > > + length = GIT_MAX_HEXSZ + 3; > > > > if (opt & OUTPUT_COLOR_LINE) { > > if (cnt > 0) { > > How about this instead? > > diff --git a/builtin/blame.c b/builtin/blame.c > index ad91fe9e97..f92e487bed 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -475,8 +475,6 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > char ch; > size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? > the_hash_algo->hexsz : (size_t) abbrev; > - if (length > GIT_MAX_HEXSZ) > - length = GIT_MAX_HEXSZ; > > if (opt & OUTPUT_COLOR_LINE) { > if (cnt > 0) { > @@ -507,6 +505,9 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > length--; > putchar('?'); > } > + > + if (length > GIT_MAX_HEXSZ) > + length = GIT_MAX_HEXSZ; > printf("%.*s", (int)length, hex); > if (opt & OUTPUT_ANNOTATE_COMPAT) { > const char *name; > > In that case there's no need to juggle with the magic indicators, which > makes it a bit easier to reason about. > > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > > index fcaba8c11f7e..71fa70a64679 100755 > > --- a/t/t8002-blame.sh > > +++ b/t/t8002-blame.sh > > @@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > > ' > > > > test_expect_success 'blame --abbrev gets truncated' ' > > - check_abbrev 9000 --abbrev=$hexsz HEAD > > + check_abbrev 9000 --abbrev=9000 HEAD.. > > ' > > This should be `check_abbrev $hexsz --abbrev=9000`, shouldn't it? I kind of liked the idea to keep the same cut-off threshold for the validation. But I won't insist. The construct is obtuse in either case ;-) (Not your fault, of course, you merely imitated existing code, which is the correct thing to do). Ciao, Johannes
diff --git a/builtin/blame.c b/builtin/blame.c index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int char ch; size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : (size_t) abbrev; + if (length > GIT_MAX_HEXSZ) + length = GIT_MAX_HEXSZ; if (opt & OUTPUT_COLOR_LINE) { if (cnt > 0) { @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int length--; putchar('?'); } - fwrite(hex, 1, length, stdout); + printf("%.*s", (int)length, hex); if (opt & OUTPUT_ANNOTATE_COMPAT) { const char *name; if (opt & OUTPUT_SHOW_EMAIL) diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755 --- a/t/t8002-blame.sh +++ b/t/t8002-blame.sh @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' check_abbrev $hexsz --no-abbrev ' +test_expect_success 'blame --abbrev gets truncated' ' + check_abbrev 9000 --abbrev=$hexsz HEAD +' + test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git blame --exclude-promisor-objects one '
In 6411a0a896 (builtin/blame: fix type of `length` variable when emitting object ID, 2024-12-06) we have fixed the type of the `length` variable. In order to avoid a cast from `size_t` to `int` in the call to printf(3p) with the "%.*s" formatter we have converted the code to instead use fwrite(3p), which accepts the length as a `size_t`. It was reported though that this makes us read over the end of the OID array when the provided `--abbrev=` length exceeds the length of the object ID. This is because fwrite(3p) of course doesn't stop when it sees a NUL byte, where as printf(3p) does. Fix the bug by reverting back to printf(3p) and culling the provided length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an `int`. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- This fixes the issue reported in [1]. Thanks! Patrick [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de> --- builtin/blame.c | 4 +++- t/t8002-blame.sh | 4 ++++ 2 files changed, 7 insertions(+), 1 deletion(-) --- base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71