diff mbox series

[10/10] lower core.maxTreeDepth default to 2048

Message ID 20230831062320.GJ3185325@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 4d5693ba05ae0d722ad5a6c0e34296caf6be9b74
Headers show
Series tree name and depth limits | expand

Commit Message

Jeff King Aug. 31, 2023, 6:23 a.m. UTC
On my Linux system, all of our recursive tree walking algorithms can run
up to the 4096 default limit without segfaulting. But not all platforms
will have stack sizes as generous (nor might even Linux if we kick off a
recursive walk within a thread).

In particular, several of the tests added in the previous few commits
fail in our Windows CI environment. Through some guess-and-check
pushing, I found that 3072 is still too much, but 2048 is OK.

These are obviously vague heuristics, and there is nothing to promise
that another system might not have trouble at even lower values. But it
seems unlikely anybody will be too angry about a 2048-depth limit (this
is close to the default max-pathname limit on Linux even for a
pathological path like "a/a/a/..."). So let's just lower it.

Some alternatives are:

  - configure separate defaults for Windows versus other platforms.

  - just skip the tests on Windows. This leaves Windows users with the
    annoying case that they can be crashed by running out of stack
    space, but there shouldn't be any security implications (they can't
    go deep enough to hit integer overflow problems).

Since the original default was arbitrary, it seems less confusing to
just lower it, keeping behavior consistent across platforms.

Signed-off-by: Jeff King <peff@peff.net>
---
Arguably this could be squashed into patch 5. But I thought that
following the sequence of logic (from "4096 is probably OK" to "whoops,
it's not") had some value to share. AFAIK GitHub is still running with
the 4096 limit; I discovered the Windows issue only while preparing this
for upstream. But I still find it highly unlikely that somebody would
run into it in practice.

 environment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Oswald Buddenhagen Aug. 31, 2023, 10:39 a.m. UTC | #1
On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>But I thought that
>following the sequence of logic (from "4096 is probably OK" to "whoops,
>it's not") had some value to share.
>
of course, but you can just integrate that into the squashed commit 
message. having it all in one place makes it easier to follow.

regards
Jeff King Aug. 31, 2023, 5:42 p.m. UTC | #2
On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:

> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
> > But I thought that
> > following the sequence of logic (from "4096 is probably OK" to "whoops,
> > it's not") had some value to share.
> > 
> of course, but you can just integrate that into the squashed commit message.
> having it all in one place makes it easier to follow.

Yes, though I think having it as a separate patch makes it easier to
revisit later (e.g., by reverting or by replacing the patch during a
re-roll).

-Peff
Junio C Hamano Aug. 31, 2023, 9:06 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:
>
>> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>> > But I thought that
>> > following the sequence of logic (from "4096 is probably OK" to "whoops,
>> > it's not") had some value to share.
>> > 
>> of course, but you can just integrate that into the squashed commit message.
>> having it all in one place makes it easier to follow.
>
> Yes, though I think having it as a separate patch makes it easier to
> revisit later (e.g., by reverting or by replacing the patch during a
> re-roll).

I am on the fence.  Having it squashed into the same step as it was
introduced may reduce the patch count, but then it would not be easy
to explain why 2048 is a reasonable default at that step when no
code actually uses the variable, so the end result is not all that
easier to follow and read, as that earlier step would be handwaving
"2048 is good at the end of the series, trust me", unlike having it
at the end.  When 4096 is introduced as a "random number that seems
larger than large enough" in the earlier step, it might be worth
mentioning that it is a tentative default and may turn out to be
larger than necessary in which case we may want to shrink it ;-)
Randall S. Becker Aug. 31, 2023, 9:59 p.m. UTC | #4
On Thursday, August 31, 2023 5:06 PM, Junio C Hamano wrote:
>Jeff King <peff@peff.net> writes:
>
>> On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:
>>
>>> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>>> > But I thought that
>>> > following the sequence of logic (from "4096 is probably OK" to
>>> > "whoops, it's not") had some value to share.
>>> >
>>> of course, but you can just integrate that into the squashed commit
message.
>>> having it all in one place makes it easier to follow.
>>
>> Yes, though I think having it as a separate patch makes it easier to
>> revisit later (e.g., by reverting or by replacing the patch during a
>> re-roll).
>
>I am on the fence.  Having it squashed into the same step as it was
introduced may
>reduce the patch count, but then it would not be easy to explain why 2048
is a
>reasonable default at that step when no code actually uses the variable, so
the end
>result is not all that easier to follow and read, as that earlier step
would be
>handwaving
>"2048 is good at the end of the series, trust me", unlike having it at the
end.  When
>4096 is introduced as a "random number that seems larger than large enough"
in the
>earlier step, it might be worth mentioning that it is a tentative default
and may turn
>out to be larger than necessary in which case we may want to shrink it ;-)

I have been trying to figure out the implications of this and went down the
wrong rabbit hole. Are we taking about the tree depth of the underlying
Merkel Tree (no) or the tree-ish thing representing the file system
(apparently yes). In this case, a practical depth of 2048 hits the exact max
path size on the NonStop platform, so I have no issue there. My concern is
one of terminology. My assumption of what maxTreeDepth meant, from other
terminology used in git, seemed (wrongly) to align with the use of --depth=n
where n<maxTreeDepth parameters for commands like fetch. From a user
intuition (arguably, if I have any here) is that the parameter should be
more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather
than what is currently in flight. Just my opinion and I'm fine no matter
which way.

--Randall
--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.
Jeff King Aug. 31, 2023, 10:31 p.m. UTC | #5
On Thu, Aug 31, 2023 at 05:59:28PM -0400, rsbecker@nexbridge.com wrote:

> I have been trying to figure out the implications of this and went down the
> wrong rabbit hole. Are we taking about the tree depth of the underlying
> Merkel Tree (no) or the tree-ish thing representing the file system
> (apparently yes). In this case, a practical depth of 2048 hits the exact max
> path size on the NonStop platform, so I have no issue there. My concern is
> one of terminology. My assumption of what maxTreeDepth meant, from other
> terminology used in git, seemed (wrongly) to align with the use of --depth=n
> where n<maxTreeDepth parameters for commands like fetch. From a user
> intuition (arguably, if I have any here) is that the parameter should be
> more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather
> than what is currently in flight. Just my opinion and I'm fine no matter
> which way.

The documentation from the patch is:

  +core.maxTreeDepth::
  +       The maximum depth Git is willing to recurse while traversing a
  +       tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
  +       to allow Git to abort cleanly, and should not generally need to
  +       be adjusted. The default is 4096.

Does reading that answer your question and make the meaning clear? If
not, can you suggest any changes?

I'd like to stick with "depth" here as it's commonly used in other
places to mean the same thing (e.g., git-grep's --max-depth option).

I also think this is something that most people will remain completely
oblivious to, as you'd only hit it for absurd cases.

-Peff
diff mbox series

Patch

diff --git a/environment.c b/environment.c
index 8e25b5ef02..bb3c2a96a3 100644
--- a/environment.c
+++ b/environment.c
@@ -81,7 +81,7 @@  int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
-int max_allowed_tree_depth = 4096;
+int max_allowed_tree_depth = 2048;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0