Message ID | 20230831062320.GJ3185325@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 4d5693ba05ae0d722ad5a6c0e34296caf6be9b74 |
Headers | show |
Series | tree name and depth limits | expand |
On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote: >But I thought that >following the sequence of logic (from "4096 is probably OK" to "whoops, >it's not") had some value to share. > of course, but you can just integrate that into the squashed commit message. having it all in one place makes it easier to follow. regards
On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote: > On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote: > > But I thought that > > following the sequence of logic (from "4096 is probably OK" to "whoops, > > it's not") had some value to share. > > > of course, but you can just integrate that into the squashed commit message. > having it all in one place makes it easier to follow. Yes, though I think having it as a separate patch makes it easier to revisit later (e.g., by reverting or by replacing the patch during a re-roll). -Peff
Jeff King <peff@peff.net> writes: > On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote: > >> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote: >> > But I thought that >> > following the sequence of logic (from "4096 is probably OK" to "whoops, >> > it's not") had some value to share. >> > >> of course, but you can just integrate that into the squashed commit message. >> having it all in one place makes it easier to follow. > > Yes, though I think having it as a separate patch makes it easier to > revisit later (e.g., by reverting or by replacing the patch during a > re-roll). I am on the fence. Having it squashed into the same step as it was introduced may reduce the patch count, but then it would not be easy to explain why 2048 is a reasonable default at that step when no code actually uses the variable, so the end result is not all that easier to follow and read, as that earlier step would be handwaving "2048 is good at the end of the series, trust me", unlike having it at the end. When 4096 is introduced as a "random number that seems larger than large enough" in the earlier step, it might be worth mentioning that it is a tentative default and may turn out to be larger than necessary in which case we may want to shrink it ;-)
On Thursday, August 31, 2023 5:06 PM, Junio C Hamano wrote: >Jeff King <peff@peff.net> writes: > >> On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote: >> >>> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote: >>> > But I thought that >>> > following the sequence of logic (from "4096 is probably OK" to >>> > "whoops, it's not") had some value to share. >>> > >>> of course, but you can just integrate that into the squashed commit message. >>> having it all in one place makes it easier to follow. >> >> Yes, though I think having it as a separate patch makes it easier to >> revisit later (e.g., by reverting or by replacing the patch during a >> re-roll). > >I am on the fence. Having it squashed into the same step as it was introduced may >reduce the patch count, but then it would not be easy to explain why 2048 is a >reasonable default at that step when no code actually uses the variable, so the end >result is not all that easier to follow and read, as that earlier step would be >handwaving >"2048 is good at the end of the series, trust me", unlike having it at the end. When >4096 is introduced as a "random number that seems larger than large enough" in the >earlier step, it might be worth mentioning that it is a tentative default and may turn >out to be larger than necessary in which case we may want to shrink it ;-) I have been trying to figure out the implications of this and went down the wrong rabbit hole. Are we taking about the tree depth of the underlying Merkel Tree (no) or the tree-ish thing representing the file system (apparently yes). In this case, a practical depth of 2048 hits the exact max path size on the NonStop platform, so I have no issue there. My concern is one of terminology. My assumption of what maxTreeDepth meant, from other terminology used in git, seemed (wrongly) to align with the use of --depth=n where n<maxTreeDepth parameters for commands like fetch. From a user intuition (arguably, if I have any here) is that the parameter should be more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather than what is currently in flight. Just my opinion and I'm fine no matter which way. --Randall -- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400) NonStop(211288444200000000) -- In real life, I talk too much.
On Thu, Aug 31, 2023 at 05:59:28PM -0400, rsbecker@nexbridge.com wrote: > I have been trying to figure out the implications of this and went down the > wrong rabbit hole. Are we taking about the tree depth of the underlying > Merkel Tree (no) or the tree-ish thing representing the file system > (apparently yes). In this case, a practical depth of 2048 hits the exact max > path size on the NonStop platform, so I have no issue there. My concern is > one of terminology. My assumption of what maxTreeDepth meant, from other > terminology used in git, seemed (wrongly) to align with the use of --depth=n > where n<maxTreeDepth parameters for commands like fetch. From a user > intuition (arguably, if I have any here) is that the parameter should be > more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather > than what is currently in flight. Just my opinion and I'm fine no matter > which way. The documentation from the patch is: +core.maxTreeDepth:: + The maximum depth Git is willing to recurse while traversing a + tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe + to allow Git to abort cleanly, and should not generally need to + be adjusted. The default is 4096. Does reading that answer your question and make the meaning clear? If not, can you suggest any changes? I'd like to stick with "depth" here as it's commonly used in other places to mean the same thing (e.g., git-grep's --max-depth option). I also think this is something that most people will remain completely oblivious to, as you'd only hit it for absurd cases. -Peff
diff --git a/environment.c b/environment.c index 8e25b5ef02..bb3c2a96a3 100644 --- a/environment.c +++ b/environment.c @@ -81,7 +81,7 @@ int merge_log_config = -1; int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ unsigned long pack_size_limit_cfg; enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET; -int max_allowed_tree_depth = 4096; +int max_allowed_tree_depth = 2048; #ifndef PROTECT_HFS_DEFAULT #define PROTECT_HFS_DEFAULT 0
On my Linux system, all of our recursive tree walking algorithms can run up to the 4096 default limit without segfaulting. But not all platforms will have stack sizes as generous (nor might even Linux if we kick off a recursive walk within a thread). In particular, several of the tests added in the previous few commits fail in our Windows CI environment. Through some guess-and-check pushing, I found that 3072 is still too much, but 2048 is OK. These are obviously vague heuristics, and there is nothing to promise that another system might not have trouble at even lower values. But it seems unlikely anybody will be too angry about a 2048-depth limit (this is close to the default max-pathname limit on Linux even for a pathological path like "a/a/a/..."). So let's just lower it. Some alternatives are: - configure separate defaults for Windows versus other platforms. - just skip the tests on Windows. This leaves Windows users with the annoying case that they can be crashed by running out of stack space, but there shouldn't be any security implications (they can't go deep enough to hit integer overflow problems). Since the original default was arbitrary, it seems less confusing to just lower it, keeping behavior consistent across platforms. Signed-off-by: Jeff King <peff@peff.net> --- Arguably this could be squashed into patch 5. But I thought that following the sequence of logic (from "4096 is probably OK" to "whoops, it's not") had some value to share. AFAIK GitHub is still running with the 4096 limit; I discovered the Windows issue only while preparing this for upstream. But I still find it highly unlikely that somebody would run into it in practice. environment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)