mbox series

[0/10] tree name and depth limits

Message ID 20230831061735.GA2702156@coredump.intra.peff.net (mailing list archive)
Headers show
Series tree name and depth limits | expand

Message

Jeff King Aug. 31, 2023, 6:17 a.m. UTC
There aren't currently any limits on the size of pathnames in tree
objects, nor of the depth of recursive trees. This can create two
problems:

  1. We process pathnames from trees as strings in lots of places, and
     we have a history of using "int" to access strings. I don't know of
     a specific problem area, but I have little doubt there is a spot
     where you can convince an "int" to overflow with a 2GB tree.

     In the long run it would be nice to fix all of that code to use
     size_t. And hopefully other audits to use st_add(), etc, for
     allocations mean that we'd avoid actual buffer overflows. But in
     the meantime, it sure would be nice to have another layer
     protecting us.

  2. Because we recurse for most tree algorithms, it's easy for a
     malicious tree to run you out of stack space and cause a segfault.
     By itself this isn't a security problem, but it's confusing for
     users. And if you happen to run a public hosting site and monitor
     your segfaults, it's awfully annoying to investigate each one to
     see if it's the first sign of a real attack, or if somebody is just
     throwing garbage at you.

So here are some patches I wrote for GitHub several years ago. They add
limits on the size of a single pathname component, as well as on the
depth we're willing to recurse when handling trees.

I kind of hate them, because I hate arbitrary limits. But I also think
they can do some good in practice, and the default limits are set such
that I doubt anybody will ever notice, let alone complain (they have
been in place on github.com for 4+ years, and I don't recall ever
getting any feedback from users who were not security researchers poking
at the system).

And most operating systems have arbitrary limits (that are much
smaller!) anyway. So if you plan to do exotic things like "run
git-checkout", you'd run into problems way before you hit either of
these.

So IMHO they'll do more good than harm.

The first three patches are cleanups / prep.

Patch 4 is the name limit.

Patches 5-10 implement the depth limits.

  [01/10]: tree-walk: reduce stack size for recursive functions
  [02/10]: tree-walk: drop MAX_TRAVERSE_TREES macro
  [03/10]: tree-walk: rename "error" variable
  [04/10]: fsck: detect very large tree pathnames
  [05/10]: add core.maxTreeDepth config
  [06/10]: traverse_trees(): respect max_allowed_tree_depth
  [07/10]: read_tree(): respect max_allowed_tree_depth
  [08/10]: list-objects: respect max_allowed_tree_depth
  [09/10]: tree-diff: respect max_allowed_tree_depth
  [10/10]: lower core.maxTreeDepth default to 2048

 Documentation/config/core.txt |  6 +++
 Documentation/fsck-msgids.txt |  7 +++
 config.c                      |  5 ++
 environment.c                 |  1 +
 environment.h                 |  1 +
 fsck.c                        | 24 ++++++++-
 fsck.h                        |  1 +
 list-objects.c                |  8 +++
 sparse-index.c                |  2 +-
 t/t1450-fsck.sh               | 10 ++++
 t/t6700-tree-depth.sh         | 93 +++++++++++++++++++++++++++++++++++
 tree-diff.c                   | 23 ++++++---
 tree-walk.c                   | 20 +++++---
 tree-walk.h                   |  2 -
 tree.c                        |  9 +++-
 tree.h                        |  1 +
 unpack-trees.c                |  9 +++-
 unpack-trees.h                |  2 +-
 wt-status.c                   |  2 +-
 19 files changed, 201 insertions(+), 25 deletions(-)
 create mode 100755 t/t6700-tree-depth.sh