mbox series

[v2,0/11] limit memory allocations for v2 servers

Message ID YUE1alo58cGyTw6/@coredump.intra.peff.net (mailing list archive)
Headers show
Series limit memory allocations for v2 servers | expand

Message

Jeff King Sept. 14, 2021, 11:51 p.m. UTC
Here's a re-roll of my series to limit the memory a v2 server is willing
to allocate on behalf of a client. See v1:

  https://lore.kernel.org/git/YUC%2F6n1hhUbMJiLw@coredump.intra.peff.net/

for an overview. The existing patches are mostly small fixups pointed
out by reviewers (thanks!), but I did take Martin's TOO_MANY_PREFIXES
suggestion in patch 7 (without the change to the name of the constant it
seemed too clever to me, but with it it seems just the right amount of
clever).

There are two new patches here:

 - patch 8 explicitly documents the v2 ref-prefix "best effort" behavior

 - patch 11 teaches ls-refs to reject bogus arguments, which violates
   the spec (the current behavior caused a broken test to go unnoticed)

Full range-diff is below the diffstat.

  [01/11]: serve: rename is_command() to parse_command()
  [02/11]: serve: return capability "value" from get_capability()
  [03/11]: serve: add "receive" method for v2 capabilities table
  [04/11]: serve: provide "receive" function for object-format capability
  [05/11]: serve: provide "receive" function for session-id capability
  [06/11]: serve: drop "keys" strvec
  [07/11]: ls-refs: ignore very long ref-prefix counts
  [08/11]: docs/protocol-v2: clarify some ls-refs ref-prefix details
  [09/11]: serve: reject bogus v2 "command=ls-refs=foo"
  [10/11]: serve: reject commands used as capabilities
  [11/11]: ls-refs: reject unknown arguments

 Documentation/technical/protocol-v2.txt |   6 +-
 ls-refs.c                               |  22 ++++-
 serve.c                                 | 119 +++++++++++++-----------
 t/t5701-git-serve.sh                    |  73 +++++++++++++++
 4 files changed, 162 insertions(+), 58 deletions(-)

 1:  eb8e7b21a1 =  1:  eb8e7b21a1 serve: rename is_command() to parse_command()
 2:  8cc66cae41 =  2:  8cc66cae41 serve: return capability "value" from get_capability()
 3:  3343f9bb0a =  3:  3343f9bb0a serve: add "receive" method for v2 capabilities table
 4:  c4cc80fe7a !  4:  0319b69881 serve: provide "receive" function for object-format capability
    @@ Commit message
         all capabilities (because they might not have sent an object-format line
         at all, and we still have to check that the default matches our
         repository algorithm). Since the check_algorithm() function would now be
    -    done to a single if() statement, I've just inlined it in its only
    +    down to a single if() statement, I've just inlined it in its only
         caller.
     
         There should be no change of behavior here, except for two
 5:  c8527ca5a7 =  5:  efe207c35c serve: provide "receive" function for session-id capability
 6:  250e4723ba !  6:  463aa7faa3 serve: drop "keys" strvec
    @@ serve.c: static int process_request(void)
      			else
      				die("unknown capability '%s'", reader.line);
      
    -+
    - 			/* Consume the peeked line */
    - 			packet_reader_read(&reader);
    - 			break;
     @@ serve.c: static int process_request(void)
      			 * If no command and no keys were given then the client
      			 * wanted to terminate the connection.
 7:  1218d62247 !  7:  da2043f42f ls-refs: ignore very long ref-prefix counts
    @@ ls-refs.c: static void ensure_config_read(void)
      }
      
     +/*
    -+ * The maximum number of "ref-prefix" lines we'll allow the client to send.
    -+ * If they go beyond this, we'll avoid using the prefix feature entirely.
    ++ * If we see this many or more "ref-prefix" lines from the client, we consider
    ++ * it "too many" and will avoid using the prefix feature entirely.
     + */
    -+#define MAX_ALLOWED_PREFIXES 65536
    ++#define TOO_MANY_PREFIXES 65536
     +
      /*
       * Check if one of the prefixes is a prefix of the ref.
       * If no prefixes were provided, all refs match.
    -@@ ls-refs.c: static int ls_refs_config(const char *var, const char *value, void *data)
    - int ls_refs(struct repository *r, struct packet_reader *request)
    - {
    - 	struct ls_refs_data data;
    -+	int too_many_prefixes = 0;
    - 
    - 	memset(&data, 0, sizeof(data));
    - 	strvec_init(&data.prefixes);
     @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      			data.peel = 1;
      		else if (!strcmp("symrefs", arg))
      			data.symrefs = 1;
     -		else if (skip_prefix(arg, "ref-prefix ", &out))
     -			strvec_push(&data.prefixes, out);
     +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
    -+			if (too_many_prefixes) {
    -+				/* ignore any further ones */
    -+			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
    -+				strvec_clear(&data.prefixes);
    -+				too_many_prefixes = 1;
    -+			} else {
    ++			if (data.prefixes.nr < TOO_MANY_PREFIXES)
     +				strvec_push(&data.prefixes, out);
    -+			}
     +		}
      		else if (!strcmp("unborn", arg))
      			data.unborn = allow_unborn;
      	}
    + 
    + 	if (request->status != PACKET_READ_FLUSH)
    + 		die(_("expected flush after ls-refs arguments"));
    + 
    ++	/*
    ++	 * If we saw too many prefixes, we must avoid using them at all; as
    ++	 * soon as we have any prefix, they are meant to form a comprehensive
    ++	 * list.
    ++	 */
    ++	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
    ++		strvec_clear(&data.prefixes);
    ++
    + 	send_possibly_unborn_head(&data);
    + 	if (!data.prefixes.nr)
    + 		strvec_push(&data.prefixes, "");
     
      ## t/t5701-git-serve.sh ##
     @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
    @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
      
     +test_expect_success 'ignore very large set of prefixes' '
     +	# generate a large number of ref-prefixes that we expect
    -+	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
    ++	# to match nothing; the value here exceeds TOO_MANY_PREFIXES
     +	# from ls-refs.c.
     +	{
     +		echo command=ls-refs &&
     +		echo object-format=$(test_oid algo)
     +		echo 0001 &&
    -+		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
    ++		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
     +		echo 0000
     +	} |
     +	test-tool pkt-line pack >in &&
 -:  ---------- >  8:  ee540a4ef7 docs/protocol-v2: clarify some ls-refs ref-prefix details
 8:  b1567fdc82 =  9:  481c07cfac serve: reject bogus v2 "command=ls-refs=foo"
 9:  9786b9a11f = 10:  dff965c1d2 serve: reject commands used as capabilities
 -:  ---------- > 11:  f7339f924b ls-refs: reject unknown arguments

Comments

Taylor Blau Sept. 15, 2021, 4:17 a.m. UTC | #1
On Tue, Sep 14, 2021 at 07:51:06PM -0400, Jeff King wrote:
> Here's a re-roll of my series to limit the memory a v2 server is willing
> to allocate on behalf of a client. See v1:
>
>   https://lore.kernel.org/git/YUC%2F6n1hhUbMJiLw@coredump.intra.peff.net/
>
> for an overview. The existing patches are mostly small fixups pointed
> out by reviewers (thanks!), but I did take Martin's TOO_MANY_PREFIXES
> suggestion in patch 7 (without the change to the name of the constant it
> seemed too clever to me, but with it it seems just the right amount of
> clever).

Thanks for this. The two new patches look good to me, and I took a light
skim over the ones which were modified since v1. Everything I saw seemed
very reasonable to me.

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor
Jeff King Sept. 15, 2021, 6:33 p.m. UTC | #2
OK, hopefully third time's the charm. The changes from v2 are pretty
minor:

 - test typos noticed by Eric (none affected the outcome of the tests)

 - I added "object-format=$(test_oid algo)" to the input of a few of the
   new tests even where it doesn't change the outcome, for consistency
   with the existing tests

 - dropped out-dated "collect" comment noticed by Junio

 - explained the command=ls-refs=foo case a little further, including
   specific references in how it violates the spec

Range-diff is below.

  [01/11]: serve: rename is_command() to parse_command()
  [02/11]: serve: return capability "value" from get_capability()
  [03/11]: serve: add "receive" method for v2 capabilities table
  [04/11]: serve: provide "receive" function for object-format capability
  [05/11]: serve: provide "receive" function for session-id capability
  [06/11]: serve: drop "keys" strvec
  [07/11]: ls-refs: ignore very long ref-prefix counts
  [08/11]: docs/protocol-v2: clarify some ls-refs ref-prefix details
  [09/11]: serve: reject bogus v2 "command=ls-refs=foo"
  [10/11]: serve: reject commands used as capabilities
  [11/11]: ls-refs: reject unknown arguments

 Documentation/technical/protocol-v2.txt |   6 +-
 ls-refs.c                               |  22 ++++-
 serve.c                                 | 120 +++++++++++++-----------
 t/t5701-git-serve.sh                    |  75 +++++++++++++++
 4 files changed, 164 insertions(+), 59 deletions(-)

 1:  e642ced1e8 =  1:  71003eb21a serve: rename is_command() to parse_command()
 2:  75d119ae49 =  2:  1e86c31477 serve: return capability "value" from get_capability()
 3:  9fb21cab1d =  3:  12a159d5c8 serve: add "receive" method for v2 capabilities table
 4:  5b661c9ed5 =  4:  f5c29b5cdf serve: provide "receive" function for object-format capability
 5:  34e0ce5c12 =  5:  063cb60d1e serve: provide "receive" function for session-id capability
 6:  8e42fa9aec !  6:  0ed0b946ea serve: drop "keys" strvec
    @@ serve.c: static int process_request(void)
      
      	packet_reader_init(&reader, 0, NULL, 0,
     @@ serve.c: static int process_request(void)
    - 			/* collect request; a sequence of keys and values */
    + 		case PACKET_READ_EOF:
    + 			BUG("Should have already died when seeing EOF");
    + 		case PACKET_READ_NORMAL:
    +-			/* collect request; a sequence of keys and values */
      			if (parse_command(reader.line, &command) ||
      			    receive_client_capability(reader.line))
     -				strvec_push(&keys, reader.line);
 7:  de7ac11ad3 !  7:  a9392d0e68 ls-refs: ignore very long ref-prefix counts
    @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
     +	# from ls-refs.c.
     +	{
     +		echo command=ls-refs &&
    -+		echo object-format=$(test_oid algo)
    ++		echo object-format=$(test_oid algo) &&
     +		echo 0001 &&
     +		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
     +		echo 0000
 8:  3f78422fd3 =  8:  0a982f676a docs/protocol-v2: clarify some ls-refs ref-prefix details
 9:  6c55a7412d !  9:  553db6f83e serve: reject bogus v2 "command=ls-refs=foo"
    @@ Commit message
         When we see a line from the client like "command=ls-refs", we parse
         everything after the equals sign as a capability, which we check against
         our capabilities table. If we don't recognize the command (e.g.,
    -    "command=foo"), we'll reject it. But we use the same parser that checks
    -    for regular capabilities like "object-format=sha256". And so we'll
    -    accept "ls-refs=foo", even though everything after the equals is bogus,
    -    and simply ignored.
    +    "command=foo"), we'll reject it.
     
    -    This isn't really hurting anything, but the request does violate the
    -    spec. Let's tighten it up to prevent any surprising behavior.
    +    But in parse_command(), we use the same get_capability() parser for
    +    parsing non-command lines. So if we see "command=ls-refs=foo", we will
    +    feed "ls-refs=foo" to get_capability(), which will say "OK, that's
    +    ls-refs, with value 'foo'". But then we simply ignore the value
    +    entirely.
    +
    +    The client is violating the spec here, which says:
    +
    +          command = PKT-LINE("command=" key LF)
    +          key = 1*(ALPHA | DIGIT | "-_")
    +
    +    I.e., the key is not even allowed to have an equals sign in it. Whereas
    +    a real non-command capability does allow a value:
    +
    +          capability = PKT-LINE(key[=value] LF)
    +
    +    So by reusing the same get_capability() parser, we are mixing up the
    +    "key" and "capability" tokens. However, since that parser tells us
    +    whether it saw an "=", we can still use it; we just reject any input
    +    that produces a non-NULL value field.
    +
    +    The current behavior isn't really hurting anything (the client should
    +    never send such a request, and if it does, we just ignore the "value"
    +    part). But since it does violate the spec, let's tighten it up to
    +    prevent any surprising behavior.
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
      '
      
     +test_expect_success 'requested command is command=value' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=ls-refs=whatever
     +	object-format=$(test_oid algo)
     +	0000
10:  bbb12669b9 ! 10:  93216b9eaa serve: reject commands used as capabilities
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
      '
      
     +test_expect_success 'request capability as command' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=agent
    ++	object-format=$(test_oid algo)
     +	0000
     +	EOF
     +	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
     +	grep invalid.command.*agent err
     +'
     +
     +test_expect_success 'request command as capability' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=ls-refs
    ++	object-format=$(test_oid algo)
     +	fetch
     +	0000
     +	EOF
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
     +'
     +
      test_expect_success 'requested command is command=value' '
    - 	test-tool pkt-line pack >in <<-\EOF &&
    + 	test-tool pkt-line pack >in <<-EOF &&
      	command=ls-refs=whatever
11:  375a85b9a6 = 11:  30802eeb54 ls-refs: reject unknown arguments