From patchwork Fri Mar 22 00:03:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599479 Received: from mail-0201.mail-europe.com (mail-0201.mail-europe.com [51.77.79.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C400A382 for ; Fri, 22 Mar 2024 00:03:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=51.77.79.158 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065830; cv=none; b=RY0OrGDPcgu0VWIRnIN8mYeYKv4AfkB9mmfLBL2x3Rk8n6mFkvipmBcL9Rjb00sDf31Riu3D4JTNs6DlFqKndHCJae7rlUHUG03GRRbC79zxHdHXE61/NhN07sHsC0TyP6wDrWUcA92AS2gauEB+aKkfg5Y/rDXqmqtIs95rrKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065830; c=relaxed/simple; bh=1GbNgUyDCplqyTnvHAoUhdGYCjwPnF19OUev67GHLGg=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=I1perllkCPEbcRLh7tc9VounKPRxDhOIZKMV6GsT71CAT/hVW2drorz3Ixo+MTORIHQ6SY2cRS7AqpsTjOaOkrgW5gHR5I5YSFi/0t+g+SEvAifgN/rVSMKG6+XLmkOZNGys8u6Y0lktLhs+MRRZTia8dJpf7tiRyfyoBO2X00Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=kiYSW2QC; arc=none smtp.client-ip=51.77.79.158 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="kiYSW2QC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065809; x=1711325009; bh=GjqLYmMSnhOYoT978jGShstq0zZCJZ5wmFPJga6PsHM=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=kiYSW2QCjParBtUySX2Qg80D5vty1Kf6qfBUkcRTKgVcpFPv3wcsJxwwjGASiLCtc 8mDjbrozd5vBBdCSzMrh5QgqNDq21TnhmpZQQlvr3ZTXzmo64Ut0t54e5VYFfI/VVc ZHL6cGNYy6y9cvI5BIfbfOEJssCloW3wT9SKTpjDZkAMiHz3ZZoejbb7L+U1qylEAm GlcQIDk0E/bM9EPxSmrtL3Cp+4XxMLP6eD+qxw9aSNgrvpqqmNylek/i2wGogw4eFX lce0CBbgd52RX+67xIiOMKii1HY6DQrYC+zIHOye1Y461lCWBz3LN68PJmWBFCJ1un iKvBfL/uli+3A== Date: Fri, 22 Mar 2024 00:03:18 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 1/6] fast-import: tighten parsing of paths Message-ID: <20240322000304.76810-2-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Path parsing in fast-import is inconsistent and many unquoting errors are suppressed. `` appears in the grammar in these places: filemodify ::= 'M' SP ( | 'inline') SP LF filedelete ::= 'D' SP LF filecopy ::= 'C' SP SP LF filerename ::= 'R' SP SP LF ls ::= 'ls' SP SP LF ls-commit ::= 'ls' SP LF and fast-import.c parses them in five different ways: 1. For filemodify and filedelete: If `` is a valid quoted string, unquote it; otherwise, treat it as literal bytes (including any number of SP). 2. For filecopy (source) and filerename (source): If `` is a valid quoted string, unquote it; otherwise, treat it as literal bytes until the next SP. 3. For filecopy (dest) and filerename (dest): Like 1., but an unquoted empty string is an error. 4. For ls: If `` starts with `"`, unquote it and report parse errors; otherwise, treat it as literal bytes (including any number of SP). 5. For ls-commit: Unquote `` and report parse errors. (It must start with `"` to disambiguate from ls.) In the first three, any errors from trying to unquote a string are suppressed, so a quoted string that contains invalid escapes would be interpreted as literal bytes. For example, `"\xff"` would fail to unquote (because hex escapes are not supported), and it would instead be interpreted as the byte sequence `"` `\` `x` `f` `f` `"`, which is certainly not intended. Some front-ends erroneously use their language's standard quoting routine and could silently introduce escapes that would be incorrectly parsed due to this. The documentation states that “To use a source path that contains SP the path must be quoted.”, so it is expected that some implementations depend on spaces being allowed in paths in the final position. Thus we have two documented ways to parse paths, so simplify the implementation to that. Now we have: 1. `parse_path_eol` for filemodify, filedelete, filecopy (dest), filerename (dest), ls, and ls-commit: If `` starts with `"`, unquote it and report parse errors; otherwise, treat it as literal bytes (including any number of SP). Garbage after a quoted string or an unquoted empty string are errors. (In ls-commit, it must be quoted to disambiguate from ls.) 2. `parse_path_space` for filecopy (source) and filerename (source): If `` starts with `"`, unquote it and report parse errors; otherwise, treat it as literal bytes until the next SP. It must be followed by a SP. An unquoted empty string is an error. Signed-off-by: Thalia Archibald --- Documentation/git-fast-import.txt | 3 +- builtin/fast-import.c | 115 ++++++++------ t/t9300-fast-import.sh | 252 +++++++++++++++++++++++++++++- 3 files changed, 316 insertions(+), 54 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index b2607366b9..271bd63a10 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -649,7 +649,8 @@ The value of `` must be in canonical form. That is it must not: * contain the special component `.` or `..` (e.g. `foo/./bar` and `foo/../bar` are invalid). -The root of the tree can be represented by an empty string as ``. +The root of the tree can be represented by a quoted empty string (`""`) +as ``. It is recommended that `` always be encoded using UTF-8. diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 71a195ca22..b2adec8d9a 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2224,7 +2224,7 @@ static int parse_mapped_oid_hex(const char *hex, struct object_id *oid, const ch * * idnum ::= ':' bigint; * - * Return the first character after the value in *endptr. + * Update *endptr to point to the first character after the value. * * Complain if the following character is not what is expected, * either a space or end of the string. @@ -2257,8 +2257,8 @@ static uintmax_t parse_mark_ref_eol(const char *p) } /* - * Parse the mark reference, demanding a trailing space. Return a - * pointer to the space. + * Parse the mark reference, demanding a trailing space. Update *p to + * point to the first character after the space. */ static uintmax_t parse_mark_ref_space(const char **p) { @@ -2272,10 +2272,57 @@ static uintmax_t parse_mark_ref_space(const char **p) return mark; } +/* + * Parse the path string into the strbuf. It may be quoted with escape sequences + * or unquoted without escape sequences. When unquoted, it may only contain a + * space if `allow_spaces` is nonzero. + */ +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field) +{ + strbuf_reset(sb); + if (*p == '"') { + if (unquote_c_style(sb, p, endp)) + die("Invalid %s: %s", field, command_buf.buf); + } else { + if (allow_spaces) + *endp = p + strlen(p); + else + *endp = strchr(p, ' '); + if (*endp == p) + die("Missing %s: %s", field, command_buf.buf); + strbuf_add(sb, p, *endp - p); + } +} + +/* + * Parse the path string into the strbuf, and complain if this is not the end of + * the string. It may contain spaces even when unquoted. + */ +static void parse_path_eol(struct strbuf *sb, const char *p, const char *field) +{ + const char *end; + + parse_path(sb, p, &end, 1, field); + if (*end) + die("Garbage after %s: %s", field, command_buf.buf); +} + +/* + * Parse the path string into the strbuf, and ensure it is followed by a space. + * It may not contain spaces when unquoted. Update *endp to point to the first + * character after the space. + */ +static void parse_path_space(struct strbuf *sb, const char *p, const char **endp, const char *field) +{ + parse_path(sb, p, endp, 0, field); + if (**endp != ' ') + die("Missing space after %s: %s", field, command_buf.buf); + (*endp)++; +} + static void file_change_m(const char *p, struct branch *b) { static struct strbuf uq = STRBUF_INIT; - const char *endp; struct object_entry *oe; struct object_id oid; uint16_t mode, inline_data = 0; @@ -2312,12 +2359,8 @@ static void file_change_m(const char *p, struct branch *b) die("Missing space after SHA1: %s", command_buf.buf); } - strbuf_reset(&uq); - if (!unquote_c_style(&uq, p, &endp)) { - if (*endp) - die("Garbage after path in: %s", command_buf.buf); - p = uq.buf; - } + parse_path_eol(&uq, p, "path"); + p = uq.buf; /* Git does not track empty, non-toplevel directories. */ if (S_ISDIR(mode) && is_empty_tree_oid(&oid) && *p) { @@ -2381,48 +2424,23 @@ static void file_change_m(const char *p, struct branch *b) static void file_change_d(const char *p, struct branch *b) { static struct strbuf uq = STRBUF_INIT; - const char *endp; - strbuf_reset(&uq); - if (!unquote_c_style(&uq, p, &endp)) { - if (*endp) - die("Garbage after path in: %s", command_buf.buf); - p = uq.buf; - } + parse_path_eol(&uq, p, "path"); + p = uq.buf; tree_content_remove(&b->branch_tree, p, NULL, 1); } -static void file_change_cr(const char *s, struct branch *b, int rename) +static void file_change_cr(const char *p, struct branch *b, int rename) { - const char *d; + const char *s, *d; static struct strbuf s_uq = STRBUF_INIT; static struct strbuf d_uq = STRBUF_INIT; - const char *endp; struct tree_entry leaf; - strbuf_reset(&s_uq); - if (!unquote_c_style(&s_uq, s, &endp)) { - if (*endp != ' ') - die("Missing space after source: %s", command_buf.buf); - } else { - endp = strchr(s, ' '); - if (!endp) - die("Missing space after source: %s", command_buf.buf); - strbuf_add(&s_uq, s, endp - s); - } + parse_path_space(&s_uq, p, &p, "source"); + parse_path_eol(&d_uq, p, "dest"); s = s_uq.buf; - - endp++; - if (!*endp) - die("Missing dest: %s", command_buf.buf); - - d = endp; - strbuf_reset(&d_uq); - if (!unquote_c_style(&d_uq, d, &endp)) { - if (*endp) - die("Garbage after dest in: %s", command_buf.buf); - d = d_uq.buf; - } + d = d_uq.buf; memset(&leaf, 0, sizeof(leaf)); if (rename) @@ -3168,6 +3186,7 @@ static void parse_ls(const char *p, struct branch *b) { struct tree_entry *root = NULL; struct tree_entry leaf = {NULL}; + static struct strbuf uq = STRBUF_INIT; /* ls SP ( SP)? */ if (*p == '"') { @@ -3182,16 +3201,8 @@ static void parse_ls(const char *p, struct branch *b) root->versions[1].mode = S_IFDIR; load_tree(root); } - if (*p == '"') { - static struct strbuf uq = STRBUF_INIT; - const char *endp; - strbuf_reset(&uq); - if (unquote_c_style(&uq, p, &endp)) - die("Invalid path: %s", command_buf.buf); - if (*endp) - die("Garbage after path in: %s", command_buf.buf); - p = uq.buf; - } + parse_path_eol(&uq, p, "path"); + p = uq.buf; tree_content_get(root, p, &leaf, 1); /* * A directory in preparation would have a sha1 of zero diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index dbb5042b0b..ef04b43f46 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -2146,6 +2146,7 @@ test_expect_success 'Q: deny note on empty branch' ' EOF test_must_fail git fast-import has a +# special case. Test every occurrence of in the grammar against every +# error case. +# + +# +# Valid paths at the end of a line: filemodify, filedelete, filecopy (dest), +# filerename (dest), and ls. +# +# commit :301 from root -- modify hello.c +# commit :302 from :301 -- modify $path +# commit :303 from :302 -- delete $path +# commit :304 from :301 -- copy hello.c $path +# commit :305 from :301 -- rename hello.c $path +# ls :305 $path +# +test_path_eol_success () { + test="$1" path="$2" unquoted_path="$3" + test_expect_success "S: paths at EOL with $test must work" ' + git fast-import --export-marks=marks.out <<-EOF >out 2>err && + blob + mark :401 + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data <tree_m.exp && + git ls-tree $commit_m | sort >tree_m.out && + test_cmp tree_m.exp tree_m.out && + + printf "100644 blob $blob1\thello.c\n" >tree_d.exp && + git ls-tree $commit_d >tree_d.out && + test_cmp tree_d.exp tree_d.out && + + ( printf "100644 blob $blob1\t'"$unquoted_path"'\n" && + printf "100644 blob $blob1\thello.c\n" ) | sort >tree_c.exp && + git ls-tree $commit_c | sort >tree_c.out && + test_cmp tree_c.exp tree_c.out && + + printf "100644 blob $blob1\t'"$unquoted_path"'\n" >tree_r.exp && + git ls-tree $commit_r >tree_r.out && + test_cmp tree_r.exp tree_r.out && + + test_cmp out tree_r.exp && + + git branch -D path-eol + ' +} + +test_path_eol_success 'quoted spaces' '" hello world.c "' ' hello world.c ' +test_path_eol_success 'unquoted spaces' ' hello world.c ' ' hello world.c ' + +# +# Valid paths before a space: filecopy (source) and filerename (source). +# +# commit :301 from root -- modify $path +# commit :302 from :301 -- copy $path hello2.c +# commit :303 from :301 -- rename $path hello2.c +# +test_path_space_success () { + test="$1" path="$2" unquoted_path="$3" + test_expect_success "S: paths before space with $test must work" ' + git fast-import --export-marks=marks.out <<-EOF 2>err && + blob + mark :401 + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data <tree_c.exp && + git ls-tree $commit_c | sort >tree_c.out && + test_cmp tree_c.exp tree_c.out && + + printf "100644 blob $blob\thello2.c\n" >tree_r.exp && + git ls-tree $commit_r >tree_r.out && + test_cmp tree_r.exp tree_r.out && + + git branch -D path-space + ' +} + +test_path_space_success 'quoted spaces' '" hello world.c "' ' hello world.c ' +test_path_space_success 'no unquoted spaces' 'hello_world.c' 'hello_world.c' + +# +# Test a single commit change with an invalid path. Run it with all occurrences +# of in the grammar against all error kinds. +# +test_path_fail () { + what="$1" path="$2" err_grep="$3" + test_expect_success "S: $change with $what must fail" ' + test_must_fail git fast-import <<-EOF 2>err && + blob + mark :1 + data < $GIT_COMMITTER_DATE + data < $GIT_COMMITTER_DATE + data <, the must be quoted. +change='ls (without tree-ish in commit)' prefix='ls ' field=path suffix='' \ +test_path_eol_quoted_fail && +test_path_fail 'empty unquoted path' '' "Invalid dataref" + ### ### series T (ls) ### From patchwork Fri Mar 22 00:03:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599480 Received: from mail-0201.mail-europe.com (mail-0201.mail-europe.com [51.77.79.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EE812F46 for ; Fri, 22 Mar 2024 00:03:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=51.77.79.158 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065833; cv=none; b=MdmfWZ6k/DW9FFm3bsueDLXNIWHuNeoQPkR3DpEHzCQlIzYaa0YEhW1Gp0fhU4TsXq9+hvaxVJLCbfHuKSXxqI/dmNKN8jhJJFuokRSFE1HIUXHa3M81k3UeXZXGDh/qdMZiURYLEPTapUnSOYS1eYfD4GYLFa3Tr1v9cdZwf3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065833; c=relaxed/simple; bh=guTgMK5fn+GJSbO9d3qPZucMKBTloDk8TIpL4ZNO34k=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YBQiOqssV7CezfqXrG8oO1sblRuDQsYLkmcm5JwcQPusvEp3ojhVUkVuW+0cNc3oKKdKVIK30DfirfGmdHxOgwwyCdFAoYfvlRG7UOu16DcYwIrHdgWUR3U6WMWovwVCfxuyUdQamTicifZgjZ1w/FVGH2Jxm9NDN4SR+WJ7vJA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=Bc94Rupy; arc=none smtp.client-ip=51.77.79.158 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="Bc94Rupy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065822; x=1711325022; bh=Oenh1LklGYfd/Ee/HzkxO6i7XQEo05IcD9zsRgYPeT0=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=Bc94Rupyi+SN6fhmpLlYU8+PMfkMUNVrtCnFrEMs8vemSboamHy2TTc56Arf8S9ds HKkRIb2x3wCWUOyp3Sg5vUfHvFs2RpwQ9vsKxdfqzqBaSKA9uS8aWR/l2XDfpJ+SKP GwPkGsBjX6jAg7O30Sx+NJlnrRyvp2SyuZogX1JQk126XgTfxrGChCh+2wwAUD9ldk wARmIDZMEdeihhdBomjMWq/MT9deU6UbpaC436zct5DzZbixhUtZao1y7AIFj1ud3p roi0Ms4JoX616tKhbHNKYFU2mwtsF0LXjJwjem9AUz5BwojAaYZn4XVqp84NRNAaaC +TNJ1xBKg+25g== Date: Fri, 22 Mar 2024 00:03:25 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 2/6] fast-import: directly use strbufs for paths Message-ID: <20240322000304.76810-3-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Previously, one case would not write the path to the strbuf: when the path is unquoted and at the end of the string. It was essentially copy-on-write. However, with the logic simplification of the previous commit, this case was eliminated and the strbuf is always populated. Directly use the strbufs now instead of an alias. Since this already changes all the lines that use the strbufs, rename them from `uq` to be more descriptive. That they are unquoted is not their most important property, so name them after what they carry. Additionally, `file_change_m` no longer needs to copy the path before reading inline data. Signed-off-by: Thalia Archibald --- builtin/fast-import.c | 54 ++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index b2adec8d9a..1b3d6784c1 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2322,7 +2322,7 @@ static void parse_path_space(struct strbuf *sb, const char *p, const char **endp static void file_change_m(const char *p, struct branch *b) { - static struct strbuf uq = STRBUF_INIT; + static struct strbuf path = STRBUF_INIT; struct object_entry *oe; struct object_id oid; uint16_t mode, inline_data = 0; @@ -2359,12 +2359,11 @@ static void file_change_m(const char *p, struct branch *b) die("Missing space after SHA1: %s", command_buf.buf); } - parse_path_eol(&uq, p, "path"); - p = uq.buf; + parse_path_eol(&path, p, "path"); /* Git does not track empty, non-toplevel directories. */ - if (S_ISDIR(mode) && is_empty_tree_oid(&oid) && *p) { - tree_content_remove(&b->branch_tree, p, NULL, 0); + if (S_ISDIR(mode) && is_empty_tree_oid(&oid) && *path.buf) { + tree_content_remove(&b->branch_tree, path.buf, NULL, 0); return; } @@ -2385,10 +2384,6 @@ static void file_change_m(const char *p, struct branch *b) if (S_ISDIR(mode)) die("Directories cannot be specified 'inline': %s", command_buf.buf); - if (p != uq.buf) { - strbuf_addstr(&uq, p); - p = uq.buf; - } while (read_next_command() != EOF) { const char *v; if (skip_prefix(command_buf.buf, "cat-blob ", &v)) @@ -2414,49 +2409,45 @@ static void file_change_m(const char *p, struct branch *b) command_buf.buf); } - if (!*p) { + if (!*path.buf) { tree_content_replace(&b->branch_tree, &oid, mode, NULL); return; } - tree_content_set(&b->branch_tree, p, &oid, mode, NULL); + tree_content_set(&b->branch_tree, path.buf, &oid, mode, NULL); } static void file_change_d(const char *p, struct branch *b) { - static struct strbuf uq = STRBUF_INIT; + static struct strbuf path = STRBUF_INIT; - parse_path_eol(&uq, p, "path"); - p = uq.buf; - tree_content_remove(&b->branch_tree, p, NULL, 1); + parse_path_eol(&path, p, "path"); + tree_content_remove(&b->branch_tree, path.buf, NULL, 1); } static void file_change_cr(const char *p, struct branch *b, int rename) { - const char *s, *d; - static struct strbuf s_uq = STRBUF_INIT; - static struct strbuf d_uq = STRBUF_INIT; + static struct strbuf source = STRBUF_INIT; + static struct strbuf dest = STRBUF_INIT; struct tree_entry leaf; - parse_path_space(&s_uq, p, &p, "source"); - parse_path_eol(&d_uq, p, "dest"); - s = s_uq.buf; - d = d_uq.buf; + parse_path_space(&source, p, &p, "source"); + parse_path_eol(&dest, p, "dest"); memset(&leaf, 0, sizeof(leaf)); if (rename) - tree_content_remove(&b->branch_tree, s, &leaf, 1); + tree_content_remove(&b->branch_tree, source.buf, &leaf, 1); else - tree_content_get(&b->branch_tree, s, &leaf, 1); + tree_content_get(&b->branch_tree, source.buf, &leaf, 1); if (!leaf.versions[1].mode) - die("Path %s not in branch", s); - if (!*d) { /* C "path/to/subdir" "" */ + die("Path %s not in branch", source.buf); + if (!*dest.buf) { /* C "path/to/subdir" "" */ tree_content_replace(&b->branch_tree, &leaf.versions[1].oid, leaf.versions[1].mode, leaf.tree); return; } - tree_content_set(&b->branch_tree, d, + tree_content_set(&b->branch_tree, dest.buf, &leaf.versions[1].oid, leaf.versions[1].mode, leaf.tree); @@ -3186,7 +3177,7 @@ static void parse_ls(const char *p, struct branch *b) { struct tree_entry *root = NULL; struct tree_entry leaf = {NULL}; - static struct strbuf uq = STRBUF_INIT; + static struct strbuf path = STRBUF_INIT; /* ls SP ( SP)? */ if (*p == '"') { @@ -3201,9 +3192,8 @@ static void parse_ls(const char *p, struct branch *b) root->versions[1].mode = S_IFDIR; load_tree(root); } - parse_path_eol(&uq, p, "path"); - p = uq.buf; - tree_content_get(root, p, &leaf, 1); + parse_path_eol(&path, p, "path"); + tree_content_get(root, path.buf, &leaf, 1); /* * A directory in preparation would have a sha1 of zero * until it is saved. Save, for simplicity. @@ -3211,7 +3201,7 @@ static void parse_ls(const char *p, struct branch *b) if (S_ISDIR(leaf.versions[1].mode)) store_tree(&leaf); - print_ls(leaf.versions[1].mode, leaf.versions[1].oid.hash, p); + print_ls(leaf.versions[1].mode, leaf.versions[1].oid.hash, path.buf); if (leaf.tree) release_tree_content_recursive(leaf.tree); if (!b || root != &b->branch_tree) From patchwork Fri Mar 22 00:03:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599478 Received: from mail-4317.proton.ch (mail-4317.proton.ch [185.70.43.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 468D638D for ; Fri, 22 Mar 2024 00:03:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.70.43.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065829; cv=none; b=TBg5zLPgi30c9mDq4nr6/JtOKP18soZ82PbEQmvhPWi8S/d7KEBkZyPJMkm9q6QklNBR2RNNvKW6fLuGuZ/6DY0dZgrQ8nCHe8NSvoGfURuKxmQUDQP/NS5QEfpPl9PkgCv3PWBeMDQtc49R8aPNWlkXojIvqHTEEVUULDMWqyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065829; c=relaxed/simple; bh=P7xUPbFPcsR4OH3Ix+vLPng3avVXijxaKs1jZjU06SM=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WHgisSfFH/9abtJBzuau9vWMeEwJLhfLk07lcL3S6kqWJq/6WuIA8B/456pb79PqTiHfdXP2/QvQXYU4Yo45l1dHU23rrEki/3qe/n0GE91EzvXSde8pmZDgblwtvCWCt3GaStdOwEIfyRmUDoHwU6MwuCg055T01tNrrRrT9gs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=uaOiD8qB; arc=none smtp.client-ip=185.70.43.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="uaOiD8qB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065825; x=1711325025; bh=P7xUPbFPcsR4OH3Ix+vLPng3avVXijxaKs1jZjU06SM=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=uaOiD8qB10zD0+g1MYhBC7VoW6OdyytkedW9c7K1bic1ImLNCxzGV1vPWDMMUJQZi ytCH1mVQjZBBi/993yWJ0wZOgVB/2sojZ2XLQaWYraePKgaeoU5CXpv3o8BgA7gLrQ fVXNKrttSrDWCiRDjwWVPjGjtS4YUUr9Pzk3WXbnjeAtl6uulRLukRQoyuZ8jWbjiF UH7UYMmSMKaO8d16o8+DZcqij3djILo8H5XhQHoWTBo7mWTMCPd+HeLuuJpirDPfBh 4u7A2U6+X/5pzNAr7ZUx5PhfdYBIKEyOqw+mgdzZgKfVKF5gP2mDxOxbOWZSdZwx4X tnNNRGJRT6pTA== Date: Fri, 22 Mar 2024 00:03:33 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 3/6] fast-import: release unfreed strbufs Message-ID: <20240322000304.76810-4-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 These strbufs are owned. Release them at the end of their scopes. Signed-off-by: Thalia Archibald --- builtin/fast-import.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 1b3d6784c1..d6f998f363 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2364,6 +2364,7 @@ static void file_change_m(const char *p, struct branch *b) /* Git does not track empty, non-toplevel directories. */ if (S_ISDIR(mode) && is_empty_tree_oid(&oid) && *path.buf) { tree_content_remove(&b->branch_tree, path.buf, NULL, 0); + strbuf_release(&path); return; } @@ -2409,11 +2410,11 @@ static void file_change_m(const char *p, struct branch *b) command_buf.buf); } - if (!*path.buf) { + if (*path.buf) + tree_content_set(&b->branch_tree, path.buf, &oid, mode, NULL); + else tree_content_replace(&b->branch_tree, &oid, mode, NULL); - return; - } - tree_content_set(&b->branch_tree, path.buf, &oid, mode, NULL); + strbuf_release(&path); } static void file_change_d(const char *p, struct branch *b) @@ -2422,6 +2423,7 @@ static void file_change_d(const char *p, struct branch *b) parse_path_eol(&path, p, "path"); tree_content_remove(&b->branch_tree, path.buf, NULL, 1); + strbuf_release(&path); } static void file_change_cr(const char *p, struct branch *b, int rename) @@ -2440,17 +2442,18 @@ static void file_change_cr(const char *p, struct branch *b, int rename) tree_content_get(&b->branch_tree, source.buf, &leaf, 1); if (!leaf.versions[1].mode) die("Path %s not in branch", source.buf); - if (!*dest.buf) { /* C "path/to/subdir" "" */ + if (*dest.buf) + tree_content_set(&b->branch_tree, dest.buf, + &leaf.versions[1].oid, + leaf.versions[1].mode, + leaf.tree); + else /* C "path/to/subdir" "" */ tree_content_replace(&b->branch_tree, &leaf.versions[1].oid, leaf.versions[1].mode, leaf.tree); - return; - } - tree_content_set(&b->branch_tree, dest.buf, - &leaf.versions[1].oid, - leaf.versions[1].mode, - leaf.tree); + strbuf_release(&source); + strbuf_release(&dest); } static void note_change_n(const char *p, struct branch *b, unsigned char *old_fanout) @@ -2804,6 +2807,7 @@ static void parse_new_commit(const char *arg) free(author); free(committer); free(encoding); + strbuf_release(&msg); if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) b->pack_id = pack_id; @@ -2886,6 +2890,7 @@ static void parse_new_tag(const char *arg) strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); free(tagger); + strbuf_release(&msg); if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, next_mark)) t->pack_id = MAX_PACK_ID; @@ -3171,6 +3176,7 @@ static void print_ls(int mode, const unsigned char *hash, const char *path) strbuf_addch(&line, '\n'); } cat_blob_write(line.buf, line.len); + strbuf_release(&line); } static void parse_ls(const char *p, struct branch *b) @@ -3206,6 +3212,7 @@ static void parse_ls(const char *p, struct branch *b) release_tree_content_recursive(leaf.tree); if (!b || root != &b->branch_tree) release_tree_entry(root); + strbuf_release(&path); } static void checkpoint(void) From patchwork Fri Mar 22 00:03:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599481 Received: from mail-0301.mail-europe.com (mail-0301.mail-europe.com [188.165.51.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BF6C4A01 for ; Fri, 22 Mar 2024 00:03:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=188.165.51.139 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065836; cv=none; b=dkD513+mEf6IQI/B8+rOiM4sn1GlIIdboBe0RDYlHRFthwCZwaQjeM0DXgaZuPqSeKZd9NEoc8C/osWvzs3H7D+2IN9ImqVpTaUJOKU2zgjX1NKnE572gWYEpXz5ycQqUDBvgbn2bKyPFzRFYn3/nmotUXz2ppbfFjVJ+qVncFc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065836; c=relaxed/simple; bh=r5RmSXB4E4WuYB2TksLftHHptaTjj+jLMtBMP+nYhGU=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EToV4SnQWor+ZDSspcJXYIfw/ffPDeIfKrPRHYU8dSA9ou87LAJV9WRK6lyMgwe8v/UWdk7An51CACTktPMnzLPfOfLnB5waSlgO4fDk3nTwjen2rLD7QdJsLwD3hQ9CCOX1r2BC3E8cnAPlMXdNFl8SiOER6tt6QEvhY/AqoF8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=SHw8msHk; arc=none smtp.client-ip=188.165.51.139 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="SHw8msHk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065825; x=1711325025; bh=r5RmSXB4E4WuYB2TksLftHHptaTjj+jLMtBMP+nYhGU=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=SHw8msHkDk1p+oIBAu9mOEbBy6bmg16Nj/3VMziO5u7SHRTeMvobDOJs/tvBpdksc s9ejE8GZ33IKxUGq6HKIzfGLNE8S/ib4HNOcitC751HuAHJ9rdx98PsOHMLfKTIIjV ddEulzXI0pjSarK8NxejsFf2my8n0JsaxhW0N+dpFS8lkIK+w5KHjsqbGxfWqL22uU EYY8XvHHkHkEZBCBjCoRusPzr4CkkAroSA5gM05GNax8/sLWsear+M2BygAjPcj8Y7 MEIMRzhu/KuN1Jb7rvgIrZL8l1fI5QeMK8wVybaEx076JfO1XYMMWtAIsXLPY15e+3 tT/ThtYEENDcw== Date: Fri, 22 Mar 2024 00:03:40 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 4/6] fast-import: remove dead strbuf Message-ID: <20240322000304.76810-5-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The strbuf in `note_change_n` has been unused since the function was created in a8dd2e7d2b (fast-import: Add support for importing commit notes, 2009-10-09) and looks to be a fossil from adapting `note_change_m`. Remove it. Signed-off-by: Thalia Archibald --- builtin/fast-import.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index d6f998f363..ae8494d0ac 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2458,7 +2458,6 @@ static void file_change_cr(const char *p, struct branch *b, int rename) static void note_change_n(const char *p, struct branch *b, unsigned char *old_fanout) { - static struct strbuf uq = STRBUF_INIT; struct object_entry *oe; struct branch *s; struct object_id oid, commit_oid; @@ -2523,10 +2522,6 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa die("Invalid ref name or SHA1 expression: %s", p); if (inline_data) { - if (p != uq.buf) { - strbuf_addstr(&uq, p); - p = uq.buf; - } read_next_command(); parse_and_store_blob(&last_blob, &oid, 0); } else if (oe) { From patchwork Fri Mar 22 00:03:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599483 Received: from mail-0201.mail-europe.com (mail-0201.mail-europe.com [51.77.79.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B63EA48 for ; Fri, 22 Mar 2024 00:04:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=51.77.79.158 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065858; cv=none; b=FQkR+ZVUqg7IwofLRUwxYwhnc/HxvkKjfwZVDPItEttEhrA+JKMCHPQgnbL+uiymbn2+XDrKcM5kztJ76e0S5Yg6XRmp4Mm63XswRRDYxLp17Sj3EsGv65XAJbZwUSgvj2mnx+kc0cx5xNGKHDxRK0+RAC1HdsYrC9MaWHB76Yg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065858; c=relaxed/simple; bh=vxmk4UhIrlJGXcxb1yNqJM60pW8TeGWjNVXzCmMUqko=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mnzhkxSno6PfAtjA0hAvxHOekshZ/cuCo21OrVIB7Jf5JJOmTIEvLQLQZdDI5YMNB/T7FjPm+OdJ43KBhKj29DFEyjspFHi44W6/NuCARyxdPQHOxS6paqemqOe8HEJ4MfZf7FAYlkFMRgFfs00sb6zH+sLw1fL5tSKthNTHCPQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=Xopya0eK; arc=none smtp.client-ip=51.77.79.158 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="Xopya0eK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065847; x=1711325047; bh=TmiGSefOnYVM6KMa3hvPOEUJOHJa6TzrAMAXpPg2504=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=Xopya0eKcxq4rCBc609VMZ3YLVnux6AabFVKhJh0IiXcdyybpuXkiYICn3YGNzOCG EPCdk8bEgr0VG4i+lfSA5TtE8fQ5UM8Dwf1tej7EVaOBAKVydk0TLLsZ3CInMNkYG1 zCnrXz6HurLt/Y17v2y6eEgfY55m6AQO6rixXr3UAuaftG95R5mNWVVlDzkebdUHyU 27Njq/VqOAT4OMhzPAy8JRXZLu5q1epE4SFv+oRvRjE3Wba6UMZHrenU0hiF9idysV wfAOAiuyeRSJLDK3SXvDrJ4wATYsalnqpueqsyCr1j0/blciohTMQlrJBwwpy+v4y4 9jnVGSU/f4a8A== Date: Fri, 22 Mar 2024 00:03:47 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 5/6] fast-import: document C-style escapes for paths Message-ID: <20240322000304.76810-6-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Simply saying “C-style” string quoting is imprecise, as only a subset of C escapes are supported. Document the exact escapes. Signed-off-by: Thalia Archibald --- Documentation/git-fast-import.txt | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 271bd63a10..4aa8ccbefd 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -630,18 +630,23 @@ in octal. Git only supports the following modes: In both formats `` is the complete path of the file to be added (if not already existing) or modified (if already existing). -A `` string must use UNIX-style directory separators (forward -slash `/`), may contain any byte other than `LF`, and must not -start with double quote (`"`). +A `` string may contain any byte other than `LF`, and must not +start with double quote (`"`). It is interpreted as literal bytes +without escaping. A path can use C-style string quoting; this is accepted in all cases and mandatory if the filename starts with double quote or contains -`LF`. In C-style quoting, the complete name should be surrounded with -double quotes, and any `LF`, backslash, or double quote characters -must be escaped by preceding them with a backslash (e.g., -`"path/with\n, \\ and \" in it"`). +`LF`. In C-style quoting, the complete name is surrounded with +double quotes (`"`) and certain characters must be escaped by preceding +them with a backslash: `LF` is written as `\n`, backslash as `\\`, and +double quote as `\"`. Additionally, some characters may may optionally +be written with escape sequences: `\a` for bell, `\b` for backspace, +`\f` for form feed, `\n` for line feed, `\r` for carriage return, `\t` +for horizontal tab, and `\v` for vertical tab. Any byte can be written +with 3-digit octal codes (e.g., `\033`). -The value of `` must be in canonical form. That is it must not: +A `` must use UNIX-style directory separators (forward slash `/`) +and must be in canonical form. That is it must not: * contain an empty directory component (e.g. `foo//bar` is invalid), * end with a directory separator (e.g. `foo/` is invalid), From patchwork Fri Mar 22 00:03:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thalia Archibald X-Patchwork-Id: 13599482 Received: from mail-4317.proton.ch (mail-4317.proton.ch [185.70.43.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D6526FA7 for ; Fri, 22 Mar 2024 00:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.70.43.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065854; cv=none; b=BRnR49vF1QApRI844nkWnp4AH/ES96Hfm34GN8J0NuNuVIwrD5EFG221601KjxUC4wuqDh99UCQZ+Jfh5nj4jnAgjWZvydB/ZPB2ZQH26PFUpnN1Aqc+5gxaJn81v8zu+1aA8sm9g1Tv6L94bVtXCRk70VWAVG1b+aiW+S6I4k4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711065854; c=relaxed/simple; bh=Cdxjvkcvts9GRDS6Mn5tch/ecFZG2SziGfTPEUMhQYE=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bVqpmlNdXZmSfCK7oAuJwQib7zI8MXkAaiG4jhRBNE9x190+CHDMs7LLHzxoprYzjnOJTprxNtYiQSwgMRpKzU3lDi+e0S5E1rQpMT/vKi37E1qznCGbc66As0deLBO0AhqxZpxJ/e2u3BNFOd8NEVFV5jwtF0pnwzB54VYeHaY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev; spf=pass smtp.mailfrom=archibald.dev; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b=qXMFPeHd; arc=none smtp.client-ip=185.70.43.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=archibald.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=archibald.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=archibald.dev header.i=@archibald.dev header.b="qXMFPeHd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=archibald.dev; s=protonmail3; t=1711065847; x=1711325047; bh=lnr5Yo35SRcJxBK8Gpcc0T08RtF1yx9dss8eTsKgAdo=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=qXMFPeHdTXWfFoWgu56WPFHiCTRY36Znl8LkllwS+Bdp200T2NQC3QfFGl3KCeAuN 4JhADnWAg3225cmx3B1H0l9+o3LwoJetv129Z8SKQgl3g4UElE9wq9MTj94yUJVyG/ 9UaeEieFho4/v1XUG+j/xyiY5IQdz87piXmm9+Iw4VkzGbwfreBWGHDsw0gCAvr89S DqFfosSDUhFUNr3cW0+KVhQzJrDC3AyUGTe0ylvjiGhQSOr3KPRtcHrCq1WyCc0AH+ /dSyXxUGirDDWGj1I9KxTRvorkwMMPFDPm0OGYuzRR7uiEe4PUUk+Va/Wx7u9OlkC3 lD/1Duf6HVpAw== Date: Fri, 22 Mar 2024 00:03:55 +0000 To: git@vger.kernel.org From: Thalia Archibald Cc: Elijah Newren , Thalia Archibald Subject: [PATCH 6/6] fast-import: forbid escaped NUL in paths Message-ID: <20240322000304.76810-7-thalia@archibald.dev> In-Reply-To: <20240322000304.76810-1-thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 NUL cannot appear in paths. Even disregarding filesystem path limitations, the tree object format delimits with NUL, so such a path cannot be encoded by Git. When a quoted path is unquoted, it could possibly contain NUL from "\000". Forbid it so it isn't truncated. fast-import still has other issues with NUL, but those will be addressed later. Signed-off-by: Thalia Archibald --- Documentation/git-fast-import.txt | 1 + builtin/fast-import.c | 2 ++ t/t9300-fast-import.sh | 1 + 3 files changed, 4 insertions(+) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 4aa8ccbefd..411413e8c3 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -657,6 +657,7 @@ and must be in canonical form. That is it must not: The root of the tree can be represented by a quoted empty string (`""`) as ``. +`` cannot contain NUL, either literally or escaped as `\000`. It is recommended that `` always be encoded using UTF-8. `filedelete` diff --git a/builtin/fast-import.c b/builtin/fast-import.c index ae8494d0ac..e36f59084e 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2283,6 +2283,8 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp, int if (*p == '"') { if (unquote_c_style(sb, p, endp)) die("Invalid %s: %s", field, command_buf.buf); + if (strlen(sb->buf) != sb->len) + die("NUL in %s: %s", field, command_buf.buf); } else { if (allow_spaces) *endp = p + strlen(p); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index ef04b43f46..994a80e442 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3285,6 +3285,7 @@ test_path_fail () { test_path_base_fail () { test_path_fail 'unclosed " in '"$field" '"hello.c' "Invalid $field" test_path_fail "invalid escape in quoted $field" '"hello\xff"' "Invalid $field" + test_path_fail "escaped NUL in quoted $field" '"hello\000"' "NUL in $field" } test_path_eol_quoted_fail () { test_path_base_fail