diff mbox series

[v6+,2/7] archive --add-virtual-file: allow paths containing colons

Message ID 20220528231118.3504387-3-gitster@pobox.com (mailing list archive)
State Accepted
Commit de1f68a968e64b3e1e2979222238fec1f045bbf3
Headers show
Series js/scalar-diagnose rebased | expand

Commit Message

Junio C Hamano May 28, 2022, 11:11 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 * Tightened shell variable quoting

 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

Comments

Adam Dinwoodie June 15, 2022, 6:16 p.m. UTC | #1
On Sat, May 28, 2022 at 04:11:13PM -0700, Junio C Hamano wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  * Tightened shell variable quoting
> 
> <snip>
>
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index d6027189e2..3992d08158 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>  
>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		PATHNAME="pathname with : colon"
> +	else
> +		PATHNAME="pathname without colon"
> +	fi &&
>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=\""$PATHNAME"\": \
>  		--add-virtual-file=hello:world $EMPTY_TREE &&
>  	test_when_finished "rm -rf tmp-unpack" &&
>  	mkdir tmp-unpack && (
>  		cd tmp-unpack &&
>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>  		test_path_is_file hello &&
> +		test_path_is_file "$PATHNAME" &&
>  		test world = $(cat hello)
>  	)
>  '

This test is currently failing on Cygwin: it looks like it's exposing a
bug in Cygwin that means files with colons in their name aren't
correctly extracted from zip archives.  I'm going to report that to the
Cygwin mailing list, but I wanted to note it for the record here, too.

Adam
Junio C Hamano June 15, 2022, 8 p.m. UTC | #2
Adam Dinwoodie <adam@dinwoodie.org> writes:

>> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
>> index d6027189e2..3992d08158 100755
>> --- a/t/t5003-archive-zip.sh
>> +++ b/t/t5003-archive-zip.sh
>> @@ -207,13 +207,21 @@ check_zip with_untracked
>>  check_added with_untracked untracked untracked
>>  
>>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
>> +	if test_have_prereq FUNNYNAMES
>> +	then
>> +		PATHNAME="pathname with : colon"
>> +	else
>> +		PATHNAME="pathname without colon"
>> +	fi &&
>>  	git archive --format=zip >with_file_with_content.zip \
>> +		--add-virtual-file=\""$PATHNAME"\": \
>>  		--add-virtual-file=hello:world $EMPTY_TREE &&
>>  	test_when_finished "rm -rf tmp-unpack" &&
>>  	mkdir tmp-unpack && (
>>  		cd tmp-unpack &&
>>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>>  		test_path_is_file hello &&
>> +		test_path_is_file "$PATHNAME" &&
>>  		test world = $(cat hello)
>>  	)
>>  '
>
> This test is currently failing on Cygwin: it looks like it's exposing a
> bug in Cygwin that means files with colons in their name aren't
> correctly extracted from zip archives.  I'm going to report that to the
> Cygwin mailing list, but I wanted to note it for the record here, too.

Does this mean that our code to set FUNNYNAMES prerequiste is
slightly broken?  IOW, should we check with a path with a colon in
it, as well as whatever we use currently for FUNNYNAMES?

Something like the attached patch?  

Or does Cygwin otherwise work perfectly well with a path with a
colon in it, but only $GIT_UNZIP command has problem with it?  If
that is the case, then please disregard the attached.

Thanks.

 t/test-lib.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git i/t/test-lib.sh w/t/test-lib.sh
index 55857af601..5dce7d95c9 100644
--- i/t/test-lib.sh
+++ w/t/test-lib.sh
@@ -1620,6 +1620,7 @@ test_lazy_prereq FUNNYNAMES '
 	touch -- \
 		"FUNNYNAMES tab	embedded" \
 		"FUNNYNAMES \"quote embedded\"" \
+		"FUNNYNAMES colon : embedded" \
 		"FUNNYNAMES newline
 embedded" 2>/dev/null &&
 	rm -- \
Adam Dinwoodie June 15, 2022, 9:36 p.m. UTC | #3
On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> Adam Dinwoodie <adam@dinwoodie.org> writes:
> 
> >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> >> index d6027189e2..3992d08158 100755
> >> --- a/t/t5003-archive-zip.sh
> >> +++ b/t/t5003-archive-zip.sh
> >> @@ -207,13 +207,21 @@ check_zip with_untracked
> >>  check_added with_untracked untracked untracked
> >>  
> >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> >> +	if test_have_prereq FUNNYNAMES
> >> +	then
> >> +		PATHNAME="pathname with : colon"
> >> +	else
> >> +		PATHNAME="pathname without colon"
> >> +	fi &&
> >>  	git archive --format=zip >with_file_with_content.zip \
> >> +		--add-virtual-file=\""$PATHNAME"\": \
> >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> >>  	test_when_finished "rm -rf tmp-unpack" &&
> >>  	mkdir tmp-unpack && (
> >>  		cd tmp-unpack &&
> >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >>  		test_path_is_file hello &&
> >> +		test_path_is_file "$PATHNAME" &&
> >>  		test world = $(cat hello)
> >>  	)
> >>  '
> >
> > This test is currently failing on Cygwin: it looks like it's exposing a
> > bug in Cygwin that means files with colons in their name aren't
> > correctly extracted from zip archives.  I'm going to report that to the
> > Cygwin mailing list, but I wanted to note it for the record here, too.
> 
> Does this mean that our code to set FUNNYNAMES prerequiste is
> slightly broken?  IOW, should we check with a path with a colon in
> it, as well as whatever we use currently for FUNNYNAMES?
> 
> Something like the attached patch?  
> 
> Or does Cygwin otherwise work perfectly well with a path with a
> colon in it, but only $GIT_UNZIP command has problem with it?  If
> that is the case, then please disregard the attached.

The latter: Cygwin works perfectly with paths containing colons, except
that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
systems Cygwin runs on don't support colons in paths, but Cygwin hides
that problem by rewriting ASCII colons to some high Unicode code point
on the filesystem, meaning Cygwin-native applications see a regular
colon, while Windows-native applications see an unusual but perfectly
valid Unicode character.

I tested the same patch to FUNNYNAMES myself before reporting, and the
test fails exactly the same way.  If we wanted to catch this, I think
we'd need a test that explicitly attempted to unzip an archive
containing a path with a colon.

(The code to set FUNNYNAMES *is* slightly broken, per the discussions
around 6d340dfaef ("t9902: split test to run on appropriate systems",
2022-04-08), and my to-do list still features tidying up and
resubmitting the patch Ævar wrote in that discussion thread.  But it
wouldn't help here because this issue is specific to Cygwin's `unzip`,
rather than a general limitation of running on Cygwin.)
Johannes Schindelin June 18, 2022, 8:19 p.m. UTC | #4
Hi Adam,

On Wed, 15 Jun 2022, Adam Dinwoodie wrote:

> On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> > Adam Dinwoodie <adam@dinwoodie.org> writes:
> >
> > >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > >> index d6027189e2..3992d08158 100755
> > >> --- a/t/t5003-archive-zip.sh
> > >> +++ b/t/t5003-archive-zip.sh
> > >> @@ -207,13 +207,21 @@ check_zip with_untracked
> > >>  check_added with_untracked untracked untracked
> > >>
> > >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> > >> +	if test_have_prereq FUNNYNAMES
> > >> +	then
> > >> +		PATHNAME="pathname with : colon"
> > >> +	else
> > >> +		PATHNAME="pathname without colon"
> > >> +	fi &&
> > >>  	git archive --format=zip >with_file_with_content.zip \
> > >> +		--add-virtual-file=\""$PATHNAME"\": \
> > >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> > >>  	test_when_finished "rm -rf tmp-unpack" &&
> > >>  	mkdir tmp-unpack && (
> > >>  		cd tmp-unpack &&
> > >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> > >>  		test_path_is_file hello &&
> > >> +		test_path_is_file "$PATHNAME" &&
> > >>  		test world = $(cat hello)
> > >>  	)
> > >>  '
> > >
> > > This test is currently failing on Cygwin: it looks like it's exposing a
> > > bug in Cygwin that means files with colons in their name aren't
> > > correctly extracted from zip archives.  I'm going to report that to the
> > > Cygwin mailing list, but I wanted to note it for the record here, too.
> >
> > Does this mean that our code to set FUNNYNAMES prerequiste is
> > slightly broken?  IOW, should we check with a path with a colon in
> > it, as well as whatever we use currently for FUNNYNAMES?
> >
> > Something like the attached patch?
> >
> > Or does Cygwin otherwise work perfectly well with a path with a
> > colon in it, but only $GIT_UNZIP command has problem with it?  If
> > that is the case, then please disregard the attached.
>
> The latter: Cygwin works perfectly with paths containing colons, except
> that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
> systems Cygwin runs on don't support colons in paths, but Cygwin hides
> that problem by rewriting ASCII colons to some high Unicode code point
> on the filesystem,

Let me throw in a bit more detail: The forbidden characters are mapped
into the Unicode page U+f0XX, which is supposed to be used "for private
purposes". Even more detail can be found here:
https://github.com/cygwin/cygwin/blob/cygwin-3_3_5-release/winsup/cygwin/strfuncs.cc#L19-L23

> meaning Cygwin-native applications see a regular colon, while
> Windows-native applications see an unusual but perfectly valid Unicode
> character.

Now, I have two questions:

- Why does `unzip` not use Cygwin's regular functions (which should all be
  aware of that U+f0XX <-> U+00XX mapping)?

- Even more importantly: would the test case pass if we simply used
  another forbidden character, such as `?` or `*`?

> I tested the same patch to FUNNYNAMES myself before reporting, and the
> test fails exactly the same way.  If we wanted to catch this, I think
> we'd need a test that explicitly attempted to unzip an archive
> containing a path with a colon.
>
> (The code to set FUNNYNAMES *is* slightly broken, per the discussions
> around 6d340dfaef ("t9902: split test to run on appropriate systems",
> 2022-04-08), and my to-do list still features tidying up and
> resubmitting the patch Ævar wrote in that discussion thread.  But it
> wouldn't help here because this issue is specific to Cygwin's `unzip`,
> rather than a general limitation of running on Cygwin.)

I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

Thanks,
Dscho
Junio C Hamano June 18, 2022, 10:05 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

I wonder if it is sufficient to ask "unzip -l" the names of the
files in the archive, without having to materialize these files on
the filesystem.  Would that bypass the whole FUNNYNAMES business, or
is "unzip" paranoid enough to reject an archive, even when it is not
extracting into the local filesystem, with a path that it would not
be able to extract if it were asked to?

I do not know how standardized different implementations of "unzip"
is, and how similar output "unzip -l" implementations produce are,
but the following seems to pass for me locally.

 t/t5003-archive-zip.sh | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git c/t/t5003-archive-zip.sh w/t/t5003-archive-zip.sh
index 3992d08158..f2fdf2c235 100755
--- c/t/t5003-archive-zip.sh
+++ w/t/t5003-archive-zip.sh
@@ -207,23 +207,13 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
-	if test_have_prereq FUNNYNAMES
-	then
-		PATHNAME="pathname with : colon"
-	else
-		PATHNAME="pathname without colon"
-	fi &&
+	PATHNAME="pathname with : colon" &&
 	git archive --format=zip >with_file_with_content.zip \
 		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
-	test_when_finished "rm -rf tmp-unpack" &&
-	mkdir tmp-unpack && (
-		cd tmp-unpack &&
-		"$GIT_UNZIP" ../with_file_with_content.zip &&
-		test_path_is_file hello &&
-		test_path_is_file "$PATHNAME" &&
-		test world = $(cat hello)
-	)
+	"$GIT_UNZIP" -l with_file_with_content.zip >toc &&
+	grep -e " $PATHNAME\$" toc &&
+	grep -e " hello\$" toc
 '
 
 test_expect_success 'git archive --format=zip --add-file twice' '
Adam Dinwoodie June 20, 2022, 9:41 a.m. UTC | #6
On Sat, Jun 18, 2022 at 10:19:28PM +0200, Johannes Schindelin wrote:
> Hi Adam,
> 
> On Wed, 15 Jun 2022, Adam Dinwoodie wrote:
> 
> > On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> > > Adam Dinwoodie <adam@dinwoodie.org> writes:
> > >
> > > >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > > >> index d6027189e2..3992d08158 100755
> > > >> --- a/t/t5003-archive-zip.sh
> > > >> +++ b/t/t5003-archive-zip.sh
> > > >> @@ -207,13 +207,21 @@ check_zip with_untracked
> > > >>  check_added with_untracked untracked untracked
> > > >>
> > > >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> > > >> +	if test_have_prereq FUNNYNAMES
> > > >> +	then
> > > >> +		PATHNAME="pathname with : colon"
> > > >> +	else
> > > >> +		PATHNAME="pathname without colon"
> > > >> +	fi &&
> > > >>  	git archive --format=zip >with_file_with_content.zip \
> > > >> +		--add-virtual-file=\""$PATHNAME"\": \
> > > >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> > > >>  	test_when_finished "rm -rf tmp-unpack" &&
> > > >>  	mkdir tmp-unpack && (
> > > >>  		cd tmp-unpack &&
> > > >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> > > >>  		test_path_is_file hello &&
> > > >> +		test_path_is_file "$PATHNAME" &&
> > > >>  		test world = $(cat hello)
> > > >>  	)
> > > >>  '
> > > >
> > > > This test is currently failing on Cygwin: it looks like it's exposing a
> > > > bug in Cygwin that means files with colons in their name aren't
> > > > correctly extracted from zip archives.  I'm going to report that to the
> > > > Cygwin mailing list, but I wanted to note it for the record here, too.
> > >
> > > Does this mean that our code to set FUNNYNAMES prerequiste is
> > > slightly broken?  IOW, should we check with a path with a colon in
> > > it, as well as whatever we use currently for FUNNYNAMES?
> > >
> > > Something like the attached patch?
> > >
> > > Or does Cygwin otherwise work perfectly well with a path with a
> > > colon in it, but only $GIT_UNZIP command has problem with it?  If
> > > that is the case, then please disregard the attached.
> >
> > The latter: Cygwin works perfectly with paths containing colons, except
> > that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
> > systems Cygwin runs on don't support colons in paths, but Cygwin hides
> > that problem by rewriting ASCII colons to some high Unicode code point
> > on the filesystem,
> 
> Let me throw in a bit more detail: The forbidden characters are mapped
> into the Unicode page U+f0XX, which is supposed to be used "for private
> purposes". Even more detail can be found here:
> https://github.com/cygwin/cygwin/blob/cygwin-3_3_5-release/winsup/cygwin/strfuncs.cc#L19-L23
> 
> > meaning Cygwin-native applications see a regular colon, while
> > Windows-native applications see an unusual but perfectly valid Unicode
> > character.
> 
> Now, I have two questions:
> 
> - Why does `unzip` not use Cygwin's regular functions (which should all be
>   aware of that U+f0XX <-> U+00XX mapping)?

That is an excellent question!  This behaviour came from an `#ifdef
__CYGWIN__` in the upstream unzip package; with that #ifdef removed,
everything works as expected.  The folk on the Cygwin mailing list had
no idea *why* that #ifdef was there, given it's evidently unnecessary;
my best guess is that it was added a long time ago before Cygwin could
handle those characters in the general case.

Since my report, the Cygwin package has picked up a new maintainer who
has released a version of the unzip package with that #ifdef removed, so
this test is now passing.

> - Even more importantly: would the test case pass if we simply used
>   another forbidden character, such as `?` or `*`?

The set of characters that had special handling in unzip was "*:?|<> all
of which are handled appropriately by Cygwin applications in general,
and all of which had this unnecessary handling in `unzip`

> > I tested the same patch to FUNNYNAMES myself before reporting, and the
> > test fails exactly the same way.  If we wanted to catch this, I think
> > we'd need a test that explicitly attempted to unzip an archive
> > containing a path with a colon.
> >
> > (The code to set FUNNYNAMES *is* slightly broken, per the discussions
> > around 6d340dfaef ("t9902: split test to run on appropriate systems",
> > 2022-04-08), and my to-do list still features tidying up and
> > resubmitting the patch Ævar wrote in that discussion thread.  But it
> > wouldn't help here because this issue is specific to Cygwin's `unzip`,
> > rather than a general limitation of running on Cygwin.)
> 
> I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

Oh yes, I definitely wasn't proposing changing things for 2.37.0!  I
just wanted to acknowledge that there is a known issue here that has
been discussed on this list previously, that we (I) would hopefully get
around to fixing at some point.

Adam
diff mbox series

Patch

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index b41cc5bc2e..56989a2f34 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -69,10 +69,16 @@  OPTIONS
 	by concatenating the value of the last `--prefix` option (if any)
 	before this `--add-virtual-file` and `<path>`.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; the contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d26f4ef945..48aba4ac46 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@ 
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -535,22 +536,31 @@  static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index d6027189e2..3992d08158 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@  check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		PATHNAME="pathname with : colon"
+	else
+		PATHNAME="pathname without colon"
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file "$PATHNAME" &&
 		test world = $(cat hello)
 	)
 '