Message ID | acc5591517fa519fc45c07c27defb309c891dea0.1635515959.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow clean/smudge filters to handle huge files in the LLP64 data model | expand |
"Matt Cooper via GitGitGadget" <gitgitgadget@gmail.com> writes: > +# This clean filter writes down the size of input it receives. By checking against > +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > + 'files over 4GB convert on input' ' > + test-tool genzeros $((5*1024*1024*1024)) >big && > + test_config filter.checklarge.clean "wc -c >big.size" && > + echo "big filter=checklarge" >.gitattributes && > + git add big && > + test $(test_file_size big) -eq $(cat big.size) > +' I would have expected that the clean filter to be sending the count to its standard output (to be hashed and made into a blob object), and the test wuld be doing "git cat-file blob :big" to read the contents of the raw blob, bypassing the filter system. But we are testing with only a single path anyway, use of this single extra file is OK. Looking good.
Hi Junio, On Fri, 29 Oct 2021, Junio C Hamano wrote: > "Matt Cooper via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +# This clean filter writes down the size of input it receives. By checking against > > +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. > > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > > + 'files over 4GB convert on input' ' > > + test-tool genzeros $((5*1024*1024*1024)) >big && > > + test_config filter.checklarge.clean "wc -c >big.size" && > > + echo "big filter=checklarge" >.gitattributes && > > + git add big && > > + test $(test_file_size big) -eq $(cat big.size) > > +' > > I would have expected that the clean filter to be sending the count > to its standard output (to be hashed and made into a blob object), > and the test wuld be doing "git cat-file blob :big" to read the > contents of the raw blob, bypassing the filter system. That was exactly what Matt had in his first iteration. But I dislike unnecessarily spawned processes, they are not "free" on Windows, so I shortened the design to take this shortcut. > But we are testing with only a single path anyway, use of this single > extra file is OK. Precisely. Ciao, Dscho
diff --git a/convert.c b/convert.c index fd9c84b0257..5ad6dfc08a0 100644 --- a/convert.c +++ b/convert.c @@ -613,7 +613,7 @@ static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf, struct filter_params { const char *src; - unsigned long size; + size_t size; int fd; const char *cmd; const char *path; diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 8b23d862600..d4cfe8bf5de 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -97,4 +97,15 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ test "$size" -ge $((5 * 1024 * 1024 * 1024)) ' +# This clean filter writes down the size of input it receives. By checking against +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ + 'files over 4GB convert on input' ' + test-tool genzeros $((5*1024*1024*1024)) >big && + test_config filter.checklarge.clean "wc -c >big.size" && + echo "big filter=checklarge" >.gitattributes && + git add big && + test $(test_file_size big) -eq $(cat big.size) +' + test_done