Message ID | edba3791caf8bacc2f722f7874369f6776ecffe0.1536885967.git.matvore@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | filter: support for excluding all trees and blobs | expand |
Matthew DeVore <matvore@google.com> writes: > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index c0e2bd6a0..14f251de4 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -50,6 +50,20 @@ static int gently_parse_list_objects_filter( > return 0; > } > > + } else if (skip_prefix(arg, "tree:", &v0)) { > + unsigned long depth; > + if (!git_parse_ulong(v0, &depth) || depth != 0) { > + if (errbuf) { > + strbuf_init(errbuf, 0); > + strbuf_addstr( > + errbuf, > + _("only 'tree:0' is supported")); This is not a new issue with this patch, but I think strbuf_init() at the location of filling done like this is a bad idea. If the caller gave you an errbuf that is pre-filled with something, we'd leak memory and lose information. It only makes sense to _init() if the caller gave us an uninitialized garbage (or a strbuf that has just been initialized and is empty). The existing callers seem to do STRBUF_INIT before passing it to this function, so we probably should not do strbuf_init() here (and other two places in this function) and simply add to it. > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh > index bbbe7537d..8eeb85fbc 100755 > --- a/t/t5616-partial-clone.sh > +++ b/t/t5616-partial-clone.sh > @@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack - > grep "git index-pack.*--fsck-objects" trace > ' > > +test_expect_success 'use fsck before and after manually fetching a missing subtree' ' > + # push new commit so server has a subtree > + mkdir src/dir && > + echo "in dir" >src/dir/file.txt && > + git -C src add dir/file.txt && > + git -C src commit -m "file in dir" && > + git -C src push -u srv master && > + SUBTREE=$(git -C src rev-parse HEAD:dir) && > + > + rm -rf dst && > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst && > + git -C dst fsck && > + > + # Make sure we only have commits, and all trees and blobs are missing. > + git -C dst rev-list master --missing=allow-any --objects >fetched_objects && > + awk -f print_1.awk fetched_objects \ > + | xargs -n1 git -C dst cat-file -t >fetched_types && Break line after pipe "|", not before, and lose the backslash. You do not need to over-indent the command on the downstream of the pipe, i.e. awk ... | xargs -n1 git -C ... && Same comment applies elsewhere in this patch, not limited to this file. > + sort fetched_types -u >unique_types.observed && Make it a habit not to add dashed options after real arguments, i.e. sort -u fetched_types > + echo commit >unique_types.expected && > + test_cmp unique_types.observed unique_types.expected && Always compare "expect" with "actual", not in the reverse order, i.e. test_cmp expect actual not test_cmp actual expect This is important because test_cmp reports failures by showing you an output of "diff expect actual" and from "sh t5616-part*.sh -v" you can see what additional/excess things were produced by the test over what is expected, prefixed with "+", and what your code failed to produce are shown prefixed with "-". Thanks.
Junio C Hamano <gitster@pobox.com> writes: >> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh >> index bbbe7537d..8eeb85fbc 100755 >> --- a/t/t5616-partial-clone.sh >> +++ b/t/t5616-partial-clone.sh >> ... > > Break line after pipe "|", not before, and lose the backslash. You > do not need to over-indent the command on the downstream of the > pipe, i.e. > > awk ... | > xargs -n1 git -C ... && > > Same comment applies elsewhere in this patch, not limited to this file. > >> + sort fetched_types -u >unique_types.observed && > > Make it a habit not to add dashed options after real arguments, i.e. > > sort -u fetched_types > >> + echo commit >unique_types.expected && >> + test_cmp unique_types.observed unique_types.expected && > > Always compare "expect" with "actual", not in the reverse order, i.e. > > test_cmp expect actual > > not > > test_cmp actual expect > > This is important because test_cmp reports failures by showing you > an output of "diff expect actual" and from "sh t5616-part*.sh -v" > you can see what additional/excess things were produced by the test > over what is expected, prefixed with "+", and what your code failed > to produce are shown prefixed with "-". I notice that patches to other files like 6112 in this series also spread the above mistakes from existing lines. Please do not view what you see in these two test scripts before you start touching as a good example to follow---rather, treat them as antipattern X-<. 5616 is not as bad as 6112, but they both need to be cleaned up. We could alternatively do a post clean-up, but ideally we should first have a clean-up patch before this series to co. Thanks.
On Fri, Sep 14, 2018 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Break line after pipe "|", not before, and lose the backslash. You > > do not need to over-indent the command on the downstream of the > > pipe, i.e. > > > > awk ... | > > xargs -n1 git -C ... && > > > > Same comment applies elsewhere in this patch, not limited to this file. > > > >> + sort fetched_types -u >unique_types.observed && > > > > Make it a habit not to add dashed options after real arguments, i.e. > > > > sort -u fetched_types > > Done. I'm not sure why I made this mistake, since I usually prefer to order flags before positional args. I didn't actually clean this up in existing code as I did other mistakes, since it is very hard to find and do thoroughly. > >> + echo commit >unique_types.expected && > >> + test_cmp unique_types.observed unique_types.expected && > > > > Always compare "expect" with "actual", not in the reverse order, i.e. > > > > test_cmp expect actual > > > > not > > > > test_cmp actual expect > > Done. > > This is important because test_cmp reports failures by showing you > > an output of "diff expect actual" and from "sh t5616-part*.sh -v" > > you can see what additional/excess things were produced by the test > > over what is expected, prefixed with "+", and what your code failed > > to produce are shown prefixed with "-". Hmm... I didn't know aout the -v flag. That's quite good to know, thanks! > > I notice that patches to other files like 6112 in this series also > spread the above mistakes from existing lines. Please do not view > what you see in these two test scripts before you start touching as > a good example to follow---rather, treat them as antipattern X-<. > 5616 is not as bad as 6112, but they both need to be cleaned up. > > We could alternatively do a post clean-up, but ideally we should > first have a clean-up patch before this series to co. I cleaned up existing tests in a new patchset here: https://public-inbox.org/git/cover.1536969438.git.matvore@google.com/T/#t - that new patch corrects the pipe placement and test_cmp argument ordering. There is no dependency between this patchset and the new one, though I assume you want to commit the clean-up once first so maintain consistency. Here is an interdiff for this particular patch series (I replaced \t with 8 spaces so it would be readable after my mail client mangles it): diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 14f251de4..e8da2e858 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter( if (filter_options->choice) { if (errbuf) { - strbuf_init(errbuf, 0); strbuf_addstr( errbuf, _("multiple filter-specs cannot be combined")); @@ -54,7 +53,6 @@ static int gently_parse_list_objects_filter( unsigned long depth; if (!git_parse_ulong(v0, &depth) || depth != 0) { if (errbuf) { - strbuf_init(errbuf, 0); strbuf_addstr( errbuf, _("only 'tree:0' is supported")); @@ -85,10 +83,9 @@ static int gently_parse_list_objects_filter( return 0; } - if (errbuf) { - strbuf_init(errbuf, 0); + if (errbuf) strbuf_addf(errbuf, "invalid filter-spec '%s'", arg); - } + memset(filter_options, 0, sizeof(*filter_options)); return 1; } diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index f02b9ae37..5bc5b4445 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -216,7 +216,7 @@ test_expect_success 'missing non-root tree object and rev-list' ' rm -rf repo && test_create_repo repo && mkdir repo/dir && - echo foo > repo/dir/foo && + echo foo >repo/dir/foo && git -C repo add dir/foo && git -C repo commit -m "commit dir/foo" && diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 7a4d49ea1..510d3537f 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -61,7 +61,7 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre test_expect_success 'get an error for missing tree object' ' git init r5 && - echo foo > r5/foo && + echo foo >r5/foo && git -C r5 add foo && git -C r5 commit -m "foo" && del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") && @@ -97,7 +97,7 @@ test_expect_success 'grab tree directly when using tree:0' ' git -C r1 verify-pack -v ../commitsonly.pack >objs && awk "/tree|blob/{print \$1}" objs >trees_and_blobs && git -C r1 rev-parse HEAD: >expected && - test_cmp trees_and_blobs expected + test_cmp expected trees_and_blobs ' # Test blob:limit=<n>[kmg] filter. diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 8eeb85fbc..7b6294ca5 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -169,11 +169,12 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr # Make sure we only have commits, and all trees and blobs are missing. git -C dst rev-list master --missing=allow-any --objects >fetched_objects && - awk -f print_1.awk fetched_objects \ - | xargs -n1 git -C dst cat-file -t >fetched_types && - sort fetched_types -u >unique_types.observed && + awk -f print_1.awk fetched_objects | + xargs -n1 git -C dst cat-file -t >fetched_types && + + sort -u fetched_types >unique_types.observed && echo commit >unique_types.expected && - test_cmp unique_types.observed unique_types.expected && + test_cmp unique_types.expected unique_types.observed && # Auto-fetch a tree with cat-file. git -C dst cat-file -p $SUBTREE >tree_contents && @@ -185,11 +186,13 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr # Auto-fetch all remaining trees and blobs with --missing=error git -C dst rev-list master --missing=error --objects >fetched_objects && test_line_count = 70 fetched_objects && - awk -f print_1.awk fetched_objects \ - | xargs -n1 git -C dst cat-file -t >fetched_types && - sort fetched_types -u >unique_types.observed && + + awk -f print_1.awk fetched_objects | + xargs -n1 git -C dst cat-file -t >fetched_types && + + sort -u fetched_types >unique_types.observed && printf "blob\ncommit\ntree\n" >unique_types.expected && - test_cmp unique_types.observed unique_types.expected + test_cmp unique_types.expected unique_types.observed ' test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' ' diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh index a989a7082..6e5c41a68 100755 --- a/t/t6112-rev-list-filters-objects.sh +++ b/t/t6112-rev-list-filters-objects.sh @@ -31,11 +31,13 @@ test_expect_success 'verify blob:none omits all 5 blobs' ' ' test_expect_success 'specify blob explicitly prevents filtering' ' - file_3=$(git -C r1 ls-files -s file.3 \ - | awk -f print_2.awk) && - file_4=$(git -C r1 ls-files -s file.4 \ - | awk -f print_2.awk) && - git -C r1 rev-list HEAD --objects --filter=blob:none HEAD $file_3 >observed && + file_3=$(git -C r1 ls-files -s file.3 | + awk -f print_2.awk) && + + file_4=$(git -C r1 ls-files -s file.4 | + awk -f print_2.awk) && + + git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed && grep -q "$file_3" observed && test_must_fail grep -q "$file_4" observed ' @@ -225,13 +227,14 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre # Test tree:0 filter. test_expect_success 'verify tree:0 includes trees in "filtered" output' ' - git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \ - | awk -f print_1.awk \ - | sed s/~// \ - | xargs -n1 git -C r3 cat-file -t \ - | sort -u >filtered_types && - printf "blob\ntree\n" > expected && - test_cmp filtered_types expected + git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 | + awk -f print_1.awk | + sed s/~// | + xargs -n1 git -C r3 cat-file -t | + sort -u >filtered_types && + + printf "blob\ntree\n" >expected && + test_cmp expected filtered_types ' # Delete some loose objects and use rev-list, but WITHOUT any filtering.
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7b273635d..5f1672913 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -731,6 +731,11 @@ the requested refs. + The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout specification contained in <path>. ++ +The form '--filter=tree:<depth>' omits all blobs and trees whose depth +from the root tree is >= <depth> (minimum depth if an object is located +at multiple depths in the commits traversed). Currently, only <depth>=0 +is supported, which omits all blobs and trees. --no-filter:: Turn off any previous `--filter=` argument. diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index c0e2bd6a0..14f251de4 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,6 +50,20 @@ static int gently_parse_list_objects_filter( return 0; } + } else if (skip_prefix(arg, "tree:", &v0)) { + unsigned long depth; + if (!git_parse_ulong(v0, &depth) || depth != 0) { + if (errbuf) { + strbuf_init(errbuf, 0); + strbuf_addstr( + errbuf, + _("only 'tree:0' is supported")); + } + return 1; + } + filter_options->choice = LOFC_TREE_NONE; + return 0; + } else if (skip_prefix(arg, "sparse:oid=", &v0)) { struct object_context oc; struct object_id sparse_oid; diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 0000a61f8..af64e5c66 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -10,6 +10,7 @@ enum list_objects_filter_choice { LOFC_DISABLED = 0, LOFC_BLOB_NONE, LOFC_BLOB_LIMIT, + LOFC_TREE_NONE, LOFC_SPARSE_OID, LOFC_SPARSE_PATH, LOFC__COUNT /* must be last */ diff --git a/list-objects-filter.c b/list-objects-filter.c index 5f8b1a002..09b2b05d5 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -79,6 +79,54 @@ static void *filter_blobs_none__init( return d; } +/* + * A filter for list-objects to omit ALL trees and blobs from the traversal. + * Can OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_trees_none_data { + struct oidset *omits; +}; + +static enum list_objects_filter_result filter_trees_none( + enum list_objects_filter_situation filter_situation, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_trees_none_data *filter_data = filter_data_; + + switch (filter_situation) { + default: + BUG("unknown filter_situation: %d", filter_situation); + + case LOFS_BEGIN_TREE: + case LOFS_BLOB: + if (filter_data->omits) + oidset_insert(filter_data->omits, &obj->oid); + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */ + + case LOFS_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + } +} + +static void* filter_trees_none__init( + struct oidset *omitted, + struct list_objects_filter_options *filter_options, + filter_object_fn *filter_fn, + filter_free_fn *filter_free_fn) +{ + struct filter_trees_none_data *d = xcalloc(1, sizeof(*d)); + d->omits = omitted; + + *filter_fn = filter_trees_none; + *filter_free_fn = free; + return d; +} + /* * A filter for list-objects to omit large blobs. * And to OPTIONALLY collect a list of the omitted OIDs. @@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = { NULL, filter_blobs_none__init, filter_blobs_limit__init, + filter_trees_none__init, filter_sparse_oid__init, filter_sparse_path__init, }; diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 5e35f33bf..7a4d49ea1 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' ' grep -q "bad tree object" bad_tree ' +test_expect_success 'setup for tests of tree:0' ' + mkdir r1/subtree && + echo "This is a file in a subtree" >r1/subtree/file && + git -C r1 add subtree/file && + git -C r1 commit -m subtree +' + +test_expect_success 'verify tree:0 packfile has no blobs or trees' ' + git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF && + HEAD + EOF + git -C r1 index-pack ../commitsonly.pack && + git -C r1 verify-pack -v ../commitsonly.pack >objs && + ! grep -E "tree|blob" objs +' + +test_expect_success 'grab tree directly when using tree:0' ' + # We should get the tree specified directly but not its blobs or subtrees. + git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF && + HEAD: + EOF + git -C r1 index-pack ../commitsonly.pack && + git -C r1 verify-pack -v ../commitsonly.pack >objs && + awk "/tree|blob/{print \$1}" objs >trees_and_blobs && + git -C r1 rev-parse HEAD: >expected && + test_cmp trees_and_blobs expected +' + # Test blob:limit=<n>[kmg] filter. # We boundary test around the size parameter. The filter is strictly less than # the value, so size 500 and 1000 should have the same results, but 1001 should diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index bbbe7537d..8eeb85fbc 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack - grep "git index-pack.*--fsck-objects" trace ' +test_expect_success 'use fsck before and after manually fetching a missing subtree' ' + # push new commit so server has a subtree + mkdir src/dir && + echo "in dir" >src/dir/file.txt && + git -C src add dir/file.txt && + git -C src commit -m "file in dir" && + git -C src push -u srv master && + SUBTREE=$(git -C src rev-parse HEAD:dir) && + + rm -rf dst && + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst && + git -C dst fsck && + + # Make sure we only have commits, and all trees and blobs are missing. + git -C dst rev-list master --missing=allow-any --objects >fetched_objects && + awk -f print_1.awk fetched_objects \ + | xargs -n1 git -C dst cat-file -t >fetched_types && + sort fetched_types -u >unique_types.observed && + echo commit >unique_types.expected && + test_cmp unique_types.observed unique_types.expected && + + # Auto-fetch a tree with cat-file. + git -C dst cat-file -p $SUBTREE >tree_contents && + grep file.txt tree_contents && + + # fsck still works after an auto-fetch of a tree. + git -C dst fsck && + + # Auto-fetch all remaining trees and blobs with --missing=error + git -C dst rev-list master --missing=error --objects >fetched_objects && + test_line_count = 70 fetched_objects && + awk -f print_1.awk fetched_objects \ + | xargs -n1 git -C dst cat-file -t >fetched_types && + sort fetched_types -u >unique_types.observed && + printf "blob\ncommit\ntree\n" >unique_types.expected && + test_cmp unique_types.observed unique_types.expected +' + test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' ' rm -rf src dst && git init src && diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh index 2e07dadf0..a989a7082 100755 --- a/t/t6112-rev-list-filters-objects.sh +++ b/t/t6112-rev-list-filters-objects.sh @@ -222,6 +222,18 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre test_must_be_empty rev_list_err ' +# Test tree:0 filter. + +test_expect_success 'verify tree:0 includes trees in "filtered" output' ' + git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \ + | awk -f print_1.awk \ + | sed s/~// \ + | xargs -n1 git -C r3 cat-file -t \ + | sort -u >filtered_types && + printf "blob\ntree\n" > expected && + test_cmp filtered_types expected +' + # Delete some loose objects and use rev-list, but WITHOUT any filtering. # This models previously omitted objects that we did not receive.
Teach list-objects the "tree:0" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:0 - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also considered only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. The name "tree:0" allows later filtering based on depth, i.e. "tree:1" would filter out all but the root tree and blobs. In order to avoid confusion between 0 and capital O, the documentation was worded in a somewhat round-about way that also hints at this future improvement to the feature. Signed-off-by: Matthew DeVore <matvore@google.com> --- Documentation/rev-list-options.txt | 5 +++ list-objects-filter-options.c | 14 ++++++++ list-objects-filter-options.h | 1 + list-objects-filter.c | 49 ++++++++++++++++++++++++++ t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++ t/t5616-partial-clone.sh | 38 ++++++++++++++++++++ t/t6112-rev-list-filters-objects.sh | 12 +++++++ 7 files changed, 147 insertions(+)