From patchwork Sun May 30 13:01:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 12288257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72114C47092 for ; Sun, 30 May 2021 13:02:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 429226023F for ; Sun, 30 May 2021 13:02:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229646AbhE3NDm (ORCPT ); Sun, 30 May 2021 09:03:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229500AbhE3NDl (ORCPT ); Sun, 30 May 2021 09:03:41 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0C59C061574 for ; Sun, 30 May 2021 06:02:01 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id m18so8030582wrv.2 for ; Sun, 30 May 2021 06:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ZoB+YL2umU8U72kpmS5D2Gk/nb/i1CzD8gh4aRuBMDU=; b=ib2J4au7vk/3e+6fU7R4oX7XZGAIO3Obj05S2zoOSX5rOM84q7oMIq5rcg+a+R1lAG Klb6nmqY2NrVxQ3ObitXnsygF9FVn0ilcLQ0MRjc3H21ey9HJXgMORUukWuYkFcn5EXl lGrBK0FUQXDaxOKXhgu457NaIhEm81niJd9/YGkrkMCV4iWGqXYWoNb8ZpUEhEkWX94Q ygbElm0bw8aFYfIiyAmWO0FvgkXnJX0lL4d3rO8wrlk/p1P6GoFHCehRGxFw7HaWezGh KQeZvBMUqfuKl1qukhC9poLF39dpjnBgVjcCotMSbo4uuFyc6oczTBRXkCtFdICoUTlf l8VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ZoB+YL2umU8U72kpmS5D2Gk/nb/i1CzD8gh4aRuBMDU=; b=j8PxY6zDWP4NBUmpJhl2KRDIyA/Rd7mCWM1fA3a/Wv+v2mUdQitqzU/0DtwrVd5gXt n3IhNadyJB86J1YX/1E/Nm8rtlchwcxZDaX3GqgVIRcQNY/unlP9TavdliIvl9dRPm8m je4DJBfkwk4QZNmbhG3o4F9o3FiQElAIk31xrojDNJXnUTUZfWXak3YpKyGugVZ6KlvG SS2qUqph14I7XCDaSBJ0t/w4y+HCowLpqHORs07RTvBWgQgbdHtoWlGbSYt8Yp+dn4Iu 7tfGB1F7lCY/24/xEztg0yFljq1YLeWuFb2tcfj+tPWv3al96atfQ5aRoDbEaWxtYdvE s7WA== X-Gm-Message-State: AOAM533LHbQ96VQ2qYWzo8948H131EViwdUkGIN0u/6mhfYY1Jiwpsfd SdnmHMAqanZCS3/DPrBBx7Uj37ZNh6c= X-Google-Smtp-Source: ABdhPJzTjFhfGco94axBisp7R66kGkp0wV9En3af3BWpUOMp7fukdyqhxTq99QHZazu7j4+a8lAcUA== X-Received: by 2002:a05:6000:137a:: with SMTP id q26mr17546188wrz.216.1622379720177; Sun, 30 May 2021 06:02:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c64sm20441563wma.15.2021.05.30.06.01.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 May 2021 06:01:59 -0700 (PDT) Message-Id: In-Reply-To: References: From: "ZheNing Hu via GitGitGadget" Date: Sun, 30 May 2021 13:01:56 +0000 Subject: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Junio C Hamano , Christian Couder , Hariom Verma , Karthik Nayak , Felipe Contreras , Bagas Sanjaya , Jeff King , Phillip Wood , ZheNing Hu Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In order to make git cat-file --batch use ref-filter logic, I add %(raw) atom to ref-filter. Change from last version: 1. Use more elegant memcasecmp(). 2. Allow %(raw:size) used with --. 3. Remove redundant BUG() in then_atom_handler(). 4. Roll back to origin function name grab_sub_body_contents(). 5. Split the check of object type in grab_sub_body_contents() into the previous patch. ZheNing Hu (2): [GSOC] ref-filter: add obj-type check in grab contents [GSOC] ref-filter: add %(raw) atom Documentation/git-for-each-ref.txt | 14 ++ ref-filter.c | 158 ++++++++++++++++++----- t/t6300-for-each-ref.sh | 200 +++++++++++++++++++++++++++++ 3 files changed, 338 insertions(+), 34 deletions(-) base-commit: 5d5b1473453400224ebb126bf3947e0a3276bdf5 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-963%2Fadlternative%2Fref-filter-raw-atom-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-963/adlternative/ref-filter-raw-atom-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/963 Range-diff vs v1: -: ------------ > 1: e6c26d19a3f3 [GSOC] ref-filter: add obj-type check in grab contents 1: b3848f24f2d3 ! 2: e44a2ed0db59 [GSOC] ref-filter: add %(raw) atom @@ Commit message The raw data of blob, tree objects may contain '\0', but most of the logic in `ref-filter` depands on the output of the atom being - a structured string (end with '\0'). + text (specifically, no embedded NULs in it). E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` add the data to the buffer. The raw data of a tree object is @@ Commit message can record raw object size, it can help us add raw object data to the buffer or compare two buffers which contain raw object data. - Beyond, `--format=%(raw)` should not combine with `--python`, `--shell`, + Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, `--tcl`, `--perl` because if our binary raw data is passed to a variable in the host language, the host languages may cause escape errors. + Helped-by: Felipe Contreras + Helped-by: Phillip Wood + Helped-by: Junio C Hamano Based-on-patch-by: Olga Telezhnaya Signed-off-by: ZheNing Hu @@ Documentation/git-for-each-ref.txt: and `date` to extract the named component. +raw:size:: + The raw data size of the object. + -+Note that `--format=%(raw)` should not combine with `--python`, `--shell`, `--tcl`, ++Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`, +`--perl` because if our binary raw data is passed to a variable in the host language, +the host languages may cause escape errors. + @@ ref-filter.c: static int contents_atom_parser(const struct ref_format *format, s +static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom, + const char *arg, struct strbuf *err) +{ -+ if (!arg) { ++ if (!arg) + atom->u.raw_data.option = RAW_BARE; -+ } else if (!strcmp(arg, "size")) ++ else if (!strcmp(arg, "size")) + atom->u.raw_data.option = RAW_LENGTH; + else + return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg); @@ ref-filter.c: static int parse_ref_filter_atom(const struct ref_format *format, return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"), (int)(ep-atom), atom); -+ if (format->quote_style && starts_with(sp, "raw")) -+ return strbuf_addf_ret(err, -1, _("--format=%.*s should not combine with" +- /* Do we have the atom already used elsewhere? */ +- for (i = 0; i < used_atom_cnt; i++) { +- int len = strlen(used_atom[i].name); +- if (len == ep - atom && !memcmp(used_atom[i].name, atom, len)) +- return i; +- } +- + /* + * If the atom name has a colon, strip it and everything after + * it off - it specifies the format for this entry, and +@@ ref-filter.c: static int parse_ref_filter_atom(const struct ref_format *format, + arg = memchr(sp, ':', ep - sp); + atom_len = (arg ? arg : ep) - sp; + ++ if (format->quote_style && !strncmp(sp, "raw", 3) && !arg) ++ return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with" + "--python, --shell, --tcl, --perl"), (int)(ep-atom), atom); + - /* Do we have the atom already used elsewhere? */ - for (i = 0; i < used_atom_cnt; i++) { - int len = strlen(used_atom[i].name); ++ /* Do we have the atom already used elsewhere? */ ++ for (i = 0; i < used_atom_cnt; i++) { ++ int len = strlen(used_atom[i].name); ++ if (len == ep - atom && !memcmp(used_atom[i].name, atom, len)) ++ return i; ++ } ++ + /* Is the atom a valid one? */ + for (i = 0; i < ARRAY_SIZE(valid_atom); i++) { + int len = strlen(valid_atom[i].name); @@ ref-filter.c: static int parse_ref_filter_atom(const struct ref_format *format, return at; } @@ ref-filter.c: static int then_atom_handler(struct atom_value *atomv, struct ref_ */ if (if_then_else->cmp_status == COMPARE_EQUAL) { - if (!strcmp(if_then_else->str, cur->output.buf)) -+ if (!if_then_else->str) -+ BUG("when if_then_else->cmp_status == COMPARE_EQUAL," -+ "if_then_else->str must not be null"); + if (str_len == cur->output.len && + !memcmp(if_then_else->str, cur->output.buf, cur->output.len)) if_then_else->condition_satisfied = 1; } else if (if_then_else->cmp_status == COMPARE_UNEQUAL) { - if (strcmp(if_then_else->str, cur->output.buf)) -+ if (!if_then_else->str) -+ BUG("when if_then_else->cmp_status == COMPARE_UNEQUAL," -+ "if_then_else->str must not be null"); + if (str_len != cur->output.len || + memcmp(if_then_else->str, cur->output.buf, cur->output.len)) if_then_else->condition_satisfied = 1; @@ ref-filter.c: static int end_atom_handler(struct atom_value *atomv, struct ref_f } strbuf_release(&s); @@ ref-filter.c: static void append_lines(struct strbuf *out, const char *buf, unsigned long size - } /* See grab_values */ --static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) -+static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned long buf_size, struct object *obj) + static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf, +- struct object *obj) ++ unsigned long buf_size, struct object *obj) { int i; const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL; -@@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) - continue; +@@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf, if (deref) name++; -- if (strcmp(name, "body") && -- !starts_with(name, "subject") && -- !starts_with(name, "trailers") && -- !starts_with(name, "contents")) -+ + + if (starts_with(name, "raw")) { + if (atom->u.raw_data.option == RAW_BARE) { + v->s = xmemdupz(buf, buf_size); + v->s_size = buf_size; -+ } else if (atom->u.raw_data.option == RAW_LENGTH) ++ } else if (atom->u.raw_data.option == RAW_LENGTH) { + v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size); ++ } + continue; + } + -+ if ((obj->type != OBJ_TAG && -+ obj->type != OBJ_COMMIT) || -+ (strcmp(name, "body") && -+ !starts_with(name, "subject") && -+ !starts_with(name, "trailers") && -+ !starts_with(name, "contents"))) - continue; - if (!subpos) - find_subpos(buf, + if ((obj->type != OBJ_TAG && + obj->type != OBJ_COMMIT) || + (strcmp(name, "body") && @@ ref-filter.c: static void fill_missing_values(struct atom_value *val) * pointed at by the ref itself; otherwise it is the object the * ref (which is a tag) refers to. @@ ref-filter.c: static void fill_missing_values(struct atom_value *val) switch (obj->type) { case OBJ_TAG: grab_tag_values(val, deref, obj); -- grab_sub_body_contents(val, deref, buf); -+ grab_raw_data(val, deref, buf, buf_size, obj); +- grab_sub_body_contents(val, deref, buf, obj); ++ grab_sub_body_contents(val, deref, buf, buf_size, obj); grab_person("tagger", val, deref, buf); break; case OBJ_COMMIT: grab_commit_values(val, deref, obj); -- grab_sub_body_contents(val, deref, buf); -+ grab_raw_data(val, deref, buf, buf_size, obj); +- grab_sub_body_contents(val, deref, buf, obj); ++ grab_sub_body_contents(val, deref, buf, buf_size, obj); grab_person("author", val, deref, buf); grab_person("committer", val, deref, buf); break; case OBJ_TREE: /* grab_tree_values(val, deref, obj, buf, sz); */ -+ grab_raw_data(val, deref, buf, buf_size, obj); ++ grab_sub_body_contents(val, deref, buf, buf_size, obj); break; case OBJ_BLOB: /* grab_blob_values(val, deref, obj, buf, sz); */ -+ grab_raw_data(val, deref, buf, buf_size, obj); ++ grab_sub_body_contents(val, deref, buf, buf_size, obj); break; default: die("Eh? Object of type %d?", obj->type); @@ ref-filter.c: static int compare_detached_head(struct ref_array_item *a, struct +static int memcasecmp(const void *vs1, const void *vs2, size_t n) +{ -+ size_t i; -+ const char *s1 = (const char *)vs1; -+ const char *s2 = (const char *)vs2; ++ const char *s1 = (const void *)vs1; ++ const char *s2 = (const void *)vs2; ++ const char *end = s1 + n; + -+ for (i = 0; i < n; i++) { -+ unsigned char u1 = s1[i]; -+ unsigned char u2 = s2[i]; -+ int U1 = toupper (u1); -+ int U2 = toupper (u2); -+ int diff = (UCHAR_MAX <= INT_MAX ? U1 - U2 -+ : U1 < U2 ? -1 : U2 < U1); ++ for (; s1 < end; s1++, s2++) { ++ int diff = tolower(*s1) - tolower(*s2); + if (diff) + return diff; + } @@ ref-filter.c: static int compare_detached_head(struct ref_array_item *a, struct static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b) { struct atom_value *va, *vb; -@@ ref-filter.c: static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru - int cmp_detached_head = 0; - cmp_type cmp_type = used_atom[s->atom].type; - struct strbuf err = STRBUF_INIT; -+ size_t slen = 0; - - if (get_ref_atom_value(a, s->atom, &va, &err)) - die("%s", err.buf); @@ ref-filter.c: static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru } else if (s->sort_flags & REF_SORTING_VERSION) { cmp = versioncmp(va->s, vb->s); @@ ref-filter.c: static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array + int (*cmp_fn)(const void *, const void *, size_t); + cmp_fn = s->sort_flags & REF_SORTING_ICASE + ? memcasecmp : memcmp; ++ size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ? ++ strlen(va->s) : va->s_size; ++ size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ? ++ strlen(vb->s) : vb->s_size; + -+ if (va->s_size != ATOM_VALUE_S_SIZE_INIT && -+ vb->s_size != ATOM_VALUE_S_SIZE_INIT) { -+ cmp = cmp_fn(va->s, vb->s, va->s_size > vb->s_size ? -+ vb->s_size : va->s_size); -+ } else if (va->s_size == ATOM_VALUE_S_SIZE_INIT) { -+ slen = strlen(va->s); -+ cmp = cmp_fn(va->s, vb->s, slen > vb->s_size ? -+ vb->s_size : slen); -+ } else { -+ slen = strlen(vb->s); -+ cmp = cmp_fn(va->s, vb->s, slen > va->s_size ? -+ slen : va->s_size); ++ cmp = cmp_fn(va->s, vb->s, b_size > a_size ? ++ a_size : b_size); ++ if (!cmp) { ++ if (a_size > b_size) ++ cmp = 1; ++ else if (a_size < b_size) ++ cmp = -1; + } -+ cmp = cmp ? cmp : va->s_size - vb->s_size; + } } else { if (va->value < vb->value) @@ t/t6300-for-each-ref.sh: test_atom refs/myblobs/first contents:body "" + refs/myblobs/first not empty + EOF + git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \ -+ refs/myblobs/ >actual && ++ refs/myblobs/ >actual && + test_cmp expected actual +' + @@ t/t6300-for-each-ref.sh: test_atom refs/myblobs/first contents:body "" + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell +' + ++test_expect_success '%(raw:size) with --shell' ' ++ git for-each-ref --format="%(raw:size)" | while read line ++ do ++ echo "'\''$line'\''" >>expect ++ done && ++ git for-each-ref --format="%(raw:size)" --shell >actual && ++ test_cmp expect actual ++' ++ +test_expect_success 'for-each-ref --format compare with cat-file --batch' ' + git rev-parse refs/mytrees/first | git cat-file --batch >expected && + git for-each-ref --format="%(objectname) %(objecttype) %(objectsize) 2: aa6d73f3e526 < -: ------------ [GSOC] ref-filter: add %(header) atom