From patchwork Fri Dec 17 11:26:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10005C433EF for ; Fri, 17 Dec 2021 11:28:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229565AbhLQL2m (ORCPT ); Fri, 17 Dec 2021 06:28:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233568AbhLQL2l (ORCPT ); Fri, 17 Dec 2021 06:28:41 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D000FC061574 for ; Fri, 17 Dec 2021 03:28:40 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id p13so2028749pfw.2 for ; Fri, 17 Dec 2021 03:28:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iJgkkNBSVRjeafKIJmXpkSQF+/WV+eDw6hCVfs3Wqps=; b=VoiHwQ0HSbslKIt6nH3pzEj/djZ5Oy21SfZKk0XwTiOXbnHDeVzHBiIgmNJ/HavskC FBg4C4H56oNyn820wiFk1nT7jkEzhnzLYdynLGN024lL8ejSfKJNIRk0vj70OwQC8Nl+ RLwIl8DehZOiuUiNnaecs0VMqKHVH0GEr7WqrfKoilTObeJDwOfrRq0CNo6Ea2FZKQWc GzwRnPn1u6yXJ+lzFMUcPhmRDrA7Ya5TKbrPBIS0O5InAAcSywjTpwv/b+QAUfkZbxHh Krb2Twum4IWRLoCDVXALXD79Ftox6p/sxneQIaWUTVUjAOLRz/3GQBgpHqqoKLFV/K86 6qWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iJgkkNBSVRjeafKIJmXpkSQF+/WV+eDw6hCVfs3Wqps=; b=jsJQcNR7sV+oICOIgXWx2QOvVHRJBFty2Qkab5CKCyMdhZtmO42cJSM9vmGLsvuVpx B4vpY8lKbsXWJqHtxngkKmDsCNCOIaTKh0Nrm5M1AjeYkV+GMbwY0oqkuH/vBD66ezok MVJq99jSDq8czpUg6BhTV3MCj+p/vRsJkq3oyQWWkEw8esdBHCcNaY4TKDA7HFXZByed CKLTBXtjN+1tMu+c1xTJRe2UMAYMaoLOUVxvoZbdUjqDMRhkk4PMR2aWe+3Aj0JEqMp8 2JA89guERxFmgMJsMfRBGTeGAAK3BE7KvNVTPa1j09GJaCbHFE1sNxZ/YmqovWap/62h Ezuw== X-Gm-Message-State: AOAM531aC4OIDoMuRoZFv4Np3y27VHEfTQFxA1XbTtiti9bpiVwoOFKP wxqjmsU9eGToYIa0xYps7dUE/OWdSRirrHHF X-Google-Smtp-Source: ABdhPJwDpnJSCr/eAZYdXOQp8qw+DXiQ+hg8rY4mB/vs2PC8IRxQzTcLP5irtVDtjHQDhNvwLOqdYg== X-Received: by 2002:a62:e907:0:b0:4a0:3a71:9712 with SMTP id j7-20020a62e907000000b004a03a719712mr2804187pfh.73.1639740520412; Fri, 17 Dec 2021 03:28:40 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:39 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 1/6] object-file.c: release strbuf in write_loose_object() Date: Fri, 17 Dec 2021 19:26:24 +0800 Message-Id: <20211217112629.12334-2-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin Fix a strbuf leak in "write_loose_object()" sugguested by Ævar Arnfjörð Bjarmason. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Han Xin --- object-file.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/object-file.c b/object-file.c index eb1426f98c..32acf1dad6 100644 --- a/object-file.c +++ b/object-file.c @@ -1874,11 +1874,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr, fd = create_tmpfile(&tmp_file, filename.buf); if (fd < 0) { if (flags & HASH_SILENT) - return -1; + ret = -1; else if (errno == EACCES) - return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory()); + ret = error(_("insufficient permission for adding an " + "object to repository database %s"), + get_object_directory()); else - return error_errno(_("unable to create temporary file")); + ret = error_errno(_("unable to create temporary file")); + goto cleanup; } /* Set it up */ @@ -1930,7 +1933,11 @@ static int write_loose_object(const struct object_id *oid, char *hdr, warning_errno(_("failed utime() on %s"), tmp_file.buf); } - return finalize_object_file(tmp_file.buf, filename.buf); + ret = finalize_object_file(tmp_file.buf, filename.buf); +cleanup: + strbuf_release(&filename); + strbuf_release(&tmp_file); + return ret; } static int freshen_loose_object(const struct object_id *oid) From patchwork Fri Dec 17 11:26:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEE5EC433EF for ; Fri, 17 Dec 2021 11:28:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233604AbhLQL2o (ORCPT ); Fri, 17 Dec 2021 06:28:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233593AbhLQL2n (ORCPT ); Fri, 17 Dec 2021 06:28:43 -0500 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD17AC061574 for ; Fri, 17 Dec 2021 03:28:43 -0800 (PST) Received: by mail-pj1-x1036.google.com with SMTP id n15-20020a17090a160f00b001a75089daa3so5446210pja.1 for ; Fri, 17 Dec 2021 03:28:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KNgYidx8vEvIkoF0u7cbJgVUsBPtfO50cr098Upx8nk=; b=nDt/BsUirN11JYmJ79MLsUxshumXHzE60osLyU54TPMafJVz1W73Wrcu5KrT1FZC7A kHSmXD5AtWf0FCJtx2D4myBgM9+Mj1N7vvdS2F5DXaiWgu+Smd4kYroF+kFxdHbq1Top z/7FpVZ6VtRNvmeIawomAcuByNqSua7K7Lfls5fFbCMfE3LL2mia85RAt3J67/wVWa3e zOP7rI0Z8JJjiAHs3UhEaXJiucB6LI4dQ2RoFY5NvuzrYm1ZMbIaIbLGiiPAZy/Wf+/x IUANQOU1AfUAtX/LOP7tIsnG9cmYCSU8TZMcCyPqMcKI5f89DLp4jXLMYajnHpQmyr/P meFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KNgYidx8vEvIkoF0u7cbJgVUsBPtfO50cr098Upx8nk=; b=UzaWYov9sA0Wm1DOaKbCRf87Zs6JlaO8oFn1aExbOvZdF89+eW+CaJXab+a+SW38Ks zAKsCZKxRTK59ulA1QSYtqav+aUhAVCmrjq0OGbbjmXY2xFVbArzzyW1GLeff4L/tSHH 10XzT52rEMeuFBMAvzm3C8N1RjhlADTJNR/JsWeU+O6drz89g9mT7SPUevRnf9Ggmv8O R/su3K9AZk5Jny2KINF8AGyDPSflFLGEOfUmznLKY3EEH+xxM7ww0dmBkgTwdzEG3s3W 2Kxw4gk33Qsxt2txuA/ZcJA3+fk8CkCZKOGgnNxlAuaOULP/tt9K0mTGRUuyYFjmbndK Cirw== X-Gm-Message-State: AOAM531qPhViH7BzuJAbUbu8fkBkSzEtHCMzy4QnJMvL69zKa29rI8YZ n4W4iWTOPJx3eWO2Z8t2zIk= X-Google-Smtp-Source: ABdhPJx02r46pkepO96LhTb5pcp8K5jFO5Nh/0ynFZpyA3JV52Saxo+BaXfIrcf7lKdqgZJnVfY1rg== X-Received: by 2002:a17:902:ce92:b0:148:dbf5:1934 with SMTP id f18-20020a170902ce9200b00148dbf51934mr2409804plg.147.1639740523282; Fri, 17 Dec 2021 03:28:43 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:42 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 2/6] object-file.c: refactor object header generation into a function Date: Fri, 17 Dec 2021 19:26:25 +0800 Message-Id: <20211217112629.12334-3-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin There are 3 places where "xsnprintf" is used to generate the object header, and I originally planned to add a fourth in the latter patch. According to Ævar Arnfjörð Bjarmason’s suggestion, although it's just one line, it's also code that's very central to git, so reafactor them into a function which will help later readability. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Han Xin --- object-file.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/object-file.c b/object-file.c index 32acf1dad6..95fcd5435d 100644 --- a/object-file.c +++ b/object-file.c @@ -1006,6 +1006,14 @@ void *xmmap(void *start, size_t length, return ret; } +static inline int generate_object_header(char *buf, int bufsz, + const char *type_name, + unsigned long size) +{ + return xsnprintf(buf, bufsz, "%s %"PRIuMAX, type_name, + (uintmax_t)size) + 1; +} + /* * With an in-core object data in "map", rehash it to make sure the * object name actually matches "oid" to detect object corruption. @@ -1034,7 +1042,7 @@ int check_object_signature(struct repository *r, const struct object_id *oid, return -1; /* Generate the header */ - hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(obj_type), (uintmax_t)size) + 1; + hdrlen = generate_object_header(hdr, sizeof(hdr), type_name(obj_type), size); /* Sha1.. */ r->hash_algo->init_fn(&c); @@ -1734,7 +1742,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo, git_hash_ctx c; /* Generate the header */ - *hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1; + *hdrlen = generate_object_header(hdr, *hdrlen, type, len); /* Sha1.. */ algo->init_fn(&c); @@ -2013,7 +2021,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime) buf = read_object(the_repository, oid, &type, &len); if (!buf) return error(_("cannot read object for %s"), oid_to_hex(oid)); - hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1; + hdrlen = generate_object_header(hdr, sizeof(hdr), type_name(type), len); ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0); free(buf); From patchwork Fri Dec 17 11:26:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4BE1C433EF for ; Fri, 17 Dec 2021 11:28:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233686AbhLQL2r (ORCPT ); Fri, 17 Dec 2021 06:28:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233694AbhLQL2q (ORCPT ); Fri, 17 Dec 2021 06:28:46 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78AF4C061747 for ; Fri, 17 Dec 2021 03:28:46 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id gn2so1963005pjb.5 for ; Fri, 17 Dec 2021 03:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/ZZ0AA9BLSQgUeabISUg6i9XDF+qvBIEwRKD0GlrlP4=; b=KYBpPj+DMpjOYMn7hdQxXJH0BoyOpenU8dhP5DyuUzuVO1RVD0Te9/92N3hgrWhYFG bl8V/zxf4Q7jqHu1GbFSA8JGkmY55sULlJRs0FEDxbBFOhhWGoIhQfNbjyiFZctDyo7i Qx9do1YLKvnIB9fbluZJQe7QSrZTi50uxp9iil7mJf6QY61Fv23Gb0/dX4pQu+GXb0bb o5XCEhjFg8D2OYAD/KZMHBlVCjHog8uPxKpr3bhQHFO1Dp1KbyGQhC7H+J9FFs/zMArX ptSCqCLXrQS8VdRmMw+6Yajy9Puk/g/IZDM3X786BTsPtXTa2iE8AA41ORKWizDhli+i vvaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/ZZ0AA9BLSQgUeabISUg6i9XDF+qvBIEwRKD0GlrlP4=; b=MzcsDiUU3sh5XbCjCeaBpYkGSJ8cXCPWb2Pzv5NpAsG0PSJ88okQRE2wulXl6dX1yU CTSHzIX7X66+WW8QgZOZxuMg93daAIRP+1YCpmILSac+/JAVJ2wMp7KGqzs5kcB0ZabJ LVUZILLRnEqpAarITttTJvdE8nWCvL+g2GUHgb31qGPuvIOAa+GDqot1od4ZEEizEHgs mLmJAONCA/DA1k/bSJTTxQMtTBo4B9Q4vZo93+jyyT5qKnXAHXqT8iGwb/q7RSLBMYGh Hug11fYJffvF/GWT0+5HA49srWoi7Z7p6UkPBlZOAXLNBtRbCpm6X4qxXuQhWd02gg8p TB9g== X-Gm-Message-State: AOAM533/Qks16Mi7OX/jv/I2hfeFeVTeNfoGh1G2ubAWUur0yrGKBeAK PtJbz86hXWz21zB+x4F7PtJkzMxTl2epMhDR X-Google-Smtp-Source: ABdhPJyiQOwukz9Ys4C4BzjjrezKn+U6I7in4mm6WP5TvJhmurrlVoL/kPK9kRt5jxGaLAy8v5TMMQ== X-Received: by 2002:a17:902:f54e:b0:148:e8ae:ffde with SMTP id h14-20020a170902f54e00b00148e8aeffdemr68822plf.25.1639740525851; Fri, 17 Dec 2021 03:28:45 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:45 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 3/6] object-file.c: refactor write_loose_object() to reuse in stream version Date: Fri, 17 Dec 2021 19:26:26 +0800 Message-Id: <20211217112629.12334-4-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin We used to call "get_data()" in "unpack_non_delta_entry()" to read the entire contents of a blob object, no matter how big it is. This implementation may consume all the memory and cause OOM. This can be improved by feeding data to "stream_loose_object()" in stream instead of read into the whole buf. As this new method "stream_loose_object()" has many similarities with "write_loose_object()", we split up "write_loose_object()" into some steps: 1. Figuring out a path for the (temp) object file. 2. Creating the tempfile. 3. Setting up zlib and write header. 4. Write object data and handle errors. 5. Optionally, do someting after write, maybe force a loose object if "mtime". Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Han Xin --- object-file.c | 98 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 63 insertions(+), 35 deletions(-) diff --git a/object-file.c b/object-file.c index 95fcd5435d..dd29e5372e 100644 --- a/object-file.c +++ b/object-file.c @@ -1751,6 +1751,25 @@ static void write_object_file_prepare(const struct git_hash_algo *algo, algo->final_oid_fn(oid, &c); } +/* + * Move the just written object with proper mtime into its final resting place. + */ +static int finalize_object_file_with_mtime(const char *tmpfile, + const char *filename, + time_t mtime, + unsigned flags) +{ + struct utimbuf utb; + + if (mtime) { + utb.actime = mtime; + utb.modtime = mtime; + if (utime(tmpfile, &utb) < 0 && !(flags & HASH_SILENT)) + warning_errno(_("failed utime() on %s"), tmpfile); + } + return finalize_object_file(tmpfile, filename); +} + /* * Move the just written object into its final resting place. */ @@ -1836,7 +1855,8 @@ static inline int directory_size(const char *filename) * We want to avoid cross-directory filename renames, because those * can have problems on various filesystems (FAT, NFS, Coda). */ -static int create_tmpfile(struct strbuf *tmp, const char *filename) +static int create_tmpfile(struct strbuf *tmp, const char *filename, + unsigned flags) { int fd, dirlen = directory_size(filename); @@ -1844,7 +1864,9 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename) strbuf_add(tmp, filename, dirlen); strbuf_addstr(tmp, "tmp_obj_XXXXXX"); fd = git_mkstemp_mode(tmp->buf, 0444); - if (fd < 0 && dirlen && errno == ENOENT) { + do { + if (fd >= 0 || !dirlen || errno != ENOENT) + break; /* * Make sure the directory exists; note that the contents * of the buffer are undefined after mkstemp returns an @@ -1854,17 +1876,48 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename) strbuf_reset(tmp); strbuf_add(tmp, filename, dirlen - 1); if (mkdir(tmp->buf, 0777) && errno != EEXIST) - return -1; + break; if (adjust_shared_perm(tmp->buf)) - return -1; + break; /* Try again */ strbuf_addstr(tmp, "/tmp_obj_XXXXXX"); fd = git_mkstemp_mode(tmp->buf, 0444); + } while (0); + + if (fd < 0 && !(flags & HASH_SILENT)) { + if (errno == EACCES) + return error(_("insufficient permission for adding an " + "object to repository database %s"), + get_object_directory()); + else + return error_errno(_("unable to create temporary file")); } + return fd; } +static void setup_stream_and_header(git_zstream *stream, + unsigned char *compressed, + unsigned long compressed_size, + git_hash_ctx *c, + char *hdr, + int hdrlen) +{ + /* Set it up */ + git_deflate_init(stream, zlib_compression_level); + stream->next_out = compressed; + stream->avail_out = compressed_size; + the_hash_algo->init_fn(c); + + /* First header.. */ + stream->next_in = (unsigned char *)hdr; + stream->avail_in = hdrlen; + while (git_deflate(stream, 0) == Z_OK) + ; /* nothing */ + the_hash_algo->update_fn(c, hdr, hdrlen); +} + static int write_loose_object(const struct object_id *oid, char *hdr, int hdrlen, const void *buf, unsigned long len, time_t mtime, unsigned flags) @@ -1879,31 +1932,15 @@ static int write_loose_object(const struct object_id *oid, char *hdr, loose_object_path(the_repository, &filename, oid); - fd = create_tmpfile(&tmp_file, filename.buf); + fd = create_tmpfile(&tmp_file, filename.buf, flags); if (fd < 0) { - if (flags & HASH_SILENT) - ret = -1; - else if (errno == EACCES) - ret = error(_("insufficient permission for adding an " - "object to repository database %s"), - get_object_directory()); - else - ret = error_errno(_("unable to create temporary file")); + ret = -1; goto cleanup; } - /* Set it up */ - git_deflate_init(&stream, zlib_compression_level); - stream.next_out = compressed; - stream.avail_out = sizeof(compressed); - the_hash_algo->init_fn(&c); - - /* First header.. */ - stream.next_in = (unsigned char *)hdr; - stream.avail_in = hdrlen; - while (git_deflate(&stream, 0) == Z_OK) - ; /* nothing */ - the_hash_algo->update_fn(&c, hdr, hdrlen); + /* Set it up and write header */ + setup_stream_and_header(&stream, compressed, sizeof(compressed), + &c, hdr, hdrlen); /* Then the data itself.. */ stream.next_in = (void *)buf; @@ -1932,16 +1969,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, close_loose_object(fd); - if (mtime) { - struct utimbuf utb; - utb.actime = mtime; - utb.modtime = mtime; - if (utime(tmp_file.buf, &utb) < 0 && - !(flags & HASH_SILENT)) - warning_errno(_("failed utime() on %s"), tmp_file.buf); - } - - ret = finalize_object_file(tmp_file.buf, filename.buf); + ret = finalize_object_file_with_mtime(tmp_file.buf, filename.buf, mtime, flags); cleanup: strbuf_release(&filename); strbuf_release(&tmp_file); From patchwork Fri Dec 17 11:26:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F370FC433EF for ; Fri, 17 Dec 2021 11:28:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233722AbhLQL2w (ORCPT ); Fri, 17 Dec 2021 06:28:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235133AbhLQL2u (ORCPT ); Fri, 17 Dec 2021 06:28:50 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC80CC061748 for ; Fri, 17 Dec 2021 03:28:48 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id i12so2007859pfd.6 for ; Fri, 17 Dec 2021 03:28:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9b/kbTf93FICYFAHzMSp6LAyX1Enp1KYcKj79dnWGi8=; b=pVtInAOiE8Ysa5yHOb0lBfxCtVlr9H+H+ul6U4iTJaotlKeTwWWe5fEi6ye4wJ4K9O LBB6Du0hZvQNi1BD0PsUCCiNSzMAmj0j/QDKe1qTM9S+zo9F+AvyzJid6/uOPSn3iLUK Kx4up4JZLy1ZqZWmYJ3DCmGar11GGIOH8a7fuqHGeb46WDkaHbK13jWEQhmd+aNa4ho9 ESohPgUWsbotD5ag3+dvMFZzyzvluBTRAv8N5uyGRyQ6YDJ7tYgCXTTUjfIk3+wybkto cJtk7V0ofGTzWJ7WHYB0Ip+zK284J+wfItjE90dd6vsRg0HkUDR9Qga9BrtTRYxZN1y4 APzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9b/kbTf93FICYFAHzMSp6LAyX1Enp1KYcKj79dnWGi8=; b=Voo6veaHMUVox2BGIRCL8KkPNTHCtZ/c1y3TXPG7Y4ZGwiMKDmofePxr06WEQuzUNq 6Wr1vVNHiZ2MGcRqz/Kk1tO65mmdWPPi5j5pfUvQz+WWL+Ny/rJK/5IxYp02XirM6e3t SL2Fj3/A5ps510A0CFnjg7eEJJsg2iEDeeywsu3ik3BZ/4TKQDtaF+PM4oX2O3/cWUtt +SnkkZbUrfR8abthVQBS0t4nXOhI4LzbELTr9h8mhyMobEVr4NYuWxdku8IPS4sL9Xaf AumQE6R4YTndhlveL5PdV568yprchX/cL8kmx0NVHoLZEVBrNYkis8F9xII67PTXDshI lvEg== X-Gm-Message-State: AOAM533CVKiLAglztm3SbcgaB1/Q8Fv2+igX1EyVOBzTgiQWkHrfiXhW Vx5TDfhtR3FMZeU6P20ZlnI= X-Google-Smtp-Source: ABdhPJyGuaVNN6wAb1TbZD/YGpcocyFCV74Aos07dOYCt0+nKLq6O6KGFv+KraVkKf9NpOExn4IV1w== X-Received: by 2002:a05:6a00:b89:b0:4ae:d9a3:ccf9 with SMTP id g9-20020a056a000b8900b004aed9a3ccf9mr2681075pfj.13.1639740528370; Fri, 17 Dec 2021 03:28:48 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:48 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 4/6] object-file.c: make "write_object_file_flags()" to support read in stream Date: Fri, 17 Dec 2021 19:26:27 +0800 Message-Id: <20211217112629.12334-5-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin We used to call "get_data()" in "unpack_non_delta_entry()" to read the entire contents of a blob object, no matter how big it is. This implementation may consume all the memory and cause OOM. This can be improved by feeding data to "stream_loose_object()" in a stream. The input stream is implemented as an interface. When streaming a large blob object to "write_loose_object()", we have no chance to run "write_object_file_prepare()" to calculate the oid in advance. So we need to handle undetermined oid in a new function called "stream_loose_object()". In "write_loose_object()", we know the oid and we can write the temporary file in the same directory as the final object, but for an object with an undetermined oid, we don't know the exact directory for the object, so we have to save the temporary file in ".git/objects/" directory instead. We will reuse "write_object_file_flags()" in "unpack_non_delta_entry()" to read the entire data contents in stream, so a new flag "HASH_STREAM" is added. When read in stream, we needn't prepare the "oid" before "write_loose_object()", only generate the header. "freshen_packed_object()" or "freshen_loose_object()" will be called inside "stream_loose_object()" after obtaining the "oid". Helped-by: Ævar Arnfjörð Bjarmason Helped-by: Jiang Xin Signed-off-by: Han Xin --- cache.h | 1 + object-file.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++ object-store.h | 5 +++ 3 files changed, 98 insertions(+) diff --git a/cache.h b/cache.h index cfba463aa9..6d68fd10a3 100644 --- a/cache.h +++ b/cache.h @@ -898,6 +898,7 @@ int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, #define HASH_FORMAT_CHECK 2 #define HASH_RENORMALIZE 4 #define HASH_SILENT 8 +#define HASH_STREAM 16 int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags); int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags); diff --git a/object-file.c b/object-file.c index dd29e5372e..2ef1d4fb00 100644 --- a/object-file.c +++ b/object-file.c @@ -1994,6 +1994,88 @@ static int freshen_packed_object(const struct object_id *oid) return 1; } +static int stream_loose_object(struct object_id *oid, char *hdr, int hdrlen, + const struct input_stream *in_stream, + unsigned long len, time_t mtime, unsigned flags) +{ + int fd, ret, err = 0, flush = 0; + unsigned char compressed[4096]; + git_zstream stream; + git_hash_ctx c; + struct object_id parano_oid; + static struct strbuf tmp_file = STRBUF_INIT; + static struct strbuf filename = STRBUF_INIT; + int dirlen; + + /* When oid is not determined, save tmp file to odb path. */ + strbuf_addf(&filename, "%s/", get_object_directory()); + + fd = create_tmpfile(&tmp_file, filename.buf, flags); + if (fd < 0) { + err = -1; + goto cleanup; + } + + /* Set it up and write header */ + setup_stream_and_header(&stream, compressed, sizeof(compressed), + &c, hdr, hdrlen); + + /* Then the data itself.. */ + do { + unsigned char *in0 = stream.next_in; + if (!stream.avail_in) { + const void *in = in_stream->read(in_stream, &stream.avail_in); + stream.next_in = (void *)in; + in0 = (unsigned char *)in; + /* All data has been read. */ + if (len + hdrlen == stream.total_in + stream.avail_in) + flush = Z_FINISH; + } + ret = git_deflate(&stream, flush); + the_hash_algo->update_fn(&c, in0, stream.next_in - in0); + if (write_buffer(fd, compressed, stream.next_out - compressed) < 0) + die(_("unable to write loose object file")); + stream.next_out = compressed; + stream.avail_out = sizeof(compressed); + } while (ret == Z_OK || ret == Z_BUF_ERROR); + + if (ret != Z_STREAM_END) + die(_("unable to deflate new object streamingly (%d)"), ret); + ret = git_deflate_end_gently(&stream); + if (ret != Z_OK) + die(_("deflateEnd on object streamingly failed (%d)"), ret); + the_hash_algo->final_oid_fn(¶no_oid, &c); + + close_loose_object(fd); + + oidcpy(oid, ¶no_oid); + + if (freshen_packed_object(oid) || freshen_loose_object(oid)) { + unlink_or_warn(tmp_file.buf); + goto cleanup; + } + + loose_object_path(the_repository, &filename, oid); + + /* We finally know the object path, and create the missing dir. */ + dirlen = directory_size(filename.buf); + if (dirlen) { + struct strbuf dir = STRBUF_INIT; + strbuf_add(&dir, filename.buf, dirlen - 1); + + if (mkdir_in_gitdir(dir.buf) < 0) { + err = -1; + goto cleanup; + } + } + + err = finalize_object_file_with_mtime(tmp_file.buf, filename.buf, mtime, flags); +cleanup: + strbuf_release(&tmp_file); + strbuf_release(&filename); + return err; +} + int write_object_file_flags(const void *buf, unsigned long len, const char *type, struct object_id *oid, unsigned flags) @@ -2001,6 +2083,16 @@ int write_object_file_flags(const void *buf, unsigned long len, char hdr[MAX_HEADER_LEN]; int hdrlen = sizeof(hdr); + /* When streaming a large blob object (marked as HASH_STREAM), + * we have no chance to run "write_object_file_prepare()" to + * calculate the "oid" in advance. Call "stream_loose_object()" + * to write loose object in stream. + */ + if (flags & HASH_STREAM) { + hdrlen = generate_object_header(hdr, hdrlen, type, len); + return stream_loose_object(oid, hdr, hdrlen, buf, len, 0, flags); + } + /* Normally if we have it in the pack then we do not bother writing * it out into .git/objects/??/?{38} file. */ diff --git a/object-store.h b/object-store.h index 952efb6a4b..4040e2c40a 100644 --- a/object-store.h +++ b/object-store.h @@ -34,6 +34,11 @@ struct object_directory { char *path; }; +struct input_stream { + const void *(*read)(const struct input_stream *, unsigned long *len); + void *data; +}; + KHASH_INIT(odb_path_map, const char * /* key: odb_path */, struct object_directory *, 1, fspathhash, fspatheq) From patchwork Fri Dec 17 11:26:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84B24C433F5 for ; Fri, 17 Dec 2021 11:28:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233593AbhLQL2x (ORCPT ); Fri, 17 Dec 2021 06:28:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233694AbhLQL2v (ORCPT ); Fri, 17 Dec 2021 06:28:51 -0500 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99A95C061746 for ; Fri, 17 Dec 2021 03:28:51 -0800 (PST) Received: by mail-pg1-x535.google.com with SMTP id j11so1823786pgs.2 for ; Fri, 17 Dec 2021 03:28:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=brw6puewM6wIOapjreSySZDw9+q290fiA4aXnkwiQ+A=; b=oBwrmOYgsA9i5VmbduC61BTJYhVtjwkrmHhLstdP9upLEEt3a6y5kcznRmKOhyFJGy ntVx5K8KtgwpL+Ojvb5ntoJhUvdBKOKMAqQWy4Z1KutlO4bJG3fFsOBj3xs7O9vAw3oU GEFPTpJbSAW5RCwt5+OlMxo19ICgttAhaDhAf0tiBvdBu6Jh/SBLBig7NEuFNn2SDO/t QD+QKBZHGZm1t1gzl5l6C5aosPgwkn3U4wq8YRLBmdZtIXjBAFOSZqaA1z5Dc52RAq0m O6FHzH8oCwcEOPi+mqlrCQ34gX2qQ48OLkWPbGXJoQYY+548vHc05iKQQw2YFSJEFULz dPLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=brw6puewM6wIOapjreSySZDw9+q290fiA4aXnkwiQ+A=; b=UBv43wZ6tRqzyy5pYC8SLHP45TsIQFoG2jXBtW0qSNd5EeNNg7f4Fs+qNCbAmuJJvj EStT/zpdSdFYEAW70sjgEsnumpyf8YZNuEgsEAk4GbuFFPMTqyTgAuevr78CZ5puftTr 5Nua8eq9ksOS+FP8+0JwFDGyRdczJM+UEDUI49ohYIHPGgVKscbR/NCrJk/kXOLuTPZ6 zlIgs3O5crGkOOH0fd6dSdkE7esFtjrnaOD8E1FntHat9c03MN6n2qi3Wb2E5Yrev/v1 FjOCJoGT6gfiw6EySafuj9VayCU2TT1MGT214bO2DH0pPvb3PDVwPV3CJ0+S30YJgzsB hicg== X-Gm-Message-State: AOAM530wyr70jBsxujkoXfBEUyP7yOzfHv4vXvG2XJIA1ktmehRFz2Ef syFEqUlQCY94ZtXAnmmt2fWyR7QLhEC4bcmD X-Google-Smtp-Source: ABdhPJyM07ExTbTIiDhOGgXrBGcZH3mSvpO4FWiikkjBUSNJZm8fhc70Pu/1Sawd+VXEEmRrXRBzXw== X-Received: by 2002:a63:1926:: with SMTP id z38mr2575595pgl.3.1639740531142; Fri, 17 Dec 2021 03:28:51 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:50 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 5/6] unpack-objects.c: add dry_run mode for get_data() Date: Fri, 17 Dec 2021 19:26:28 +0800 Message-Id: <20211217112629.12334-6-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin In dry_run mode, "get_data()" is used to verify the inflation of data, and the returned buffer will not be used at all and will be freed immediately. Even in dry_run mode, it is dangerous to allocate a full-size buffer for a large blob object. Therefore, only allocate a low memory footprint when calling "get_data()" in dry_run mode. Suggested-by: Jiang Xin Signed-off-by: Han Xin --- builtin/unpack-objects.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 4a9466295b..c4a17bdb44 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -96,15 +96,21 @@ static void use(int bytes) display_throughput(progress, consumed_bytes); } -static void *get_data(unsigned long size) +static void *get_data(unsigned long size, int dry_run) { git_zstream stream; - void *buf = xmallocz(size); + unsigned long bufsize; + void *buf; memset(&stream, 0, sizeof(stream)); + if (dry_run && size > 8192) + bufsize = 8192; + else + bufsize = size; + buf = xmallocz(bufsize); stream.next_out = buf; - stream.avail_out = size; + stream.avail_out = bufsize; stream.next_in = fill(1); stream.avail_in = len; git_inflate_init(&stream); @@ -124,6 +130,11 @@ static void *get_data(unsigned long size) } stream.next_in = fill(1); stream.avail_in = len; + if (dry_run) { + /* reuse the buffer in dry_run mode */ + stream.next_out = buf; + stream.avail_out = bufsize; + } } git_inflate_end(&stream); return buf; @@ -323,7 +334,7 @@ static void added_object(unsigned nr, enum object_type type, static void unpack_non_delta_entry(enum object_type type, unsigned long size, unsigned nr) { - void *buf = get_data(size); + void *buf = get_data(size, dry_run); if (!dry_run && buf) write_object(nr, type, buf, size); @@ -357,7 +368,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, if (type == OBJ_REF_DELTA) { oidread(&base_oid, fill(the_hash_algo->rawsz)); use(the_hash_algo->rawsz); - delta_data = get_data(delta_size); + delta_data = get_data(delta_size, dry_run); if (dry_run || !delta_data) { free(delta_data); return; @@ -396,7 +407,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, if (base_offset <= 0 || base_offset >= obj_list[nr].offset) die("offset value out of bound for delta base object"); - delta_data = get_data(delta_size); + delta_data = get_data(delta_size, dry_run); if (dry_run || !delta_data) { free(delta_data); return; From patchwork Fri Dec 17 11:26:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12684325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9799DC433F5 for ; Fri, 17 Dec 2021 11:28:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233711AbhLQL26 (ORCPT ); Fri, 17 Dec 2021 06:28:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235353AbhLQL2y (ORCPT ); Fri, 17 Dec 2021 06:28:54 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99885C061746 for ; Fri, 17 Dec 2021 03:28:54 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id w24so1576658ply.12 for ; Fri, 17 Dec 2021 03:28:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=URt16ng8WbPRlvvLmsp8d8CEpWtL65O6Ontzy1l6yBY=; b=HgxjOFtCnN4KKl8d3IYv3jtObs5BTimoQBdOR4Y3Nn25XZJshTelyQJi/o+jhSsR7x fG6dE0oeYPMPa5EnM0gfUkfvx65OjpOrvmWnxoEr5qIrngh/VaLcpsbg88Tu6E1ZV/yw mZgHjlyymD8b+DEPcqyHyA2VdRUDWNh05bZHBmKbShoc86ndfeK/8IJXvMXs9kOT2Yyt 5W3gFfm5Bs5DzohjojYYbCT3lcwahPT/Y8Fw0t9hriT/ezrTUrSc885AdLOrS6hUsTXp z3YoHt/cCHCxcFmpKroNYeJ0KwEr0Z5W2aqwwDzQ5aY3gJHmxiq21lQnbMcpEU4eXnP8 I2XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=URt16ng8WbPRlvvLmsp8d8CEpWtL65O6Ontzy1l6yBY=; b=vaSJ/KVcgZHUo0rO8DMDTgA46n6CM2S01E0YhzB68+n8zYZXZyhooe6f5/hWvnUenl Z354tBt1NM/oZhzKfcPX1ap5aa9njXurgtMQ4PcH2d+jFzHgCLMORr/QddNiShx5kilo 6MTCJw44qke5o+7l1k5R6W7hP0u9NCp365REa811lN1ZzLFyz4cLw5QOoeYMeZYEUWJG 6JE+La291p3BKUlf/8JXgUTTEygpu5vUSlP9TGXZLj4aQoVRw0YSy8J8mFvNJPUtvrT4 sRfxlXJ1RPDMq+Xe9DkvXBnObDnynhJL8vgfokg2pooJ0cx24AnSbDLn5eYvXn2/lzNQ UNeQ== X-Gm-Message-State: AOAM5334r352V2YWnR0uHX3Q/6mhdmyD8ECgnIkXS5EZctz8dgZjLazV sKQww+zjSLd5p45+zWKS/Kc= X-Google-Smtp-Source: ABdhPJx5x+utWhkzaB/rmo56j/i3Dc2ELx27ejX83YOle3ucqdxmCj5VLlZ3rvdeJhCjFMMN59MABw== X-Received: by 2002:a17:902:eb44:b0:148:b1ed:1a33 with SMTP id i4-20020a170902eb4400b00148b1ed1a33mr2691452pli.149.1639740534120; Fri, 17 Dec 2021 03:28:54 -0800 (PST) Received: from localhost.localdomain ([205.204.117.97]) by smtp.gmail.com with ESMTPSA id f10sm5194673pge.33.2021.12.17.03.28.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 03:28:53 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Derrick Stolee Cc: Han Xin Subject: [PATCH v6 6/6] unpack-objects: unpack_non_delta_entry() read data in a stream Date: Fri, 17 Dec 2021 19:26:29 +0800 Message-Id: <20211217112629.12334-7-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.34.1.52.gfcc2252aea.agit.6.5.6 In-Reply-To: <20211210103435.83656-1-chiyutianyi@gmail.com> References: <20211210103435.83656-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin We used to call "get_data()" in "unpack_non_delta_entry()" to read the entire contents of a blob object, no matter how big it is. This implementation may consume all the memory and cause OOM. By implementing a zstream version of input_stream interface, we can use a small fixed buffer for "unpack_non_delta_entry()". However, unpack non-delta objects from a stream instead of from an entrie buffer will have 10% performance penalty. Therefore, only unpack object larger than the "core.BigFileStreamingThreshold" in zstream. See the following benchmarks: hyperfine \ --setup \ 'if ! test -d scalar.git; then git clone --bare https://github.com/microsoft/scalar.git; cp scalar.git/objects/pack/*.pack small.pack; fi' \ --prepare 'rm -rf dest.git && git init --bare dest.git' Summary './git -C dest.git -c core.bigfilethreshold=512m unpack-objects Helped-by: Derrick Stolee Helped-by: Jiang Xin Signed-off-by: Han Xin --- Documentation/config/core.txt | 11 ++++ builtin/unpack-objects.c | 73 +++++++++++++++++++++++- cache.h | 1 + config.c | 5 ++ environment.c | 1 + t/t5590-unpack-non-delta-objects.sh | 87 +++++++++++++++++++++++++++++ 6 files changed, 177 insertions(+), 1 deletion(-) create mode 100755 t/t5590-unpack-non-delta-objects.sh diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index c04f62a54a..601b7a2418 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -424,6 +424,17 @@ be delta compressed, but larger binary media files won't be. + Common unit suffixes of 'k', 'm', or 'g' are supported. +core.bigFileStreamingThreshold:: + Files larger than this will be streamed out to a temporary + object file while being hashed, which will when be renamed + in-place to a loose object, particularly if the + `core.bigFileThreshold' setting dictates that they're always + written out as loose objects. ++ +Default is 128 MiB on all platforms. ++ +Common unit suffixes of 'k', 'm', or 'g' are supported. + core.excludesFile:: Specifies the pathname to the file that contains patterns to describe paths that are not meant to be tracked, in addition diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index c4a17bdb44..42e1033d85 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -331,11 +331,82 @@ static void added_object(unsigned nr, enum object_type type, } } +struct input_zstream_data { + git_zstream *zstream; + unsigned char buf[8192]; + int status; +}; + +static const void *feed_input_zstream(const struct input_stream *in_stream, + unsigned long *readlen) +{ + struct input_zstream_data *data = in_stream->data; + git_zstream *zstream = data->zstream; + void *in = fill(1); + + if (!len || data->status == Z_STREAM_END) { + *readlen = 0; + return NULL; + } + + zstream->next_out = data->buf; + zstream->avail_out = sizeof(data->buf); + zstream->next_in = in; + zstream->avail_in = len; + + data->status = git_inflate(zstream, 0); + use(len - zstream->avail_in); + *readlen = sizeof(data->buf) - zstream->avail_out; + + return data->buf; +} + +static void write_stream_blob(unsigned nr, unsigned long size) +{ + git_zstream zstream; + struct input_zstream_data data; + struct input_stream in_stream = { + .read = feed_input_zstream, + .data = &data, + }; + + memset(&zstream, 0, sizeof(zstream)); + memset(&data, 0, sizeof(data)); + data.zstream = &zstream; + git_inflate_init(&zstream); + + if (write_object_file_flags(&in_stream, size, + type_name(OBJ_BLOB), + &obj_list[nr].oid, + HASH_STREAM)) + die(_("failed to write object in stream")); + + if (zstream.total_out != size || data.status != Z_STREAM_END) + die(_("inflate returned %d"), data.status); + git_inflate_end(&zstream); + + if (strict) { + struct blob *blob = lookup_blob(the_repository, &obj_list[nr].oid); + if (blob) + blob->object.flags |= FLAG_WRITTEN; + else + die(_("invalid blob object from stream")); + } + obj_list[nr].obj = NULL; +} + static void unpack_non_delta_entry(enum object_type type, unsigned long size, unsigned nr) { - void *buf = get_data(size, dry_run); + void *buf; + + /* Write large blob in stream without allocating full buffer. */ + if (!dry_run && type == OBJ_BLOB && size > big_file_streaming_threshold) { + write_stream_blob(nr, size); + return; + } + buf = get_data(size, dry_run); if (!dry_run && buf) write_object(nr, type, buf, size); else diff --git a/cache.h b/cache.h index 6d68fd10a3..976f9cf656 100644 --- a/cache.h +++ b/cache.h @@ -975,6 +975,7 @@ extern size_t packed_git_window_size; extern size_t packed_git_limit; extern size_t delta_base_cache_limit; extern unsigned long big_file_threshold; +extern unsigned long big_file_streaming_threshold; extern unsigned long pack_size_limit_cfg; /* diff --git a/config.c b/config.c index c5873f3a70..7b122a142a 100644 --- a/config.c +++ b/config.c @@ -1408,6 +1408,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb) return 0; } + if (!strcmp(var, "core.bigfilestreamingthreshold")) { + big_file_streaming_threshold = git_config_ulong(var, value); + return 0; + } + if (!strcmp(var, "core.packedgitlimit")) { packed_git_limit = git_config_ulong(var, value); return 0; diff --git a/environment.c b/environment.c index 0d06a31024..04bba593de 100644 --- a/environment.c +++ b/environment.c @@ -47,6 +47,7 @@ size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; unsigned long big_file_threshold = 512 * 1024 * 1024; +unsigned long big_file_streaming_threshold = 128 * 1024 * 1024; int pager_use_color = 1; const char *editor_program; const char *askpass_program; diff --git a/t/t5590-unpack-non-delta-objects.sh b/t/t5590-unpack-non-delta-objects.sh new file mode 100755 index 0000000000..11c70e192c --- /dev/null +++ b/t/t5590-unpack-non-delta-objects.sh @@ -0,0 +1,87 @@ +#!/bin/sh +# +# Copyright (c) 2021 Han Xin +# + +test_description='Test unpack-objects when receive pack' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +prepare_dest () { + test_when_finished "rm -rf dest.git" && + git init --bare dest.git && + git -C dest.git config core.bigFileStreamingThreshold $1 && + git -C dest.git config core.bigFileThreshold $1 +} + +test_expect_success "setup repo with big blobs (1.5 MB)" ' + test-tool genrandom foo 1500000 >big-blob && + test_commit --append foo big-blob && + test-tool genrandom bar 1500000 >big-blob && + test_commit --append bar big-blob && + ( + cd .git && + find objects/?? -type f | sort + ) >expect && + PACK=$(echo main | git pack-objects --revs test) +' + +test_expect_success 'setup env: GIT_ALLOC_LIMIT to 1MB' ' + GIT_ALLOC_LIMIT=1m && + export GIT_ALLOC_LIMIT +' + +test_expect_success 'fail to unpack-objects: cannot allocate' ' + prepare_dest 2m && + test_must_fail git -C dest.git unpack-objects err && + grep "fatal: attempting to allocate" err && + ( + cd dest.git && + find objects/?? -type f | sort + ) >actual && + test_file_not_empty actual && + ! test_cmp expect actual +' + +test_expect_success 'unpack big object in stream' ' + prepare_dest 1m && + git -C dest.git unpack-objects actual && + test_cmp expect actual +' + +test_expect_success 'unpack big object in stream with existing oids' ' + prepare_dest 1m && + git -C dest.git index-pack --stdin actual && + test_must_be_empty actual && + git -C dest.git unpack-objects actual && + test_must_be_empty actual +' + +test_expect_success 'unpack-objects dry-run' ' + prepare_dest 1m && + git -C dest.git unpack-objects -n actual && + test_must_be_empty actual +' + +test_done