From patchwork Fri Dec 9 21:44:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13070140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32D41C04FDE for ; Fri, 9 Dec 2022 21:44:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229956AbiLIVoe (ORCPT ); Fri, 9 Dec 2022 16:44:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229911AbiLIVoc (ORCPT ); Fri, 9 Dec 2022 16:44:32 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 342D5303EF for ; Fri, 9 Dec 2022 13:44:32 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 134-20020a63008c000000b00478b9313e0eso3790538pga.9 for ; Fri, 09 Dec 2022 13:44:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lwKbvaXU+L0SbOjSLYra6EM27mGx94iI5RwBlo+OQpM=; b=J3Jn8rTw4BR/pAhENzv2HsqMqy9MCA8rrXR7zpKHs6gc071jjIEOxeJxRbVNNSBB0A yTNVUb8WIHK3C2c9Pb++RAeSQXNBCkDtaDcKrhVKjnzMeOjtsLthniaCLKE0bAjJvRFH g2F88+AiP9Qf9K/UJELCUtUV6g5hn4lueU9bUbRricuh4HvxWljHPSN5aSdmollO2Lp9 bdHrn3HZ/wDWpwMEhneXeukTMNyePL0fyKz8PDWQ9DnD5+fKXLQHOwYV97C7J8M8eFIZ Ocrnw8B/Ach52wVJV7ngHlfHDBjZhCmLoBtwEf0qucUPba85na6pAEtZtCimkrBTSr3A 1vSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lwKbvaXU+L0SbOjSLYra6EM27mGx94iI5RwBlo+OQpM=; b=O3CdYlc9I2U9X3xXMR2IB4PgRf+wAfN5xjTa1Dx725w2u9UVz/HWxAg13ylvXLf3z0 kuZKRboWOwPb8voZ0alHcC5OJm8bSZ1EwiBiqbvKyyYcphCBKc2VWQKEfEmxnsULi7Zl /d54UfHoNmWjw6M82gxAZbvgj6jCkt104YVv+FLfozGJfjLWzcEE9A/EuA+2OURZKbf8 UIky1MjW+y2TSruQCWf9NfrqffHVVgiLXtPuK538XMtLxRn5uAfj3RcOftWXiUs2d3PQ vNxhvhaqW5qmIPK9zB34mqrrIA0FYzHi0skWHpedu/3ssqdyofModib/BabsvH9/G/NH +nyg== X-Gm-Message-State: ANoB5pl7ataOzhN4ptu6LSzfAz6osio7ZvGckf5/4CCEhutav9m9foeF qOgBCTSXmP0izZFVLf8URP9+3ulOZPZrLD2Sle8LYSADMkI2SpQFAwBsPGAr2R2XF54/2XLI8j9 PV1WQceaBkCflSt1VeC2MDeX/ptukruHqaYZgvSJ1TsCghDXlx0xo2NkkHXvsVlGHsQaZbOfc6J QN X-Google-Smtp-Source: AA0mqf6KerEcHz0zKqgrXivtnVJTzDYe6Fgl+QmqkWhp+woEVvknwT6jUBRREKQpxfRluBLpL5xl1EUNx7Aznvrcuucp X-Received: from twelve4.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:437a]) (user=jonathantanmy job=sendgmr) by 2002:a17:902:6b07:b0:189:93a5:bcc6 with SMTP id o7-20020a1709026b0700b0018993a5bcc6mr49934054plk.156.1670622271662; Fri, 09 Dec 2022 13:44:31 -0800 (PST) Date: Fri, 9 Dec 2022 13:44:22 -0800 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: Subject: [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , peff@peff.net, avarab@gmail.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Its last user was removed in 97b2fa08b6 (fetch-pack: drop custom loose object cache, 2018-11-12), so we can remove it. Helped-by: Jeff King Signed-off-by: Jonathan Tan --- object-file.c | 3 --- object-store.h | 4 +--- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/object-file.c b/object-file.c index 26290554bb..cf724bc19b 100644 --- a/object-file.c +++ b/object-file.c @@ -1575,9 +1575,6 @@ static int do_oid_object_info_extended(struct repository *r, if (find_pack_entry(r, real, &e)) break; - if (flags & OBJECT_INFO_IGNORE_LOOSE) - return -1; - /* Most likely it's a loose object. */ if (!loose_object_info(r, real, oi, flags)) return 0; diff --git a/object-store.h b/object-store.h index 1be57abaf1..b1ec0bde82 100644 --- a/object-store.h +++ b/object-store.h @@ -434,13 +434,11 @@ struct object_info { #define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2 /* Do not retry packed storage after checking packed and loose storage */ #define OBJECT_INFO_QUICK 8 -/* Do not check loose object */ -#define OBJECT_INFO_IGNORE_LOOSE 16 /* * Do not attempt to fetch the object if missing (even if fetch_is_missing is * nonzero). */ -#define OBJECT_INFO_SKIP_FETCH_OBJECT 32 +#define OBJECT_INFO_SKIP_FETCH_OBJECT 16 /* * This is meant for bulk prefetching of missing blobs in a partial * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK From patchwork Fri Dec 9 21:44:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13070141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48834C4332F for ; Fri, 9 Dec 2022 21:44:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229981AbiLIVok (ORCPT ); Fri, 9 Dec 2022 16:44:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229961AbiLIVof (ORCPT ); Fri, 9 Dec 2022 16:44:35 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61604303EF for ; Fri, 9 Dec 2022 13:44:34 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 195-20020a2505cc000000b0071163981d18so6368545ybf.13 for ; Fri, 09 Dec 2022 13:44:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IC3BUgof840lhumwoRwCw8PH+u3UlMHZYmpnyBupWOU=; b=tkCi1y8qVN4IhMstv1A1nasex/FK9mFUAjKu19pCiOg/RNk3A3Rs7zt72DhZZRJ9bv Hk6qu5wJquciV0DU92HceqAw5FicjdRJpzPwb2ZKleoHKrxDoTX+H3QaMpo3Rnd0qa1/ Z21FjWVUIhxUhTIvELgfJ5WsnEQma46p0U5iIcEz0B/oFy9m5Xrswxnu+fYneg9Uf4WG wl4+6ZST6QjiQ7zEsnFB49sTQU1El1iNFABRHS42hzOdPv4ZNQr5BCHHrBaUEW6oVOrJ QPHQBxs+RpPfR120hK7OMh1LqpXg1ne1wUuijupOS/ey4SrPEmh5FxLiDuK/P+y5bzU0 BHvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IC3BUgof840lhumwoRwCw8PH+u3UlMHZYmpnyBupWOU=; b=2zana7/ZCzE++a1DrBNF7WvIIbBdg0tX9DI6n9Cr0daziw7soCrYfUJGrOJKzIkhHS +xkC+GvEUz+jni7wcuQTBEGaFa11/X8v2UbjADm7TAftQgRNzWaSPq5MQ1IYb8xb3yHd AsdqxlHMq5MZjZB/gsX5J8CQjt93RzRqguVDVmAHheAX1BL8A3qZSK2D4rqC+G+eHjpQ zfefLr53qku3r24R9oDr6EEOrUQneJtdfI1SN1wMQyt3izmp7mJgCHc6v03rMSI7x5jp hjyDL5rjsVMDWfvS9g2+QidIM3PPBbARXWvN5c4iAEbDIjK6ArGpe9wUyoC/ZiuJyAzb 5iNA== X-Gm-Message-State: ANoB5pnCxfCmKpXMDeTZCrCiFBnAaI1eDzSWKWveDY/bQ8vd1Tl6UQ7K Nlvwnt71HhuuQ7yLRDb9c6sE7ZgBk/tdy8nyeOd7sqF4dKzVxEMcdQA5KcJ3nacUT2ntL251r1j c6wkO7fGDU4dnWDIhKadeJsO2ouJ6/rAPZsfSYHPdmU6D6eoWAAzBO6hqjxD9F7n9uytATnMgJo qi X-Google-Smtp-Source: AA0mqf65A6irwbQyleOQhrWa08L9X05wdVXkJ5Cn0mSwWhiQIHxHzToCtN/gKCUCbjTF5FTB/JB2wb42zcQSxGVFlfMw X-Received: from twelve4.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:437a]) (user=jonathantanmy job=sendgmr) by 2002:a25:d613:0:b0:6f9:6faa:d508 with SMTP id n19-20020a25d613000000b006f96faad508mr37307153ybg.421.1670622273593; Fri, 09 Dec 2022 13:44:33 -0800 (PST) Date: Fri, 9 Dec 2022 13:44:23 -0800 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: <4b2fb687432c2ce1471d9eb02e86b3acc43cc953.1670622176.git.jonathantanmy@google.com> Subject: [PATCH v4 2/4] object-file: refactor map_loose_object_1() From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , peff@peff.net, avarab@gmail.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This function can do 3 things: 1. Gets an fd given a path 2. Simultaneously gets a path and fd given an OID 3. Memory maps an fd Keep 3 (renaming the function accordingly) and inline 1 and 2 into their respective callers. Signed-off-by: Jonathan Tan --- object-file.c | 50 ++++++++++++++++++++++++-------------------------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/object-file.c b/object-file.c index cf724bc19b..429e3a746d 100644 --- a/object-file.c +++ b/object-file.c @@ -1211,35 +1211,25 @@ static int quick_has_loose(struct repository *r, } /* - * Map the loose object at "path" if it is not NULL, or the path found by - * searching for a loose object named "oid". + * Map and close the given loose object fd. The path argument is used for + * error reporting. */ -static void *map_loose_object_1(struct repository *r, const char *path, - const struct object_id *oid, unsigned long *size) +static void *map_fd(int fd, const char *path, unsigned long *size) { - void *map; - int fd; - - if (path) - fd = git_open(path); - else - fd = open_loose_object(r, oid, &path); - map = NULL; - if (fd >= 0) { - struct stat st; + void *map = NULL; + struct stat st; - if (!fstat(fd, &st)) { - *size = xsize_t(st.st_size); - if (!*size) { - /* mmap() is forbidden on empty files */ - error(_("object file %s is empty"), path); - close(fd); - return NULL; - } - map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0); + if (!fstat(fd, &st)) { + *size = xsize_t(st.st_size); + if (!*size) { + /* mmap() is forbidden on empty files */ + error(_("object file %s is empty"), path); + close(fd); + return NULL; } - close(fd); + map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0); } + close(fd); return map; } @@ -1247,7 +1237,12 @@ void *map_loose_object(struct repository *r, const struct object_id *oid, unsigned long *size) { - return map_loose_object_1(r, NULL, oid, size); + const char *p; + int fd = open_loose_object(r, oid, &p); + + if (fd < 0) + return NULL; + return map_fd(fd, p, size); } enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, @@ -2789,13 +2784,16 @@ int read_loose_object(const char *path, struct object_info *oi) { int ret = -1; + int fd; void *map = NULL; unsigned long mapsize; git_zstream stream; char hdr[MAX_HEADER_LEN]; unsigned long *size = oi->sizep; - map = map_loose_object_1(the_repository, path, NULL, &mapsize); + fd = git_open(path); + if (fd >= 0) + map = map_fd(fd, path, &mapsize); if (!map) { error_errno(_("unable to mmap %s"), path); goto out; From patchwork Fri Dec 9 21:44:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13070142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA64AC4332F for ; Fri, 9 Dec 2022 21:44:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229963AbiLIVou (ORCPT ); Fri, 9 Dec 2022 16:44:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229968AbiLIVoj (ORCPT ); Fri, 9 Dec 2022 16:44:39 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F095B6D9E for ; Fri, 9 Dec 2022 13:44:36 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id l10-20020a170902f68a00b00189d1728848so5158024plg.2 for ; Fri, 09 Dec 2022 13:44:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Jan3IV1IbL5cNKBIlCNd8jqGClxe8JXbkPr2K44ifGU=; b=nfPmtOy6ZJJh+ALqSYMQSGQV3qERLbW3dSbIQgvx3swGQMKQr4sL5uLIwEolYss1he nDVEKxNjudjocp+9gpQwLz1VGexKyhLpM2ShqOa2ztQ3QB4v2A6eQP/wfy2qbN0SDSYY Z1CT8ZhzgF0wM9Ku+kph23oWt/bEXUUhpv8qbjg8zZruNs4tffuDfZoSTieWkPHX4mme IgEVTigH+CyzmsKPq/UP6R0DmQJzdUs1HS6fKXl/Bd4YXLzmjrI5OAnZJI4ulc8Gv0Px nNuzm2Xud9stk1b25d0RmaQn5FIHgkiISOG0nJgPZBAgbxK0yqihpo5lqn6WmPTOmGPf S6AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Jan3IV1IbL5cNKBIlCNd8jqGClxe8JXbkPr2K44ifGU=; b=iS01+K0aArLtCwMOddmk1bOakQY1+qiArjUtoIXlrEDjYIYta+XcvHtes1RLD56HcH OV+0IncziY9W16cpc4N+8q2ztdkn0hZiVSpCCm8wj8Je+GPjvoDJRf3WcLlIf0YL+WRZ NTTRrmQ/6V48HNEJ+nFDplc3p9gqUIEdA9HjiiU2TeiRkzMp40AhzM31HmTt2UzvL8gc EuIN1fCCYxBfYAkkAvLqUO9+91gnVXDuOdaVNNR1h+mvKvWlsJaPQzYZZ5Tu58P3nAf5 r5KjeuiTesF8BVOnhjPRs15UJDPxGXfi6Y44V9i2hY42KBrZsAXjrrvpQZM7AKGIBDLN RiUw== X-Gm-Message-State: ANoB5pnKEal5fNp8Hqc7B6hCoyvAffG5ufH6Wrb94ps2pESKJ3BAfw4b H6x4i3j5NskGqKGjyu1/ghAYlJbV3KoO36zOk5nR1O2yq5USVZuWT0L7TKaB95EjW5vscPlWHqB JW+XNU8HanzsjVACTiy+yDMVLH/hRaDNesYVlHSpfiEtPRZtgTuTrXz54WCv/1RMhgQ1+WeL0Mq 3h X-Google-Smtp-Source: AA0mqf6Vafp+m0EHAPw/qKEx3ekIYWyxGTWpqXmNwyiZK8aCZEqKOudiCpJt8gs9L8q//6RK4HTQ2j2JNcYf+CmxF9we X-Received: from twelve4.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:437a]) (user=jonathantanmy job=sendgmr) by 2002:a17:90a:bd94:b0:219:1d0a:34a6 with SMTP id z20-20020a17090abd9400b002191d0a34a6mr2274pjr.1.1670622275472; Fri, 09 Dec 2022 13:44:35 -0800 (PST) Date: Fri, 9 Dec 2022 13:44:24 -0800 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: <07d28db92c2c61358755b3d501bc5bd35a760de1.1670622176.git.jonathantanmy@google.com> Subject: [PATCH v4 3/4] object-file: emit corruption errors when detected From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , peff@peff.net, avarab@gmail.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Instead of relying on errno being preserved across function calls, teach do_oid_object_info_extended() to itself report object corruption when it first detects it. There are 3 types of corruption being detected: - when a replacement object is missing - when a loose object is corrupt - when a packed object is corrupt and the object cannot be read in another way Note that in the RHS of this patch's diff, a check for ENOENT that was introduced in 3ba7a06552 (A loose object is not corrupt if it cannot be read due to EMFILE, 2010-10-28) is also removed. The purpose of this check is to avoid a false report of corruption if the errno contains something like EMFILE (or anything that is not ENOENT), in which case a more generic report is presented. Because, as of this patch, we no longer rely on such a heuristic to determine corruption, but surface the error message at the point when we read something that we did not expect, this check is no longer necessary. Besides being more resilient, this also prepares for a future patch in which an indirect caller of do_oid_object_info_extended() will need such functionality. Helped-by: Jeff King Signed-off-by: Jonathan Tan --- object-file.c | 55 +++++++++++++++++++++++++------------------------- object-store.h | 3 +++ 2 files changed, 31 insertions(+), 27 deletions(-) diff --git a/object-file.c b/object-file.c index 429e3a746d..2a0df39822 100644 --- a/object-file.c +++ b/object-file.c @@ -1422,7 +1422,9 @@ static int loose_object_info(struct repository *r, struct object_info *oi, int flags) { int status = 0; + int fd; unsigned long mapsize; + const char *path; void *map; git_zstream stream; char hdr[MAX_HEADER_LEN]; @@ -1443,7 +1445,6 @@ static int loose_object_info(struct repository *r, * object even exists. */ if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { - const char *path; struct stat st; if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) return quick_has_loose(r, oid) ? 0 : -1; @@ -1454,7 +1455,13 @@ static int loose_object_info(struct repository *r, return 0; } - map = map_loose_object(r, oid, &mapsize); + fd = open_loose_object(r, oid, &path); + if (fd < 0) { + if (errno != ENOENT) + error_errno(_("unable to open loose object %s"), path); + return -1; + } + map = map_fd(fd, path, &mapsize); if (!map) return -1; @@ -1492,6 +1499,10 @@ static int loose_object_info(struct repository *r, break; } + if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) + die(_("loose object %s (stored in %s) is corrupt"), + oid_to_hex(oid), path); + git_inflate_end(&stream); cleanup: munmap(map, mapsize); @@ -1601,6 +1612,15 @@ static int do_oid_object_info_extended(struct repository *r, continue; } + if (flags & OBJECT_INFO_DIE_IF_CORRUPT) { + const struct packed_git *p; + if ((flags & OBJECT_INFO_LOOKUP_REPLACE) && !oideq(real, oid)) + die(_("replacement %s not found for %s"), + oid_to_hex(real), oid_to_hex(oid)); + if ((p = has_packed_and_bad(r, real))) + die(_("packed object %s (stored in %s) is corrupt"), + oid_to_hex(real), p->pack_name); + } return -1; } @@ -1653,7 +1673,8 @@ int oid_object_info(struct repository *r, static void *read_object(struct repository *r, const struct object_id *oid, enum object_type *type, - unsigned long *size) + unsigned long *size, + int die_if_corrupt) { struct object_info oi = OBJECT_INFO_INIT; void *content; @@ -1661,7 +1682,8 @@ static void *read_object(struct repository *r, oi.sizep = size; oi.contentp = &content; - if (oid_object_info_extended(r, oid, &oi, 0) < 0) + if (oid_object_info_extended(r, oid, &oi, die_if_corrupt + ? OBJECT_INFO_DIE_IF_CORRUPT : 0) < 0) return NULL; return content; } @@ -1697,35 +1719,14 @@ void *read_object_file_extended(struct repository *r, int lookup_replace) { void *data; - const struct packed_git *p; - const char *path; - struct stat st; const struct object_id *repl = lookup_replace ? lookup_replace_object(r, oid) : oid; errno = 0; - data = read_object(r, repl, type, size); + data = read_object(r, repl, type, size, 1); if (data) return data; - obj_read_lock(); - if (errno && errno != ENOENT) - die_errno(_("failed to read object %s"), oid_to_hex(oid)); - - /* die if we replaced an object with one that does not exist */ - if (repl != oid) - die(_("replacement %s not found for %s"), - oid_to_hex(repl), oid_to_hex(oid)); - - if (!stat_loose_object(r, repl, &st, &path)) - die(_("loose object %s (stored in %s) is corrupt"), - oid_to_hex(repl), path); - - if ((p = has_packed_and_bad(r, repl))) - die(_("packed object %s (stored in %s) is corrupt"), - oid_to_hex(repl), p->pack_name); - obj_read_unlock(); - return NULL; } @@ -2268,7 +2269,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime) if (has_loose_object(oid)) return 0; - buf = read_object(the_repository, oid, &type, &len); + buf = read_object(the_repository, oid, &type, &len, 0); if (!buf) return error(_("cannot read object for %s"), oid_to_hex(oid)); hdrlen = format_object_header(hdr, sizeof(hdr), type, len); diff --git a/object-store.h b/object-store.h index b1ec0bde82..98c1d67946 100644 --- a/object-store.h +++ b/object-store.h @@ -445,6 +445,9 @@ struct object_info { */ #define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK) +/* Die if object corruption (not just an object being missing) was detected. */ +#define OBJECT_INFO_DIE_IF_CORRUPT 32 + int oid_object_info_extended(struct repository *r, const struct object_id *, struct object_info *, unsigned flags); From patchwork Fri Dec 9 21:44:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13070143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35C9FC04FDE for ; Fri, 9 Dec 2022 21:44:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230019AbiLIVov (ORCPT ); Fri, 9 Dec 2022 16:44:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229982AbiLIVop (ORCPT ); Fri, 9 Dec 2022 16:44:45 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D84EFB6DAF for ; Fri, 9 Dec 2022 13:44:37 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id l10-20020a170902f68a00b00189d1728848so5158054plg.2 for ; Fri, 09 Dec 2022 13:44:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JPpuIbJOfBZ40jm8I4bnWOn08D+JMmEeOVygJEeJuNM=; b=AYtLS0knMGg41O0tol3pyGlvJshBloKJpbXFMBJUW3u5AeykQ0OjbxF+ey7lQnS7CF us60Frcp6b380smAI8TVtaMwnu0nRKRhQXzMR4nnjAZLLQDa567/orKyi4+p2Sg5/55s yvEVC/BH8q3PMeskJ+2suSTMx5bvIo58qqMhEpYyEbJL5hdcFtejaOkNCSwpPVNQ19I2 5vuK7/oJeaGoTiuj/HDJen2v65VA05aDsbHZ9ItJnxzbDmxnkorZWLI9+TPIwsIFJciI X19uhyWUNElXSVUQV6zq1T1aQzTFjthMVkl3Qa2C7Trk9+ReTZwSxrfS7YGUdEchNm7M CjAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JPpuIbJOfBZ40jm8I4bnWOn08D+JMmEeOVygJEeJuNM=; b=kmgtMXy1j2mFRWktrMq5Iw1Ji44QZEcdqM/QuCGM7nm62PtbZr6DQUjZr4Qad8b5mQ QBScGGsEYOD1xtdvRlFxAhPPAdvqdexij8OMsRgxGPTwWkzbnAKNmOyg4arqYrNOZDgY 0c+i6VfJLDxT3LcZZbjMlduCWsZjXPtd+2JCNc6BuZSaZm+gQkiuUn7jVxjpft6Inx8s CthGYbiEjtOjxzAkSeGSSa7kBmfZMhu5x1u82hfctK/zTrpFzfWACP4Sj8N4KW7LDqgl 23HO0Vh5Ial+XQ2m8wMYfDzXwtUXpFi3P0GQDIAQm2G8aMPuJeOJnoIJ4TV1hSbWpITm UW+Q== X-Gm-Message-State: ANoB5pktTYQgqTFqmR57RUVsyapMPqpUmCnx/t9ecLmy0YxiNsxT+ayc r76uZfAmalxE5Z5fUBUE9I23aAAzkDz6r4qAC2MlPASLa7kReYb/IYcfjbBQEYRdBMO9oOAkzMW Pbbef36XSFOryoZu1cKJVWCoAs4yG64jxXkeHcNnRncq6OAiVtDfQPOvh9AxBM3k7nYJ/61xUR/ AM X-Google-Smtp-Source: AA0mqf4sFEKmw9S5x7XUdLT2QYUKOwXFFa9D4W3eKVXB7YE5i7mIlK8B4YMNjCVMHP0tq5b028EOa5U2sDQZ2NY9f7+A X-Received: from twelve4.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:437a]) (user=jonathantanmy job=sendgmr) by 2002:a05:6a00:1941:b0:56b:a80f:38d4 with SMTP id s1-20020a056a00194100b0056ba80f38d4mr78930375pfk.12.1670622277489; Fri, 09 Dec 2022 13:44:37 -0800 (PST) Date: Fri, 9 Dec 2022 13:44:25 -0800 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: <1a0cd5b244652fc821714380bfd3cb5425388c8b.1670622176.git.jonathantanmy@google.com> Subject: [PATCH v4 4/4] commit: don't lazy-fetch commits From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , peff@peff.net, avarab@gmail.com, Junio C Hamano Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When parsing commits, fail fast when the commit is missing or corrupt, instead of attempting to fetch them. This is done by inlining repo_read_object_file() and setting the flag that prevents fetching. This is motivated by a situation in which through a bug (not necessarily through Git), there was corruption in the object store of a partial clone. In this particular case, the problem was exposed when "git gc" tried to expire reflogs, which calls repo_parse_commit(), which triggers fetches of the missing commits. (There are other possible solutions to this problem including passing an argument from "git gc" to "git reflog" to inhibit all lazy fetches, but I think that this fix is at the wrong level - fixing "git reflog" means that this particular command works fine, or so we think (it will fail if it somehow needs to read a legitimately missing blob, say, a .gitmodules file), but fixing repo_parse_commit() will fix a whole class of bugs.) Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano --- commit.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/commit.c b/commit.c index 572301b80a..a02723f06b 100644 --- a/commit.c +++ b/commit.c @@ -508,6 +508,17 @@ int repo_parse_commit_internal(struct repository *r, enum object_type type; void *buffer; unsigned long size; + struct object_info oi = { + .typep = &type, + .sizep = &size, + .contentp = &buffer, + }; + /* + * Git does not support partial clones that exclude commits, so set + * OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing. + */ + int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT | + OBJECT_INFO_DIE_IF_CORRUPT; int ret; if (!item) @@ -516,8 +527,8 @@ int repo_parse_commit_internal(struct repository *r, return 0; if (use_commit_graph && parse_commit_in_graph(r, item)) return 0; - buffer = repo_read_object_file(r, &item->object.oid, &type, &size); - if (!buffer) + + if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0) return quiet_on_missing ? -1 : error("Could not read %s", oid_to_hex(&item->object.oid));