From patchwork Fri Aug 23 22:46:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776096 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 265D41494D6 for ; Fri, 23 Aug 2024 22:46:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453207; cv=none; b=RUrSbv4BBz2U7O6S3YIaCs9RYw2dVjwUlUMHU1X57chOfOFFxKYEkcdF+5ag0P8/Eelo9dFm9ETIZJW5e9S6sGP7Mgtjy9gPjAUbDbyhrZCdCXeBE/oCgzHdrSth/EIlhlimjshZTMhR/X9xgHrv6W86eDR5MXDI9hU9zwtBoCA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453207; c=relaxed/simple; bh=SArNgN2UzZkvQadUCzwbYY3hHG/wMQSy2RtMHHcigeo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rFHR4xnFQdvySHlbbr4oUhVarCEl4ZEfrbkr6tF46dlpPk0RRp62UYXyBFoYClVYaXdRw8d+HnpcAs3mKb5Y3529iID2ZxllzBls9MwTvycNUo3xoDvOy+0stcLYiSyxAv1RsWu7zMoNqBoNZon1IQt2NsiX6Q24tE66pG+QvrM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=0azNF5v+; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="0azNF5v+" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 3A0DD1F47C; Fri, 23 Aug 2024 22:46:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453191; bh=SArNgN2UzZkvQadUCzwbYY3hHG/wMQSy2RtMHHcigeo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0azNF5v+uGWW9adkzRuIn9QgWNQkooPZFEyD9q1jXMKMkb729SjwkJ9VvuPPXwRY0 9yskjXfeIEsJF5pMZi2jkWCGmAKf16Yeh87gaNwV4EzHgouRSJDqhsOVKugKIS9IAC QGBcuPvtt3VJO8YZaqQ+HvHjH++KHWIbI32vRz3o= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 01/10] packfile: move sizep computation Date: Fri, 23 Aug 2024 22:46:21 +0000 Message-ID: <20240823224630.1180772-2-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff King Moving the sizep computation now makes the next commit to avoid redundant object info lookups easier to understand. There is no user-visible change, here. [ew: commit message] Signed-off-by: Jeff King Signed-off-by: Eric Wong --- packfile.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/packfile.c b/packfile.c index 813584646f..4028763947 100644 --- a/packfile.c +++ b/packfile.c @@ -1536,24 +1536,24 @@ int packed_object_info(struct repository *r, struct packed_git *p, type = OBJ_BAD; } else { type = unpack_object_header(p, &w_curs, &curpos, &size); - } - if (!oi->contentp && oi->sizep) { - if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { - off_t tmp_pos = curpos; - off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos, - type, obj_offset); - if (!base_offset) { - type = OBJ_BAD; - goto out; + if (oi->sizep) { + if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { + off_t tmp_pos = curpos; + off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos, + type, obj_offset); + if (!base_offset) { + type = OBJ_BAD; + goto out; + } + *oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos); + if (*oi->sizep == 0) { + type = OBJ_BAD; + goto out; + } + } else { + *oi->sizep = size; } - *oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos); - if (*oi->sizep == 0) { - type = OBJ_BAD; - goto out; - } - } else { - *oi->sizep = size; } } From patchwork Fri Aug 23 22:46:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776097 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DB091494D6 for ; Fri, 23 Aug 2024 22:46:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453214; cv=none; b=XgCHF21rsXySIO2zXnTvRipl+XdWN2Zx1HYT2LuwbutDpKCKOry/Vx7olafvOG8WSKy+4UlT2KStdTp5a885ZhrYYsTqohe/TUaQUgPfmn5ATE1o1n3Rk9XAj9bMgNbwjshXYaChkkeQhbq7bVsch4672sH8E2/1v0FSQ4BWpBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453214; c=relaxed/simple; bh=IEJVhlQKubt6dC8wtYYqpGri8d+i72lLqxRL/avBFaY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Uai1OMxitC0ZUeAsuOYa7YTS3rdGZvdH5813DwXmdedAOBzUVszpnzUEcYAWVVZjXoIq8g28DTbgDWZTS2cD6s6XV/0eTJ4AZBq+7U57Lsw1RV4qCUhfSX0WAf4CwfX7M0Ml9l1VOIEw4oke4sACc64eoydhAlFHO4REl+eHF+o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=dZ/Z7DSK; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="dZ/Z7DSK" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6C1FE1F513; Fri, 23 Aug 2024 22:46:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453191; bh=IEJVhlQKubt6dC8wtYYqpGri8d+i72lLqxRL/avBFaY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dZ/Z7DSKAhP9fDrSB69sc8F0sOOoHzp28gGNF+/7gLo6X04gARqHKVEm0n9WMo4eW 0uuRHCsPzM6lCGzdxsWBHDM7uleT2hWOVItfo2l1pKjq22slahzLfozX0vxbbboE2N 5PpvOkVRr/phvgKuy4ieP5FGArAiE7qy8S4QV2gE= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 02/10] packfile: allow content-limit for cat-file Date: Fri, 23 Aug 2024 22:46:22 +0000 Message-ID: <20240823224630.1180772-3-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff King Avoid unnecessary round trips to the object store to speed up cat-file contents retrievals. The majority of packed objects don't benefit from the streaming interface at all and we end up having to load them in core anyways to satisfy our streaming API. This drops the runtime of `git cat-file --batch-all-objects --unordered --batch' on git.git from ~7.1s to ~6.1s on Jeff's machine. [ew: commit message] Signed-off-by: Jeff King Signed-off-by: Eric Wong --- builtin/cat-file.c | 17 +++++++++++++++-- object-file.c | 6 ++++++ object-store-ll.h | 1 + packfile.c | 13 ++++++++++++- 4 files changed, 34 insertions(+), 3 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 18fe58d6b8..bc4bb89610 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -280,6 +280,7 @@ struct expand_data { off_t disk_size; const char *rest; struct object_id delta_base_oid; + void *content; /* * If mark_query is true, we do not expand anything, but rather @@ -383,7 +384,10 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d assert(data->info.typep); - if (data->type == OBJ_BLOB) { + if (data->content) { + batch_write(opt, data->content, data->size); + FREE_AND_NULL(data->content); + } else if (data->type == OBJ_BLOB) { if (opt->buffer_output) fflush(stdout); if (opt->transform_mode) { @@ -801,9 +805,18 @@ static int batch_objects(struct batch_options *opt) /* * If we are printing out the object, then always fill in the type, * since we will want to decide whether or not to stream. + * + * Likewise, grab the content in the initial request if it's small + * and we're not planning to filter it. */ - if (opt->batch_mode == BATCH_MODE_CONTENTS) + if (opt->batch_mode == BATCH_MODE_CONTENTS) { data.info.typep = &data.type; + if (!opt->transform_mode) { + data.info.sizep = &data.size; + data.info.contentp = &data.content; + data.info.content_limit = big_file_threshold; + } + } if (opt->all_objects) { struct object_cb_data cb; diff --git a/object-file.c b/object-file.c index 065103be3e..1cc29c3c58 100644 --- a/object-file.c +++ b/object-file.c @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r, if (!oi->contentp) break; + if (oi->content_limit && *oi->sizep > oi->content_limit) { + git_inflate_end(&stream); + oi->contentp = NULL; + goto cleanup; + } + *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); if (*oi->contentp) goto cleanup; diff --git a/object-store-ll.h b/object-store-ll.h index c5f2bb2fc2..b71a15f590 100644 --- a/object-store-ll.h +++ b/object-store-ll.h @@ -289,6 +289,7 @@ struct object_info { struct object_id *delta_base_oid; struct strbuf *type_name; void **contentp; + size_t content_limit; /* Response */ enum { diff --git a/packfile.c b/packfile.c index 4028763947..c12a0515b3 100644 --- a/packfile.c +++ b/packfile.c @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, * We always get the representation type, but only convert it to * a "real" type later if the caller is interested. */ - if (oi->contentp) { + if (oi->contentp && !oi->content_limit) { *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep, &type); if (!*oi->contentp) @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p, *oi->sizep = size; } } + + if (oi->contentp) { + if (oi->sizep && *oi->sizep < oi->content_limit) { + *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, + oi->sizep, &type); + if (!*oi->contentp) + type = OBJ_BAD; + } else { + *oi->contentp = NULL; + } + } } if (oi->disk_sizep) { From patchwork Fri Aug 23 22:46:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776098 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE5271494D6 for ; Fri, 23 Aug 2024 22:46:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453221; cv=none; b=Jix5czwYggxFIqG4XD6h+cTKentRlKSJIcAoCsLzwv7RFdcCN9VwQiXbga55MX1MRKm0UZTsZI50/aynXLW+E4hF5D8mRBzYyJPa6JF/WToWfEkYMzvn4Z88J0IOHdjtyCfriQaUJxLdtuS81kjLKF2UI1QsZRlaNHPM+LCopw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453221; c=relaxed/simple; bh=0uSYoHBRtgj/qBAnm1Qlgx5WLjkFSKGVnkr45CCe6uA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VpvFQJdi9O8i2LG7rm/aKGkmfuEV8J82mFCOJ80IcgHJjERp2kOutifzwDQSqgI5lPkbsJ4WeDVPIms89Gc3DK4i16aQu9EQpiu3+hVfPmHUH0bwg46LW7cynqbt1+dpzL6huM5wKsGvDGlm5qRFYLAz0f/Eh1NhcSz0Ebuk7nA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=LZLbjYZK; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="LZLbjYZK" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A1DEE1F518; Fri, 23 Aug 2024 22:46:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453191; bh=0uSYoHBRtgj/qBAnm1Qlgx5WLjkFSKGVnkr45CCe6uA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LZLbjYZKYkMzFi7xL1PPwrdt8Vw0uVtmT/+r8vj3lQiTqVvjYE9Q5k9r7On/KNEvv koJ+b7PmYwRLvcNRmrNGUSzOe0FBDDhVQG1W4Fnm/p8cdCbYPDxH04/FXpzgOvH3dN zzMeysPgl1WZTGG+/U+oG7uegZdoiAqfkBc5yRDg= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Date: Fri, 23 Aug 2024 22:46:23 +0000 Message-ID: <20240823224630.1180772-4-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 object-file.c::loose_object_info() accepts objects matching content_limit exactly, so it follows packfile handling allows slurping objects which match loose object handling and slurp objects with size matching the content_limit exactly. This change is merely for consistency with the majority of existing code and there is no user visible change in nearly all cases. The only exception being the corner case when the object size matches content_limit exactly where users will see a speedup from avoiding an extra lookup. Signed-off-by: Eric Wong --- packfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packfile.c b/packfile.c index c12a0515b3..8ec86d2d69 100644 --- a/packfile.c +++ b/packfile.c @@ -1557,7 +1557,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, } if (oi->contentp) { - if (oi->sizep && *oi->sizep < oi->content_limit) { + if (oi->sizep && *oi->sizep <= oi->content_limit) { *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep, &type); if (!*oi->contentp) From patchwork Fri Aug 23 22:46:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776099 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 122B11494D6 for ; Fri, 23 Aug 2024 22:47:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453228; cv=none; b=b6150xlddYb/hCS+B+D7YRps3UazH6fq9uD9rOfxUwFxLNMQa9Pnd0dUD8EOPtdeuOdrqmoDNkvFH7A1aBQeUAx2dEYoaGCyTqxVj4IfAcEO2+REiiuMHiCAUj1L4RYOuA8LhqYLz+3I+1z0LyK/P4zL02ZNlUDnn9yqDYj3TO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453228; c=relaxed/simple; bh=ruXvqtXfHI9fbe8aylVbfYydgcKGcefYxmqgx+bQKAU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RqkM1nyKD+K/eUhtjzQsjrLClIrgWEQt7wxh6qnXeR7LP/NWfmk/HoadFyd1zPsCtjE/nMGYVi/MRbDPBQ/+G1YcdOVTGvC8qUrbEo50IN0Y87XeTnetxtBt83UC9Ubh1znNoHuvcvM4G+xmqHG79tRbgNQV02qSb7pYuE5ybO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=bJElrAGV; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="bJElrAGV" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id D101A1F51A; Fri, 23 Aug 2024 22:46:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453191; bh=ruXvqtXfHI9fbe8aylVbfYydgcKGcefYxmqgx+bQKAU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bJElrAGV2xlJbGjHXAMk+4yOP9h3vDFXGyiE7iDWaNQ8ccYQD4rnE9Dpgpd42uu4e b4/787v/6pIX93xQKJZMm72980bonk/CJOA4PECv8wS4QIePyRF/nON6f3Rev+wV5b LPOdbMBKuoDaPdVEB7SpVIWK2qHLR8dI5f5LCuOA= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Date: Fri, 23 Aug 2024 22:46:24 +0000 Message-ID: <20240823224630.1180772-5-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We need to check delta_base_cache anyways to fill in the `whence' field in `struct object_info'. Inlining (and getting rid of) cache_or_unpack_entry() makes it easier to only do the hashmap lookup once and avoid a redundant lookup later on. This code reorganization will also make an optimization to use the cache entry directly easier to implement in the next commit. Signed-off-by: Eric Wong --- packfile.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/packfile.c b/packfile.c index 8ec86d2d69..0a90a5ed67 100644 --- a/packfile.c +++ b/packfile.c @@ -1444,23 +1444,6 @@ static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent) free(ent); } -static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p, - off_t base_offset, unsigned long *base_size, - enum object_type *type) -{ - struct delta_base_cache_entry *ent; - - ent = get_delta_base_cache_entry(p, base_offset); - if (!ent) - return unpack_entry(r, p, base_offset, type, base_size); - - if (type) - *type = ent->type; - if (base_size) - *base_size = ent->size; - return xmemdupz(ent->data, ent->size); -} - static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) { free(ent->data); @@ -1521,20 +1504,35 @@ int packed_object_info(struct repository *r, struct packed_git *p, off_t obj_offset, struct object_info *oi) { struct pack_window *w_curs = NULL; - unsigned long size; off_t curpos = obj_offset; enum object_type type; + struct delta_base_cache_entry *ent; /* * We always get the representation type, but only convert it to * a "real" type later if the caller is interested. */ - if (oi->contentp && !oi->content_limit) { - *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep, - &type); + oi->whence = OI_PACKED; + ent = get_delta_base_cache_entry(p, obj_offset); + if (ent) { + oi->whence = OI_DBCACHED; + type = ent->type; + if (oi->sizep) + *oi->sizep = ent->size; + if (oi->contentp) { + if (!oi->content_limit || + ent->size <= oi->content_limit) + *oi->contentp = xmemdupz(ent->data, ent->size); + else + *oi->contentp = NULL; /* caller must stream */ + } + } else if (oi->contentp && !oi->content_limit) { + *oi->contentp = unpack_entry(r, p, obj_offset, &type, + oi->sizep); if (!*oi->contentp) type = OBJ_BAD; } else { + unsigned long size; type = unpack_object_header(p, &w_curs, &curpos, &size); if (oi->sizep) { @@ -1558,8 +1556,8 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->contentp) { if (oi->sizep && *oi->sizep <= oi->content_limit) { - *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, - oi->sizep, &type); + *oi->contentp = unpack_entry(r, p, obj_offset, + &type, oi->sizep); if (!*oi->contentp) type = OBJ_BAD; } else { @@ -1608,10 +1606,6 @@ int packed_object_info(struct repository *r, struct packed_git *p, } else oidclr(oi->delta_base_oid, the_repository->hash_algo); } - - oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED : - OI_PACKED; - out: unuse_pack(&w_curs); return type; From patchwork Fri Aug 23 22:46:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776100 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCE68185B72 for ; Fri, 23 Aug 2024 22:47:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453235; cv=none; b=gOZnS6fZAm3IAO8usBmEWVYBk+djQJvVaiIRIYhLWfgXANl41O/KNo2wp0tGSjbIBvYX5VRJ9oZWl4skEFge4znDbpwlRj7tBJ8Z/ynHOPdMtgIZMtZmS0OvktW7nDJtfLcDNu7havWF7fyStcdSzIf9Q8cmDDOcK+Plp/Zzzj4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453235; c=relaxed/simple; bh=6OVRRV5hJacNEh3iVoby8LVAWThACnoCmwApPeqpLWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bo5aeEJn1e+UzgPZaHrWt7uVlg8V8oadhGvbFml+AuXLl0lNp4OeTaAsZJt5bzEsUnrgtxeAcxoLFAOCETQPwi8WqMV4vGPNi3od/Mcnyee5bp0EcDFG+DrnhfwlWrYMIi/mnfOtnha7AeFj09iftt46nZQbi+av4Rlv7aNB4Ys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=P+Vi2X0W; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="P+Vi2X0W" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 0FF8D1F51B; Fri, 23 Aug 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453192; bh=6OVRRV5hJacNEh3iVoby8LVAWThACnoCmwApPeqpLWI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P+Vi2X0WHpgYO/TRd8NIeKfqP1pGabAkh9gcsb1UKDV5Us55H9OTDdddB22B0Bhux ARHuc6OWATAtlv5QcB2U4LbiSzSQnhrh0Su94u4jXUuW1m8SdRuzk98XzFfipm+3EP CaboVW5tUe+xVOQ/eFuUzS5Trah/sAj6vBbxooGM= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Date: Fri, 23 Aug 2024 22:46:25 +0000 Message-ID: <20240823224630.1180772-6-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For objects already in the delta_base_cache, we can safely use one entry at-a-time directly to avoid the malloc+memcpy+free overhead. For a 1MB delta base object, this eliminates the speed penalty of duplicating large objects into memory and speeds up those 1MB delta base cached content retrievals by roughly 30%. While only 2-7% of objects are delta bases in repos I've looked at, this avoids up to 96MB of duplicated memory in the worst case with the default git config. The new delta_base_cache_lock is a simple single-threaded assertion to ensure cat-file (and similar) is the exclusive user of the delta_base_cache. In other words, we cannot have diff or similar commands using two or more entries directly from the delta base cache. The new lock has nothing to do with parallel access via multiple threads at the moment. Signed-off-by: Eric Wong --- builtin/cat-file.c | 16 +++++++++++++++- object-file.c | 5 +++++ object-store-ll.h | 8 ++++++++ packfile.c | 33 ++++++++++++++++++++++++++++++--- packfile.h | 4 ++++ 5 files changed, 62 insertions(+), 4 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index bc4bb89610..8debcdca3e 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -386,7 +386,20 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d if (data->content) { batch_write(opt, data->content, data->size); - FREE_AND_NULL(data->content); + switch (data->info.whence) { + case OI_CACHED: + /* + * only blame uses OI_CACHED atm, so it's unlikely + * we'll ever hit this path + */ + BUG("TODO OI_CACHED support not done"); + case OI_LOOSE: + case OI_PACKED: + FREE_AND_NULL(data->content); + break; + case OI_DBCACHED: + unlock_delta_base_cache(); + } } else if (data->type == OBJ_BLOB) { if (opt->buffer_output) fflush(stdout); @@ -815,6 +828,7 @@ static int batch_objects(struct batch_options *opt) data.info.sizep = &data.size; data.info.contentp = &data.content; data.info.content_limit = big_file_threshold; + data.info.direct_cache = 1; } } diff --git a/object-file.c b/object-file.c index 1cc29c3c58..19100e823d 100644 --- a/object-file.c +++ b/object-file.c @@ -1586,6 +1586,11 @@ static int do_oid_object_info_extended(struct repository *r, oidclr(oi->delta_base_oid, the_repository->hash_algo); if (oi->type_name) strbuf_addstr(oi->type_name, type_name(co->type)); + /* + * Currently `blame' is the only command which creates + * OI_CACHED, and direct_cache is only used by `cat-file'. + */ + assert(!oi->direct_cache); if (oi->contentp) *oi->contentp = xmemdupz(co->buf, co->size); oi->whence = OI_CACHED; diff --git a/object-store-ll.h b/object-store-ll.h index b71a15f590..669bb93784 100644 --- a/object-store-ll.h +++ b/object-store-ll.h @@ -298,6 +298,14 @@ struct object_info { OI_PACKED, OI_DBCACHED } whence; + + /* + * Set if caller is able to use OI_DBCACHED entries without copying. + * This only applies to OI_DBCACHED entries at the moment, + * not OI_CACHED or any other type of entry. + */ + unsigned direct_cache:1; + union { /* * struct { diff --git a/packfile.c b/packfile.c index 0a90a5ed67..40c6c2e387 100644 --- a/packfile.c +++ b/packfile.c @@ -1362,6 +1362,14 @@ static enum object_type packed_to_object_type(struct repository *r, static struct hashmap delta_base_cache; static size_t delta_base_cached; +/* + * Ensures only a single object is used at-a-time via oi->direct_cache. + * Using two objects directly at once (e.g. diff) would cause corruption + * since populating the cache may invalidate existing entries. + * This lock has nothing to do with parallelism at the moment. + */ +static int delta_base_cache_lock; + static LIST_HEAD(delta_base_cache_lru); struct delta_base_cache_key { @@ -1444,6 +1452,18 @@ static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent) free(ent); } +static void lock_delta_base_cache(void) +{ + delta_base_cache_lock++; + assert(delta_base_cache_lock == 1); +} + +void unlock_delta_base_cache(void) +{ + delta_base_cache_lock--; + assert(delta_base_cache_lock == 0); +} + static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) { free(ent->data); @@ -1453,6 +1473,7 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) void clear_delta_base_cache(void) { struct list_head *lru, *tmp; + assert(!delta_base_cache_lock); list_for_each_safe(lru, tmp, &delta_base_cache_lru) { struct delta_base_cache_entry *entry = list_entry(lru, struct delta_base_cache_entry, lru); @@ -1466,6 +1487,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset, struct delta_base_cache_entry *ent; struct list_head *lru, *tmp; + assert(!delta_base_cache_lock); /* * Check required to avoid redundant entries when more than one thread * is unpacking the same object, in unpack_entry() (since its phases I @@ -1520,11 +1542,16 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->sizep) *oi->sizep = ent->size; if (oi->contentp) { - if (!oi->content_limit || - ent->size <= oi->content_limit) + /* ignore content_limit if avoiding copy from cache */ + if (oi->direct_cache) { + lock_delta_base_cache(); + *oi->contentp = ent->data; + } else if (!oi->content_limit || + ent->size <= oi->content_limit) { *oi->contentp = xmemdupz(ent->data, ent->size); - else + } else { *oi->contentp = NULL; /* caller must stream */ + } } } else if (oi->contentp && !oi->content_limit) { *oi->contentp = unpack_entry(r, p, obj_offset, &type, diff --git a/packfile.h b/packfile.h index eb18ec15db..94941bbe80 100644 --- a/packfile.h +++ b/packfile.h @@ -210,4 +210,8 @@ int is_promisor_object(const struct object_id *oid); int load_idx(const char *path, const unsigned int hashsz, void *idx_map, size_t idx_size, struct packed_git *p); +/* + * release lock acquired via oi->direct_cache + */ +void unlock_delta_base_cache(void); #endif From patchwork Fri Aug 23 22:46:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776101 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98D5A185B72 for ; Fri, 23 Aug 2024 22:47:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453241; cv=none; b=JEa5btVl4RtSS4kaCaArSd3L5YVkoJ5ysDIR6+C4Ad9Qg4IyYWdBkr6tc0VH7DRa3F4EkDfpdKK3/ipOV+92i9RqJTVhCeh1k4trxHHQFqHrMVnmPuZxma7NB6oGdFWuN/ihhE6JJJfMUQNSlbi5wFYl4cIraINZ7W5Ys1Sfjew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453241; c=relaxed/simple; bh=zb2xoGXnokuGuwwvXQvRs+zGHt7IWeuNo1s76QQNlNQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AezbtQznJfFjtBRht/9onTekHSO/YETmENx5TbDyInz6SWigl1HGgsqwJ7xZN42fZ7K/LLT7WO27ThU0YFAxkBA8J8zPrcIsmWPdusnge48pSCDWl8EFhoUK1Xdd0zmdeN9apwXPkDS4piMSlWQLl6oFcz7Gb/zWIjAjaPUvtDY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=aef2FtuY; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="aef2FtuY" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 4056D1F543; Fri, 23 Aug 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453192; bh=zb2xoGXnokuGuwwvXQvRs+zGHt7IWeuNo1s76QQNlNQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aef2FtuYMxaf0+88QeMLBe11+ZUIjykwu8oh9/K0y4QZqYHxM1cOzXh9EMYPhiMav d1XkbDWbagV6MS13R99/M0PcBbXR8B89QqLkp4+MVmcOrFy62Zk8r1WgRzU7h4C1c1 yfUOlSWa7++u0T0Y5LYDV6wrjrSuiitr+5NecrJk= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Date: Fri, 23 Aug 2024 22:46:26 +0000 Message-ID: <20240823224630.1180772-7-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For entries in the delta base cache, packed_to_object_type calls can be omitted. This prepares us to bypass content_limit for non-blob types in the following commit. Signed-off-by: Eric Wong --- packfile.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/packfile.c b/packfile.c index 40c6c2e387..94d20034e4 100644 --- a/packfile.c +++ b/packfile.c @@ -1527,7 +1527,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, { struct pack_window *w_curs = NULL; off_t curpos = obj_offset; - enum object_type type; + enum object_type type, final_type = OBJ_BAD; struct delta_base_cache_entry *ent; /* @@ -1538,7 +1538,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, ent = get_delta_base_cache_entry(p, obj_offset); if (ent) { oi->whence = OI_DBCACHED; - type = ent->type; + final_type = type = ent->type; if (oi->sizep) *oi->sizep = ent->size; if (oi->contentp) { @@ -1556,6 +1556,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, } else if (oi->contentp && !oi->content_limit) { *oi->contentp = unpack_entry(r, p, obj_offset, &type, oi->sizep); + final_type = type; if (!*oi->contentp) type = OBJ_BAD; } else { @@ -1585,6 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->sizep && *oi->sizep <= oi->content_limit) { *oi->contentp = unpack_entry(r, p, obj_offset, &type, oi->sizep); + final_type = type; if (!*oi->contentp) type = OBJ_BAD; } else { @@ -1606,17 +1608,17 @@ int packed_object_info(struct repository *r, struct packed_git *p, } if (oi->typep || oi->type_name) { - enum object_type ptot; - ptot = packed_to_object_type(r, p, obj_offset, - type, &w_curs, curpos); + if (final_type < 0) + final_type = packed_to_object_type(r, p, obj_offset, + type, &w_curs, curpos); if (oi->typep) - *oi->typep = ptot; + *oi->typep = final_type; if (oi->type_name) { - const char *tn = type_name(ptot); + const char *tn = type_name(final_type); if (tn) strbuf_addstr(oi->type_name, tn); } - if (ptot < 0) { + if (final_type < 0) { type = OBJ_BAD; goto out; } From patchwork Fri Aug 23 22:46:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776102 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06719185B72 for ; Fri, 23 Aug 2024 22:47:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453249; cv=none; b=rE2yB9naMcn8Bd/F0wkmjIMOZNZvv0Otibu2ZVgK16KJzQXJXrDK+kXPeZ/hSpl4rrzBE0IFIJu2CCzzWlpGx3rwdsUdEhMpHLgy+5JoqGrxknLtY2+taCAVXnrsogVULhpR7pWE6N0xyV8SmwD2Gk4GY/OhZwNHVsAEJtwzjn8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453249; c=relaxed/simple; bh=7VXnokmq29XlxhRRe8WntaEmwqhoE7gjQSeuo3pAJKw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PXaKzBS0Bbxw7OqivEp91S2OBSoLyAATaDYyetivh11dc0LzV0P4jFA2SYMjqKFOqyR5m+29IFhM7nQ/Zrl1Yg3SHTHZAGxMEjPtYuJFQPCu6IBM8bzPiUwya955jj9nLnYiiGsTM5PnVmNiv18tlNa99bZvHnr+TXSJTBRuhz0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=4b4hqEaK; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="4b4hqEaK" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 737141F549; Fri, 23 Aug 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453192; bh=7VXnokmq29XlxhRRe8WntaEmwqhoE7gjQSeuo3pAJKw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=4b4hqEaK9wodkTtKL/hI89TWIvDEYpWYrphX+qYpwc21juJSxAeMnzypRk37MCF6C QHSebCXnaaJel/GN9w0y0jUwqWVbxHSoDdSVTI/XuvVqjYs0jIStQa8NW+S3/b8z+Q W0jB7GHsa+w2JmeBcklwsjtnrUhIH37ALmX/ypUo= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 07/10] object_info: content_limit only applies to blobs Date: Fri, 23 Aug 2024 22:46:27 +0000 Message-ID: <20240823224630.1180772-8-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Streaming is only supported for blobs, so we'd end up having to slurp all the other object types into memory regardless. So slurp all the non-blob types up front when requesting content since we always handle them in-core, anyways. Signed-off-by: Eric Wong --- builtin/cat-file.c | 21 +++++++++++++++++++-- object-file.c | 3 ++- packfile.c | 8 +++++--- t/t1006-cat-file.sh | 19 ++++++++++++++++--- 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 8debcdca3e..2aedd62324 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -385,7 +385,24 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d assert(data->info.typep); if (data->content) { - batch_write(opt, data->content, data->size); + void *content = data->content; + unsigned long size = data->size; + + data->content = NULL; + if (use_mailmap && (data->type == OBJ_COMMIT || + data->type == OBJ_TAG)) { + size_t s = size; + + if (data->info.whence == OI_DBCACHED) { + content = xmemdupz(content, s); + data->info.whence = OI_PACKED; + } + + content = replace_idents_using_mailmap(content, &s); + size = cast_size_t_to_ulong(s); + } + + batch_write(opt, content, size); switch (data->info.whence) { case OI_CACHED: /* @@ -395,7 +412,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d BUG("TODO OI_CACHED support not done"); case OI_LOOSE: case OI_PACKED: - FREE_AND_NULL(data->content); + free(content); break; case OI_DBCACHED: unlock_delta_base_cache(); diff --git a/object-file.c b/object-file.c index 19100e823d..59842cfe1b 100644 --- a/object-file.c +++ b/object-file.c @@ -1492,7 +1492,8 @@ static int loose_object_info(struct repository *r, if (!oi->contentp) break; - if (oi->content_limit && *oi->sizep > oi->content_limit) { + if (oi->content_limit && *oi->typep == OBJ_BLOB && + *oi->sizep > oi->content_limit) { git_inflate_end(&stream); oi->contentp = NULL; goto cleanup; diff --git a/packfile.c b/packfile.c index 94d20034e4..a592e0b32c 100644 --- a/packfile.c +++ b/packfile.c @@ -1546,7 +1546,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->direct_cache) { lock_delta_base_cache(); *oi->contentp = ent->data; - } else if (!oi->content_limit || + } else if (type != OBJ_BLOB || !oi->content_limit || ent->size <= oi->content_limit) { *oi->contentp = xmemdupz(ent->data, ent->size); } else { @@ -1583,10 +1583,12 @@ int packed_object_info(struct repository *r, struct packed_git *p, } if (oi->contentp) { - if (oi->sizep && *oi->sizep <= oi->content_limit) { + final_type = packed_to_object_type(r, p, obj_offset, + type, &w_curs, curpos); + if (final_type != OBJ_BLOB || (oi->sizep && + *oi->sizep <= oi->content_limit)) { *oi->contentp = unpack_entry(r, p, obj_offset, &type, oi->sizep); - final_type = type; if (!*oi->contentp) type = OBJ_BAD; } else { diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index ff9bf213aa..841e8567e9 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -622,20 +622,33 @@ test_expect_success 'confirm that neither loose blob is a delta' ' test_cmp expect actual ' +test_expect_success 'setup delta base tests' ' + foo="$(git rev-parse HEAD:foo)" && + foo_plus="$(git rev-parse HEAD:foo-plus)" && + git repack -ad +' + # To avoid relying too much on the current delta heuristics, # we will check only that one of the two objects is a delta # against the other, but not the order. We can do so by just # asking for the base of both, and checking whether either # oid appears in the output. test_expect_success '%(deltabase) reports packed delta bases' ' - git repack -ad && git cat-file --batch-check="%(deltabase)" actual && { - grep "$(git rev-parse HEAD:foo)" actual || - grep "$(git rev-parse HEAD:foo-plus)" actual + grep "$foo" actual || grep "$foo_plus" actual } ' +test_expect_success 'delta base direct cache use succeeds w/o asserting' ' + commands="info $foo +info $foo_plus +contents $foo_plus +contents $foo" && + echo "$commands" >in && + git cat-file --batch-command out +' + test_expect_success 'setup bogus data' ' bogus_short_type="bogus" && bogus_short_content="bogus" && From patchwork Fri Aug 23 22:46:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776103 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A627A185B72 for ; Fri, 23 Aug 2024 22:47:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453256; cv=none; b=r2E/7znHXi6uZm+6Y6wBZdvDPuAAJxDP7kqJ0R8fNoL5IM+ab3bIrrkfSk1PS0fVDmMxDY4cHNRor2XKhcZw+go8Wm3OU7B4a1oGVTun4EM1OuxP/ijaBQrVn4gjs69XTDQ1O/Omy8/yRKT64HhKXths8MPy8QM1VoMjnQtQdaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453256; c=relaxed/simple; bh=nGH6j8wnUTCvWDYLOlKwCPIY7BvMqe0mk8IAjgfDtFE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bl0FgAI7G0K0O62ORP/qPmcSKDWSFouIu5YebqBt1zj/Upgz5CB8LUnaQ7Tuikuu05TN13uEXGXtKdf89p61Bj5njQuemMdt2CffcjYxlMSbXtNY1dog9y0yts9rlrRkDFVaEjKHgR9AOZtYHtLV7C2WGm0qK91JSz/kQ44GoD4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=BFiE2oHP; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="BFiE2oHP" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A68CD1F566; Fri, 23 Aug 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453192; bh=nGH6j8wnUTCvWDYLOlKwCPIY7BvMqe0mk8IAjgfDtFE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BFiE2oHP0pHTXmdZX7K19rFI8rcvlcZxP0l9bzmclCwc+PED/sCT4sbgQ1qJseQzP msEomdY0tubgomvTD4KQc69VoPc6/8mCuwXw6mZPnBh1+obCL4ITQxr6Wo2DRbhizx R6pKS+J+clYvfOhNqOELGyf7FsKBW0u9hp2WJnws= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 08/10] cat-file: batch-command uses content_limit Date: Fri, 23 Aug 2024 22:46:28 +0000 Message-ID: <20240823224630.1180772-9-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As with the normal `--batch' mode, we can use the content_limit round trip optimization to avoid a redundant lookup. The only tricky thing here is we need to enable/disable setting the object_info.contentp field depending on whether we hit an `info' or `contents' command. t1006 is updated to ensure we can switch back and forth between `info' and `contents' commands without problems. Signed-off-by: Eric Wong --- builtin/cat-file.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 2aedd62324..067cdbdbf9 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -417,7 +417,8 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d case OI_DBCACHED: unlock_delta_base_cache(); } - } else if (data->type == OBJ_BLOB) { + } else { + assert(data->type == OBJ_BLOB); if (opt->buffer_output) fflush(stdout); if (opt->transform_mode) { @@ -452,30 +453,6 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d stream_blob(oid); } } - else { - enum object_type type; - unsigned long size; - void *contents; - - contents = repo_read_object_file(the_repository, oid, &type, - &size); - if (!contents) - die("object %s disappeared", oid_to_hex(oid)); - - if (use_mailmap) { - size_t s = size; - contents = replace_idents_using_mailmap(contents, &s); - size = cast_size_t_to_ulong(s); - } - - if (type != data->type) - die("object %s changed type!?", oid_to_hex(oid)); - if (data->info.sizep && size != data->size && !use_mailmap) - die("object %s changed size!?", oid_to_hex(oid)); - - batch_write(opt, contents, size); - free(contents); - } } static void print_default_format(struct strbuf *scratch, struct expand_data *data, @@ -689,6 +666,7 @@ static void parse_cmd_contents(struct batch_options *opt, struct expand_data *data) { opt->batch_mode = BATCH_MODE_CONTENTS; + data->info.contentp = &data->content; batch_one_object(line, output, opt, data); } @@ -698,6 +676,7 @@ static void parse_cmd_info(struct batch_options *opt, struct expand_data *data) { opt->batch_mode = BATCH_MODE_INFO; + data->info.contentp = NULL; batch_one_object(line, output, opt, data); } @@ -839,7 +818,8 @@ static int batch_objects(struct batch_options *opt) * Likewise, grab the content in the initial request if it's small * and we're not planning to filter it. */ - if (opt->batch_mode == BATCH_MODE_CONTENTS) { + if ((opt->batch_mode == BATCH_MODE_CONTENTS) || + (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH)) { data.info.typep = &data.type; if (!opt->transform_mode) { data.info.sizep = &data.size; From patchwork Fri Aug 23 22:46:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776105 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D86A9194AD5 for ; Fri, 23 Aug 2024 22:47:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453263; cv=none; b=DJt+/DYYqOHnhrFyTzYINbzwh4tbU7ZbmcgCKe+UDb74xFTlL31BRyeQTVg2NrTEZ5XgT7AZk+2AdhMpxIVfkW8NNwLylf0bY25me0gJ9Y707cuSZRs64NPX1v7MgSlxthsrBG38msOj4r5thsAgLlj0TqsEW968PEY9Wese8m0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453263; c=relaxed/simple; bh=SNOgkbi/GL4g3IV00zCRF6AWzeuhauu4CMmP6IucLO0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KaZuoP6MsKMznp9xs7ezBEYHdT2NjEE3UsOCLEf+nukT9nLg5b29kruCuc+lvAKYTej/74K84Ia1l3G/0iBwqRO0B/OpSJDczFtkbaBSZzP0ye9tsrL4ZN/Ed5ScZVF5kVChwNpKosF+HPPDSH8Mf8P3bt8oDy9kVddp8mm8iLk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=Jx3uJ96H; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="Jx3uJ96H" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id DA3441F569; Fri, 23 Aug 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453192; bh=SNOgkbi/GL4g3IV00zCRF6AWzeuhauu4CMmP6IucLO0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Jx3uJ96HmD3IcoD747yYBC1G74/ofbB4dtziY02JMAqq5vaPFYVFbwIrlMZuKyEM6 ElycSvQSjVdUZIoGGUsIj8c90bdeO3wvhv8jbqMioeolj8V7TCjnzEKAqyMurn5QQL HRQEiJ7hxkk0b1hskg7oYqUv4KQGk7nGs4NOLWUQ= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 09/10] cat-file: batch_write: use size_t for length Date: Fri, 23 Aug 2024 22:46:29 +0000 Message-ID: <20240823224630.1180772-10-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 fwrite(3) and write(2), and all of our wrappers for them use size_t while object size is `unsigned long', so there's no excuse to use a potentially smaller representation. Signed-off-by: Eric Wong --- builtin/cat-file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 067cdbdbf9..bf81054662 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -369,7 +369,7 @@ static void expand_format(struct strbuf *sb, const char *start, } } -static void batch_write(struct batch_options *opt, const void *data, int len) +static void batch_write(struct batch_options *opt, const void *data, size_t len) { if (opt->buffer_output) { if (fwrite(data, 1, len, stdout) != len) From patchwork Fri Aug 23 22:46:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 13776106 Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F169192B6D for ; Fri, 23 Aug 2024 22:47:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453270; cv=none; b=UljxHAhpM3/u05GyxoFyR75ZqVv0IB2iFy+pi89s8pQU5cO6g6zI2jmVpydCdqB56hCy2ZF7pbR6ddjQAqf4rjG41JLRbM3SGqmb3ZJEh0XfFCXh52ptI3DQpGKAFAN14RIhTxpOyqtQ/XFpOSSoUF8HW5ICg1qq9CYcnsM8Q3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724453270; c=relaxed/simple; bh=QDHxFk6B61QLOaBhQf04GpmTq84ETy545BL6MpebPB0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q2XRnVsONurAWt89KISJMrI9Hstpz2EnZtyZdf0HGYdk3KONYYpGLvwZwZ/CC8DEs/ohS260DN/SBYcqcE2+u4QthHafaSsOEG9gR6sXvM77WyfZNKRTne7HFgozH22OKLzai567kz3AfjY2TJhnRUndtXQABAQ8AFGoKY0aRyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org; spf=pass smtp.mailfrom=80x24.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b=rW+hePpn; arc=none smtp.client-ip=173.255.242.215 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=80x24.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=80x24.org header.i=@80x24.org header.b="rW+hePpn" Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 17B841F56A; Fri, 23 Aug 2024 22:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1724453193; bh=QDHxFk6B61QLOaBhQf04GpmTq84ETy545BL6MpebPB0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rW+hePpn8z0gAjLEFDq+GthGXrYAjvFdmLZTaChmK7+QfwYxx2bO94aAmYlKQKUPv vq1IrqwQGxFfKMy4op3uUHiobspFOobkwAAbDdj71tR1jPDMDqnHDbwXyJGm/RToNL hwzkLJ6h7Dp0gvIgV2XE5UgR6jYMSlXvFLh4U+Ls= From: Eric Wong To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt Subject: [PATCH v2 10/10] cat-file: use writev(2) if available Date: Fri, 23 Aug 2024 22:46:30 +0000 Message-ID: <20240823224630.1180772-11-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Using writev here is 20-40% faster than three write syscalls in succession for smaller (1-10k) objects in the delta base cache. This advantage decreases as object sizes approach pipe size (64k on Linux). writev reduces wakeups and syscalls on the read side as well: each write(2) syscall may trigger one or more corresponding read(2) syscalls in the reader. Attempting atomicity in the writer via writev also reduces the likelyhood of non-blocking readers failing with EAGAIN and having to call poll||select before attempting to read again. Unfortunately, this turns into a small (1-3%) slowdown for gigantic objects of a megabyte or more even with after increasing pipe size to 1MB via the F_SETPIPE_SZ fcntl(2) op. This slowdown is acceptable to me since the vast majority of objects are 64K or less for projects I've looked at. Relying on stdio buffering and fflush(3) after each response was considered for users without --buffer, but historically cat-file defaults to being compatible with non-blocking stdout and able to poll(2) after hitting EAGAIN on write(2). Using stdio on files with the O_NONBLOCK flag is (AFAIK) unspecified and likely subject to portability problems and thus avoided. Signed-off-by: Eric Wong --- Makefile | 3 +++ builtin/cat-file.c | 62 ++++++++++++++++++++++++++++++------------- config.mak.uname | 5 ++++ git-compat-util.h | 10 +++++++ wrapper.c | 18 +++++++++++++ wrapper.h | 1 + write-or-die.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ write-or-die.h | 2 ++ 8 files changed, 149 insertions(+), 18 deletions(-) diff --git a/Makefile b/Makefile index 3eab701b10..c7a062de00 100644 --- a/Makefile +++ b/Makefile @@ -1844,6 +1844,9 @@ ifdef NO_PREAD COMPAT_CFLAGS += -DNO_PREAD COMPAT_OBJS += compat/pread.o endif +ifdef HAVE_WRITEV + COMPAT_CFLAGS += -DHAVE_WRITEV +endif ifdef NO_FAST_WORKING_DIRECTORY BASIC_CFLAGS += -DNO_FAST_WORKING_DIRECTORY endif diff --git a/builtin/cat-file.c b/builtin/cat-file.c index bf81054662..016b7d26a7 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -280,7 +280,7 @@ struct expand_data { off_t disk_size; const char *rest; struct object_id delta_base_oid; - void *content; + struct git_iovec iov[3]; /* * If mark_query is true, we do not expand anything, but rather @@ -378,17 +378,42 @@ static void batch_write(struct batch_options *opt, const void *data, size_t len) write_or_die(1, data, len); } -static void print_object_or_die(struct batch_options *opt, struct expand_data *data) +static void batch_writev(struct batch_options *opt, struct expand_data *data, + const struct strbuf *hdr, size_t size) +{ + data->iov[0].iov_base = hdr->buf; + data->iov[0].iov_len = hdr->len; + data->iov[1].iov_len = size; + + /* + * Copying a (8|16)-byte iovec for a single byte is gross, but my + * attempt to stuff output_delim into the trailing NUL byte of + * iov[1].iov_base (and restoring it after writev(2) for the + * OI_DBCACHED case) to drop iovcnt from 3->2 wasn't faster. + */ + data->iov[2].iov_base = &opt->output_delim; + data->iov[2].iov_len = 1; + + if (opt->buffer_output) + fwritev_or_die(stdout, data->iov, 3); + else + writev_or_die(1, data->iov, 3); + + /* writev_or_die may move iov[1].iov_base, so it's invalid */ + data->iov[1].iov_base = NULL; +} + +static void print_object_or_die(struct batch_options *opt, + struct expand_data *data, struct strbuf *hdr) { const struct object_id *oid = &data->oid; assert(data->info.typep); - if (data->content) { - void *content = data->content; + if (data->iov[1].iov_base) { + void *content = data->iov[1].iov_base; unsigned long size = data->size; - data->content = NULL; if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) { size_t s = size; @@ -399,10 +424,10 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } content = replace_idents_using_mailmap(content, &s); + data->iov[1].iov_base = content; size = cast_size_t_to_ulong(s); } - - batch_write(opt, content, size); + batch_writev(opt, data, hdr, size); switch (data->info.whence) { case OI_CACHED: /* @@ -419,8 +444,6 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } else { assert(data->type == OBJ_BLOB); - if (opt->buffer_output) - fflush(stdout); if (opt->transform_mode) { char *contents; unsigned long size; @@ -447,10 +470,15 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d oid_to_hex(oid), data->rest); } else BUG("invalid transform_mode: %c", opt->transform_mode); - batch_write(opt, contents, size); + data->iov[1].iov_base = contents; + batch_writev(opt, data, hdr, size); free(contents); } else { + batch_write(opt, hdr->buf, hdr->len); + if (opt->buffer_output) + fflush(stdout); stream_blob(oid); + batch_write(opt, &opt->output_delim, 1); } } } @@ -519,12 +547,10 @@ static void batch_object_write(const char *obj_name, strbuf_addch(scratch, opt->output_delim); } - batch_write(opt, scratch->buf, scratch->len); - - if (opt->batch_mode == BATCH_MODE_CONTENTS) { - print_object_or_die(opt, data); - batch_write(opt, &opt->output_delim, 1); - } + if (opt->batch_mode == BATCH_MODE_CONTENTS) + print_object_or_die(opt, data, scratch); + else + batch_write(opt, scratch->buf, scratch->len); } static void batch_one_object(const char *obj_name, @@ -666,7 +692,7 @@ static void parse_cmd_contents(struct batch_options *opt, struct expand_data *data) { opt->batch_mode = BATCH_MODE_CONTENTS; - data->info.contentp = &data->content; + data->info.contentp = &data->iov[1].iov_base; batch_one_object(line, output, opt, data); } @@ -823,7 +849,7 @@ static int batch_objects(struct batch_options *opt) data.info.typep = &data.type; if (!opt->transform_mode) { data.info.sizep = &data.size; - data.info.contentp = &data.content; + data.info.contentp = &data.iov[1].iov_base; data.info.content_limit = big_file_threshold; data.info.direct_cache = 1; } diff --git a/config.mak.uname b/config.mak.uname index 85d63821ec..8ce8776657 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -69,6 +69,7 @@ ifeq ($(uname_S),Linux) BASIC_CFLAGS += -std=c99 endif LINK_FUZZ_PROGRAMS = YesPlease + HAVE_WRITEV = YesPlease endif ifeq ($(uname_S),GNU/kFreeBSD) HAVE_ALLOCA_H = YesPlease @@ -77,6 +78,7 @@ ifeq ($(uname_S),GNU/kFreeBSD) DIR_HAS_BSD_GROUP_SEMANTICS = YesPlease LIBC_CONTAINS_LIBINTL = YesPlease FREAD_READS_DIRECTORIES = UnfortunatelyYes + HAVE_WRITEV = YesPlease endif ifeq ($(uname_S),UnixWare) CC = cc @@ -292,6 +294,7 @@ ifeq ($(uname_S),FreeBSD) PAGER_ENV = LESS=FRX LV=-c MORE=FRX FREAD_READS_DIRECTORIES = UnfortunatelyYes FILENO_IS_A_MACRO = UnfortunatelyYes + HAVE_WRITEV = YesPlease endif ifeq ($(uname_S),OpenBSD) NO_STRCASESTR = YesPlease @@ -307,6 +310,7 @@ ifeq ($(uname_S),OpenBSD) PROCFS_EXECUTABLE_PATH = /proc/curproc/file FREAD_READS_DIRECTORIES = UnfortunatelyYes FILENO_IS_A_MACRO = UnfortunatelyYes + HAVE_WRITEV = YesPlease endif ifeq ($(uname_S),MirBSD) NO_STRCASESTR = YesPlease @@ -329,6 +333,7 @@ ifeq ($(uname_S),NetBSD) HAVE_BSD_KERN_PROC_SYSCTL = YesPlease CSPRNG_METHOD = arc4random PROCFS_EXECUTABLE_PATH = /proc/curproc/exe + HAVE_WRITEV = YesPlease endif ifeq ($(uname_S),AIX) DEFAULT_PAGER = more diff --git a/git-compat-util.h b/git-compat-util.h index ca7678a379..afde8abc99 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -388,6 +388,16 @@ static inline int git_setitimer(int which UNUSED, #define setitimer(which,value,ovalue) git_setitimer(which,value,ovalue) #endif +#ifdef HAVE_WRITEV +#include +#define git_iovec iovec +#else /* !HAVE_WRITEV */ +struct git_iovec { + void *iov_base; + size_t iov_len; +}; +#endif /* !HAVE_WRITEV */ + #ifndef NO_LIBGEN_H #include #else diff --git a/wrapper.c b/wrapper.c index f87d90bf57..066c772145 100644 --- a/wrapper.c +++ b/wrapper.c @@ -262,6 +262,24 @@ ssize_t xwrite(int fd, const void *buf, size_t len) } } +#ifdef HAVE_WRITEV +ssize_t xwritev(int fd, const struct iovec *iov, int iovcnt) +{ + while (1) { + ssize_t nr = writev(fd, iov, iovcnt); + + if (nr < 0) { + if (errno == EINTR) + continue; + if (handle_nonblock(fd, POLLOUT, errno)) + continue; + } + + return nr; + } +} +#endif /* !HAVE_WRITEV */ + /* * xpread() is the same as pread(), but it automatically restarts pread() * operations with a recoverable error (EAGAIN and EINTR). xpread() DOES diff --git a/wrapper.h b/wrapper.h index 1b2b047ea0..3d33c63d4f 100644 --- a/wrapper.h +++ b/wrapper.h @@ -16,6 +16,7 @@ void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_ int xopen(const char *path, int flags, ...); ssize_t xread(int fd, void *buf, size_t len); ssize_t xwrite(int fd, const void *buf, size_t len); +ssize_t xwritev(int fd, const struct git_iovec *, int iovcnt); ssize_t xpread(int fd, void *buf, size_t len, off_t offset); int xdup(int fd); FILE *xfopen(const char *path, const char *mode); diff --git a/write-or-die.c b/write-or-die.c index 01a9a51fa2..227b051165 100644 --- a/write-or-die.c +++ b/write-or-die.c @@ -107,3 +107,69 @@ void fflush_or_die(FILE *f) if (fflush(f)) die_errno("fflush error"); } + +void fwritev_or_die(FILE *fp, const struct git_iovec *iov, int iovcnt) +{ + int i; + + for (i = 0; i < iovcnt; i++) { + size_t n = iov[i].iov_len; + + if (fwrite(iov[i].iov_base, 1, n, fp) != n) + die_errno("unable to write to FD=%d", fileno(fp)); + } +} + +/* + * note: we don't care about atomicity from writev(2) right now. + * The goal is to avoid allocations+copies in the writer and + * reduce wakeups+syscalls in the reader. + * n.b. @iov is not const since we modify it to avoid allocating + * on partial write. + */ +#ifdef HAVE_WRITEV +void writev_or_die(int fd, struct git_iovec *iov, int iovcnt) +{ + int i; + + while (iovcnt > 0) { + ssize_t n = xwritev(fd, iov, iovcnt); + + /* EINVAL happens when sum of iov_len exceeds SSIZE_MAX */ + if (n < 0 && errno == EINVAL) + n = xwrite(fd, iov[0].iov_base, iov[0].iov_len); + if (n < 0) { + check_pipe(errno); + die_errno("writev error"); + } else if (!n) { + errno = ENOSPC; + die_errno("writev_error"); + } + /* skip fully written iovs, retry from the first partial iov */ + for (i = 0; i < iovcnt; i++) { + if (n >= iov[i].iov_len) { + n -= iov[i].iov_len; + } else { + iov[i].iov_len -= n; + iov[i].iov_base = (char *)iov[i].iov_base + n; + break; + } + } + iovcnt -= i; + iov += i; + } +} +#else /* !HAVE_WRITEV */ + +/* + * n.b. don't use stdio fwrite here even if it's faster, @fd may be + * non-blocking and stdio isn't equipped for EAGAIN + */ +void writev_or_die(int fd, struct git_iovec *iov, int iovcnt) +{ + int i; + + for (i = 0; i < iovcnt; i++) + write_or_die(fd, iov[i].iov_base, iov[i].iov_len); +} +#endif /* !HAVE_WRITEV */ diff --git a/write-or-die.h b/write-or-die.h index 65a5c42a47..20abec211c 100644 --- a/write-or-die.h +++ b/write-or-die.h @@ -7,6 +7,8 @@ void fprintf_or_die(FILE *, const char *fmt, ...); void fwrite_or_die(FILE *f, const void *buf, size_t count); void fflush_or_die(FILE *f); void write_or_die(int fd, const void *buf, size_t count); +void writev_or_die(int fd, struct git_iovec *, int iovcnt); +void fwritev_or_die(FILE *, const struct git_iovec *, int iovcnt); /* * These values are used to help identify parts of a repository to fsync.