From patchwork Mon Apr 8 12:16:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13621044 Received: from fout2-smtp.messagingengine.com (fout2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B68B1EB39 for ; Mon, 8 Apr 2024 12:17:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712578629; cv=none; b=Vua8o001u0rk+xGhVF6KIm1Ss/xhnpDTNATFaGvD8fZenyUPbZOwg7EUH3ni9Avy6O0O7pz5Ey60nVX5MRFL7ngfVQsXUFTc67NCy9VTYX22WihlQpw45fT+EXDiI2SiqLTAJlpplqKpLBfnnY2ubD6ae13CyqYQGTYoQ4z/Qsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712578629; c=relaxed/simple; bh=r2uKbPDbZBw6e13hvfMb2qXCcMsdh/ftFIkxDUrQUy4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WW9c2Y+I14QLWSFvgo7NDs7Lp9/NwaUGS/fS1PnIG+Wa8Ua8gSgDlSA6PoU+mPPy0y+EYGRl+IiF3uephIPf90gG1NdIdSy1/n9gI1vnwhDN5VrljUNhRD3csMGdMMSvy0/FZGeRRWEyIPoLx3CvNlU1oaKylpXlL2O1DoMeH3w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im; spf=pass smtp.mailfrom=pks.im; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b=A0OKx6Hi; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=njG0pJ4p; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="A0OKx6Hi"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="njG0pJ4p" Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfout.nyi.internal (Postfix) with ESMTP id 0B65F13800BC; Mon, 8 Apr 2024 08:16:58 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute7.internal (MEProxy); Mon, 08 Apr 2024 08:16:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1712578618; x=1712665018; bh=/Q08PZGlQu Znk3wJICWdNzvnFyAcIJu196rXv+KL7F0=; b=A0OKx6Hi6RjqJE97vaKXK8NSaD isD8Yp2wf2pGhRV5ObBgkQBQ0mxdb8EpYfZuoGZRaglRWh1mCMWZDHNSx6dZp6fu kQI2NnDLxKNEe81Xnod1psOixq01LrBwVt2LrDDPu1L4KTbKZ4ZFz5tpopue+y2t MATmxpBtbLll6G6kXS/i8+WlAo8uDGC0vueW5Ec0L6yZ5TZfXfYqB2SXIs9N4fXI RGulZcDm1VRiwKBpd/MQT8gbHTj1sHxnbfY/bqCFPp77wWg3CrKI1zKFtyeH1671 Ukat0rA6n2VApU7HoEiNkXVIp+JzWUsiXmCQ9Oq83a0zWYTs0/Ym5CB8OzJQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712578618; x=1712665018; bh=/Q08PZGlQuZnk3wJICWdNzvnFyAc IJu196rXv+KL7F0=; b=njG0pJ4pt68nzhFCqPJtv7EjwZQb4j0H9CWMfekctVQb If5oyYNcwtFAYHhbP6Ofe5GjADSbKZ5PKLXseC6n+SrAzrnF9ak7NWBlqejFtvj+ m8zKPvubTPs2DORmdQJaECwqBwL0ZW7Oh5qzczTLvL8p6epuZwIWFNeAOPQBMTv0 dhpb/2dpDFmJXbxoPeLLH+Jo4FiltAQMzyuh7iJfGmw4K+oYvKZ5H8XEUVJ9Juex ZkubkjQLp/C5/KG8TKtuXWZqeuoqnj29ah0Q4k+1CuGMK1fleHaDujl+xa9YhveI Bjs+rLQicIj4KnQ69kyT94miokqnGTZL3z+H5ykQSw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudegiedghedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtvdenucfhrhhomheprfgrthhr ihgtkhcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvg hrnhepueektdevtdffveeljeetgfehheeigeekleduvdeffeeghefgledttdehjeelffet necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhsse hpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 8 Apr 2024 08:16:56 -0400 (EDT) Received: by localhost (OpenSMTPD) with ESMTPSA id faeb18ad (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 8 Apr 2024 12:16:51 +0000 (UTC) Date: Mon, 8 Apr 2024 14:16:54 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Karthik Nayak , Justin Tobler Subject: [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The reftable backend stores reflog entries in a compressed format and thus needs to uncompress blocks before one can read records from it. For each reflog block we thus have to allocate an array that we can decompress the block contents into. This block is being discarded whenever the table iterator moves to the next block. Consequently, we reallocate a new array on every block, which is quite wasteful. Refactor the code to reuse the uncompressed block data when moving the block reader to a new block. This significantly reduces the number of allocations when iterating through many compressed blocks. The following measurements are done with `git reflog list` when listing 100k reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated Signed-off-by: Patrick Steinhardt --- reftable/block.c | 14 ++++++-------- reftable/block.h | 4 ++++ reftable/reader.c | 27 ++++++++++++++++----------- 3 files changed, 26 insertions(+), 19 deletions(-) diff --git a/reftable/block.c b/reftable/block.c index 0c4e71eae3..9460273290 100644 --- a/reftable/block.c +++ b/reftable/block.c @@ -186,7 +186,6 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block, uint16_t restart_count = 0; uint32_t restart_start = 0; uint8_t *restart_bytes = NULL; - uint8_t *uncompressed = NULL; reftable_block_done(&br->block); @@ -202,14 +201,15 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block, uLongf src_len = block->len - block_header_skip; /* Log blocks specify the *uncompressed* size in their header. */ - REFTABLE_ALLOC_ARRAY(uncompressed, sz); + REFTABLE_ALLOC_GROW(br->uncompressed_data, sz, + br->uncompressed_cap); /* Copy over the block header verbatim. It's not compressed. */ - memcpy(uncompressed, block->data, block_header_skip); + memcpy(br->uncompressed_data, block->data, block_header_skip); /* Uncompress */ if (Z_OK != - uncompress2(uncompressed + block_header_skip, &dst_len, + uncompress2(br->uncompressed_data + block_header_skip, &dst_len, block->data + block_header_skip, &src_len)) { err = REFTABLE_ZLIB_ERROR; goto done; @@ -222,10 +222,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block, /* We're done with the input data. */ reftable_block_done(block); - block->data = uncompressed; - uncompressed = NULL; + block->data = br->uncompressed_data; block->len = sz; - block->source = malloc_block_source(); full_block_size = src_len + block_header_skip; } else if (full_block_size == 0) { full_block_size = sz; @@ -254,12 +252,12 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block, br->restart_bytes = restart_bytes; done: - reftable_free(uncompressed); return err; } void block_reader_release(struct block_reader *br) { + reftable_free(br->uncompressed_data); reftable_block_done(&br->block); } diff --git a/reftable/block.h b/reftable/block.h index d733d45ee0..12414eb642 100644 --- a/reftable/block.h +++ b/reftable/block.h @@ -66,6 +66,10 @@ struct block_reader { struct reftable_block block; int hash_size; + /* Uncompressed data for log entries. */ + unsigned char *uncompressed_data; + size_t uncompressed_cap; + /* size of the data, excluding restart data. */ uint32_t block_len; uint8_t *restart_bytes; diff --git a/reftable/reader.c b/reftable/reader.c index dd4de294a1..aacd5f1337 100644 --- a/reftable/reader.c +++ b/reftable/reader.c @@ -459,6 +459,8 @@ static int reader_seek_linear(struct table_iter *ti, * we would not do a linear search there anymore. */ memset(&next.br.block, 0, sizeof(next.br.block)); + next.br.uncompressed_data = NULL; + next.br.uncompressed_cap = 0; err = table_iter_next_block(&next); if (err < 0) @@ -599,25 +601,28 @@ static int reader_seek_internal(struct reftable_reader *r, struct reftable_reader_offsets *offs = reader_offsets_for(r, reftable_record_type(rec)); uint64_t idx = offs->index_offset; - struct table_iter ti = TABLE_ITER_INIT; - int err = 0; + struct table_iter ti = TABLE_ITER_INIT, *p; + int err; + if (idx > 0) return reader_seek_indexed(r, it, rec); err = reader_start(r, &ti, reftable_record_type(rec), 0); if (err < 0) - return err; + goto out; + err = reader_seek_linear(&ti, rec); if (err < 0) - return err; - else { - struct table_iter *p = - reftable_malloc(sizeof(struct table_iter)); - *p = ti; - iterator_from_table_iter(it, p); - } + goto out; - return 0; + REFTABLE_ALLOC_ARRAY(p, 1); + *p = ti; + iterator_from_table_iter(it, p); + +out: + if (err) + table_iter_close(&ti); + return err; } static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,