Message ID | 4877ab39212867e91058c60f99fe0dc2a592d583.1712209149.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | reftable: optimize write performance | expand |
On Thu, Apr 4, 2024 at 7:48 AM Patrick Steinhardt <ps@pks.im> wrote: > + /* > + * Try to add the record to the writer again. If this still fails then > + * the record does not fit into the block size. > + * > + * TODO: it would be great to have `block_writer_add()` return proper > + * error codes so that we don't have to second-guess the failure > + * mode here. > + */ The Go code returns a (size, boolean) tuple for the write routines here, but that does not really work in the Git C style. If you make the routines return error codes it suggests that the in-memory write can fail for other reasons beyond "does not fit". Not sure if that is really an improvement.
On Thu, Apr 04, 2024 at 08:58:08AM +0200, Han-Wen Nienhuys wrote: > On Thu, Apr 4, 2024 at 7:48 AM Patrick Steinhardt <ps@pks.im> wrote: > > + /* > > + * Try to add the record to the writer again. If this still fails then > > + * the record does not fit into the block size. > > + * > > + * TODO: it would be great to have `block_writer_add()` return proper > > + * error codes so that we don't have to second-guess the failure > > + * mode here. > > + */ > > The Go code returns a (size, boolean) tuple for the write routines > here, but that does not really work in the Git C style. > > If you make the routines return error codes it suggests that the > in-memory write can fail for other reasons beyond "does not fit". Not > sure if that is really an improvement. In reality, `block_writer_add()` already can fail because of different reasons: it returns `REFTABLE_API_ERROR` if the passed-in record has an empty key. This shouldn't ever happen, but it demonstrates that this is certainly an area which needs some further cleanups. Patrick
diff --git a/reftable/writer.c b/reftable/writer.c index 1d9ff0fbfa..0ad5eb8887 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -209,7 +209,8 @@ static int writer_add_record(struct reftable_writer *w, struct reftable_record *rec) { struct strbuf key = STRBUF_INIT; - int err = -1; + int err; + reftable_record_key(rec, &key); if (strbuf_cmp(&w->last_key, &key) >= 0) { err = REFTABLE_API_ERROR; @@ -218,27 +219,42 @@ static int writer_add_record(struct reftable_writer *w, strbuf_reset(&w->last_key); strbuf_addbuf(&w->last_key, &key); - if (!w->block_writer) { + if (!w->block_writer) writer_reinit_block_writer(w, reftable_record_type(rec)); - } - assert(block_writer_type(w->block_writer) == reftable_record_type(rec)); + if (block_writer_type(w->block_writer) != reftable_record_type(rec)) + BUG("record of type %d added to writer of type %d", + reftable_record_type(rec), block_writer_type(w->block_writer)); - if (block_writer_add(w->block_writer, rec) == 0) { + /* + * Try to add the record to the writer. If this succeeds then we're + * done. Otherwise the block writer may have hit the block size limit + * and needs to be flushed. + */ + if (!block_writer_add(w->block_writer, rec)) { err = 0; goto done; } + /* + * The current block is full, so we need to flush and reinitialize the + * writer to start writing the next block. + */ err = writer_flush_block(w); - if (err < 0) { + if (err < 0) goto done; - } - writer_reinit_block_writer(w, reftable_record_type(rec)); + + /* + * Try to add the record to the writer again. If this still fails then + * the record does not fit into the block size. + * + * TODO: it would be great to have `block_writer_add()` return proper + * error codes so that we don't have to second-guess the failure + * mode here. + */ err = block_writer_add(w->block_writer, rec); - if (err == -1) { - /* we are writing into memory, so an error can only mean it - * doesn't fit. */ + if (err) { err = REFTABLE_ENTRY_TOO_BIG_ERROR; goto done; }
Large parts of the reftable library do not conform to Git's typical code style. Refactor `writer_add_record()` such that it conforms better to it and add some documentation that explains some of its more intricate behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- reftable/writer.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-)