diff mbox series

[v2,03/10] reftable/basics: provide new `reftable_buf` interface

Message ID 0ddc8c0c896a006e4cc094390125efcec0b3cdff.1728910727.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: stop using `struct strbuf` | expand

Commit Message

Patrick Steinhardt Oct. 14, 2024, 1:02 p.m. UTC
Implement a new `reftable_buf` interface that will replace Git's own
`strbuf` interface. This is done due to three reasons:

  - The `strbuf` interfaces do not handle memory allocation failures and
    instead causes us to die. This is okay in the context of Git, but is
    not in the context of the reftable library, which is supposed to be
    usable by third-party applications.

  - The `strbuf` interface is quite deeply tied into Git, which makes it
    hard to use the reftable library as a standalone library. Any
    dependent would have to carefully extract the relevant parts of it
    to make things work, which is not all that sensible.

  - The `strbuf` interface does not use the pluggable allocators that
    can be set up via `refatble_set_alloc()`.

So we have good reasons to use our own type, and the implementation is
rather trivial. Implement our own type. Conversion of the reftable
library will be handled in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/basics.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 reftable/basics.h | 56 +++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+)

Comments

Taylor Blau Oct. 14, 2024, 10:34 p.m. UTC | #1
On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> Implement a new `reftable_buf` interface that will replace Git's own
> `strbuf` interface. This is done due to three reasons:
>
>   - The `strbuf` interfaces do not handle memory allocation failures and
>     instead causes us to die. This is okay in the context of Git, but is
>     not in the context of the reftable library, which is supposed to be
>     usable by third-party applications.
>
>   - The `strbuf` interface is quite deeply tied into Git, which makes it
>     hard to use the reftable library as a standalone library. Any
>     dependent would have to carefully extract the relevant parts of it
>     to make things work, which is not all that sensible.
>
>   - The `strbuf` interface does not use the pluggable allocators that
>     can be set up via `refatble_set_alloc()`.

s/refatble/reftable/.

> +/*
> + * Add the given bytes to the buffer. Returns 0 on success,
> + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> + */
> +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);

Is there a reason that data is a void-pointer here and not a const char
*?

Thanks,
Taylor
Patrick Steinhardt Oct. 15, 2024, 4:38 a.m. UTC | #2
On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > Implement a new `reftable_buf` interface that will replace Git's own
> > `strbuf` interface. This is done due to three reasons:
> >
> >   - The `strbuf` interfaces do not handle memory allocation failures and
> >     instead causes us to die. This is okay in the context of Git, but is
> >     not in the context of the reftable library, which is supposed to be
> >     usable by third-party applications.
> >
> >   - The `strbuf` interface is quite deeply tied into Git, which makes it
> >     hard to use the reftable library as a standalone library. Any
> >     dependent would have to carefully extract the relevant parts of it
> >     to make things work, which is not all that sensible.
> >
> >   - The `strbuf` interface does not use the pluggable allocators that
> >     can be set up via `refatble_set_alloc()`.
> 
> s/refatble/reftable/.
> 
> > +/*
> > + * Add the given bytes to the buffer. Returns 0 on success,
> > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > + */
> > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> 
> Is there a reason that data is a void-pointer here and not a const char
> *?

Only that it emulates `strbuf_add()`, which also uses a void pointer.

Patrick
Eric Sunshine Oct. 15, 2024, 5:10 a.m. UTC | #3
On Tue, Oct 15, 2024 at 12:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> > On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > > +/*
> > > + * Add the given bytes to the buffer. Returns 0 on success,
> > > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > > + */
> > > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> >
> > Is there a reason that data is a void-pointer here and not a const char
> > *?
>
> Only that it emulates `strbuf_add()`, which also uses a void pointer.

The reason for that is because strbuf is a generic byte-array which
may contain embedded NULs, and the `const void *` plus `len`
emphasizes this property, whereas `const char *` would imply a
C-string with no embedded NULs.

(strbuf also happens to ensure that the contained byte-array ends with
NUL so that the .buf member can be used safely with any function
accepting a C-string, but that is a convenience for the common case
when it does not contain embedded NULs, not a promise that there won't
be embedded NULs, hence `const void *` rather than `cont char *`.)
Taylor Blau Oct. 15, 2024, 7:27 p.m. UTC | #4
On Tue, Oct 15, 2024 at 01:10:59AM -0400, Eric Sunshine wrote:
> On Tue, Oct 15, 2024 at 12:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> > > On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > > > +/*
> > > > + * Add the given bytes to the buffer. Returns 0 on success,
> > > > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > > > + */
> > > > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> > >
> > > Is there a reason that data is a void-pointer here and not a const char
> > > *?
> >
> > Only that it emulates `strbuf_add()`, which also uses a void pointer.
>
> The reason for that is because strbuf is a generic byte-array which
> may contain embedded NULs, and the `const void *` plus `len`
> emphasizes this property, whereas `const char *` would imply a
> C-string with no embedded NULs.

Thanks, that was the explanation I was missing. Perhaps it is worth
re-stating in the commit message here to avoid confusing readers like I
was when I first read Patrick's patch ;-).

Thanks,
Taylor
Patrick Steinhardt Oct. 16, 2024, 8:42 a.m. UTC | #5
On Tue, Oct 15, 2024 at 03:27:29PM -0400, Taylor Blau wrote:
> On Tue, Oct 15, 2024 at 01:10:59AM -0400, Eric Sunshine wrote:
> > On Tue, Oct 15, 2024 at 12:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> > > > On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > > > > +/*
> > > > > + * Add the given bytes to the buffer. Returns 0 on success,
> > > > > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > > > > + */
> > > > > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> > > >
> > > > Is there a reason that data is a void-pointer here and not a const char
> > > > *?
> > >
> > > Only that it emulates `strbuf_add()`, which also uses a void pointer.
> >
> > The reason for that is because strbuf is a generic byte-array which
> > may contain embedded NULs, and the `const void *` plus `len`
> > emphasizes this property, whereas `const char *` would imply a
> > C-string with no embedded NULs.
> 
> Thanks, that was the explanation I was missing. Perhaps it is worth
> re-stating in the commit message here to avoid confusing readers like I
> was when I first read Patrick's patch ;-).

Does it make sense to explicitly state how the interfaces look like
though? I don't do that for the other functions either, and for most of
the part I just reuse the exact same function arguments as we had with
the strbuf interface.

Patrick
Taylor Blau Oct. 16, 2024, 8:56 p.m. UTC | #6
On Wed, Oct 16, 2024 at 10:42:44AM +0200, Patrick Steinhardt wrote:
> On Tue, Oct 15, 2024 at 03:27:29PM -0400, Taylor Blau wrote:
> > On Tue, Oct 15, 2024 at 01:10:59AM -0400, Eric Sunshine wrote:
> > > On Tue, Oct 15, 2024 at 12:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > > On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> > > > > On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > > > > > +/*
> > > > > > + * Add the given bytes to the buffer. Returns 0 on success,
> > > > > > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > > > > > + */
> > > > > > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> > > > >
> > > > > Is there a reason that data is a void-pointer here and not a const char
> > > > > *?
> > > >
> > > > Only that it emulates `strbuf_add()`, which also uses a void pointer.
> > >
> > > The reason for that is because strbuf is a generic byte-array which
> > > may contain embedded NULs, and the `const void *` plus `len`
> > > emphasizes this property, whereas `const char *` would imply a
> > > C-string with no embedded NULs.
> >
> > Thanks, that was the explanation I was missing. Perhaps it is worth
> > re-stating in the commit message here to avoid confusing readers like I
> > was when I first read Patrick's patch ;-).
>
> Does it make sense to explicitly state how the interfaces look like
> though? I don't do that for the other functions either, and for most of
> the part I just reuse the exact same function arguments as we had with
> the strbuf interface.

I don't feel very strongly about it, but I had suggested it because my
initial read of this patch confused me, and I had wondered if others may
be similarly confused.

For what it's worth, I was thinking something on the order of the
following added to the patch message:

    Note that the reftable_buf_add() function intentionally takes a "const
    void *" instead of a "const char *" (as does its strbuf counterpart,
    strbuf_add()) to emphasize that the buffer may contain NUL characters.

But, as I said, I don't feel very strongly about it.

Thanks,
Taylor
Patrick Steinhardt Oct. 17, 2024, 4:54 a.m. UTC | #7
On Wed, Oct 16, 2024 at 04:56:52PM -0400, Taylor Blau wrote:
> On Wed, Oct 16, 2024 at 10:42:44AM +0200, Patrick Steinhardt wrote:
> > On Tue, Oct 15, 2024 at 03:27:29PM -0400, Taylor Blau wrote:
> > > On Tue, Oct 15, 2024 at 01:10:59AM -0400, Eric Sunshine wrote:
> > > > On Tue, Oct 15, 2024 at 12:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > > > On Mon, Oct 14, 2024 at 06:34:55PM -0400, Taylor Blau wrote:
> > > > > > On Mon, Oct 14, 2024 at 03:02:24PM +0200, Patrick Steinhardt wrote:
> > > > > > > +/*
> > > > > > > + * Add the given bytes to the buffer. Returns 0 on success,
> > > > > > > + * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
> > > > > > > + */
> > > > > > > +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
> > > > > >
> > > > > > Is there a reason that data is a void-pointer here and not a const char
> > > > > > *?
> > > > >
> > > > > Only that it emulates `strbuf_add()`, which also uses a void pointer.
> > > >
> > > > The reason for that is because strbuf is a generic byte-array which
> > > > may contain embedded NULs, and the `const void *` plus `len`
> > > > emphasizes this property, whereas `const char *` would imply a
> > > > C-string with no embedded NULs.
> > >
> > > Thanks, that was the explanation I was missing. Perhaps it is worth
> > > re-stating in the commit message here to avoid confusing readers like I
> > > was when I first read Patrick's patch ;-).
> >
> > Does it make sense to explicitly state how the interfaces look like
> > though? I don't do that for the other functions either, and for most of
> > the part I just reuse the exact same function arguments as we had with
> > the strbuf interface.
> 
> I don't feel very strongly about it, but I had suggested it because my
> initial read of this patch confused me, and I had wondered if others may
> be similarly confused.
> 
> For what it's worth, I was thinking something on the order of the
> following added to the patch message:
> 
>     Note that the reftable_buf_add() function intentionally takes a "const
>     void *" instead of a "const char *" (as does its strbuf counterpart,
>     strbuf_add()) to emphasize that the buffer may contain NUL characters.
> 
> But, as I said, I don't feel very strongly about it.

You know: let me amend the function documentation itself. That feels way
less out of place compared to putting this info into the commit message
and has the benefit that a future reader of the code will know why we
have types without digging into the commit history.

Patrick
Taylor Blau Oct. 17, 2024, 8:59 p.m. UTC | #8
On Thu, Oct 17, 2024 at 06:54:51AM +0200, Patrick Steinhardt wrote:
> > I don't feel very strongly about it, but I had suggested it because my
> > initial read of this patch confused me, and I had wondered if others may
> > be similarly confused.
> >
> > For what it's worth, I was thinking something on the order of the
> > following added to the patch message:
> >
> >     Note that the reftable_buf_add() function intentionally takes a "const
> >     void *" instead of a "const char *" (as does its strbuf counterpart,
> >     strbuf_add()) to emphasize that the buffer may contain NUL characters.
> >
> > But, as I said, I don't feel very strongly about it.
>
> You know: let me amend the function documentation itself. That feels way
> less out of place compared to putting this info into the commit message
> and has the benefit that a future reader of the code will know why we
> have types without digging into the commit history.

Good idea, thanks.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/reftable/basics.c b/reftable/basics.c
index 9a949e5cf80..65ad761da0b 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -9,6 +9,7 @@  license that can be found in the LICENSE file or at
 #define REFTABLE_ALLOW_BANNED_ALLOCATORS
 #include "basics.h"
 #include "reftable-basics.h"
+#include "reftable-error.h"
 
 static void *(*reftable_malloc_ptr)(size_t sz);
 static void *(*reftable_realloc_ptr)(void *, size_t);
@@ -69,6 +70,79 @@  void reftable_set_alloc(void *(*malloc)(size_t),
 	reftable_free_ptr = free;
 }
 
+void reftable_buf_init(struct reftable_buf *buf)
+{
+	struct reftable_buf empty = REFTABLE_BUF_INIT;
+	*buf = empty;
+}
+
+void reftable_buf_release(struct reftable_buf *buf)
+{
+	reftable_free(buf->buf);
+	reftable_buf_init(buf);
+}
+
+void reftable_buf_reset(struct reftable_buf *buf)
+{
+	if (buf->alloc) {
+		buf->len = 0;
+		buf->buf[0] = '\0';
+	}
+}
+
+int reftable_buf_setlen(struct reftable_buf *buf, size_t len)
+{
+	if (len > buf->len)
+		return -1;
+	if (len == buf->len)
+		return 0;
+	buf->buf[len] = '\0';
+	buf->len = len;
+	return 0;
+}
+
+int reftable_buf_cmp(const struct reftable_buf *a, const struct reftable_buf *b)
+{
+	size_t len = a->len < b->len ? a->len : b->len;
+	if (len) {
+		int cmp = memcmp(a->buf, b->buf, len);
+		if (cmp)
+			return cmp;
+	}
+	return a->len < b->len ? -1 : a->len != b->len;
+}
+
+int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len)
+{
+	size_t newlen = buf->len + len;
+
+	if (newlen + 1 > buf->alloc) {
+		char *reallocated = buf->buf;
+		REFTABLE_ALLOC_GROW(reallocated, newlen + 1, buf->alloc);
+		if (!reallocated)
+			return REFTABLE_OUT_OF_MEMORY_ERROR;
+		buf->buf = reallocated;
+	}
+
+	memcpy(buf->buf + buf->len, data, len);
+	buf->buf[newlen] = '\0';
+	buf->len = newlen;
+
+	return 0;
+}
+
+int reftable_buf_addstr(struct reftable_buf *buf, const char *s)
+{
+	return reftable_buf_add(buf, s, strlen(s));
+}
+
+char *reftable_buf_detach(struct reftable_buf *buf)
+{
+	char *result = buf->buf;
+	reftable_buf_init(buf);
+	return result;
+}
+
 void put_be24(uint8_t *out, uint32_t i)
 {
 	out[0] = (uint8_t)((i >> 16) & 0xff);
diff --git a/reftable/basics.h b/reftable/basics.h
index 4c9ef0fe6c5..bd33c34deae 100644
--- a/reftable/basics.h
+++ b/reftable/basics.h
@@ -16,6 +16,62 @@  license that can be found in the LICENSE file or at
 #include "system.h"
 #include "reftable-basics.h"
 
+struct reftable_buf {
+	size_t alloc;
+	size_t len;
+	char *buf;
+};
+#define REFTABLE_BUF_INIT { 0 }
+
+/*
+ * Initialize the buffer such that it is ready for use. This is equivalent to
+ * using REFTABLE_BUF_INIT for stack-allocated variables.
+ */
+void reftable_buf_init(struct reftable_buf *buf);
+
+/*
+ * Release memory associated with the buffer. The buffer is reinitialized such
+ * that it can be reused for subsequent operations.
+ */
+void reftable_buf_release(struct reftable_buf *buf);
+
+/*
+ * Reset the buffer such that it is effectively empty, without releasing the
+ * memory that this structure holds on to. This is equivalent to calling
+ * `reftable_buf_setlen(buf, 0)`.
+ */
+void reftable_buf_reset(struct reftable_buf *buf);
+
+/*
+ * Trim the buffer to a shorter length by updating the `len` member and writing
+ * a NUL byte to `buf[len]`. Returns 0 on success, -1 when `len` points outside
+ * of the array.
+ */
+int reftable_buf_setlen(struct reftable_buf *buf, size_t len);
+
+/*
+ * Lexicographically compare the two buffers. Returns 0 when both buffers have
+ * the same contents, -1 when `a` is lexicographically smaller than `b`, and 1
+ * otherwise.
+ */
+int reftable_buf_cmp(const struct reftable_buf *a, const struct reftable_buf *b);
+
+/*
+ * Add the given bytes to the buffer. Returns 0 on success,
+ * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
+ */
+int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
+
+/* Equivalent to `reftable_buf_add(buf, s, strlen(s))`. */
+int reftable_buf_addstr(struct reftable_buf *buf, const char *s);
+
+/*
+ * Detach the buffer from the structure such that the underlying memory is now
+ * owned by the caller. The buffer is reinitialized such that it can be reused
+ * for subsequent operations.
+ */
+char *reftable_buf_detach(struct reftable_buf *buf);
+
 /* Bigendian en/decoding of integers */
 
 void put_be24(uint8_t *out, uint32_t i);