Message ID | 54e10bb06a39d23591ea4d02665083d705682468.1587297254.git.martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] strbuf: fix doc for `strbuf_attach()` | expand |
Martin Ågren <martin.agren@gmail.com> writes: > Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15) > added `strbuf_attach()`. In the commit message, it is explicitly > discussed how using `mem == len` is ok, although possibly costly. When > this function was documented in dd613e6b87 ("Strbuf documentation: > document most functions", 2008-06-04), this distinction was missed and > the documentation simply forbids this case. In other words, mem==len implies that the original lacks the terminating '\0', so attach needs to reallocate from the get go, so discouraging such use may make sense, but it is too strong to forbid it, as the strbuf invariant is held safely? If so, the description makes sense to me. > The function handles reallocation and truncation, yet the docs say that > the "amount [of allocated memory] must be larger than the string length, > because the string you pass is supposed to be a NUL-terminated string". IOW, _attach() does not mind if the original lacks '\0' at the end. > diff --git a/strbuf.h b/strbuf.h > index ce8e49c0b2..2a462f70cc 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz); > /** > * Attach a string to a buffer. You should specify the string to attach, > * the current length of the string and the amount of allocated memory. > + * The amount must be at least as large as the string length. If the two > + * lengths are equal, reallocation will be handled as appropriate and in > + * any case, the string will be NUL-truncated as implied by `len`. NUL-truncated? Ah, if mem and len are the same, the string is reallocated to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len holds. Makes sense. > + * > + * This string _must_ be malloc()ed, and after attaching, the pointer > + * cannot be relied upon anymore, and neither be free()d directly. That's a good thing to explain. > */ > void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
On Mon, 20 Apr 2020 at 19:30, Junio C Hamano <gitster@pobox.com> wrote: > > Martin Ågren <martin.agren@gmail.com> writes: > > > The function handles reallocation and truncation, yet the docs say that > > the "amount [of allocated memory] must be larger than the string length, > > because the string you pass is supposed to be a NUL-terminated string". > > IOW, _attach() does not mind if the original lacks '\0' at the end. Right, nor if it lacks the space for it. > > diff --git a/strbuf.h b/strbuf.h > > index ce8e49c0b2..2a462f70cc 100644 > > --- a/strbuf.h > > +++ b/strbuf.h > > @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz); > > /** > > * Attach a string to a buffer. You should specify the string to attach, > > * the current length of the string and the amount of allocated memory. > > + * The amount must be at least as large as the string length. If the two > > + * lengths are equal, reallocation will be handled as appropriate and in > > + * any case, the string will be NUL-truncated as implied by `len`. > > NUL-truncated? Ah, if mem and len are the same, the string is reallocated > to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len > holds. Makes sense. Exactly. NUL-terminated would be better. I think I'll split that last sentence and replace it with something like the following: If the two lengths are equal, reallocation will be handled as needed. And regardless, the string will be NUL-terminated at `len`. (One might still have strlen(sb->buf) < len though. This just guarantees "<=") Martin
diff --git a/strbuf.h b/strbuf.h index ce8e49c0b2..2a462f70cc 100644 --- a/strbuf.h +++ b/strbuf.h @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz); /** * Attach a string to a buffer. You should specify the string to attach, * the current length of the string and the amount of allocated memory. - * The amount must be larger than the string length, because the string you - * pass is supposed to be a NUL-terminated string. This string _must_ be - * malloc()ed, and after attaching, the pointer cannot be relied upon - * anymore, and neither be free()d directly. + * The amount must be at least as large as the string length. If the two + * lengths are equal, reallocation will be handled as appropriate and in + * any case, the string will be NUL-truncated as implied by `len`. + * + * This string _must_ be malloc()ed, and after attaching, the pointer + * cannot be relied upon anymore, and neither be free()d directly. */ void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15) added `strbuf_attach()`. In the commit message, it is explicitly discussed how using `mem == len` is ok, although possibly costly. When this function was documented in dd613e6b87 ("Strbuf documentation: document most functions", 2008-06-04), this distinction was missed and the documentation simply forbids this case. The function handles reallocation and truncation, yet the docs say that the "amount [of allocated memory] must be larger than the string length, because the string you pass is supposed to be a NUL-terminated string". Several callers pass in `mem == len` meaning they are compatible with the implementation and the original commit message, but not with the documented behavior. In fact, compared to the documentation, they look outright dangerous. Later commits will convert most of those call sites to use new, simpler interfaces, but in at least one instance we really do want to use this possibility of passing the same value twice: In rerere.c, it is not safe to pass in "len, len + 1". Doing so makes `strbuf_attach()` write a trailing NUL at `result.ptr[len]` and running our test suite with AddressSanitizer will spot an out-of-bounds write due to this. That is, rerere.c really has use for the flexibility that `mem == len` gives us. Let's update the documentation to clarify that this is ok. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- strbuf.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)