Message ID | e427fe6ad383cc238c13f313dc9f11eab37a3840.1697736516.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | merge-ort: implement support for packing objects together | expand |
On Thu, Oct 19, 2023 at 01:28:51PM -0400, Taylor Blau wrote: > Continue to prepare for streaming an object's contents directly from > memory by teaching `bulk_checkin_source` how to perform reads and seeks > based on an address in memory. > > Unlike file descriptors, which manage their own offset internally, we > have to keep track of how many bytes we've read out of the buffer, and > make sure we don't read past the end of the buffer. > > Suggested-by: Junio C Hamano <gitster@pobox.com> > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > bulk-checkin.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/bulk-checkin.c b/bulk-checkin.c > index 28bc8d5ab4..60361b3e2e 100644 > --- a/bulk-checkin.c > +++ b/bulk-checkin.c > @@ -141,11 +141,15 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id > } > > struct bulk_checkin_source { > - enum { SOURCE_FILE } type; > + enum { SOURCE_FILE, SOURCE_INCORE } type; > > /* SOURCE_FILE fields */ > int fd; > > + /* SOURCE_INCORE fields */ > + const void *buf; > + size_t read; > + > /* common fields */ > size_t size; > const char *path; > @@ -157,6 +161,11 @@ static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, > switch (source->type) { > case SOURCE_FILE: > return lseek(source->fd, offset, SEEK_SET); > + case SOURCE_INCORE: > + if (!(0 <= offset && offset < source->size)) > + return (off_t)-1; > + source->read = offset; > + return source->read; > default: > BUG("unknown bulk-checkin source: %d", source->type); > } > @@ -168,6 +177,13 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, > switch (source->type) { > case SOURCE_FILE: > return read_in_full(source->fd, buf, nr); > + case SOURCE_INCORE: > + assert(source->read <= source->size); Is there any guideline around when to use `assert()` vs `BUG()`? I think that this assertion here is quite critical, because when it does not hold we can end up performing out-of-bounds reads and writes. But as asserts are typically missing in non-debug builds, this safeguard would not do anything for our end users, right? Patrick > + if (nr > source->size - source->read) > + nr = source->size - source->read; > + memcpy(buf, (unsigned char *)source->buf + source->read, nr); > + source->read += nr; > + return nr; > default: > BUG("unknown bulk-checkin source: %d", source->type); > } > -- > 2.42.0.405.g86fe3250c2 >
On Mon, Oct 23, 2023 at 11:19:13AM +0200, Patrick Steinhardt wrote: > > + case SOURCE_INCORE: > > + assert(source->read <= source->size); > > Is there any guideline around when to use `assert()` vs `BUG()`? I think > that this assertion here is quite critical, because when it does not > hold we can end up performing out-of-bounds reads and writes. But as > asserts are typically missing in non-debug builds, this safeguard would > not do anything for our end users, right? I don't think we have a written guideline. My philosophy is: always use BUG(), because you will never be surprised that the assertion was not compiled in (and I think compiling without assertions is almost certainly premature optimization, especially given the way we tend to use them). -Peff
On Mon, Oct 23, 2023 at 02:58:42PM -0400, Jeff King wrote: > On Mon, Oct 23, 2023 at 11:19:13AM +0200, Patrick Steinhardt wrote: > > > > + case SOURCE_INCORE: > > > + assert(source->read <= source->size); > > > > Is there any guideline around when to use `assert()` vs `BUG()`? I think > > that this assertion here is quite critical, because when it does not > > hold we can end up performing out-of-bounds reads and writes. But as > > asserts are typically missing in non-debug builds, this safeguard would > > not do anything for our end users, right? > > I don't think we have a written guideline. My philosophy is: always use > BUG(), because you will never be surprised that the assertion was not > compiled in (and I think compiling without assertions is almost > certainly premature optimization, especially given the way we tend to > use them). > > -Peff I'm inclined to agree with your philosophy. Makes me wonder whether we should write a Coccinelle rule to catch this. But a quick-and-dirty grep in our codebase shows that such a rule would cause quite a lot of churn: $ git grep BUG\( | wc -l 677 $ git grep assert\( | wc -l 549 Probably not worth it. Patrick
Patrick Steinhardt <ps@pks.im> writes: > I'm inclined to agree with your philosophy. Makes me wonder whether we > should write a Coccinelle rule to catch this. But a quick-and-dirty grep > in our codebase shows that such a rule would cause quite a lot of churn: > > $ git grep BUG\( | wc -l > 677 > $ git grep assert\( | wc -l > 549 > > Probably not worth it. Yeah, we can stick to our usual "do not add new instances, fix them while touching near-by code" pattern for this one, I would say. Thanks.
diff --git a/bulk-checkin.c b/bulk-checkin.c index 28bc8d5ab4..60361b3e2e 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -141,11 +141,15 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id } struct bulk_checkin_source { - enum { SOURCE_FILE } type; + enum { SOURCE_FILE, SOURCE_INCORE } type; /* SOURCE_FILE fields */ int fd; + /* SOURCE_INCORE fields */ + const void *buf; + size_t read; + /* common fields */ size_t size; const char *path; @@ -157,6 +161,11 @@ static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return lseek(source->fd, offset, SEEK_SET); + case SOURCE_INCORE: + if (!(0 <= offset && offset < source->size)) + return (off_t)-1; + source->read = offset; + return source->read; default: BUG("unknown bulk-checkin source: %d", source->type); } @@ -168,6 +177,13 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return read_in_full(source->fd, buf, nr); + case SOURCE_INCORE: + assert(source->read <= source->size); + if (nr > source->size - source->read) + nr = source->size - source->read; + memcpy(buf, (unsigned char *)source->buf + source->read, nr); + source->read += nr; + return nr; default: BUG("unknown bulk-checkin source: %d", source->type); }
Continue to prepare for streaming an object's contents directly from memory by teaching `bulk_checkin_source` how to perform reads and seeks based on an address in memory. Unlike file descriptors, which manage their own offset internally, we have to keep track of how many bytes we've read out of the buffer, and make sure we don't read past the end of the buffer. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- bulk-checkin.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)