Message ID | 20230317171309.73607-2-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page | expand |
On Fri, 2023-03-17 at 13:13 -0400, Jeff Layton wrote: > If rq_next_page ends up pointing outside the array, then we can corrupt > memory when we go to change its value. Ensure that it hasn't strayed > outside the array, and have svc_rqst_replace_page return -EIO without > changing anything if it has. > > Fix up nfsd_splice_actor (the only caller) to handle this case by either > returning an error or a short splice when this happens. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > include/linux/sunrpc/svc.h | 2 +- > net/sunrpc/svc.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 97b38b47c563..0ebd7a65a9f0 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > struct page *page = buf->page; // may be a compound one > unsigned offset = buf->offset; > struct page *last_page; > + int ret = 0, consumed = 0; > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > * Skip page replacement when extending the contents > * of the current page. > */ > - if (page != *(rqstp->rq_next_page - 1)) > - svc_rqst_replace_page(rqstp, page); > + if (page != *(rqstp->rq_next_page - 1)) { > + ret = svc_rqst_replace_page(rqstp, page); > + if (ret) > + break; > + } > + consumed += min_t(int, > + PAGE_SIZE - offset_in_page(offset), > + sd->len - consumed); > + offset = 0; > } > - if (rqstp->rq_res.page_len == 0) // first call > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > - rqstp->rq_res.page_len += sd->len; > - return sd->len; > + if (consumed) { > + if (rqstp->rq_res.page_len == 0) // first call > + rqstp->rq_res.page_base = offset % PAGE_SIZE; Oops, this will get the page_base wrong if the first splice covers multiple pages. I'll need to respin the offset handling. > + rqstp->rq_res.page_len += consumed; > + return consumed; > + } > + return ret; > } > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index 877891536c2f..9ea52f143f49 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, > int (*threadfn)(void *data)); > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > struct svc_pool *pool, int node); > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > struct page *page); > void svc_rqst_free(struct svc_rqst *); > void svc_exit_thread(struct svc_rqst *); > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index fea7ce8fba14..d624c02f09be 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > * When replacing a page in rq_pages, batch the release of the > * replaced pages to avoid hammering the page allocator. > */ > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > { > + struct page **begin, **end; > + > + /* > + * Bounds check: make sure rq_next_page points into the rq_respages > + * part of the array. > + */ > + begin = rqstp->rq_pages; > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) > + return -EIO; > + > if (*rqstp->rq_next_page) { > if (!pagevec_space(&rqstp->rq_pvec)) > __pagevec_release(&rqstp->rq_pvec); > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > get_page(page); > *(rqstp->rq_next_page++) = page; > + return 0; > } > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); >
> On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > If rq_next_page ends up pointing outside the array, then we can corrupt > memory when we go to change its value. Ensure that it hasn't strayed > outside the array, and have svc_rqst_replace_page return -EIO without > changing anything if it has. > > Fix up nfsd_splice_actor (the only caller) to handle this case by either > returning an error or a short splice when this happens. IMO it's not worth the extra complexity to return a short splice. This is a "should never happen" scenario in a hot I/O path. Let's keep this code as simple as possible, and use unlikely() for the error cases in both nfsd_splice_actor and svc_rqst_replace_page(). Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON stack trace is not adding value. I still think a tracepoint is more appropriate. I suggest: trace_svc_replace_page_err(rqst); > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > include/linux/sunrpc/svc.h | 2 +- > net/sunrpc/svc.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 97b38b47c563..0ebd7a65a9f0 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > struct page *page = buf->page; // may be a compound one > unsigned offset = buf->offset; > struct page *last_page; > + int ret = 0, consumed = 0; > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > * Skip page replacement when extending the contents > * of the current page. > */ > - if (page != *(rqstp->rq_next_page - 1)) > - svc_rqst_replace_page(rqstp, page); > + if (page != *(rqstp->rq_next_page - 1)) { > + ret = svc_rqst_replace_page(rqstp, page); > + if (ret) > + break; > + } > + consumed += min_t(int, > + PAGE_SIZE - offset_in_page(offset), > + sd->len - consumed); > + offset = 0; > } > - if (rqstp->rq_res.page_len == 0) // first call > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > - rqstp->rq_res.page_len += sd->len; > - return sd->len; > + if (consumed) { > + if (rqstp->rq_res.page_len == 0) // first call > + rqstp->rq_res.page_base = offset % PAGE_SIZE; > + rqstp->rq_res.page_len += consumed; > + return consumed; > + } > + return ret; > } > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index 877891536c2f..9ea52f143f49 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, > int (*threadfn)(void *data)); > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > struct svc_pool *pool, int node); > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > struct page *page); > void svc_rqst_free(struct svc_rqst *); > void svc_exit_thread(struct svc_rqst *); > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index fea7ce8fba14..d624c02f09be 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > * When replacing a page in rq_pages, batch the release of the > * replaced pages to avoid hammering the page allocator. > */ > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > { > + struct page **begin, **end; > + > + /* > + * Bounds check: make sure rq_next_page points into the rq_respages > + * part of the array. > + */ > + begin = rqstp->rq_pages; > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) > + return -EIO; > + > if (*rqstp->rq_next_page) { > if (!pagevec_space(&rqstp->rq_pvec)) > __pagevec_release(&rqstp->rq_pvec); > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > get_page(page); > *(rqstp->rq_next_page++) = page; > + return 0; > } > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); > > -- > 2.39.2 > -- Chuck Lever
On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote: > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > If rq_next_page ends up pointing outside the array, then we can > > corrupt > > memory when we go to change its value. Ensure that it hasn't strayed > > outside the array, and have svc_rqst_replace_page return -EIO > > without > > changing anything if it has. > > > > Fix up nfsd_splice_actor (the only caller) to handle this case by > > either > > returning an error or a short splice when this happens. > > IMO it's not worth the extra complexity to return a short splice. > This is a "should never happen" scenario in a hot I/O path. Let's > keep this code as simple as possible, and use unlikely() for the > error cases in both nfsd_splice_actor and svc_rqst_replace_page(). > Are there any issues with just returning an error even though we have successfully spliced some of the data into the buffer? I guess the caller will just see an EIO or whatever instead of the short read in that case? > Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON > stack trace is not adding value. I still think a tracepoint is more > appropriate. I suggest: > > trace_svc_replace_page_err(rqst); > > Ok, I can look at adding a conditional tracepoint. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > > include/linux/sunrpc/svc.h | 2 +- > > net/sunrpc/svc.c | 14 +++++++++++++- > > 3 files changed, 31 insertions(+), 8 deletions(-) > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 97b38b47c563..0ebd7a65a9f0 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, > > struct pipe_buffer *buf, > > struct page *page = buf->page; // may be a compound one > > unsigned offset = buf->offset; > > struct page *last_page; > > + int ret = 0, consumed = 0; > > > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > > for (page += offset / PAGE_SIZE; page <= last_page; page++) > > { > > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info > > *pipe, struct pipe_buffer *buf, > > * Skip page replacement when extending the > > contents > > * of the current page. > > */ > > - if (page != *(rqstp->rq_next_page - 1)) > > - svc_rqst_replace_page(rqstp, page); > > + if (page != *(rqstp->rq_next_page - 1)) { > > + ret = svc_rqst_replace_page(rqstp, page); > > + if (ret) > > + break; > > + } > > + consumed += min_t(int, > > + PAGE_SIZE - > > offset_in_page(offset), > > + sd->len - consumed); > > + offset = 0; > > } > > - if (rqstp->rq_res.page_len == 0) // first call > > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > > - rqstp->rq_res.page_len += sd->len; > > - return sd->len; > > + if (consumed) { > > + if (rqstp->rq_res.page_len == 0) // first > > call > > + rqstp->rq_res.page_base = offset % > > PAGE_SIZE; > > + rqstp->rq_res.page_len += consumed; > > + return consumed; > > + } > > + return ret; > > } > > > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > > index 877891536c2f..9ea52f143f49 100644 > > --- a/include/linux/sunrpc/svc.h > > +++ b/include/linux/sunrpc/svc.h > > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program > > *, unsigned int, > > int (*threadfn)(void *data)); > > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > > struct svc_pool *pool, int > > node); > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > > struct page *page); > > void svc_rqst_free(struct svc_rqst *); > > void svc_exit_thread(struct svc_rqst *); > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > index fea7ce8fba14..d624c02f09be 100644 > > --- a/net/sunrpc/svc.c > > +++ b/net/sunrpc/svc.c > > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > > * When replacing a page in rq_pages, batch the release of the > > * replaced pages to avoid hammering the page allocator. > > */ > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page > > *page) > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page > > *page) > > { > > + struct page **begin, **end; > > + > > + /* > > + * Bounds check: make sure rq_next_page points into the > > rq_respages > > + * part of the array. > > + */ > > + begin = rqstp->rq_pages; > > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; > > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp- > > >rq_next_page > end)) > > + return -EIO; > > + > > if (*rqstp->rq_next_page) { > > if (!pagevec_space(&rqstp->rq_pvec)) > > __pagevec_release(&rqstp->rq_pvec); > > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst > > *rqstp, struct page *page) > > > > get_page(page); > > *(rqstp->rq_next_page++) = page; > > + return 0; > > } > > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); > > > > -- > > 2.39.2 > > > > -- > Chuck Lever > >
> On Mar 17, 2023, at 2:04 PM, Jeff Layton <jlayton@kernel.org> wrote: > > On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote: >>> On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: >>> >>> If rq_next_page ends up pointing outside the array, then we can >>> corrupt >>> memory when we go to change its value. Ensure that it hasn't strayed >>> outside the array, and have svc_rqst_replace_page return -EIO >>> without >>> changing anything if it has. >>> >>> Fix up nfsd_splice_actor (the only caller) to handle this case by >>> either >>> returning an error or a short splice when this happens. >> >> IMO it's not worth the extra complexity to return a short splice. >> This is a "should never happen" scenario in a hot I/O path. Let's >> keep this code as simple as possible, and use unlikely() for the >> error cases in both nfsd_splice_actor and svc_rqst_replace_page(). >> > > Are there any issues with just returning an error even though we have > successfully spliced some of the data into the buffer? I guess the > caller will just see an EIO or whatever instead of the short read in > that case? NFSv4 READ is probably going to truncate the XDR buffer. I'm not sure NFSv3 is so clever, so you should test it. >> Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON >> stack trace is not adding value. I still think a tracepoint is more >> appropriate. I suggest: >> >> trace_svc_replace_page_err(rqst); > > Ok, I can look at adding a conditional tracepoint. I thought about that: it doesn't help much, since you have to explicitly test anyway to see whether to return an error. >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >>> fs/nfsd/vfs.c | 23 +++++++++++++++++------ >>> include/linux/sunrpc/svc.h | 2 +- >>> net/sunrpc/svc.c | 14 +++++++++++++- >>> 3 files changed, 31 insertions(+), 8 deletions(-) >>> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index 97b38b47c563..0ebd7a65a9f0 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, >>> struct pipe_buffer *buf, >>> struct page *page = buf->page; // may be a compound one >>> unsigned offset = buf->offset; >>> struct page *last_page; >>> + int ret = 0, consumed = 0; >>> >>> last_page = page + (offset + sd->len - 1) / PAGE_SIZE; >>> for (page += offset / PAGE_SIZE; page <= last_page; page++) >>> { >>> @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info >>> *pipe, struct pipe_buffer *buf, >>> * Skip page replacement when extending the >>> contents >>> * of the current page. >>> */ >>> - if (page != *(rqstp->rq_next_page - 1)) >>> - svc_rqst_replace_page(rqstp, page); >>> + if (page != *(rqstp->rq_next_page - 1)) { >>> + ret = svc_rqst_replace_page(rqstp, page); >>> + if (ret) >>> + break; >>> + } >>> + consumed += min_t(int, >>> + PAGE_SIZE - >>> offset_in_page(offset), >>> + sd->len - consumed); >>> + offset = 0; >>> } >>> - if (rqstp->rq_res.page_len == 0) // first call >>> - rqstp->rq_res.page_base = offset % PAGE_SIZE; >>> - rqstp->rq_res.page_len += sd->len; >>> - return sd->len; >>> + if (consumed) { >>> + if (rqstp->rq_res.page_len == 0) // first >>> call >>> + rqstp->rq_res.page_base = offset % >>> PAGE_SIZE; >>> + rqstp->rq_res.page_len += consumed; >>> + return consumed; >>> + } >>> + return ret; >>> } >>> >>> static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, >>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h >>> index 877891536c2f..9ea52f143f49 100644 >>> --- a/include/linux/sunrpc/svc.h >>> +++ b/include/linux/sunrpc/svc.h >>> @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program >>> *, unsigned int, >>> int (*threadfn)(void *data)); >>> struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, >>> struct svc_pool *pool, int >>> node); >>> -void svc_rqst_replace_page(struct svc_rqst *rqstp, >>> +int svc_rqst_replace_page(struct svc_rqst *rqstp, >>> struct page *page); >>> void svc_rqst_free(struct svc_rqst *); >>> void svc_exit_thread(struct svc_rqst *); >>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>> index fea7ce8fba14..d624c02f09be 100644 >>> --- a/net/sunrpc/svc.c >>> +++ b/net/sunrpc/svc.c >>> @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); >>> * When replacing a page in rq_pages, batch the release of the >>> * replaced pages to avoid hammering the page allocator. >>> */ >>> -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page >>> *page) >>> +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page >>> *page) >>> { >>> + struct page **begin, **end; >>> + >>> + /* >>> + * Bounds check: make sure rq_next_page points into the >>> rq_respages >>> + * part of the array. >>> + */ >>> + begin = rqstp->rq_pages; >>> + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; >>> + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp- >>>> rq_next_page > end)) >>> + return -EIO; >>> + >>> if (*rqstp->rq_next_page) { >>> if (!pagevec_space(&rqstp->rq_pvec)) >>> __pagevec_release(&rqstp->rq_pvec); >>> @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst >>> *rqstp, struct page *page) >>> >>> get_page(page); >>> *(rqstp->rq_next_page++) = page; >>> + return 0; >>> } >>> EXPORT_SYMBOL_GPL(svc_rqst_replace_page); >>> >>> -- >>> 2.39.2 >>> >> >> -- >> Chuck Lever >> >> > > -- > Jeff Layton <jlayton@kernel.org> -- Chuck Lever
> On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > If rq_next_page ends up pointing outside the array, then we can corrupt > memory when we go to change its value. Ensure that it hasn't strayed > outside the array, and have svc_rqst_replace_page return -EIO without > changing anything if it has. > > Fix up nfsd_splice_actor (the only caller) to handle this case by either > returning an error or a short splice when this happens. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > include/linux/sunrpc/svc.h | 2 +- > net/sunrpc/svc.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 97b38b47c563..0ebd7a65a9f0 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > struct page *page = buf->page; // may be a compound one > unsigned offset = buf->offset; > struct page *last_page; > + int ret = 0, consumed = 0; > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > * Skip page replacement when extending the contents > * of the current page. > */ > - if (page != *(rqstp->rq_next_page - 1)) > - svc_rqst_replace_page(rqstp, page); > + if (page != *(rqstp->rq_next_page - 1)) { > + ret = svc_rqst_replace_page(rqstp, page); > + if (ret) > + break; > + } > + consumed += min_t(int, > + PAGE_SIZE - offset_in_page(offset), > + sd->len - consumed); > + offset = 0; > } > - if (rqstp->rq_res.page_len == 0) // first call > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > - rqstp->rq_res.page_len += sd->len; > - return sd->len; > + if (consumed) { > + if (rqstp->rq_res.page_len == 0) // first call > + rqstp->rq_res.page_base = offset % PAGE_SIZE; > + rqstp->rq_res.page_len += consumed; > + return consumed; > + } > + return ret; > } > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index 877891536c2f..9ea52f143f49 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, > int (*threadfn)(void *data)); > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > struct svc_pool *pool, int node); > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > struct page *page); > void svc_rqst_free(struct svc_rqst *); > void svc_exit_thread(struct svc_rqst *); > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index fea7ce8fba14..d624c02f09be 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > * When replacing a page in rq_pages, batch the release of the > * replaced pages to avoid hammering the page allocator. > */ > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > { > + struct page **begin, **end; > + > + /* > + * Bounds check: make sure rq_next_page points into the rq_respages > + * part of the array. > + */ > + begin = rqstp->rq_pages; > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; Would it make sense to use rq_page_end here? Just a thought. > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) > + return -EIO; > + > if (*rqstp->rq_next_page) { > if (!pagevec_space(&rqstp->rq_pvec)) > __pagevec_release(&rqstp->rq_pvec); > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > get_page(page); > *(rqstp->rq_next_page++) = page; > + return 0; > } > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); > > -- > 2.39.2 > -- Chuck Lever
On Fri, 2023-03-17 at 18:32 +0000, Chuck Lever III wrote: > > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > If rq_next_page ends up pointing outside the array, then we can corrupt > > memory when we go to change its value. Ensure that it hasn't strayed > > outside the array, and have svc_rqst_replace_page return -EIO without > > changing anything if it has. > > > > Fix up nfsd_splice_actor (the only caller) to handle this case by either > > returning an error or a short splice when this happens. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > > include/linux/sunrpc/svc.h | 2 +- > > net/sunrpc/svc.c | 14 +++++++++++++- > > 3 files changed, 31 insertions(+), 8 deletions(-) > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 97b38b47c563..0ebd7a65a9f0 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > > struct page *page = buf->page; // may be a compound one > > unsigned offset = buf->offset; > > struct page *last_page; > > + int ret = 0, consumed = 0; > > > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > > * Skip page replacement when extending the contents > > * of the current page. > > */ > > - if (page != *(rqstp->rq_next_page - 1)) > > - svc_rqst_replace_page(rqstp, page); > > + if (page != *(rqstp->rq_next_page - 1)) { > > + ret = svc_rqst_replace_page(rqstp, page); > > + if (ret) > > + break; > > + } > > + consumed += min_t(int, > > + PAGE_SIZE - offset_in_page(offset), > > + sd->len - consumed); > > + offset = 0; > > } > > - if (rqstp->rq_res.page_len == 0) // first call > > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > > - rqstp->rq_res.page_len += sd->len; > > - return sd->len; > > + if (consumed) { > > + if (rqstp->rq_res.page_len == 0) // first call > > + rqstp->rq_res.page_base = offset % PAGE_SIZE; > > + rqstp->rq_res.page_len += consumed; > > + return consumed; > > + } > > + return ret; > > } > > > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > > index 877891536c2f..9ea52f143f49 100644 > > --- a/include/linux/sunrpc/svc.h > > +++ b/include/linux/sunrpc/svc.h > > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, > > int (*threadfn)(void *data)); > > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > > struct svc_pool *pool, int node); > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > > struct page *page); > > void svc_rqst_free(struct svc_rqst *); > > void svc_exit_thread(struct svc_rqst *); > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > index fea7ce8fba14..d624c02f09be 100644 > > --- a/net/sunrpc/svc.c > > +++ b/net/sunrpc/svc.c > > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > > * When replacing a page in rq_pages, batch the release of the > > * replaced pages to avoid hammering the page allocator. > > */ > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > { > > + struct page **begin, **end; > > + > > + /* > > + * Bounds check: make sure rq_next_page points into the rq_respages > > + * part of the array. > > + */ > > + begin = rqstp->rq_pages; > > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; > > Would it make sense to use rq_page_end here? Just a thought. > That should be set to the right value, but do you trust that it wouldn't be corrupt if rq_pages got overrun? It's only 3 pointers beyond the end of the array. This should work even if rq_page_end gets clobbered. > > > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) > > + return -EIO; > > + > > if (*rqstp->rq_next_page) { > > if (!pagevec_space(&rqstp->rq_pvec)) > > __pagevec_release(&rqstp->rq_pvec); > > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) > > > > get_page(page); > > *(rqstp->rq_next_page++) = page; > > + return 0; > > } > > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); > > > > -- > > 2.39.2 > > > > -- > Chuck Lever > >
On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote: > > > On Mar 17, 2023, at 2:04 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote: > > > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > If rq_next_page ends up pointing outside the array, then we can > > > > corrupt > > > > memory when we go to change its value. Ensure that it hasn't strayed > > > > outside the array, and have svc_rqst_replace_page return -EIO > > > > without > > > > changing anything if it has. > > > > > > > > Fix up nfsd_splice_actor (the only caller) to handle this case by > > > > either > > > > returning an error or a short splice when this happens. > > > > > > IMO it's not worth the extra complexity to return a short splice. > > > This is a "should never happen" scenario in a hot I/O path. Let's > > > keep this code as simple as possible, and use unlikely() for the > > > error cases in both nfsd_splice_actor and svc_rqst_replace_page(). > > > > > > > Are there any issues with just returning an error even though we have > > successfully spliced some of the data into the buffer? I guess the > > caller will just see an EIO or whatever instead of the short read in > > that case? > > NFSv4 READ is probably going to truncate the XDR buffer. I'm not > sure NFSv3 is so clever, so you should test it. > Honestly, I don't have the cycles to do that sort of fault injection testing for this. If you think handling it as a short read is overblown, then tell me what you would like see here. > > > Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON > > > stack trace is not adding value. I still think a tracepoint is more > > > appropriate. I suggest: > > > > > > trace_svc_replace_page_err(rqst); > > > > Ok, I can look at adding a conditional tracepoint. > > I thought about that: it doesn't help much, since you have to > explicitly test anyway to see whether to return an error. > Fair point. I can just make it a regular tracepoint that fires inside the conditional. > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/nfsd/vfs.c | 23 +++++++++++++++++------ > > > > include/linux/sunrpc/svc.h | 2 +- > > > > net/sunrpc/svc.c | 14 +++++++++++++- > > > > 3 files changed, 31 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > > > index 97b38b47c563..0ebd7a65a9f0 100644 > > > > --- a/fs/nfsd/vfs.c > > > > +++ b/fs/nfsd/vfs.c > > > > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, > > > > struct pipe_buffer *buf, > > > > struct page *page = buf->page; // may be a compound one > > > > unsigned offset = buf->offset; > > > > struct page *last_page; > > > > + int ret = 0, consumed = 0; > > > > > > > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > > > > for (page += offset / PAGE_SIZE; page <= last_page; page++) > > > > { > > > > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info > > > > *pipe, struct pipe_buffer *buf, > > > > * Skip page replacement when extending the > > > > contents > > > > * of the current page. > > > > */ > > > > - if (page != *(rqstp->rq_next_page - 1)) > > > > - svc_rqst_replace_page(rqstp, page); > > > > + if (page != *(rqstp->rq_next_page - 1)) { > > > > + ret = svc_rqst_replace_page(rqstp, page); > > > > + if (ret) > > > > + break; > > > > + } > > > > + consumed += min_t(int, > > > > + PAGE_SIZE - > > > > offset_in_page(offset), > > > > + sd->len - consumed); > > > > + offset = 0; > > > > } > > > > - if (rqstp->rq_res.page_len == 0) // first call > > > > - rqstp->rq_res.page_base = offset % PAGE_SIZE; > > > > - rqstp->rq_res.page_len += sd->len; > > > > - return sd->len; > > > > + if (consumed) { > > > > + if (rqstp->rq_res.page_len == 0) // first > > > > call > > > > + rqstp->rq_res.page_base = offset % > > > > PAGE_SIZE; > > > > + rqstp->rq_res.page_len += consumed; > > > > + return consumed; > > > > + } > > > > + return ret; > > > > } > > > > > > > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, > > > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > > > > index 877891536c2f..9ea52f143f49 100644 > > > > --- a/include/linux/sunrpc/svc.h > > > > +++ b/include/linux/sunrpc/svc.h > > > > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program > > > > *, unsigned int, > > > > int (*threadfn)(void *data)); > > > > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, > > > > struct svc_pool *pool, int > > > > node); > > > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, > > > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, > > > > struct page *page); > > > > void svc_rqst_free(struct svc_rqst *); > > > > void svc_exit_thread(struct svc_rqst *); > > > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > > > index fea7ce8fba14..d624c02f09be 100644 > > > > --- a/net/sunrpc/svc.c > > > > +++ b/net/sunrpc/svc.c > > > > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); > > > > * When replacing a page in rq_pages, batch the release of the > > > > * replaced pages to avoid hammering the page allocator. > > > > */ > > > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page > > > > *page) > > > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page > > > > *page) > > > > { > > > > + struct page **begin, **end; > > > > + > > > > + /* > > > > + * Bounds check: make sure rq_next_page points into the > > > > rq_respages > > > > + * part of the array. > > > > + */ > > > > + begin = rqstp->rq_pages; > > > > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; > > > > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp- > > > > > rq_next_page > end)) > > > > + return -EIO; > > > > + > > > > if (*rqstp->rq_next_page) { > > > > if (!pagevec_space(&rqstp->rq_pvec)) > > > > __pagevec_release(&rqstp->rq_pvec); > > > > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst > > > > *rqstp, struct page *page) > > > > > > > > get_page(page); > > > > *(rqstp->rq_next_page++) = page; > > > > + return 0; > > > > } > > > > EXPORT_SYMBOL_GPL(svc_rqst_replace_page); > > > > > > > > -- > > > > 2.39.2 > > > > > > > > > > -- > > > Chuck Lever > > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> > > -- > Chuck Lever > >
> On Mar 17, 2023, at 2:59 PM, Jeff Layton <jlayton@kernel.org> wrote: > > On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote: >> >>> On Mar 17, 2023, at 2:04 PM, Jeff Layton <jlayton@kernel.org> wrote: >>> >>> On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote: >>>>> On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>> >>>>> If rq_next_page ends up pointing outside the array, then we can >>>>> corrupt >>>>> memory when we go to change its value. Ensure that it hasn't strayed >>>>> outside the array, and have svc_rqst_replace_page return -EIO >>>>> without >>>>> changing anything if it has. >>>>> >>>>> Fix up nfsd_splice_actor (the only caller) to handle this case by >>>>> either >>>>> returning an error or a short splice when this happens. >>>> >>>> IMO it's not worth the extra complexity to return a short splice. >>>> This is a "should never happen" scenario in a hot I/O path. Let's >>>> keep this code as simple as possible, and use unlikely() for the >>>> error cases in both nfsd_splice_actor and svc_rqst_replace_page(). >>>> >>> >>> Are there any issues with just returning an error even though we have >>> successfully spliced some of the data into the buffer? I guess the >>> caller will just see an EIO or whatever instead of the short read in >>> that case? >> >> NFSv4 READ is probably going to truncate the XDR buffer. I'm not >> sure NFSv3 is so clever, so you should test it. > > Honestly, I don't have the cycles to do that sort of fault injection > testing for this. nfsd_splice_actor() has never returned an error, so IMO it is necessary to confirm that when svc_rqst_replace_page() returns an error, it doesn't create further problems. I don't see how we can avoid some kind of simple fault injection while developing the fix. Tell you what, I can take it from here if you'd like. > If you think handling it as a short read is overblown, > then tell me what you would like see here. It's not the short reads that bugs me, it's the additional code in a hot path that is worrisome. -- Chuck Lever
On Fri, 2023-03-17 at 20:55 +0000, Chuck Lever III wrote: > > > On Mar 17, 2023, at 2:59 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote: > > > > > > > On Mar 17, 2023, at 2:04 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote: > > > > > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > > > If rq_next_page ends up pointing outside the array, then we can > > > > > > corrupt > > > > > > memory when we go to change its value. Ensure that it hasn't strayed > > > > > > outside the array, and have svc_rqst_replace_page return -EIO > > > > > > without > > > > > > changing anything if it has. > > > > > > > > > > > > Fix up nfsd_splice_actor (the only caller) to handle this case by > > > > > > either > > > > > > returning an error or a short splice when this happens. > > > > > > > > > > IMO it's not worth the extra complexity to return a short splice. > > > > > This is a "should never happen" scenario in a hot I/O path. Let's > > > > > keep this code as simple as possible, and use unlikely() for the > > > > > error cases in both nfsd_splice_actor and svc_rqst_replace_page(). > > > > > > > > > > > > > Are there any issues with just returning an error even though we have > > > > successfully spliced some of the data into the buffer? I guess the > > > > caller will just see an EIO or whatever instead of the short read in > > > > that case? > > > > > > NFSv4 READ is probably going to truncate the XDR buffer. I'm not > > > sure NFSv3 is so clever, so you should test it. > > > > Honestly, I don't have the cycles to do that sort of fault injection > > testing for this. > > nfsd_splice_actor() has never returned an error, so IMO it is > necessary to confirm that when svc_rqst_replace_page() returns > an error, it doesn't create further problems. I don't see how > we can avoid some kind of simple fault injection while developing > the fix. > > Tell you what, I can take it from here if you'd like. > Sure! All yours! > > > If you think handling it as a short read is overblown, > > then tell me what you would like see here. > > It's not the short reads that bugs me, it's the additional > code in a hot path that is worrisome. > I wouldn't think tracking some extra stuff on the stack would show up much in profiles, but it's your call.
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 97b38b47c563..0ebd7a65a9f0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct page *page = buf->page; // may be a compound one unsigned offset = buf->offset; struct page *last_page; + int ret = 0, consumed = 0; last_page = page + (offset + sd->len - 1) / PAGE_SIZE; for (page += offset / PAGE_SIZE; page <= last_page; page++) { @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, * Skip page replacement when extending the contents * of the current page. */ - if (page != *(rqstp->rq_next_page - 1)) - svc_rqst_replace_page(rqstp, page); + if (page != *(rqstp->rq_next_page - 1)) { + ret = svc_rqst_replace_page(rqstp, page); + if (ret) + break; + } + consumed += min_t(int, + PAGE_SIZE - offset_in_page(offset), + sd->len - consumed); + offset = 0; } - if (rqstp->rq_res.page_len == 0) // first call - rqstp->rq_res.page_base = offset % PAGE_SIZE; - rqstp->rq_res.page_len += sd->len; - return sd->len; + if (consumed) { + if (rqstp->rq_res.page_len == 0) // first call + rqstp->rq_res.page_base = offset % PAGE_SIZE; + rqstp->rq_res.page_len += consumed; + return consumed; + } + return ret; } static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 877891536c2f..9ea52f143f49 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, int (*threadfn)(void *data)); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); -void svc_rqst_replace_page(struct svc_rqst *rqstp, +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index fea7ce8fba14..d624c02f09be 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads); * When replacing a page in rq_pages, batch the release of the * replaced pages to avoid hammering the page allocator. */ -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) { + struct page **begin, **end; + + /* + * Bounds check: make sure rq_next_page points into the rq_respages + * part of the array. + */ + begin = rqstp->rq_pages; + end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) + return -EIO; + if (*rqstp->rq_next_page) { if (!pagevec_space(&rqstp->rq_pvec)) __pagevec_release(&rqstp->rq_pvec); @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) get_page(page); *(rqstp->rq_next_page++) = page; + return 0; } EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
If rq_next_page ends up pointing outside the array, then we can corrupt memory when we go to change its value. Ensure that it hasn't strayed outside the array, and have svc_rqst_replace_page return -EIO without changing anything if it has. Fix up nfsd_splice_actor (the only caller) to handle this case by either returning an error or a short splice when this happens. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/vfs.c | 23 +++++++++++++++++------ include/linux/sunrpc/svc.h | 2 +- net/sunrpc/svc.c | 14 +++++++++++++- 3 files changed, 31 insertions(+), 8 deletions(-)