Message ID | 20190425160913.1878-1-agruenba@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/2] iomap: Add a page_prepare callback | expand |
On Thu 25-04-19 18:09:12, Andreas Gruenbacher wrote: > Move the page_done callback into a separate iomap_page_ops structure and > add a page_prepare calback to be called before a page is written to. In > gfs2, we'll want to start a transaction in page_prepare and end it in > page_done, and other filesystems that implement data journaling will > require the same kind of mechanism. ... > @@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > if (fatal_signal_pending(current)) > return -EINTR; > > + if (page_ops) { > + status = page_ops->page_prepare(inode, pos, len, iomap); > + if (status) > + return status; > + } > + Looks OK for now I guess, although I'm not sure if later some fs won't need to get hold of the actual page in ->page_prepare() and then we will need to switch to ->page_prepare() returning the page to use. But let's leave that for a time when such fs wants to use iomap. > @@ -780,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, > ret = __iomap_write_end(inode, pos, len, copied, page, iomap); > } > > - if (iomap->page_done) > - iomap->page_done(inode, pos, copied, page, iomap); > + if (page_ops) > + page_ops->page_done(inode, pos, copied, page, iomap); Looking at the code now, this is actually flawed (preexisting problem): __iomap_write_end or generic_write_end() will release the page reference and so you cannot just pass it to ->page_done(). That is a potential use-after-free... Honza
On Fri, 26 Apr 2019 at 10:30, Jan Kara <jack@suse.cz> wrote: > > On Thu 25-04-19 18:09:12, Andreas Gruenbacher wrote: > > Move the page_done callback into a separate iomap_page_ops structure and > > add a page_prepare calback to be called before a page is written to. In > > gfs2, we'll want to start a transaction in page_prepare and end it in > > page_done, and other filesystems that implement data journaling will > > require the same kind of mechanism. > > ... > > > @@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > if (fatal_signal_pending(current)) > > return -EINTR; > > > > + if (page_ops) { > > + status = page_ops->page_prepare(inode, pos, len, iomap); > > + if (status) > > + return status; > > + } > > + > > Looks OK for now I guess, although I'm not sure if later some fs won't need > to get hold of the actual page in ->page_prepare() and then we will need to > switch to ->page_prepare() returning the page to use. But let's leave that > for a time when such fs wants to use iomap. Alright. > > @@ -780,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, > > ret = __iomap_write_end(inode, pos, len, copied, page, iomap); > > } > > > > - if (iomap->page_done) > > - iomap->page_done(inode, pos, copied, page, iomap); > > + if (page_ops) > > + page_ops->page_done(inode, pos, copied, page, iomap); > > Looking at the code now, this is actually flawed (preexisting problem): > __iomap_write_end or generic_write_end() will release the page reference > and so you cannot just pass it to ->page_done(). That is a potential > use-after-free... Ouch. I'm sending a fix. Thanks, Andreas
diff --git a/fs/iomap.c b/fs/iomap.c index 97cb9d486a7d..667a822ecb7d 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -665,6 +665,7 @@ static int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, struct page **pagep, struct iomap *iomap) { + const struct iomap_page_ops *page_ops = iomap->page_ops; pgoff_t index = pos >> PAGE_SHIFT; struct page *page; int status = 0; @@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, if (fatal_signal_pending(current)) return -EINTR; + if (page_ops) { + status = page_ops->page_prepare(inode, pos, len, iomap); + if (status) + return status; + } + page = grab_cache_page_write_begin(inode->i_mapping, index, flags); - if (!page) - return -ENOMEM; + if (!page) { + status = -ENOMEM; + goto no_page; + } if (iomap->type == IOMAP_INLINE) iomap_read_inline_data(inode, page, iomap); @@ -684,12 +693,16 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, status = __block_write_begin_int(page, pos, len, NULL, iomap); else status = __iomap_write_begin(inode, pos, len, page, iomap); + if (unlikely(status)) { unlock_page(page); put_page(page); page = NULL; iomap_write_failed(inode, pos, len); +no_page: + if (page_ops) + page_ops->page_done(inode, pos, 0, NULL, iomap); } *pagep = page; @@ -769,6 +782,7 @@ static int iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, struct page *page, struct iomap *iomap) { + const struct iomap_page_ops *page_ops = iomap->page_ops; int ret; if (iomap->type == IOMAP_INLINE) { @@ -780,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, ret = __iomap_write_end(inode, pos, len, copied, page, iomap); } - if (iomap->page_done) - iomap->page_done(inode, pos, copied, page, iomap); + if (page_ops) + page_ops->page_done(inode, pos, copied, page, iomap); if (ret < len) iomap_write_failed(inode, pos, len); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 0fefb5455bda..fd65f27d300e 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -53,6 +53,8 @@ struct vm_fault; */ #define IOMAP_NULL_ADDR -1ULL /* addr is not valid */ +struct iomap_page_ops; + struct iomap { u64 addr; /* disk offset of mapping, bytes */ loff_t offset; /* file offset of mapping, bytes */ @@ -63,12 +65,18 @@ struct iomap { struct dax_device *dax_dev; /* dax_dev for dax operations */ void *inline_data; void *private; /* filesystem private */ + const struct iomap_page_ops *page_ops; +}; - /* - * Called when finished processing a page in the mapping returned in - * this iomap. At least for now this is only supported in the buffered - * write path. - */ +/* + * Called before / after processing a page in the mapping returned in this + * iomap. At least for now, this is only supported in the buffered write path. + * When page_prepare returns 0, page_done is called as well + * (possibly with page == NULL). + */ +struct iomap_page_ops { + int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len, + struct iomap *iomap); void (*page_done)(struct inode *inode, loff_t pos, unsigned copied, struct page *page, struct iomap *iomap); };
Move the page_done callback into a separate iomap_page_ops structure and add a page_prepare calback to be called before a page is written to. In gfs2, we'll want to start a transaction in page_prepare and end it in page_done, and other filesystems that implement data journaling will require the same kind of mechanism. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> --- fs/iomap.c | 22 ++++++++++++++++++---- include/linux/iomap.h | 18 +++++++++++++----- 2 files changed, 31 insertions(+), 9 deletions(-)