Message ID | 1378217781-27942-1-git-send-email-dominique.martinet@cea.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/03/13 16:24, Eric Van Hensbergen wrote: > Interesting, do me a favor and cross-post to at least linux-fsdevel -- > there have been some strong objections from the rest of the community on > similar approaches in the past, and I'd rather we encounter them earlier > rather than when I try to push the patch upstream. The lack of > consistency is pretty nasty, we'll need a big disclaimer in the > documentation if we decide to take it forward. You may also want to > cross-check some of my earlier efforts (back in 2.6.14 timeframe IIRC) > if you haven't already. I had some hackery to try and force things > through sooner. Right, sorry about this, reposted it to everyone right now. > FWIW, the right approach is probably to try and go for some more > intelligent caching mechanism based on revokable leases so that the > server can force a flush on a client if necessary -- but I understand > that's a pretty big nut to crack and if this solves for your use case > perhaps its the right start. I tried to enable cached read/writes to keep the original behaviour, but it pretty quickly turns back into a cache=loose without metadata behaviour and I wanted to avoid this - the less cached the better as far as I'm concerned. I'm also working on a 9P server (nfs-ganesha, https://github.com/nfs-ganesha/nfs-ganesha ), so breaking the server definitely isn't out of question. Something based on lock/getlock might be possible but I don't think there is anything in the protocol that would make them revokable, especially since every message originates from the client... If anything, I'd rather check before reading or writing if we have dirty pages to avoid inconsistencies with other filesystems.
On Tue, Sep 3, 2013 at 9:44 AM, MARTINET Dominique < dominique.martinet@cea.fr> wrote: > > FWIW, the right approach is probably to try and go for some more >> intelligent caching mechanism based on revokable leases so that the >> server can force a flush on a client if necessary -- but I understand >> that's a pretty big nut to crack and if this solves for your use case >> perhaps its the right start. >> > > I tried to enable cached read/writes to keep the original behaviour, but > it pretty quickly turns back into a cache=loose without metadata behaviour > and I wanted to avoid this - the less cached the better as far as I'm > concerned. > > I'm also working on a 9P server (nfs-ganesha, https://github.com/nfs-** > ganesha/nfs-ganesha <https://github.com/nfs-ganesha/nfs-ganesha> ), so > breaking the server definitely isn't out of question. > Something based on lock/getlock might be possible but I don't think there > is anything in the protocol that would make them revokable, especially > since every message originates from the client... > If anything, I'd rather check before reading or writing if we have dirty > pages to avoid inconsistencies with other filesystems. An approach which has been suggested in the past is to use a 'posted read' to a synthetic file on the file server which could be used as a communications channel to receive flush/invalidate requests and otherwise help manage the cache. Some of this was explored by Sape Mullender in recent 9p follow-on work called Pepys and there was also some efforts by the lsub.org folks in Madrid on a 9p follow-on protocol named IX which tried to be more aggressive in caching. Both might be good resources for inspiration. -eric ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
Dominique Martinet <dominique.martinet@cea.fr> writes: > - Add cache=mmap option > - Try to keep most operations synchronous > > Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> > --- > > Hi, > > I brought up the issue of mmap being read-only with cache=none a bit > ago, and worked on it since. > > Here's what this patch does: > - Add an option cache=mmap (or just mmap, like other caches), nothing > changes for default or other caches. > The following assumes cache=mmap. > - fills in writeback_fid on open (this could be done in > v9fs_file_mmap since we don't enable buffered write either, but I > wanted to keep the code shared with other caches, so we just have > useless writeback fids most of the time) > - flush mmap data synchronously on munmap (vma->vm_op->close), this > is because we don't want to use the cache for read, so we need the > modifications to take place immediately. I'm not quite sure why I had > to force it, but it's lost otherwise (with buffered reads, you'd read > from the mmap pages anyway and it wouldn't matter if it's flushed > later - this doesn't really work for this case) Can you explain why cache=loose doesn't work and why we need this alternate approach ? -aneesh ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
Hi, Aneesh Kumar K.V wrote on Sun, Sep 08, 2013 : > Can you explain why cache=loose doesn't work and why we need this > alternate approach ? I'm not sure what happens, I suppose it would need more investigation, but on a 9p mountpoint with cache=loose (or fscache) I get "Numerical result out of range" errors all around after a few tests run (I'm using an homebrew test suite, but basically a directory with more than 512 files in it just doesn't breaks the whole mount point... Have to umount/remount to clear the error) The other reason is that the approach is quite different, even for the actual data. cache=loose relies on page cache for all reads and writes, not only mmap, and for various reasons I'd like to keep the memory footprint as small as possible even if this costs more actual read/write operations. The server will have caching enabled and its filesystem accesses should be working out that way - this might not always be the best way to go, but half of it is politics and I don't want to deal with that :) Anyway, I definitely don't want a metadata cache on a distributed system if there's no cache coherence mechanisms, so I'm afraid cache=loose or even fscache aren't going to cut it on this end. Disabling metadata cache completly at least guarantees the information is up to date, and as long as the server has cache and the network keeps up it's reasonably costly compared to what cache validation could be Thanks,
On Sun, Sep 8, 2013 at 12:28 PM, Dominique Martinet < dominique.martinet@cea.fr> wrote: > I'm not sure what happens, I suppose it would need more investigation, > but on a 9p mountpoint with cache=loose (or fscache) I get "Numerical > result out of range" errors all around after a few tests run (I'm using > an homebrew test suite, but basically a directory with more than 512 > files in it just doesn't breaks the whole mount point... Have to > umount/remount to clear the error) > That sounds like a corner case bug we need to track down and fix. I'll add it to the queue. However, that doesn't invalidate your other reasons for the approach you took. -eric ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
On 09/08/13 21:25, Eric Van Hensbergen wrote: > On Sun, Sep 8, 2013 at 12:28 PM, Dominique Martinet > dominique.martinet@cea.fr wrote: > I'm not sure what happens, I suppose it would need more investigation, > but on a 9p mountpoint with cache=loose (or fscache) I get "Numerical > result out of range" errors all around after a few tests run (I'm using > an homebrew test suite, but basically a directory with more than 512 > files in it just doesn't breaks the whole mount point... Have to > umount/remount to clear the error) > > > That sounds like a corner case bug we need to track down and fix. I'll > add it to the queue. Sounds like I should run my own tests on my own patches as well, this happens with my rfc patch as well :) Apparently some writeback fids aren't clunked properly, so a server with a maximum number of fids limitation with reply with RERROR(ERANGE) quickly enough. I'll look into that and offer a patch shortly. (It's pretty obvious with a wonder-verbose debug flag, simply running "touch /mnt/non-existant-file" will show more walks than clunk)
On 09/09/13 14:34, MARTINET Dominique wrote: > Sounds like I should run my own tests on my own patches as well, this > happens with my rfc patch as well :) > Apparently some writeback fids aren't clunked properly, so a server with > a maximum number of fids limitation with reply with RERROR(ERANGE) > quickly enough. Actually looks like I forgot an "if (v9ses->cache)" that ought to have been changed to "if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)" (for dirent operations), so the issue isn't present in cache=mmap as expected once that's fixed (and writeback fids are clunked properly) The issue would appear to be that the client's cache can hold more inodes than the server is configured to accept (it's set pretty low to 1024 fids at the moment), and that writeback fids are only clunked when the inode is evicted from the cache (and maybe the actual fid as well) I however am not quite sure on how to fix that... Raising the limit on the server would be an obvious first step, but is there a (configurable?) limit on how many entries the cache can hold? I'll post a v2 for cache=mmap after I fix some more details (I'd particularly like to flush only the concerned cache pages on vma close).
On Mon, Sep 9, 2013 at 8:35 AM, MARTINET Dominique < dominique.martinet@cea.fr> wrote: > The issue would appear to be that the client's cache can hold more inodes > than the server is configured to accept (it's set pretty low to 1024 fids > at the moment), and that writeback fids are only clunked when the inode is > evicted from the cache (and maybe the actual fid as well) > > I however am not quite sure on how to fix that... Raising the limit on the > server would be an obvious first step, but is there a (configurable?) limit > on how many entries the cache can hold? > Makes sense, cache=loose was really mostly a hack intended to see what the upper bound on a cache would deliver with v9fs. A limit would make sense, a negotiated limit with the server might make even more sense (but would require protocol extensions). I suppose the client can "figure this out" once it gets error messages back and then start releasing cached entries in order to retry requests -- but that seems a bit ugly. Seems like the "right" way really would be to go back an architect a proper cache model and incorporate it into the protocol. -eric ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 58e6cbc..1a2b0c8 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -56,7 +56,7 @@ enum { /* Options that take no arguments */ Opt_nodevmap, /* Cache options */ - Opt_cache_loose, Opt_fscache, + Opt_cache_loose, Opt_fscache, Opt_mmap, /* Access options */ Opt_access, Opt_posixacl, /* Error token */ @@ -74,6 +74,7 @@ static const match_table_t tokens = { {Opt_cache, "cache=%s"}, {Opt_cache_loose, "loose"}, {Opt_fscache, "fscache"}, + {Opt_mmap, "mmap"}, {Opt_cachetag, "cachetag=%s"}, {Opt_access, "access=%s"}, {Opt_posixacl, "posixacl"}, @@ -91,6 +92,9 @@ static int get_cache_mode(char *s) } else if (!strcmp(s, "fscache")) { version = CACHE_FSCACHE; p9_debug(P9_DEBUG_9P, "Cache mode: fscache\n"); + } else if (!strcmp(s, "mmap")) { + version = CACHE_MMAP; + p9_debug(P9_DEBUG_9P, "Cache mode: mmap\n"); } else if (!strcmp(s, "none")) { version = CACHE_NONE; p9_debug(P9_DEBUG_9P, "Cache mode: none\n"); @@ -220,6 +224,9 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts) case Opt_fscache: v9ses->cache = CACHE_FSCACHE; break; + case Opt_mmap: + v9ses->cache = CACHE_MMAP; + break; case Opt_cachetag: #ifdef CONFIG_9P_FSCACHE v9ses->cachetag = match_strdup(&args[0]); diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index a8e127c..099c771 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -64,6 +64,7 @@ enum p9_session_flags { enum p9_cache_modes { CACHE_NONE, + CACHE_MMAP, CACHE_LOOSE, CACHE_FSCACHE, }; diff --git a/fs/9p/v9fs_vfs.h b/fs/9p/v9fs_vfs.h index dc95a25..b83ebfb 100644 --- a/fs/9p/v9fs_vfs.h +++ b/fs/9p/v9fs_vfs.h @@ -50,6 +50,8 @@ extern const struct dentry_operations v9fs_dentry_operations; extern const struct dentry_operations v9fs_cached_dentry_operations; extern const struct file_operations v9fs_cached_file_operations; extern const struct file_operations v9fs_cached_file_operations_dotl; +extern const struct file_operations v9fs_mmap_file_operations; +extern const struct file_operations v9fs_mmap_file_operations_dotl; extern struct kmem_cache *v9fs_inode_cache; struct inode *v9fs_alloc_inode(struct super_block *sb); diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 9ff073f..c71e886 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -202,6 +202,8 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc) { int retval; + p9_debug(P9_DEBUG_VFS, "page %p\n", page); + retval = v9fs_vfs_writepage_locked(page); if (retval < 0) { if (retval == -EAGAIN) { @@ -282,6 +284,9 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping, pgoff_t index = pos >> PAGE_CACHE_SHIFT; struct inode *inode = mapping->host; + + p9_debug(P9_DEBUG_VFS, "filp %p, mapping %p\n", filp, mapping); + v9inode = V9FS_I(inode); start: page = grab_cache_page_write_begin(mapping, index, flags); @@ -312,6 +317,8 @@ static int v9fs_write_end(struct file *filp, struct address_space *mapping, loff_t last_pos = pos + copied; struct inode *inode = page->mapping->host; + p9_debug(P9_DEBUG_VFS, "filp %p, mapping %p\n", filp, mapping); + if (unlikely(copied < len)) { /* * zero out the rest of the area diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index d384a8b..666a08e 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -45,6 +45,7 @@ #include "cache.h" static const struct vm_operations_struct v9fs_file_vm_ops; +static const struct vm_operations_struct v9fs_mmap_file_vm_ops; /** * v9fs_file_open - open a file (or directory) @@ -106,7 +107,9 @@ int v9fs_file_open(struct inode *inode, struct file *file) } mutex_unlock(&v9inode->v_mutex); #ifdef CONFIG_9P_FSCACHE - if (v9ses->cache) + /* previous check would also set cookie with CACHE_LOOSE? + * set_cookie does a check if v9inode->fscache anyway... */ + if (v9ses->cache == CACHE_FSCACHE) v9fs_cache_inode_set_cookie(inode, file); #endif return 0; @@ -583,17 +586,33 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end, } static int -v9fs_file_mmap(struct file *file, struct vm_area_struct *vma) +v9fs_file_mmap(struct file *filp, struct vm_area_struct *vma) { int retval; - retval = generic_file_mmap(file, vma); + + retval = generic_file_mmap(filp, vma); if (!retval) vma->vm_ops = &v9fs_file_vm_ops; return retval; } +/* TODO: merge this with the previous function + * and set the right vm_ops depending on v9ses->cache */ +static int +v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma) +{ + int retval; + + + retval = generic_file_mmap(filp, vma); + if (!retval) + vma->vm_ops = &v9fs_mmap_file_vm_ops; + + return retval; +} + static int v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { @@ -662,6 +681,22 @@ v9fs_cached_file_read(struct file *filp, char __user *data, size_t count, return do_sync_read(filp, data, count, offset); } +/** + * v9fs_mmap_file_read - read from a file + * @filp: file pointer to read + * @udata: user data buffer to read data into + * @count: size of buffer + * @offset: offset at which to read data + * + */ +static ssize_t +v9fs_mmap_file_read(struct file *filp, char __user *data, size_t count, + loff_t *offset) +{ + /* TODO: Check if there are dirty pages */ + return v9fs_file_read(filp, data, count, offset); +} + static ssize_t v9fs_direct_write(struct file *filp, const char __user * data, size_t count, loff_t *offsetp) @@ -732,12 +767,43 @@ v9fs_cached_file_write(struct file *filp, const char __user * data, return do_sync_write(filp, data, count, offset); } + +/** + * v9fs_mmap_file_write - write to a file + * @filp: file pointer to write + * @data: data buffer to write data from + * @count: size of buffer + * @offset: offset at which to write data + * + */ +static ssize_t +v9fs_mmap_file_write(struct file *filp, const char __user *data, + size_t count, loff_t *offset) +{ + /* TODO: invalidate mmaps on filp's inode between offset and offset+count */ + return v9fs_file_write(filp, data, count, offset); +} + +static void v9fs_mmap_vm_close(struct vm_area_struct *vma) +{ + p9_debug(P9_DEBUG_VFS, "9p VMA close, %p, flushing", vma); + write_inode_now(file_inode(vma->vm_file), 1); +} + + static const struct vm_operations_struct v9fs_file_vm_ops = { .fault = filemap_fault, .page_mkwrite = v9fs_vm_page_mkwrite, .remap_pages = generic_file_remap_pages, }; +static const struct vm_operations_struct v9fs_mmap_file_vm_ops = { + .close = v9fs_mmap_vm_close, + .fault = filemap_fault, + .page_mkwrite = v9fs_vm_page_mkwrite, + .remap_pages = generic_file_remap_pages, +}; + const struct file_operations v9fs_cached_file_operations = { .llseek = generic_file_llseek, @@ -788,3 +854,26 @@ const struct file_operations v9fs_file_operations_dotl = { .mmap = generic_file_readonly_mmap, .fsync = v9fs_file_fsync_dotl, }; + +const struct file_operations v9fs_mmap_file_operations = { + .llseek = generic_file_llseek, + .read = v9fs_mmap_file_read, + .write = v9fs_mmap_file_write, + .open = v9fs_file_open, + .release = v9fs_dir_release, + .lock = v9fs_file_lock, + .mmap = v9fs_mmap_file_mmap, + .fsync = v9fs_file_fsync, +}; + +const struct file_operations v9fs_mmap_file_operations_dotl = { + .llseek = generic_file_llseek, + .read = v9fs_mmap_file_read, + .write = v9fs_mmap_file_write, + .open = v9fs_file_open, + .release = v9fs_dir_release, + .lock = v9fs_file_lock_dotl, + .flock = v9fs_file_flock_dotl, + .mmap = v9fs_mmap_file_mmap, + .fsync = v9fs_file_fsync_dotl, +}; diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 25b018e..d52179c 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -299,15 +299,20 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses, case S_IFREG: if (v9fs_proto_dotl(v9ses)) { inode->i_op = &v9fs_file_inode_operations_dotl; - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) inode->i_fop = &v9fs_cached_file_operations_dotl; + else if (v9ses->cache == CACHE_MMAP) + inode->i_fop = &v9fs_mmap_file_operations_dotl; else inode->i_fop = &v9fs_file_operations_dotl; } else { inode->i_op = &v9fs_file_inode_operations; - if (v9ses->cache) - inode->i_fop = &v9fs_cached_file_operations; + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) + inode->i_fop = + &v9fs_cached_file_operations; + else if (v9ses->cache == CACHE_MMAP) + inode->i_fop = &v9fs_mmap_file_operations; else inode->i_fop = &v9fs_file_operations; } @@ -816,7 +821,7 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry, * unlink. For cached mode create calls request for new * inode. But with cache disabled, lookup should do this. */ - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb); else inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb); @@ -906,7 +911,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry, file->private_data = fid; #ifdef CONFIG_9P_FSCACHE - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) v9fs_cache_inode_set_cookie(dentry->d_inode, file); #endif @@ -1485,7 +1490,7 @@ int v9fs_refresh_inode(struct p9_fid *fid, struct inode *inode) */ i_size = inode->i_size; v9fs_stat2inode(st, inode, inode->i_sb); - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) inode->i_size = i_size; spin_unlock(&inode->i_lock); out: diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c index 53687bb..710d5c6 100644 --- a/fs/9p/vfs_inode_dotl.c +++ b/fs/9p/vfs_inode_dotl.c @@ -362,7 +362,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry, goto err_clunk_old_fid; file->private_data = ofid; #ifdef CONFIG_9P_FSCACHE - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) v9fs_cache_inode_set_cookie(inode, file); #endif *opened |= FILE_CREATED; @@ -725,7 +725,7 @@ v9fs_vfs_symlink_dotl(struct inode *dir, struct dentry *dentry, } v9fs_invalidate_inode_attr(dir); - if (v9ses->cache) { + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) { /* Now walk from the parent so we can get an unopened fid. */ fid = p9_client_walk(dfid, 1, &name, 1); if (IS_ERR(fid)) { @@ -983,7 +983,7 @@ int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode) */ i_size = inode->i_size; v9fs_stat2inode_dotl(st, inode); - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) inode->i_size = i_size; spin_unlock(&inode->i_lock); out: diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index 2756dcd..60e5763 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -282,7 +282,7 @@ static int v9fs_drop_inode(struct inode *inode) { struct v9fs_session_info *v9ses; v9ses = v9fs_inode2v9ses(inode); - if (v9ses->cache) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) return generic_drop_inode(inode); /* * in case of non cached mode always drop the @@ -325,10 +325,11 @@ static int v9fs_write_inode_dotl(struct inode *inode, * send an fsync request to server irrespective of * wbc->sync_mode. */ - p9_debug(P9_DEBUG_VFS, "%s: inode %p\n", __func__, inode); v9inode = V9FS_I(inode); + p9_debug(P9_DEBUG_VFS, "%s: inode %p, writeback_fid %p\n", __func__, inode, v9inode->writeback_fid); if (!v9inode->writeback_fid) return 0; + ret = p9_client_fsync(v9inode->writeback_fid, 0); if (ret < 0) { __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
- Add cache=mmap option - Try to keep most operations synchronous Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> --- Hi, I brought up the issue of mmap being read-only with cache=none a bit ago, and worked on it since. Here's what this patch does: - Add an option cache=mmap (or just mmap, like other caches), nothing changes for default or other caches. The following assumes cache=mmap. - fills in writeback_fid on open (this could be done in v9fs_file_mmap since we don't enable buffered write either, but I wanted to keep the code shared with other caches, so we just have useless writeback fids most of the time) - flush mmap data synchronously on munmap (vma->vm_op->close), this is because we don't want to use the cache for read, so we need the modifications to take place immediately. I'm not quite sure why I had to force it, but it's lost otherwise (with buffered reads, you'd read from the mmap pages anyway and it wouldn't matter if it's flushed later - this doesn't really work for this case) What works: - msync(addr, length, MS_SYNC) (MS_SYNC might just be the default behaviour) does sync correctly; - munmap does a sync as well, obviously; - no msync/msync with MS_ASYNC will eventually flush when the kernel decides to. What doesn't: - If you want to read a file after writting in a mmap'd section with no msync (or msync with MS_ASYNC) and no munmap, you will read what the servers knows about -- so the old data. With buffered read, you'd expect the read to find about the map and use it, we don't. There is, however, what the documentation says so I don't see this as a bug. What could probably be done would be to check on read if we have dirty mapped pages and flush them before reading? I'm not too sure I like this idea. - Likewise, if you write with a map open, the map won't be updated unless something else invalidates it. I guess we should do that like the direct write function does. All in all, while I'm unhappy that I had to use write_inode_now() in vm_ops->close, I would guess it works for what it is (i.e. an unsafe feature on a networking filesystem); at least it seems better than returning ENOTSUPP for my own usage, given read mmap isn't much safer anyway, and that's why it's an option. There are a few questions/TODO marks and p9_debug lines that could go away, but figured it probably makes more sense to leave it there for an RFC patch. (Also a few lines over 80 characters I'll fix for the real deal) Comments more than welcome, thanks! fs/9p/v9fs.c | 9 ++++- fs/9p/v9fs.h | 1 + fs/9p/v9fs_vfs.h | 2 ++ fs/9p/vfs_addr.c | 7 ++++ fs/9p/vfs_file.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++-- fs/9p/vfs_inode.c | 17 +++++---- fs/9p/vfs_inode_dotl.c | 6 ++-- fs/9p/vfs_super.c | 5 +-- 8 files changed, 127 insertions(+), 15 deletions(-)