Message ID | 20190112203816.85534-2-joel@joelfernandes.org (mailing list archive) |
---|---|
State | Mainlined |
Commit | ab3948f58ff841e51feb845720624665ef5b7ef3 |
Headers | show |
Series | Add a future write seal to memfd | expand |
On Sat, Jan 12, 2019 at 12:38 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > Android uses ashmem for sharing memory regions. We are looking forward to > migrating all usecases of ashmem to memfd so that we can possibly remove > the ashmem driver in the future from staging while also benefiting from > using memfd and contributing to it. Note staging drivers are also not ABI > and generally can be removed at anytime. > > One of the main usecases Android has is the ability to create a region and > mmap it as writeable, then add protection against making any "future" > writes while keeping the existing already mmap'ed writeable-region active. > This allows us to implement a usecase where receivers of the shared > memory buffer can get a read-only view, while the sender continues to > write to the buffer. See CursorWindow documentation in Android for more > details: > https://developer.android.com/reference/android/database/CursorWindow > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal. > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal > which prevents any future mmap and write syscalls from succeeding while > keeping the existing mmap active. > > A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week > where we don't need to modify core VFS structures to get the same > behavior of the seal. This solves several side-effects pointed by Andy. > self-tests are provided in later patch to verify the expected semantics. > > [1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/ > > [Thanks a lot to Andy for suggestions to improve code] > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > fs/hugetlbfs/inode.c | 2 +- > include/uapi/linux/fcntl.h | 1 + > mm/memfd.c | 3 ++- > mm/shmem.c | 25 ++++++++++++++++++++++--- > 4 files changed, 26 insertions(+), 5 deletions(-) Acked-by: John Stultz <john.stultz@linaro.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 53ea3cef526e..3daf471bbd92 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -557,7 +557,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) inode_lock(inode); /* protected by i_mutex */ - if (info->seals & F_SEAL_WRITE) { + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { inode_unlock(inode); return -EPERM; } diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6448cdd9a350..a2f8658f1c55 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -41,6 +41,7 @@ #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ #define F_SEAL_GROW 0x0004 /* prevent file from growing */ #define F_SEAL_WRITE 0x0008 /* prevent writes */ +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ /* (1U << 31) is reserved for signed error codes */ /* diff --git a/mm/memfd.c b/mm/memfd.c index 97264c79d2cd..650e65a46b9c 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) #define F_ALL_SEALS (F_SEAL_SEAL | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \ - F_SEAL_WRITE) + F_SEAL_WRITE | \ + F_SEAL_FUTURE_WRITE) static int memfd_add_seals(struct file *file, unsigned int seals) { diff --git a/mm/shmem.c b/mm/shmem.c index 6ece1e2fe76e..3c98cc9655b4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2125,6 +2125,24 @@ int shmem_lock(struct file *file, int lock, struct user_struct *user) static int shmem_mmap(struct file *file, struct vm_area_struct *vma) { + struct shmem_inode_info *info = SHMEM_I(file_inode(file)); + + if (info->seals & F_SEAL_FUTURE_WRITE) { + /* + * New PROT_WRITE and MAP_SHARED mmaps are not allowed when + * "future write" seal active. + */ + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE)) + return -EPERM; + + /* + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED + * read-only mapping, take care to not allow mprotect to revert + * protections. + */ + vma->vm_flags &= ~(VM_MAYWRITE); + } + file_accessed(file); vma->vm_ops = &shmem_vm_ops; if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) && @@ -2375,8 +2393,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping, pgoff_t index = pos >> PAGE_SHIFT; /* i_mutex is held by caller */ - if (unlikely(info->seals & (F_SEAL_WRITE | F_SEAL_GROW))) { - if (info->seals & F_SEAL_WRITE) + if (unlikely(info->seals & (F_SEAL_GROW | + F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) { + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) return -EPERM; if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) return -EPERM; @@ -2639,7 +2658,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq); /* protected by i_mutex */ - if (info->seals & F_SEAL_WRITE) { + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { error = -EPERM; goto out; }