diff mbox series

fs: use KERNEL_DS instead of get_ds()

Message ID 20190301200835.18286-1-jannh@google.com (mailing list archive)
State New, archived
Headers show
Series fs: use KERNEL_DS instead of get_ds() | expand

Commit Message

Jann Horn March 1, 2019, 8:08 p.m. UTC
get_ds() is a legacy name for KERNEL_DS; all architectures #define it to
KERNEL_DS, and almost every user of set_fs() uses the name KERNEL_DS.

Let the VFS also use KERNEL_DS so that we can get rid of get_ds() at some
point.

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/read_write.c | 6 +++---
 fs/splice.c     | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Al Viro March 2, 2019, 3:40 a.m. UTC | #1
On Fri, Mar 01, 2019 at 09:08:35PM +0100, Jann Horn wrote:
> get_ds() is a legacy name for KERNEL_DS; all architectures #define it to
> KERNEL_DS,

not quite - h8300 and m68k have it as (equivalent) static inline.

> and almost every user of set_fs() uses the name KERNEL_DS.
> 
> Let the VFS also use KERNEL_DS so that we can get rid of get_ds() at some
> point.

No objections, but that kind of stuff might be better done in a different
way: ask Linus to run this

git grep -n -w get_ds | tr ':' ' ' | while read file line junk; do
	case "$file" in
	*include*|tools*)
		;;
	*)
		ed $file <<EOF
${line}s/get_ds()/KERNEL_DS/
w
q
EOF
		;;
	esac
done

just before -rc1, then follow it up with "kill unused get_ds()" patch
right after -rc1.
Linus Torvalds March 5, 2019, 12:23 a.m. UTC | #2
On Fri, Mar 1, 2019 at 7:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> No objections, but that kind of stuff might be better done in a different
> way: ask Linus to run this

Your script is disgusting, and I will not quote it for posterity for
that reason. I will just say that git has a "path exclusion" thing
that you can use to make it much more streamlined.

And I ended up going a bit further, and just got rid of it all in
commit 736706bee329 ("get rid of legacy 'get_ds()' function")

                  Linus
Christoph Hellwig March 8, 2019, 2:01 p.m. UTC | #3
On Mon, Mar 04, 2019 at 04:23:06PM -0800, Linus Torvalds wrote:
> Your script is disgusting, and I will not quote it for posterity for
> that reason. I will just say that git has a "path exclusion" thing
> that you can use to make it much more streamlined.
> 
> And I ended up going a bit further, and just got rid of it all in
> commit 736706bee329 ("get rid of legacy 'get_ds()' function")

Any chance we could just retire the legacy FS/DS names that are
horribly misleading these days?  E.g. turn the whole thing into:

	uaccess_kernel_enable();

	...

	uaccess_kernel_disable();

which for now turn into the existing calls with a nesting counter
in task_struct, with the hopes of cleaning all that mess up
eventually.
Al Viro March 8, 2019, 2:23 p.m. UTC | #4
On Fri, Mar 08, 2019 at 06:01:42AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 04, 2019 at 04:23:06PM -0800, Linus Torvalds wrote:
> > Your script is disgusting, and I will not quote it for posterity for
> > that reason. I will just say that git has a "path exclusion" thing
> > that you can use to make it much more streamlined.
> > 
> > And I ended up going a bit further, and just got rid of it all in
> > commit 736706bee329 ("get rid of legacy 'get_ds()' function")
> 
> Any chance we could just retire the legacy FS/DS names that are
> horribly misleading these days?  E.g. turn the whole thing into:
> 
> 	uaccess_kernel_enable();
> 
> 	...
> 
> 	uaccess_kernel_disable();
> 
> which for now turn into the existing calls with a nesting counter
> in task_struct, with the hopes of cleaning all that mess up
> eventually.

You do realize that nested pairs of that sort are not all there is?
Even leaving m68k aside (there the same registers that select
userland or kernel for that kind of access can be used e.g. for
writeback control, or to switch to accessing sun3 MMU tables, etc.)
there are
	* temporary switches to USER_DS in things like unaligned
access handlers, etc., where the kernel is doing emulation of possibly
userland insns; similar for oops code dumping, etc.
	* use_mm()/unuse_mm() should probably switch to USER_DS and
back, rather than doing that in callers.
	* switch to USER_DS (and no, it's *not* "USER_DS unless we started
with KERNEL_DS" - nested counter is no-go here) for perf callbacks.
	* regular non-paired switches to USER_DS: do_exit() and
flush_old_exec().
Christoph Hellwig March 8, 2019, 4:20 p.m. UTC | #5
On Fri, Mar 08, 2019 at 02:23:31PM +0000, Al Viro wrote:
> You do realize that nested pairs of that sort are not all there is?
> Even leaving m68k aside (there the same registers that select
> userland or kernel for that kind of access can be used e.g. for
> writeback control, or to switch to accessing sun3 MMU tables, etc.)

Yes.  And the whole point is to keep these uses clear and separate.

> there are
> 	* temporary switches to USER_DS in things like unaligned
> access handlers, etc., where the kernel is doing emulation of possibly
> userland insns; similar for oops code dumping, etc.
> 	* use_mm()/unuse_mm() should probably switch to USER_DS and
> back, rather than doing that in callers.
> 	* switch to USER_DS (and no, it's *not* "USER_DS unless we started
> with KERNEL_DS" - nested counter is no-go here) for perf callbacks.
> 	* regular non-paired switches to USER_DS: do_exit() and
> flush_old_exec().

And that is probably the close to full list of callers that want
to explicitly enable access to the user address space, and thus
mark the thread as a user thread (and occasionally clear that in e.g.
unuse_mm).

Unless I'm completely missing something our general rule of thumb
should be:

 - threads are started with uaccess kernel turned on (count = 1)
 - if we execute in userspace we switch to user uaccess (count = 0)
   - same for use_mm style threads that want user access
 - every current random kernel code override increments the refcount
   and drops the reference when done
 - force uaccess cases like do_exit or the validation check on
   return to userspace force it back to 0.

Initially each 1 > 0 transition (decrement or force) will do
set_fs(USER_DS), each 0 > 1 transition will do set_fs(KERNEL_DS).

Then later architectures can kill the set_fs API, and potentially
optimize things by getting rid of the addr_limit field in its current
form.
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index ff3c5e6f87cf..30df848b7451 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -426,7 +426,7 @@  ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 	ssize_t result;
 
 	old_fs = get_fs();
-	set_fs(get_ds());
+	set_fs(KERNEL_DS);
 	/* The cast to a user pointer is valid due to the set_fs() */
 	result = vfs_read(file, (void __user *)buf, count, pos);
 	set_fs(old_fs);
@@ -499,7 +499,7 @@  ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 		return -EINVAL;
 
 	old_fs = get_fs();
-	set_fs(get_ds());
+	set_fs(KERNEL_DS);
 	p = (__force const char __user *)buf;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
@@ -521,7 +521,7 @@  ssize_t kernel_write(struct file *file, const void *buf, size_t count,
 	ssize_t res;
 
 	old_fs = get_fs();
-	set_fs(get_ds());
+	set_fs(KERNEL_DS);
 	/* The cast to a user pointer is valid due to the set_fs() */
 	res = vfs_write(file, (__force const char __user *)buf, count, pos);
 	set_fs(old_fs);
diff --git a/fs/splice.c b/fs/splice.c
index de2ede048473..ec8b54b5c0d2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -357,7 +357,7 @@  static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
 	ssize_t res;
 
 	old_fs = get_fs();
-	set_fs(get_ds());
+	set_fs(KERNEL_DS);
 	/* The cast to a user pointer is valid due to the set_fs() */
 	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(old_fs);