Message ID | CAGXu5j+o6cf0PrnxD+W+09F7UY6JL+h7Bn5_4T2C0Z_fTZAx-w@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote: > > Linus, what do you have in mind for the root-only "yes we really need > the actual address output" exceptions? I am convinced that absolutely none of them should use '%pK'. So far we have actually never seen a valid case wher %pK was really the right thing to do. > For example, right now /sys/kernel/debug/kernel_page_tables > (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x. So I think it could continue to use %x, and just make sure the whole file is root-only. And that is why %pK is so wrong. It's almost never really about root. Look at /proc/kallsyms, for example. There it's mainly about kernel profiles (although there certainly have been other uses historically, and maybe some of them remain) - which we have another flag for entirely that is very much specifically about kernel profiles. > Looking other places that stand out, it seems like > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of > %p usage. It's unclear to me if a hash is sufficient for meaningful > debugging there? Maybe not, but that is also _so_ esoteric that I suspect the right fix is to just make it root-only readable. I've never used it, we should check with people who have. I get the feeling that this is purely for PeterZ debugging. The very first commit that introduced that code actually has a (FIXME: should go into debugfs) so I suspect it never should have been user-readable to begin with. I guess it makes some things easier, but it really is *very* different from things like profiling. Profiling you often *cannot* do as root - some things you profile really shouldn't be run as root, and might even refuse to do so. So requiring you to be root just to get a kernel profile is very bad. But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal. And I really suspect that's true of a _lot_ of these %p things that really want a pointer. It's not that they really want %pK, it's that they shouldn't have been visible to regular users in the first place. Things that *do* want a pointer and should be visible to regular users are things like oops messages etc, but even there we obviously generally want to use %pS/%pF when possible (but generally %x when not - things like register contents etc that *may* contain pointers). And if they really are visible to users - because you want to cross-correlate them for things like netstat - I think the hashing is the right thing to do both for root and for regular users. > Seems like these three from dmesg could be removed? > > [ 0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576 > arch/x86/realmode/init.c > > [ 0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944 > r8192 d30512 u524288 > mm/percpu.c > > [ 0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB) > mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff] > lib/swiotlb.c Yes, I think the solution for a lot of the random device discovery messages etc is to just remove them. They were likely useful when the code was new and untested, and just stayed around afterwards. Linus
On Tue, Nov 7, 2017 at 1:44 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote: >> >> Linus, what do you have in mind for the root-only "yes we really need >> the actual address output" exceptions? > > I am convinced that absolutely none of them should use '%pK'. > > So far we have actually never seen a valid case wher %pK was really > the right thing to do. One case to keep in mind (likely for the future, once we stabilize this current plan), is to consider the situation of "untrusted root users", in the case of locked down systems where modules are disabled, etc, etc. In those cases, there's a brighter line between kernel memory and uid 0. (And when we do start looking at that class of info leaks, we'll be faced with a possible mess from %x-but-root-only usage.) >> For example, right now /sys/kernel/debug/kernel_page_tables >> (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x. > > So I think it could continue to use %x, and just make sure the whole > file is root-only. Yup, sounds good. Looks like the emerging rules for displaying virtual addresses are: - don't use %p - especially don't use %p in dmesg - really, go use %pS/%pF - if you absolutely need an address, show it with %x and have the file be root-only What should the guidance be for physical addresses? They're sensitive too (especially since they're reachable via the physmap), but we've traditionally been leaving them in dmesg, etc. >> Looking other places that stand out, it seems like >> /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of >> %p usage. It's unclear to me if a hash is sufficient for meaningful >> debugging there? > > Maybe not, but that is also _so_ esoteric that I suspect the right fix > is to just make it root-only readable. Anyone running a "real" system with LOCKDEP is out of their mind. :) > I've never used it, we should check with people who have. I get the > feeling that this is purely for PeterZ debugging. > > The very first commit that introduced that code actually has a > > (FIXME: should go into debugfs) Peter, Paul, what do you think about relocating these files? (/proc/lockdep* into debugfs) > so I suspect it never should have been user-readable to begin with. I > guess it makes some things easier, but it really is *very* different > from things like profiling. I didn't make that clear in the original report: I was running this as root just to see what even the root leaks looked like (it was surprisingly small, IMO). Those files are already root-only. >> Seems like these three from dmesg could be removed? >> >> [ 0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576 >> arch/x86/realmode/init.c >> >> [ 0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944 >> r8192 d30512 u524288 >> mm/percpu.c >> >> [ 0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB) >> mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff] >> lib/swiotlb.c > > Yes, I think the solution for a lot of the random device discovery > messages etc is to just remove them. They were likely useful when the > code was new and untested, and just stayed around afterwards. Yup, sounds right. (Though my physical address question from above remains: should these also drop their physical address reports?) -Kees
On Tue, 7 Nov 2017 13:44:01 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Looking other places that stand out, it seems like > > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of > > %p usage. It's unclear to me if a hash is sufficient for meaningful > > debugging there? > > Maybe not, but that is also _so_ esoteric that I suspect the right fix > is to just make it root-only readable. Also note, I don't believe anyone should be running a LOCKDEP configured kernel in a production (secured) environment. As it adds quite a bit of overhead. It's something you run on test environments to make sure it doesn't detect any possible deadlocks. > > I've never used it, we should check with people who have. I get the > feeling that this is purely for PeterZ debugging. I've used it. But then again, I also debug lockdep ;-) > > The very first commit that introduced that code actually has a > > (FIXME: should go into debugfs) > > so I suspect it never should have been user-readable to begin with. I > guess it makes some things easier, but it really is *very* different > from things like profiling. Want me to whip up a patch to move the file? -- Steve > > Profiling you often *cannot* do as root - some things you profile > really shouldn't be run as root, and might even refuse to do so. So > requiring you to be root just to get a kernel profile is very bad. > > But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal. > >
Hi Kees, It seems I over looked your suggestions when submitting v4. My mistake. On Tue, Nov 07, 2017 at 01:22:13PM -0800, Kees Cook wrote: > On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Sun, Nov 5, 2017 at 9:19 PM, Tobin C. Harding <me@tobin.cc> wrote: > >> Currently we are leaking addresses from the kernel to user space. This > >> script is an attempt to find some of those leakages. Script parses > >> `dmesg` output and /proc and /sys files for hex strings that look like > >> kernel addresses. > > > > Lovely. This is great. It shows just how much totally pointless stuff > > we leak, and to normal users that really shouldn't need it. > > > > I had planned to wait for 4.15 to look at the printk hashing stuff > > etc, but this part I think I could/should merge early just because I > > think a lot of kernel developers will go "Why the f*ck would we expose > > that kernel address there?" > > > > The module sections stuff etc should likely be obviously root-only, > > although maybe I'm missing some tool that ends up using it and is > > useful to normal developers. > > > > And I'm thinking we could make kallsyms smarter too, and instead of > > depending on kptr_restrict that screws over things with much too big a > > hammer, we could make it take 'perf_event_paranoid' into account. I > > suspect that's the main user of kallsyms that would still be relevant > > to non-root. > > Linus, what do you have in mind for the root-only "yes we really need > the actual address output" exceptions? > > For example, right now /sys/kernel/debug/kernel_page_tables > (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x. > > Looking other places that stand out, it seems like > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of > %p usage. It's unclear to me if a hash is sufficient for meaningful > debugging there? > > Seems like these three from dmesg could be removed? > > [ 0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576 > arch/x86/realmode/init.c > > [ 0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944 > r8192 d30512 u524288 > mm/percpu.c > > [ 0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB) > mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff] > lib/swiotlb.c > > Tobin, some other feedback on v4... > > I find the output hard to parse. Instead of: > > [27527 lockdep_chains] [ffffffffb226c628] cgroup_mutex > > Could we have: > > 27527 /proc/lockdep_chains: [ffffffffb226c628] cgroup_mutex This is what I had during development, it becomes had to parse when the message contains ':' and also if the address is not contained in braces (I'm assuming '[ffffffffb226c628] cgroup_mutex' is the message). We could use your suggested format but replace the ':' character? > At the very least, getting the full file path is needed or might not > be clear where something lives. Current dev version includes this. > And for my kernels, I needed to exclude usbmon or the script would > hang (perhaps add a read timeout to the script to detect stalling > files?) Good idea, I'll add this. > diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl > index 282c0cc2bdea..a9b729c0a052 100644 > --- a/scripts/leaking_addresses.pl > +++ b/scripts/leaking_addresses.pl > @@ -70,6 +70,7 @@ my @skip_walk_dirs_any = ('self', > 'thread-self', > 'cwd', > 'fd', > + 'usbmon', > 'stderr', > 'stdin', > 'stdout'); > Added this. thanks. Again, sorry for missing this before v4. Tobin
On Tue, Nov 07, 2017 at 01:44:01PM -0800, Linus Torvalds wrote: > On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote: > > > > Linus, what do you have in mind for the root-only "yes we really need > > the actual address output" exceptions? > > I am convinced that absolutely none of them should use '%pK'. > > So far we have actually never seen a valid case wher %pK was really > the right thing to do. > > > For example, right now /sys/kernel/debug/kernel_page_tables > > (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x. > > So I think it could continue to use %x, and just make sure the whole > file is root-only. > > And that is why %pK is so wrong. It's almost never really about root. > > Look at /proc/kallsyms, for example. There it's mainly about kernel > profiles (although there certainly have been other uses historically, > and maybe some of them remain) - which we have another flag for > entirely that is very much specifically about kernel profiles. > > > Looking other places that stand out, it seems like > > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of > > %p usage. It's unclear to me if a hash is sufficient for meaningful > > debugging there? > > Maybe not, but that is also _so_ esoteric that I suspect the right fix > is to just make it root-only readable. > > I've never used it, we should check with people who have. I get the > feeling that this is purely for PeterZ debugging. > > The very first commit that introduced that code actually has a > > (FIXME: should go into debugfs) > > so I suspect it never should have been user-readable to begin with. I > guess it makes some things easier, but it really is *very* different > from things like profiling. > > Profiling you often *cannot* do as root - some things you profile > really shouldn't be run as root, and might even refuse to do so. So > requiring you to be root just to get a kernel profile is very bad. > > But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal. > > And I really suspect that's true of a _lot_ of these %p things that > really want a pointer. It's not that they really want %pK, it's that > they shouldn't have been visible to regular users in the first place. > > Things that *do* want a pointer and should be visible to regular users > are things like oops messages etc, but even there we obviously > generally want to use %pS/%pF when possible (but generally %x when not > - things like register contents etc that *may* contain pointers). This is opt-in, it means asking developers to do the right thing every time they think they need a pointer. If we hash %p as soon as someone has been bitten by it they will start using %x and sooner or later they will use %x exclusively (suggesting using %x for _really_ necessary addresses will only hasten the process). If the only real benefit of hashing %p is to clean up kruft why don't we add %pX and do tree wide substitution of all %p to %pX. People that get broken will change it back to %p and we will have half a chance of finding new abusers of %p down the track. If there is not a 'one size fits all' solution, as we have seen with kptr_restrict, then should we not be trying to make it easier for people to do the right thing _and_ easier for us to catch it when we do the wrong thing? There is already almost 30 000 users of %{x,X}, surely we don't want to promote more kernel address printing to be mixed in with all of that? > And if they really are visible to users - because you want to > cross-correlate them for things like netstat - I think the hashing is > the right thing to do both for root and for regular users. > > > Seems like these three from dmesg could be removed? > > > > [ 0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576 > > arch/x86/realmode/init.c > > > > [ 0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944 > > r8192 d30512 u524288 > > mm/percpu.c > > > > [ 0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB) > > mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff] > > lib/swiotlb.c > > Yes, I think the solution for a lot of the random device discovery > messages etc is to just remove them. They were likely useful when the > code was new and untested, and just stayed around afterwards. Is anyone actually going to do this? If the hashing gets done then these messages are not a risk, are _real_ kernel developers going to bother cleaning this up? Surely newbies are not going to get much love either if they submit patches that '[PATCH] remove unnecessary printk message'. thanks, Tobin.
On Tue, Nov 07, 2017 at 01:22:13PM -0800, Kees Cook wrote: > On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: [snip] > And for my kernels, I needed to exclude usbmon or the script would > hang (perhaps add a read timeout to the script to detect stalling > files?) Bother. I submitted the patch set without adding this suggestion. Even after apologizing for missing it. Better context switching required. thanks, Tobin.
On Tue, Nov 07, 2017 at 05:44:13PM -0500, Steven Rostedt wrote: > On Tue, 7 Nov 2017 13:44:01 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > Looking other places that stand out, it seems like > > > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of > > > %p usage. It's unclear to me if a hash is sufficient for meaningful > > > debugging there? > > > > Maybe not, but that is also _so_ esoteric that I suspect the right fix > > is to just make it root-only readable. > > Also note, I don't believe anyone should be running a LOCKDEP > configured kernel in a production (secured) environment. As it adds > quite a bit of overhead. It's something you run on test environments to > make sure it doesn't detect any possible deadlocks. > > > > > I've never used it, we should check with people who have. I get the > > feeling that this is purely for PeterZ debugging. > > I've used it. But then again, I also debug lockdep ;-) > > > > > The very first commit that introduced that code actually has a > > > > (FIXME: should go into debugfs) > > > > so I suspect it never should have been user-readable to begin with. I > > guess it makes some things easier, but it really is *very* different > > from things like profiling. > > Want me to whip up a patch to move the file? Fine by me; create /debug/lockdep/ for the 3 files or something like that. As to the actual addresses, they can be used to double check things are in fact the same object (in case of name collisions), are in static memory (as these things ought to be) etc.. But mostly they're not too important. And yes, as everybody says, LOCKDEP is debug tool; you run this on your (local) dev kernels, anything else it out of spec.
diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl index 282c0cc2bdea..a9b729c0a052 100644 --- a/scripts/leaking_addresses.pl +++ b/scripts/leaking_addresses.pl @@ -70,6 +70,7 @@ my @skip_walk_dirs_any = ('self', 'thread-self', 'cwd', 'fd', + 'usbmon', 'stderr', 'stdin', 'stdout');