diff mbox

linux-next: Tree for Nov 7

Message ID 20171113092006.cjw2njjukt6limvb@dhcp22.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko Nov. 13, 2017, 9:20 a.m. UTC
[Cc arm and ppc maintainers]

Thanks a lot for testing!

On Sun 12-11-17 11:38:02, Joel Stanley wrote:
> On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > Hi Joel,
> >
> > On Wed 08-11-17 15:20:50, Michal Hocko wrote:
> > [...]
> >> > There are a lot of messages on the way up that look like this:
> >> >
> >> > [    2.527460] Uhuuh, elf segement at 000d9000 requested but the
> >> > memory is mapped already
> >> > [    2.540160] Uhuuh, elf segement at 000d9000 requested but the
> >> > memory is mapped already
> >> > [    2.546153] Uhuuh, elf segement at 000d9000 requested but the
> >> > memory is mapped already
> >> >
> >> > And then trying to run userspace looks like this:
> >>
> >> Could you please run with debugging patch posted
> >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz
> >
> > Did you have chance to test with this debugging patch, please?
> 
> Lots of this:
> 
> [    1.177266] Uhuuh, elf segement at 000d9000 requested but the  memory is mapped already, got 000dd000
> [    1.177555] Clashing vma [dd000, de000] flags:100873 name:(null)

This smells like the problem I've expected that mmap with hint doesn't
respect the hint even though there is no clashing mapping. The above
basically says that we didn't map at 0xd9000 but it has placed it at
0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new
mapping. find_vma returns the closest vma (with addr < vm_end) for the
given address 0xd9000 so this address cannot be mapped by any other vma.

Now that I am looking at arm's arch_get_unmapped_area it does perform
aligning for shared vmas. We do not do that for MAP_FIXED.  Powepc,
reported earlier [1] seems to suffer from the similar problem.
slice_get_unmapped_area alignes to slices, whatever that means.

I can see two possible ways around that. Either we explicitly request
non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or
simply opt out from the MAP_FIXED protection via ifdefs. The first
option sounds more generic to me but also more tricky to not introduce
other user visible effects. The later is quite straightforward. What do
you think about the following on top of the previous patch?

It is rather terse and disables the MAP_FIXED protection for arm
comletely because I couldn't find a way to make it conditional on
CACHEID_VIPT_ALIASING. But this can be always handled later. I find the
protection for other archtectures useful enough to have this working for
most architectures now and handle others specially.

[1] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
---

Comments

Russell King (Oracle) Nov. 13, 2017, 9:34 a.m. UTC | #1
On Mon, Nov 13, 2017 at 10:20:06AM +0100, Michal Hocko wrote:
> [Cc arm and ppc maintainers]
> 
> Thanks a lot for testing!
> 
> On Sun 12-11-17 11:38:02, Joel Stanley wrote:
> > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > Hi Joel,
> > >
> > > On Wed 08-11-17 15:20:50, Michal Hocko wrote:
> > > [...]
> > >> > There are a lot of messages on the way up that look like this:
> > >> >
> > >> > [    2.527460] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> > [    2.540160] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> > [    2.546153] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> >
> > >> > And then trying to run userspace looks like this:
> > >>
> > >> Could you please run with debugging patch posted
> > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz
> > >
> > > Did you have chance to test with this debugging patch, please?
> > 
> > Lots of this:
> > 
> > [    1.177266] Uhuuh, elf segement at 000d9000 requested but the  memory is mapped already, got 000dd000
> > [    1.177555] Clashing vma [dd000, de000] flags:100873 name:(null)
> 
> This smells like the problem I've expected that mmap with hint doesn't
> respect the hint even though there is no clashing mapping. The above
> basically says that we didn't map at 0xd9000 but it has placed it at
> 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new
> mapping. find_vma returns the closest vma (with addr < vm_end) for the
> given address 0xd9000 so this address cannot be mapped by any other vma.
> 
> Now that I am looking at arm's arch_get_unmapped_area it does perform
> aligning for shared vmas. We do not do that for MAP_FIXED.  Powepc,
> reported earlier [1] seems to suffer from the similar problem.
> slice_get_unmapped_area alignes to slices, whatever that means.
> 
> I can see two possible ways around that. Either we explicitly request
> non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or
> simply opt out from the MAP_FIXED protection via ifdefs. The first
> option sounds more generic to me but also more tricky to not introduce
> other user visible effects. The later is quite straightforward. What do
> you think about the following on top of the previous patch?
> 
> It is rather terse and disables the MAP_FIXED protection for arm
> comletely because I couldn't find a way to make it conditional on
> CACHEID_VIPT_ALIASING. But this can be always handled later. I find the
> protection for other archtectures useful enough to have this working for
> most architectures now and handle others specially.

Can someone provide the background information for this please?
Michal Hocko Nov. 13, 2017, 2:11 p.m. UTC | #2
On Mon 13-11-17 10:20:06, Michal Hocko wrote:
> [Cc arm and ppc maintainers]
> 
> Thanks a lot for testing!
> 
> On Sun 12-11-17 11:38:02, Joel Stanley wrote:
> > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > Hi Joel,
> > >
> > > On Wed 08-11-17 15:20:50, Michal Hocko wrote:
> > > [...]
> > >> > There are a lot of messages on the way up that look like this:
> > >> >
> > >> > [    2.527460] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> > [    2.540160] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> > [    2.546153] Uhuuh, elf segement at 000d9000 requested but the
> > >> > memory is mapped already
> > >> >
> > >> > And then trying to run userspace looks like this:
> > >>
> > >> Could you please run with debugging patch posted
> > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz
> > >
> > > Did you have chance to test with this debugging patch, please?
> > 
> > Lots of this:
> > 
> > [    1.177266] Uhuuh, elf segement at 000d9000 requested but the  memory is mapped already, got 000dd000
> > [    1.177555] Clashing vma [dd000, de000] flags:100873 name:(null)
> 
> This smells like the problem I've expected that mmap with hint doesn't
> respect the hint even though there is no clashing mapping. The above
> basically says that we didn't map at 0xd9000 but it has placed it at
> 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new
> mapping. find_vma returns the closest vma (with addr < vm_end) for the
> given address 0xd9000 so this address cannot be mapped by any other vma.
> 
> Now that I am looking at arm's arch_get_unmapped_area it does perform
> aligning for shared vmas.

Sorry for confusion here. These are not shared mappings as pointed out
by Russell in a private email. I got confused by the above flags which I
have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags
obviously so the bit 0 is VM_READ. Sorry about the confusion. The real
reason we are doing the alignment is that we do a file mapping
	/*
	 * We only need to do colour alignment if either the I or D
	 * caches alias.
	 */
	if (aliasing)
		do_align = filp || (flags & MAP_SHARED);

I am not really familiar with this architecture to understand why do we
need aliasing for file mappings, though.
Russell King (Oracle) Nov. 13, 2017, 3:09 p.m. UTC | #3
On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote:
> On Mon 13-11-17 10:20:06, Michal Hocko wrote:
> > [Cc arm and ppc maintainers]
> > 
> > Thanks a lot for testing!
> > 
> > On Sun 12-11-17 11:38:02, Joel Stanley wrote:
> > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > Hi Joel,
> > > >
> > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote:
> > > > [...]
> > > >> > There are a lot of messages on the way up that look like this:
> > > >> >
> > > >> > [    2.527460] Uhuuh, elf segement at 000d9000 requested but the
> > > >> > memory is mapped already
> > > >> > [    2.540160] Uhuuh, elf segement at 000d9000 requested but the
> > > >> > memory is mapped already
> > > >> > [    2.546153] Uhuuh, elf segement at 000d9000 requested but the
> > > >> > memory is mapped already
> > > >> >
> > > >> > And then trying to run userspace looks like this:
> > > >>
> > > >> Could you please run with debugging patch posted
> > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz
> > > >
> > > > Did you have chance to test with this debugging patch, please?
> > > 
> > > Lots of this:
> > > 
> > > [    1.177266] Uhuuh, elf segement at 000d9000 requested but the  memory is mapped already, got 000dd000
> > > [    1.177555] Clashing vma [dd000, de000] flags:100873 name:(null)
> > 
> > This smells like the problem I've expected that mmap with hint doesn't
> > respect the hint even though there is no clashing mapping. The above
> > basically says that we didn't map at 0xd9000 but it has placed it at
> > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new
> > mapping. find_vma returns the closest vma (with addr < vm_end) for the
> > given address 0xd9000 so this address cannot be mapped by any other vma.
> > 
> > Now that I am looking at arm's arch_get_unmapped_area it does perform
> > aligning for shared vmas.
> 
> Sorry for confusion here. These are not shared mappings as pointed out
> by Russell in a private email. I got confused by the above flags which I
> have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags
> obviously so the bit 0 is VM_READ. Sorry about the confusion. The real
> reason we are doing the alignment is that we do a file mapping
> 	/*
> 	 * We only need to do colour alignment if either the I or D
> 	 * caches alias.
> 	 */
> 	if (aliasing)
> 		do_align = filp || (flags & MAP_SHARED);
> 
> I am not really familiar with this architecture to understand why do we
> need aliasing for file mappings, though.

I think it's there so that flush_dcache_page() works - possibly
get_user_pages() being used on a private mapping of page cache pages,
but that's guessing.

I'm afraid I don't remember all the details, this is code from around
15 years ago, and I'd be very nervous about changing it now without
fully understanding the issues.
Michal Hocko Nov. 13, 2017, 3:31 p.m. UTC | #4
On Mon 13-11-17 15:09:09, Russell King - ARM Linux wrote:
> On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote:
> > On Mon 13-11-17 10:20:06, Michal Hocko wrote:
> > > [Cc arm and ppc maintainers]
> > > 
> > > Thanks a lot for testing!
> > > 
> > > On Sun 12-11-17 11:38:02, Joel Stanley wrote:
> > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > > Hi Joel,
> > > > >
> > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote:
> > > > > [...]
> > > > >> > There are a lot of messages on the way up that look like this:
> > > > >> >
> > > > >> > [    2.527460] Uhuuh, elf segement at 000d9000 requested but the
> > > > >> > memory is mapped already
> > > > >> > [    2.540160] Uhuuh, elf segement at 000d9000 requested but the
> > > > >> > memory is mapped already
> > > > >> > [    2.546153] Uhuuh, elf segement at 000d9000 requested but the
> > > > >> > memory is mapped already
> > > > >> >
> > > > >> > And then trying to run userspace looks like this:
> > > > >>
> > > > >> Could you please run with debugging patch posted
> > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz
> > > > >
> > > > > Did you have chance to test with this debugging patch, please?
> > > > 
> > > > Lots of this:
> > > > 
> > > > [    1.177266] Uhuuh, elf segement at 000d9000 requested but the  memory is mapped already, got 000dd000
> > > > [    1.177555] Clashing vma [dd000, de000] flags:100873 name:(null)
> > > 
> > > This smells like the problem I've expected that mmap with hint doesn't
> > > respect the hint even though there is no clashing mapping. The above
> > > basically says that we didn't map at 0xd9000 but it has placed it at
> > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new
> > > mapping. find_vma returns the closest vma (with addr < vm_end) for the
> > > given address 0xd9000 so this address cannot be mapped by any other vma.
> > > 
> > > Now that I am looking at arm's arch_get_unmapped_area it does perform
> > > aligning for shared vmas.
> > 
> > Sorry for confusion here. These are not shared mappings as pointed out
> > by Russell in a private email. I got confused by the above flags which I
> > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags
> > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real
> > reason we are doing the alignment is that we do a file mapping
> > 	/*
> > 	 * We only need to do colour alignment if either the I or D
> > 	 * caches alias.
> > 	 */
> > 	if (aliasing)
> > 		do_align = filp || (flags & MAP_SHARED);
> > 
> > I am not really familiar with this architecture to understand why do we
> > need aliasing for file mappings, though.
> 
> I think it's there so that flush_dcache_page() works - possibly
> get_user_pages() being used on a private mapping of page cache pages,
> but that's guessing.

I fail to see how the mixure of MAP_FIXED and regular mapping of the
same file work then, but as I've said I really do not understand this
code.

> I'm afraid I don't remember all the details, this is code from around
> 15 years ago, and I'd be very nervous about changing it now without
> fully understanding the issues.

Ohh, absolutely! I didn't dare to touch this code and that's why I took
the easy way and simply opt-out from the harding for all those archs
that are basically sharing this pattern. But after a closer look it
seems that we can really introduce MAP_FIXED_SAFE that would keep the
arch mmap code intact yet we would get the hardening for all archs.
It would allow also allow a safer MAP_FIXED semantic for userspace.
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb15067e..018d041a30e6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -99,6 +99,7 @@  config ARM
 	select PERF_USE_VMALLOC
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
+	select ARCH_ALIGNED_MMAPS
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
 	help
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 2f629e0551e9..156f69c09c7f 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -368,6 +368,7 @@  config PPC_MM_SLICES
 	bool
 	default y if PPC_STD_MMU_64
 	default n
+	select ARCH_ALIGNED_MMAPS
 
 config PPC_HAVE_PMU_SUPPORT
        bool
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a22718de42db..d23eb89f31c0 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -345,13 +345,19 @@  static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
 		unsigned long size, int prot, int type, unsigned long off)
 {
 	unsigned long map_addr;
+	unsigned long map_type = type;
 
 	/*
 	 * If caller requests the mapping at a specific place, make sure we fail
 	 * rather than potentially clobber an existing mapping which can have
-	 * security consequences (e.g. smash over the stack area).
+	 * security consequences (e.g. smash over the stack area). Be careful
+	 * about architectures which do not respect the address hint due to
+	 * aligning restrictions for !fixed mappings.
 	 */
-	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
+	if (!IS_ENABLED(ARCH_ALIGNED_MMAPS))
+		map_type &= ~MAP_FIXED;
+
+	map_addr = vm_mmap(filep, addr, size, prot, map_type, off);
 	if (BAD_ADDR(map_addr))
 		return map_addr;