Message ID | 20171113092006.cjw2njjukt6limvb@dhcp22.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Nov 13, 2017 at 10:20:06AM +0100, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, > reported earlier [1] seems to suffer from the similar problem. > slice_get_unmapped_area alignes to slices, whatever that means. > > I can see two possible ways around that. Either we explicitly request > non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or > simply opt out from the MAP_FIXED protection via ifdefs. The first > option sounds more generic to me but also more tricky to not introduce > other user visible effects. The later is quite straightforward. What do > you think about the following on top of the previous patch? > > It is rather terse and disables the MAP_FIXED protection for arm > comletely because I couldn't find a way to make it conditional on > CACHEID_VIPT_ALIASING. But this can be always handled later. I find the > protection for other archtectures useful enough to have this working for > most architectures now and handle others specially. Can someone provide the background information for this please?
On Mon 13-11-17 10:20:06, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. Sorry for confusion here. These are not shared mappings as pointed out by Russell in a private email. I got confused by the above flags which I have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags obviously so the bit 0 is VM_READ. Sorry about the confusion. The real reason we are doing the alignment is that we do a file mapping /* * We only need to do colour alignment if either the I or D * caches alias. */ if (aliasing) do_align = filp || (flags & MAP_SHARED); I am not really familiar with this architecture to understand why do we need aliasing for file mappings, though.
On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > [Cc arm and ppc maintainers] > > > > Thanks a lot for testing! > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > > Hi Joel, > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > [...] > > > >> > There are a lot of messages on the way up that look like this: > > > >> > > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > > > > >> > And then trying to run userspace looks like this: > > > >> > > > >> Could you please run with debugging patch posted > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > Lots of this: > > > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > This smells like the problem I've expected that mmap with hint doesn't > > respect the hint even though there is no clashing mapping. The above > > basically says that we didn't map at 0xd9000 but it has placed it at > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > aligning for shared vmas. > > Sorry for confusion here. These are not shared mappings as pointed out > by Russell in a private email. I got confused by the above flags which I > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > reason we are doing the alignment is that we do a file mapping > /* > * We only need to do colour alignment if either the I or D > * caches alias. > */ > if (aliasing) > do_align = filp || (flags & MAP_SHARED); > > I am not really familiar with this architecture to understand why do we > need aliasing for file mappings, though. I think it's there so that flush_dcache_page() works - possibly get_user_pages() being used on a private mapping of page cache pages, but that's guessing. I'm afraid I don't remember all the details, this is code from around 15 years ago, and I'd be very nervous about changing it now without fully understanding the issues.
On Mon 13-11-17 15:09:09, Russell King - ARM Linux wrote: > On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > > [Cc arm and ppc maintainers] > > > > > > Thanks a lot for testing! > > > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > > > Hi Joel, > > > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > > [...] > > > > >> > There are a lot of messages on the way up that look like this: > > > > >> > > > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > > > > > >> > And then trying to run userspace looks like this: > > > > >> > > > > >> Could you please run with debugging patch posted > > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz > > > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > > > Lots of this: > > > > > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > > > This smells like the problem I've expected that mmap with hint doesn't > > > respect the hint even though there is no clashing mapping. The above > > > basically says that we didn't map at 0xd9000 but it has placed it at > > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > > aligning for shared vmas. > > > > Sorry for confusion here. These are not shared mappings as pointed out > > by Russell in a private email. I got confused by the above flags which I > > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > > reason we are doing the alignment is that we do a file mapping > > /* > > * We only need to do colour alignment if either the I or D > > * caches alias. > > */ > > if (aliasing) > > do_align = filp || (flags & MAP_SHARED); > > > > I am not really familiar with this architecture to understand why do we > > need aliasing for file mappings, though. > > I think it's there so that flush_dcache_page() works - possibly > get_user_pages() being used on a private mapping of page cache pages, > but that's guessing. I fail to see how the mixure of MAP_FIXED and regular mapping of the same file work then, but as I've said I really do not understand this code. > I'm afraid I don't remember all the details, this is code from around > 15 years ago, and I'd be very nervous about changing it now without > fully understanding the issues. Ohh, absolutely! I didn't dare to touch this code and that's why I took the easy way and simply opt-out from the harding for all those archs that are basically sharing this pattern. But after a closer look it seems that we can really introduce MAP_FIXED_SAFE that would keep the arch mmap code intact yet we would get the hardening for all archs. It would allow also allow a safer MAP_FIXED semantic for userspace.
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping at a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have - * security consequences (e.g. smash over the stack area). + * security consequences (e.g. smash over the stack area). Be careful + * about architectures which do not respect the address hint due to + * aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr;