Message ID | alpine.LRH.2.02.1401041241590.4648@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | Awaiting Upstream, archived |
Headers | show |
On 4-Jan-14, at 12:45 PM, Mikulas Patocka wrote: > * flush_dcache_page asks for the list of userspace mappings, however > that > page->mapping field is reused by the slab subsystem for a different > purpose. This causes the crash. I'd noticed the other day that the parisc implementation of flush_dcache_page() should return if "!mapping || mapping != page->mapping" is true. This would have avoided crash. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 4 Jan 2014, John David Anglin wrote: > On 4-Jan-14, at 12:45 PM, Mikulas Patocka wrote: > > > * flush_dcache_page asks for the list of userspace mappings, however that > > page->mapping field is reused by the slab subsystem for a different > > purpose. This causes the crash. > > I'd noticed the other day that the parisc implementation of > flush_dcache_page() > should return if "!mapping || mapping != page->mapping" is true. This would > have avoided crash. > > Dave I think no. page_mapping returns NULL if the page has only anonymous mapping and it is not placed in the swap cache. In this case, you need to flush the kernel cache. Maybe you could skip cache flush if the page is neither anonymous nor file-backed, but I haven't seen this condition in other architectures' flush_dcache_page. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4-Jan-14, at 2:55 PM, Mikulas Patocka wrote: > On Sat, 4 Jan 2014, John David Anglin wrote: > >> On 4-Jan-14, at 12:45 PM, Mikulas Patocka wrote: >> >>> * flush_dcache_page asks for the list of userspace mappings, >>> however that >>> page->mapping field is reused by the slab subsystem for a different >>> purpose. This causes the crash. >> >> I'd noticed the other day that the parisc implementation of >> flush_dcache_page() >> should return if "!mapping || mapping != page->mapping" is true. >> This would >> have avoided crash. >> >> Dave > > I think no. > > page_mapping returns NULL if the page has only anonymous mapping and > it is > not placed in the swap cache. In this case, you need to flush the > kernel > cache. The suggestion is to add the "mapping != page->mapping" to the current NULL check. It occurs after the kernel cache flush. It doesn't seem right to flush the vma mappings associated with swap address space and that appears to be happening with current code. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 4 Jan 2014, John David Anglin wrote: > On 4-Jan-14, at 2:55 PM, Mikulas Patocka wrote: > > > On Sat, 4 Jan 2014, John David Anglin wrote: > > > > > On 4-Jan-14, at 12:45 PM, Mikulas Patocka wrote: > > > > > > > * flush_dcache_page asks for the list of userspace mappings, however > > > > that > > > > page->mapping field is reused by the slab subsystem for a different > > > > purpose. This causes the crash. > > > > > > I'd noticed the other day that the parisc implementation of > > > flush_dcache_page() > > > should return if "!mapping || mapping != page->mapping" is true. This > > > would > > > have avoided crash. > > > > > > Dave > > > > I think no. > > > > page_mapping returns NULL if the page has only anonymous mapping and it is > > not placed in the swap cache. In this case, you need to flush the kernel > > cache. > > > The suggestion is to add the "mapping != page->mapping" to the current NULL > check. > It occurs after the kernel cache flush. "if (!mapping || mapping != page->mapping) return;" returns if the mapping is NULL (and that is wrong because the variable mapping is NULL for anonymous pages). You could probably return "if (!mapping && !PageAnon(page))", but the other architectures aren't doing it. > It doesn't seem right to flush the vma mappings associated with swap address > space > and that appears to be happening with current code. > > Dave > -- > John David Anglin dave.anglin@bell.net I suppose that "vma_interval_tree_foreach" is empty operation for swap address space. Or isn't it? Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 04, 2014 at 12:45:45PM -0500, Mikulas Patocka wrote: > The patch 8456a648cf44f14365f1f44de90a3da2526a4776 causes crash in the > LVM2 testsuite on PA-RISC (the crashing test is fsadm.sh). The testsuite > doesn't crash on 3.12, crashes on 3.13-rc1 and later. > > Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d) > CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5 > task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000 > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > PSW: 00001000000001101111100100001110 Not tainted > r00-03 000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0 > r04-07 00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40 > r08-11 0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000 > r12-15 0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000 > r16-19 000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001 > r20-23 0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080 > r24-27 00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0 > r28-31 202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001 > sr00-03 000000000532c000 0000000000000000 0000000000000000 000000000532c000 > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430 > IIR: 539c0030 ISR: 00000000202d6000 IOR: 000006202224647d > CPU: 3 CR30: 000000413edd8000 CR31: 0000000000000000 > ORIG_R28: 00000000405a95e0 > IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48 > IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48 > RP(r2): flush_dcache_page+0x128/0x388 > Backtrace: > [<000000004013e6c0>] flush_dcache_page+0x128/0x388 > [<0000000010fe6ca0>] lo_splice_actor+0x90/0x148 [loop] > [<00000000402579b0>] splice_from_pipe_feed+0xc0/0x1d0 > [<00000000402580a4>] __splice_from_pipe+0xac/0xc0 > [<0000000010fe6bbc>] lo_direct_splice_actor+0x1c/0x70 [loop] > [<000000004025854c>] splice_direct_to_actor+0xec/0x228 > [<0000000010fe63ac>] lo_receive+0xe4/0x298 [loop] > [<0000000010fe69d8>] loop_thread+0x478/0x640 [loop] > [<000000004018975c>] kthread+0x134/0x168 > [<000000004012c020>] end_fault_vector+0x20/0x28 > [<00000000115e0098>] xfs_setsize_buftarg+0x0/0x90 [xfs] > > Kernel panic - not syncing: Bad Address (null pointer deref?) > > The patch 8456a648cf44f14365f1f44de90a3da2526a4776 changes the page > structure so that the slab subsystem reuses the page->mapping field. > > The crash happens in the following way: > * XFS allocates some memory from slab and issues a bio to read data into > it. > * the bio is sent to the loopback device. > * lo_receive creates an actor and calls splice_direct_to_actor. > * lo_splice_actor copies data to the target page. > * lo_splice_actor calls flush_dcache_page because the page may be mapped > by userspace. In that case we need to flush the kernel cache. > * flush_dcache_page asks for the list of userspace mappings, however that > page->mapping field is reused by the slab subsystem for a different > purpose. This causes the crash. > > Note that other architectures without coherent caches (sparc, arm, mips) > also call page_mapping from flush_dcache_page, so they may crash in the > same way. > > This patch fixes this bug by testing if the page is a slab page in > page_mapping and returning NULL if it is. > > > The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in > earlier kernels in the same scenario on architectures without cache > coherence when CONFIG_DEBUG_VM is enabled - so it should be backported to > stable kernels. > > > In the old kernels, the function page_mapping is placed in > include/linux/mm.h, so you should modify the patch accordingly when > backporting it. > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > --- > mm/util.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > Index: linux-3.13-rc6/mm/util.c > =================================================================== > --- linux-3.13-rc6.orig/mm/util.c 2014-01-04 00:06:07.000000000 +0100 > +++ linux-3.13-rc6/mm/util.c 2014-01-04 00:24:42.000000000 +0100 > @@ -390,7 +390,10 @@ struct address_space *page_mapping(struc > { > struct address_space *mapping = page->mapping; > > - VM_BUG_ON(PageSlab(page)); > + /* This happens if someone calls flush_dcache_page on slab page */ > + if (unlikely(PageSlab(page))) > + return NULL; > + > if (unlikely(PageSwapCache(page))) { > swp_entry_t entry; > > -- Hello, I'm surprised that this VM_BUG_ON() has not been triggered until now. It was introduced in 2007 by commit (b5fab14). Maybe there is no person who test with CONFIG_DEBUG_VM. There is one more bug report same as this. * possible regression on 3.13 when calling flush_dcache_page (lkml.org/lkml/2013/12/12/255) As mentioned in the description of commit (b5fab14), slab object may not be properly aligned and use of page oriented function to this object can be dangerous. I searched the XFS code and found that they only try to allocate multiple of 512 bytes, so there is no problem for now. But, IMHO, it is better not to use slab objects for this purpose. And I rapidly searched every callsites of page_mapping() and, IMHO, this patch would work correctly. But possibly reverting original commit is better solution. Hello, Pekka and Christoph. Could you teach me which direction we have to go? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Mon, 6 Jan 2014, Joonsoo Kim wrote: > Hello, > > I'm surprised that this VM_BUG_ON() has not been triggered until now. It was > introduced in 2007 by commit (b5fab14). Maybe there is no person who test > with CONFIG_DEBUG_VM. Last time I tried it, PS-RISC didn't work with CONFIG_DEBUG_VM at all. > There is one more bug report same as this. > * possible regression on 3.13 when calling flush_dcache_page > (lkml.org/lkml/2013/12/12/255) That link doesn't show anything. > As mentioned in the description of commit (b5fab14), slab object may not be > properly aligned and use of page oriented function to this object can be > dangerous. I searched the XFS code and found that they only try to allocate > multiple of 512 bytes, so there is no problem for now. But, IMHO, it is better > not to use slab objects for this purpose. If slab debugging is enabled, kmalloc memory is not aligned. In XFS in xfs_buf_allocate_memory they test if the kmalloc memory crosses page boundary - if it does, they free the kmalloc memory and allocate a full page. Maybe this approach could still run into problems with some bus-master adapters that assume alignment in hardware... dm-bufio also does I/O to slab-allocated buffers, but it allocates the object from slab (not kmalloc) with proper alignment. > And I rapidly searched every callsites of page_mapping() and, IMHO, this > patch would work correctly. But possibly reverting original commit is > better solution. Reverting the original commit wouldn't fix that VM_BUG_ON. > Hello, Pekka and Christoph. > Could you teach me which direction we have to go? > > Thanks. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 06, 2014 at 12:54:22PM -0500, Mikulas Patocka wrote: > Hi > > On Mon, 6 Jan 2014, Joonsoo Kim wrote: > > > Hello, > > > > I'm surprised that this VM_BUG_ON() has not been triggered until now. It was > > introduced in 2007 by commit (b5fab14). Maybe there is no person who test > > with CONFIG_DEBUG_VM. > > Last time I tried it, PS-RISC didn't work with CONFIG_DEBUG_VM at all. > > > There is one more bug report same as this. > > * possible regression on 3.13 when calling flush_dcache_page > > (lkml.org/lkml/2013/12/12/255) > > That link doesn't show anything. > > > As mentioned in the description of commit (b5fab14), slab object may not be > > properly aligned and use of page oriented function to this object can be > > dangerous. I searched the XFS code and found that they only try to allocate > > multiple of 512 bytes, so there is no problem for now. But, IMHO, it is better > > not to use slab objects for this purpose. > > If slab debugging is enabled, kmalloc memory is not aligned. > > In XFS in xfs_buf_allocate_memory they test if the kmalloc memory crosses > page boundary - if it does, they free the kmalloc memory and allocate a > full page. Maybe this approach could still run into problems with some > bus-master adapters that assume alignment in hardware... > > > dm-bufio also does I/O to slab-allocated buffers, but it allocates the > object from slab (not kmalloc) with proper alignment. Hello, Okay. I see. Thanks for good explanation. > > > And I rapidly searched every callsites of page_mapping() and, IMHO, this > > patch would work correctly. But possibly reverting original commit is > > better solution. > > Reverting the original commit wouldn't fix that VM_BUG_ON. Initially, I thought that VM_BUG_ON() isn't wrong and it was better to remove the callsites where do I/O with slab-allocated buffers, because doing I/O with slab-allocated buffers needs a great care. So I didn't fully agreed with your patch and recommended to revert original commit yesterday. After reverting that, I would attempt to remove the callsites. But, now, I change my thought, because of your explanation. There are already some users to do I/O with slab-allocated buffers and they already did it with some cares, so I guess that admitting this usage is more beneficial than forbidding it. Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/07/2014 02:41 AM, Joonsoo Kim wrote: > On Mon, Jan 06, 2014 at 12:54:22PM -0500, Mikulas Patocka wrote: >> Hi >> >> On Mon, 6 Jan 2014, Joonsoo Kim wrote: >> >>> Hello, >>> >>> I'm surprised that this VM_BUG_ON() has not been triggered until now. It was >>> introduced in 2007 by commit (b5fab14). Maybe there is no person who test >>> with CONFIG_DEBUG_VM. >> Last time I tried it, PS-RISC didn't work with CONFIG_DEBUG_VM at all. >> >>> There is one more bug report same as this. >>> * possible regression on 3.13 when calling flush_dcache_page >>> (lkml.org/lkml/2013/12/12/255) >> That link doesn't show anything. >> >>> As mentioned in the description of commit (b5fab14), slab object may not be >>> properly aligned and use of page oriented function to this object can be >>> dangerous. I searched the XFS code and found that they only try to allocate >>> multiple of 512 bytes, so there is no problem for now. But, IMHO, it is better >>> not to use slab objects for this purpose. >> If slab debugging is enabled, kmalloc memory is not aligned. >> >> In XFS in xfs_buf_allocate_memory they test if the kmalloc memory crosses >> page boundary - if it does, they free the kmalloc memory and allocate a >> full page. Maybe this approach could still run into problems with some >> bus-master adapters that assume alignment in hardware... >> >> >> dm-bufio also does I/O to slab-allocated buffers, but it allocates the >> object from slab (not kmalloc) with proper alignment. > Hello, > > Okay. I see. > Thanks for good explanation. > >>> And I rapidly searched every callsites of page_mapping() and, IMHO, this >>> patch would work correctly. But possibly reverting original commit is >>> better solution. >> Reverting the original commit wouldn't fix that VM_BUG_ON. > Initially, I thought that VM_BUG_ON() isn't wrong and it was better to remove > the callsites where do I/O with slab-allocated buffers, because doing I/O > with slab-allocated buffers needs a great care. So I didn't fully agreed with > your patch and recommended to revert original commit yesterday. After reverting > that, I would attempt to remove the callsites. > > But, now, I change my thought, because of your explanation. There are already > some users to do I/O with slab-allocated buffers and they already did it with > some cares, so I guess that admitting this usage is more beneficial than > forbidding it. > > Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> I can queue up this patch in my next pull-request for the parisc-tree which I plan to send tomorrow, unless people want this patch to go via mm-tree or similiar... Please let me know. Helge -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 8, 2014 at 11:05 PM, Helge Deller <deller@gmx.de> wrote: > On 01/07/2014 02:41 AM, Joonsoo Kim wrote: >> >> On Mon, Jan 06, 2014 at 12:54:22PM -0500, Mikulas Patocka wrote: >>> >>> Hi >>> >>> On Mon, 6 Jan 2014, Joonsoo Kim wrote: >>> >>>> Hello, >>>> >>>> I'm surprised that this VM_BUG_ON() has not been triggered until now. It >>>> was >>>> introduced in 2007 by commit (b5fab14). Maybe there is no person who >>>> test >>>> with CONFIG_DEBUG_VM. >>> >>> Last time I tried it, PS-RISC didn't work with CONFIG_DEBUG_VM at all. >>> >>>> There is one more bug report same as this. >>>> * possible regression on 3.13 when calling flush_dcache_page >>>> (lkml.org/lkml/2013/12/12/255) >>> >>> That link doesn't show anything. >>> >>>> As mentioned in the description of commit (b5fab14), slab object may not >>>> be >>>> properly aligned and use of page oriented function to this object can be >>>> dangerous. I searched the XFS code and found that they only try to >>>> allocate >>>> multiple of 512 bytes, so there is no problem for now. But, IMHO, it is >>>> better >>>> not to use slab objects for this purpose. >>> >>> If slab debugging is enabled, kmalloc memory is not aligned. >>> >>> In XFS in xfs_buf_allocate_memory they test if the kmalloc memory crosses >>> page boundary - if it does, they free the kmalloc memory and allocate a >>> full page. Maybe this approach could still run into problems with some >>> bus-master adapters that assume alignment in hardware... >>> >>> >>> dm-bufio also does I/O to slab-allocated buffers, but it allocates the >>> object from slab (not kmalloc) with proper alignment. >> >> Hello, >> >> Okay. I see. >> Thanks for good explanation. >> >>>> And I rapidly searched every callsites of page_mapping() and, IMHO, this >>>> patch would work correctly. But possibly reverting original commit is >>>> better solution. >>> >>> Reverting the original commit wouldn't fix that VM_BUG_ON. >> >> Initially, I thought that VM_BUG_ON() isn't wrong and it was better to >> remove >> the callsites where do I/O with slab-allocated buffers, because doing I/O >> with slab-allocated buffers needs a great care. So I didn't fully agreed >> with >> your patch and recommended to revert original commit yesterday. After >> reverting >> that, I would attempt to remove the callsites. >> >> But, now, I change my thought, because of your explanation. There are >> already >> some users to do I/O with slab-allocated buffers and they already did it >> with >> some cares, so I guess that admitting this usage is more beneficial than >> forbidding it. >> >> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > > I can queue up this patch in my next pull-request for the parisc-tree which > I plan to > send tomorrow, unless people want this patch to go via mm-tree or > similiar... > Please let me know. The patch looks good to me but it probably should go through Andrew's tree. Acked-by: Pekka Enberg <penberg@kernel.org> -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/08/2014 10:37 PM, Pekka Enberg wrote: > On Wed, Jan 8, 2014 at 11:05 PM, Helge Deller <deller@gmx.de> wrote: >> On 01/07/2014 02:41 AM, Joonsoo Kim wrote: >>> On Mon, Jan 06, 2014 at 12:54:22PM -0500, Mikulas Patocka wrote: >>>> Hi >>>> >>>> On Mon, 6 Jan 2014, Joonsoo Kim wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm surprised that this VM_BUG_ON() has not been triggered until now. It >>>>> was >>>>> introduced in 2007 by commit (b5fab14). Maybe there is no person who >>>>> test >>>>> with CONFIG_DEBUG_VM. >>>> Last time I tried it, PS-RISC didn't work with CONFIG_DEBUG_VM at all. >>>> >>>>> There is one more bug report same as this. >>>>> * possible regression on 3.13 when calling flush_dcache_page >>>>> (lkml.org/lkml/2013/12/12/255) >>>> That link doesn't show anything. >>>> >>>>> As mentioned in the description of commit (b5fab14), slab object may not >>>>> be >>>>> properly aligned and use of page oriented function to this object can be >>>>> dangerous. I searched the XFS code and found that they only try to >>>>> allocate >>>>> multiple of 512 bytes, so there is no problem for now. But, IMHO, it is >>>>> better >>>>> not to use slab objects for this purpose. >>>> If slab debugging is enabled, kmalloc memory is not aligned. >>>> >>>> In XFS in xfs_buf_allocate_memory they test if the kmalloc memory crosses >>>> page boundary - if it does, they free the kmalloc memory and allocate a >>>> full page. Maybe this approach could still run into problems with some >>>> bus-master adapters that assume alignment in hardware... >>>> >>>> >>>> dm-bufio also does I/O to slab-allocated buffers, but it allocates the >>>> object from slab (not kmalloc) with proper alignment. >>> Hello, >>> >>> Okay. I see. >>> Thanks for good explanation. >>> >>>>> And I rapidly searched every callsites of page_mapping() and, IMHO, this >>>>> patch would work correctly. But possibly reverting original commit is >>>>> better solution. >>>> Reverting the original commit wouldn't fix that VM_BUG_ON. >>> Initially, I thought that VM_BUG_ON() isn't wrong and it was better to >>> remove >>> the callsites where do I/O with slab-allocated buffers, because doing I/O >>> with slab-allocated buffers needs a great care. So I didn't fully agreed >>> with >>> your patch and recommended to revert original commit yesterday. After >>> reverting >>> that, I would attempt to remove the callsites. >>> >>> But, now, I change my thought, because of your explanation. There are >>> already >>> some users to do I/O with slab-allocated buffers and they already did it >>> with >>> some cares, so I guess that admitting this usage is more beneficial than >>> forbidding it. >>> >>> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> >> >> I can queue up this patch in my next pull-request for the parisc-tree which >> I plan to >> send tomorrow, unless people want this patch to go via mm-tree or >> similiar... >> Please let me know. > The patch looks good to me but it probably should go through Andrew's tree. > > Acked-by: Pekka Enberg <penberg@kernel.org> Absolutely fine with me. Andrew, can you please pick it up for 3.13 ? Thanks, Helge -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 8 Jan 2014 23:37:49 +0200 Pekka Enberg <penberg@kernel.org> wrote:
> The patch looks good to me but it probably should go through Andrew's tree.
yup.
page_mapping() will be called quite frequently, and adding a new
test-n-branch in there will be somewhat costly. We might end up with a
better kernel if we were to instead revert 8456a648cf44f. How useful
was that patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 08, 2014 at 01:59:30PM -0800, Andrew Morton wrote: > On Wed, 8 Jan 2014 23:37:49 +0200 Pekka Enberg <penberg@kernel.org> wrote: > > > The patch looks good to me but it probably should go through Andrew's tree. > > yup. > > page_mapping() will be called quite frequently, and adding a new > test-n-branch in there will be somewhat costly. We might end up with a > better kernel if we were to instead revert 8456a648cf44f. How useful > was that patch? Hello, Performance effect of this patch was decribed in the cover-letter, but I missed to attach it to patch description. Sorry about that. In summary, this patch saves some memory and decreases cache-footprint so that it increases performance. Here goes the description in cover-letter. Below is some numbers of 'cat /proc/slabinfo'. * Before * # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...] kmalloc-512 527 600 512 8 1 : tunables 54 27 0 : slabdata 75 75 0 kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0 kmalloc-192 1040 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0 kmalloc-96 750 750 128 30 1 : tunables 120 60 0 : slabdata 25 25 0 kmalloc-64 2773 2773 64 59 1 : tunables 120 60 0 : slabdata 47 47 0 kmalloc-128 660 690 128 30 1 : tunables 120 60 0 : slabdata 23 23 0 kmalloc-32 11200 11200 32 112 1 : tunables 120 60 0 : slabdata 100 100 0 kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0 * After * # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...] kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0 kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0 kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0 kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0 kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0 kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0 kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0 kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0 kmem_caches consisting of objects less than or equal to 128 byte have one more objects in a slab. You can see it at objperslab. Here are the performance results on my 4 cpus machine. * Before * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 238,309,671 cache-misses ( +- 0.40% ) 12.010172090 seconds time elapsed ( +- 0.21% ) * After * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 229,945,138 cache-misses ( +- 0.23% ) 11.627897174 seconds time elapsed ( +- 0.14% ) cache-misses are reduced by this patchset, roughly 5%. And elapsed times are also improved by 3.1% to baseline. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 9 Jan 2014 09:13:31 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > On Wed, Jan 08, 2014 at 01:59:30PM -0800, Andrew Morton wrote: > > On Wed, 8 Jan 2014 23:37:49 +0200 Pekka Enberg <penberg@kernel.org> wrote: > > > > > The patch looks good to me but it probably should go through Andrew's tree. > > > > yup. > > > > page_mapping() will be called quite frequently, and adding a new > > test-n-branch in there will be somewhat costly. We might end up with a > > better kernel if we were to instead revert 8456a648cf44f. How useful > > was that patch? > > Hello, > > Performance effect of this patch was decribed in the cover-letter, but > I missed to attach it to patch description. Sorry about that. > > In summary, this patch saves some memory and decreases cache-footprint > so that it increases performance. > > Here goes the description in cover-letter. > > ... > > cache-misses are reduced by this patchset, roughly 5%. > And elapsed times are also improved by 3.1% to baseline. ah, OK, thanks, useful. A few instructions added to page_mapping() won't have effects like that! -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 9, 2014 at 2:19 AM, Andrew Morton <akpm@linux-foundation.org> wrote: >> cache-misses are reduced by this patchset, roughly 5%. >> And elapsed times are also improved by 3.1% to baseline. > > ah, OK, thanks, useful. A few instructions added to page_mapping() > won't have effects like that! Yup, I merged the series because the numbers were so impressive. There's a link to the cover letter in merge commit 24f971a but it would have been better to include them in the changelog itself. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mikulas, On Sat, Jan 04, 2014 at 12:45:45PM -0500, Mikulas Patocka wrote: > The patch 8456a648cf44f14365f1f44de90a3da2526a4776 causes crash in the > LVM2 testsuite on PA-RISC (the crashing test is fsadm.sh). The testsuite > doesn't crash on 3.12, crashes on 3.13-rc1 and later. > > Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d) > CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5 > task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000 > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > PSW: 00001000000001101111100100001110 Not tainted > r00-03 000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0 > r04-07 00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40 > r08-11 0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000 > r12-15 0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000 > r16-19 000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001 > r20-23 0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080 > r24-27 00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0 > r28-31 202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001 > sr00-03 000000000532c000 0000000000000000 0000000000000000 000000000532c000 > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430 > IIR: 539c0030 ISR: 00000000202d6000 IOR: 000006202224647d > CPU: 3 CR30: 000000413edd8000 CR31: 0000000000000000 > ORIG_R28: 00000000405a95e0 > IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48 > IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48 > RP(r2): flush_dcache_page+0x128/0x388 > Backtrace: > [<000000004013e6c0>] flush_dcache_page+0x128/0x388 > [<0000000010fe6ca0>] lo_splice_actor+0x90/0x148 [loop] > [<00000000402579b0>] splice_from_pipe_feed+0xc0/0x1d0 > [<00000000402580a4>] __splice_from_pipe+0xac/0xc0 > [<0000000010fe6bbc>] lo_direct_splice_actor+0x1c/0x70 [loop] > [<000000004025854c>] splice_direct_to_actor+0xec/0x228 > [<0000000010fe63ac>] lo_receive+0xe4/0x298 [loop] > [<0000000010fe69d8>] loop_thread+0x478/0x640 [loop] > [<000000004018975c>] kthread+0x134/0x168 > [<000000004012c020>] end_fault_vector+0x20/0x28 > [<00000000115e0098>] xfs_setsize_buftarg+0x0/0x90 [xfs] > > Kernel panic - not syncing: Bad Address (null pointer deref?) > > The patch 8456a648cf44f14365f1f44de90a3da2526a4776 changes the page > structure so that the slab subsystem reuses the page->mapping field. > > The crash happens in the following way: > * XFS allocates some memory from slab and issues a bio to read data into > it. > * the bio is sent to the loopback device. > * lo_receive creates an actor and calls splice_direct_to_actor. > * lo_splice_actor copies data to the target page. > * lo_splice_actor calls flush_dcache_page because the page may be mapped > by userspace. In that case we need to flush the kernel cache. > * flush_dcache_page asks for the list of userspace mappings, however that > page->mapping field is reused by the slab subsystem for a different > purpose. This causes the crash. > > Note that other architectures without coherent caches (sparc, arm, mips) > also call page_mapping from flush_dcache_page, so they may crash in the > same way. > > This patch fixes this bug by testing if the page is a slab page in > page_mapping and returning NULL if it is. > > > The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in > earlier kernels in the same scenario on architectures without cache > coherence when CONFIG_DEBUG_VM is enabled - so it should be backported to > stable kernels. > > > In the old kernels, the function page_mapping is placed in > include/linux/mm.h, so you should modify the patch accordingly when > backporting it. > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > --- > mm/util.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > Index: linux-3.13-rc6/mm/util.c > =================================================================== > --- linux-3.13-rc6.orig/mm/util.c 2014-01-04 00:06:07.000000000 +0100 > +++ linux-3.13-rc6/mm/util.c 2014-01-04 00:24:42.000000000 +0100 > @@ -390,7 +390,10 @@ struct address_space *page_mapping(struc > { > struct address_space *mapping = page->mapping; > > - VM_BUG_ON(PageSlab(page)); > + /* This happens if someone calls flush_dcache_page on slab page */ > + if (unlikely(PageSlab(page))) > + return NULL; > + > if (unlikely(PageSwapCache(page))) { > swp_entry_t entry; I don't think that this is the correct fix. According to cachetlb.txt flush_(kernel_)dcache_page() is not supposed to be called with a slab page in the first place. There is code in the kernel to avoid that (see for example the discussion in [1] and [2]). Also on ARM, page_mapping() == NULL results in flush_(kernel_)dcache_page() assuming that the page is an anon page. Consequently, it would flush the slab page, which make no sense. Thus, I think we either need to add the check to the original caller of flush_dcache_page() or we allow flush_(kernel_)dcache_page() to be called with slab pages and put the check there (this has been proposed by Russell King once [3], but would affect multiple architectures) - Simon [1] https://lkml.org/lkml/2013/10/24/414 [2] https://lkml.org/lkml/2013/10/28/432 [3] https://lkml.org/lkml/2013/10/27/89 -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Haben Sie eine dringende Darlehen benötigen? Sofort für schnelle Darlehen in DAICHI Darlehen beantragen FIRM-SPS. Wir sind 24 Stunden für Sie online. E-Mail: peterdaichi2012@gmail.com Vollständiger Name: ............................ Kontakt Adresse: ...................... Land: .............................. Betrag als Darlehen benötigt: ................ Loan Dauer: ........................ Zweck des Darlehens: ...................... Beruf: ........................... Geschlecht: .................................. Alter: .................................. Telefon: ................................ Mit freundlichen Grüßen, Herr Peter Daichi Director / MD -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-3.13-rc6/mm/util.c =================================================================== --- linux-3.13-rc6.orig/mm/util.c 2014-01-04 00:06:07.000000000 +0100 +++ linux-3.13-rc6/mm/util.c 2014-01-04 00:24:42.000000000 +0100 @@ -390,7 +390,10 @@ struct address_space *page_mapping(struc { struct address_space *mapping = page->mapping; - VM_BUG_ON(PageSlab(page)); + /* This happens if someone calls flush_dcache_page on slab page */ + if (unlikely(PageSlab(page))) + return NULL; + if (unlikely(PageSwapCache(page))) { swp_entry_t entry;
The patch 8456a648cf44f14365f1f44de90a3da2526a4776 causes crash in the LVM2 testsuite on PA-RISC (the crashing test is fsadm.sh). The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and later. Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d) CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5 task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001101111100100001110 Not tainted r00-03 000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0 r04-07 00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40 r08-11 0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000 r12-15 0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000 r16-19 000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001 r20-23 0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080 r24-27 00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0 r28-31 202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001 sr00-03 000000000532c000 0000000000000000 0000000000000000 000000000532c000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430 IIR: 539c0030 ISR: 00000000202d6000 IOR: 000006202224647d CPU: 3 CR30: 000000413edd8000 CR31: 0000000000000000 ORIG_R28: 00000000405a95e0 IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48 IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48 RP(r2): flush_dcache_page+0x128/0x388 Backtrace: [<000000004013e6c0>] flush_dcache_page+0x128/0x388 [<0000000010fe6ca0>] lo_splice_actor+0x90/0x148 [loop] [<00000000402579b0>] splice_from_pipe_feed+0xc0/0x1d0 [<00000000402580a4>] __splice_from_pipe+0xac/0xc0 [<0000000010fe6bbc>] lo_direct_splice_actor+0x1c/0x70 [loop] [<000000004025854c>] splice_direct_to_actor+0xec/0x228 [<0000000010fe63ac>] lo_receive+0xe4/0x298 [loop] [<0000000010fe69d8>] loop_thread+0x478/0x640 [loop] [<000000004018975c>] kthread+0x134/0x168 [<000000004012c020>] end_fault_vector+0x20/0x28 [<00000000115e0098>] xfs_setsize_buftarg+0x0/0x90 [xfs] Kernel panic - not syncing: Bad Address (null pointer deref?) The patch 8456a648cf44f14365f1f44de90a3da2526a4776 changes the page structure so that the slab subsystem reuses the page->mapping field. The crash happens in the following way: * XFS allocates some memory from slab and issues a bio to read data into it. * the bio is sent to the loopback device. * lo_receive creates an actor and calls splice_direct_to_actor. * lo_splice_actor copies data to the target page. * lo_splice_actor calls flush_dcache_page because the page may be mapped by userspace. In that case we need to flush the kernel cache. * flush_dcache_page asks for the list of userspace mappings, however that page->mapping field is reused by the slab subsystem for a different purpose. This causes the crash. Note that other architectures without coherent caches (sparc, arm, mips) also call page_mapping from flush_dcache_page, so they may crash in the same way. This patch fixes this bug by testing if the page is a slab page in page_mapping and returning NULL if it is. The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in earlier kernels in the same scenario on architectures without cache coherence when CONFIG_DEBUG_VM is enabled - so it should be backported to stable kernels. In the old kernels, the function page_mapping is placed in include/linux/mm.h, so you should modify the patch accordingly when backporting it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- mm/util.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html