diff mbox series

[1/1] mm: memcg/slab: Call flush_memcg_workqueue() only if memcg workqueue is valid

Message ID 20200103085503.1665-1-ahuang12@lenovo.com (mailing list archive)
State New, archived
Headers show
Series [1/1] mm: memcg/slab: Call flush_memcg_workqueue() only if memcg workqueue is valid | expand

Commit Message

Adrian Huang Jan. 3, 2020, 8:55 a.m. UTC
From: Adrian Huang <ahuang12@lenovo.com>

When booting with amd_iommu=off, the following WARNING message
appears:
  AMD-Vi: AMD IOMMU disabled on kernel command-line
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6
  Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019
  RIP: 0010:flush_workqueue+0x42e/0x450
  Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04
  RSP: 0000:ffffffff96203d80 EFLAGS: 00010246
  RAX: ffffffff96203dc8 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: ffffffff96a63120 RSI: ffffffff95efcba2 RDI: ffffffff96203dc0
  RBP: ffffffff96203e08 R08: 0000000000000000 R09: ffffffff962a1828
  R10: 00000000f0000080 R11: dead000000000100 R12: ffff8d8a87c0a770
  R13: dead000000000100 R14: 0000000000000456 R15: ffffffff96203da0
  FS:  0000000000000000(0000) GS:ffff8d8dbd000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8d91cfbff000 CR3: 000000078920a000 CR4: 00000000000406b0
  Call Trace:
   ? wait_for_completion+0x51/0x180
   kmem_cache_destroy+0x69/0x260
   iommu_go_to_state+0x40c/0x5ab
   amd_iommu_prepare+0x16/0x2a
   irq_remapping_prepare+0x36/0x5f
   enable_IR_x2apic+0x21/0x172
   default_setup_apic_routing+0x12/0x6f
   apic_intr_mode_init+0x1a1/0x1f1
   x86_late_time_init+0x17/0x1c
   start_kernel+0x480/0x53f
   secondary_startup_64+0xb6/0xc0
  ---[ end trace 30894107c3749449 ]---
  x2apic: IRQ remapping doesn't support X2APIC mode
  x2apic disabled

The warning is caused by the calling of 'kmem_cache_destroy()'
in free_iommu_resources(). Here is the call path:
  free_iommu_resources
    kmem_cache_destroy
      flush_memcg_workqueue
        flush_workqueue

The root cause is that the IOMMU subsystem runs before the
workqueue subsystem, which the variable 'wq_online' is still 'false'.
This leads to the statement 'if (WARN_ON(!wq_online))' in
flush_workqueue() is 'true'.

Since the variable 'memcg_kmem_cache_wq' is not allocated
during the time, it is unnecessary to call flush_memcg_workqueue().
This prevents the WARNING message triggered by flush_workqueue().

Cc: Shakeel Butt <shakeelb@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Reported-by: Xiaochun Lee <lixc17@lenovo.com> 
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
---
 mm/slab_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Shakeel Butt Jan. 3, 2020, 6:33 p.m. UTC | #1
On Fri, Jan 3, 2020 at 12:55 AM Adrian Huang <adrianhuang0701@gmail.com> wrote:
>
> From: Adrian Huang <ahuang12@lenovo.com>
>
> When booting with amd_iommu=off, the following WARNING message
> appears:
>   AMD-Vi: AMD IOMMU disabled on kernel command-line
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450
>   Modules linked in:
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6
>   Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019
>   RIP: 0010:flush_workqueue+0x42e/0x450
>   Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04
>   RSP: 0000:ffffffff96203d80 EFLAGS: 00010246
>   RAX: ffffffff96203dc8 RBX: 0000000000000000 RCX: 0000000000000000
>   RDX: ffffffff96a63120 RSI: ffffffff95efcba2 RDI: ffffffff96203dc0
>   RBP: ffffffff96203e08 R08: 0000000000000000 R09: ffffffff962a1828
>   R10: 00000000f0000080 R11: dead000000000100 R12: ffff8d8a87c0a770
>   R13: dead000000000100 R14: 0000000000000456 R15: ffffffff96203da0
>   FS:  0000000000000000(0000) GS:ffff8d8dbd000000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: ffff8d91cfbff000 CR3: 000000078920a000 CR4: 00000000000406b0
>   Call Trace:
>    ? wait_for_completion+0x51/0x180
>    kmem_cache_destroy+0x69/0x260
>    iommu_go_to_state+0x40c/0x5ab
>    amd_iommu_prepare+0x16/0x2a
>    irq_remapping_prepare+0x36/0x5f
>    enable_IR_x2apic+0x21/0x172
>    default_setup_apic_routing+0x12/0x6f
>    apic_intr_mode_init+0x1a1/0x1f1
>    x86_late_time_init+0x17/0x1c
>    start_kernel+0x480/0x53f
>    secondary_startup_64+0xb6/0xc0
>   ---[ end trace 30894107c3749449 ]---
>   x2apic: IRQ remapping doesn't support X2APIC mode
>   x2apic disabled
>
> The warning is caused by the calling of 'kmem_cache_destroy()'
> in free_iommu_resources(). Here is the call path:
>   free_iommu_resources
>     kmem_cache_destroy
>       flush_memcg_workqueue
>         flush_workqueue
>
> The root cause is that the IOMMU subsystem runs before the
> workqueue subsystem, which the variable 'wq_online' is still 'false'.
> This leads to the statement 'if (WARN_ON(!wq_online))' in
> flush_workqueue() is 'true'.
>
> Since the variable 'memcg_kmem_cache_wq' is not allocated
> during the time, it is unnecessary to call flush_memcg_workqueue().
> This prevents the WARNING message triggered by flush_workqueue().
>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Reported-by: Xiaochun Lee <lixc17@lenovo.com>
> Signed-off-by: Adrian Huang <ahuang12@lenovo.com>

Fixes: 92ee383f6daab ("mm: fix race between kmem_cache destroy, create
and deactivate")

Reviewed-by: Shakeel Butt <shakeelb@google.com>

Should this be backported to stable trees?

> ---
>  mm/slab_common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index f0ab6d4ceb4c..0d95ddea13b0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -903,7 +903,8 @@ static void flush_memcg_workqueue(struct kmem_cache *s)
>          * deactivates the memcg kmem_caches through workqueue. Make sure all
>          * previous workitems on workqueue are processed.
>          */
> -       flush_workqueue(memcg_kmem_cache_wq);
> +       if (likely(memcg_kmem_cache_wq))
> +               flush_workqueue(memcg_kmem_cache_wq);
>
>         /*
>          * If we're racing with children kmem_cache deactivation, it might
> --
> 2.17.1
>
Adrian Huang12 Jan. 8, 2020, 3:17 p.m. UTC | #2
> -----Original Message-----
> From: Shakeel Butt <shakeelb@google.com>
> Sent: Saturday, January 4, 2020 2:33 AM
> To: Adrian Huang <adrianhuang0701@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>; Pekka Enberg <penberg@kernel.org>;
> David Rientjes <rientjes@google.com>; Joonsoo Kim
> <iamjoonsoo.kim@lge.com>; Andrew Morton <akpm@linux-foundation.org>;
> Linux MM <linux-mm@kvack.org>; Adrian Huang12 <ahuang12@lenovo.com>;
> Joerg Roedel <jroedel@suse.de>
> Subject: [External] Re: [PATCH 1/1] mm: memcg/slab: Call
> flush_memcg_workqueue() only if memcg workqueue is valid
> 
> Fixes: 92ee383f6daab ("mm: fix race between kmem_cache destroy, create and
> deactivate")
> 
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Thanks for the review. Really appreciated. 

Hi Andrew, would it be possible to add Shakeel's Reviewed-by tag to this patch link: http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-slab-call-flush_memcg_workqueue-only-if-memcg-workqueue-is-valid.patch

Thanks. 

> 
> Should this be backported to stable trees?
> 

-- Adrian
diff mbox series

Patch

diff --git a/mm/slab_common.c b/mm/slab_common.c
index f0ab6d4ceb4c..0d95ddea13b0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -903,7 +903,8 @@  static void flush_memcg_workqueue(struct kmem_cache *s)
 	 * deactivates the memcg kmem_caches through workqueue. Make sure all
 	 * previous workitems on workqueue are processed.
 	 */
-	flush_workqueue(memcg_kmem_cache_wq);
+	if (likely(memcg_kmem_cache_wq))
+		flush_workqueue(memcg_kmem_cache_wq);
 
 	/*
 	 * If we're racing with children kmem_cache deactivation, it might