From patchwork Thu Dec 21 09:14:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 10127221 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6ACF66019C for ; Thu, 21 Dec 2017 09:14:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61DE62998A for ; Thu, 21 Dec 2017 09:14:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5538829AA7; Thu, 21 Dec 2017 09:14:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9BB22998A for ; Thu, 21 Dec 2017 09:14:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751702AbdLUJOr (ORCPT ); Thu, 21 Dec 2017 04:14:47 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:51700 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752141AbdLUJOk (ORCPT ); Thu, 21 Dec 2017 04:14:40 -0500 Received: from linux-mbth.suse (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Thu, 21 Dec 2017 02:14:37 -0700 Subject: Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod To: Paolo Valente Cc: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= , linux-block , DAVIDE FERRARI <162996@studenti.unimore.it> References: <9c96f320-a7db-4db0-adbc-34f9510ce84d@suse.com> From: Guoqing Jiang Message-ID: Date: Thu, 21 Dec 2017 17:14:32 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 12/21/2017 03:53 PM, Paolo Valente wrote: > >> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang ha scritto: >> >> Hi, >> >> >> On 12/08/2017 08:34 AM, Holger Hoffstätte wrote: >>> So plugging in a device on USB with BFQ as scheduler now works without >>> hiccup (probably thanks to Ming Lei's last patch), but of course I found >>> another problem. Unmounting the device after use, changing the scheduler >>> back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me: >>> >>> kernel: ============================================================================= >>> kernel: BUG bfq_queue (Tainted: G B ): Objects remaining in bfq_queue on __kmem_cache_shutdown() >>> kernel: ----------------------------------------------------------------------------- >>> kernel: >>> kernel: INFO: Slab 0xffffea001601fc00 objects=37 used=3 fp=0xffff8805807f0360 flags=0x8000000000008100 >>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G B 4.14.5 #1 >>> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011 >>> kernel: Call Trace: >>> kernel: dump_stack+0x46/0x5e >>> kernel: slab_err+0x9e/0xb0 >>> kernel: ? on_each_cpu_mask+0x35/0x40 >>> kernel: ? ksm_migrate_page+0x60/0x60 >>> kernel: ? __kmalloc+0x1c9/0x1d0 >>> kernel: __kmem_cache_shutdown+0x177/0x350 >>> kernel: shutdown_cache+0xf/0x130 >>> kernel: kmem_cache_destroy+0x19e/0x1b0 >>> kernel: SyS_delete_module+0x168/0x230 >>> kernel: ? exit_to_usermode_loop+0x39/0x80 >>> kernel: entry_SYSCALL_64_fastpath+0x13/0x94 >>> kernel: RIP: 0033:0x7f53e4136b97 >>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 >>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97 >>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8 >>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000 >>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790 >>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260 >>> kernel: INFO: Object 0xffff8805807f0000 @offset=0 >>> kernel: INFO: Object 0xffff8805807f01b0 @offset=432 >>> kernel: INFO: Object 0xffff8805807f3cc0 @offset=15552 >>> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects >>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G B 4.14.5 #1 >>> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011 >>> kernel: Call Trace: >>> kernel: dump_stack+0x46/0x5e >>> kernel: kmem_cache_destroy+0x191/0x1b0 >>> kernel: SyS_delete_module+0x168/0x230 >>> kernel: ? exit_to_usermode_loop+0x39/0x80 >>> kernel: entry_SYSCALL_64_fastpath+0x13/0x94 >>> kernel: RIP: 0033:0x7f53e4136b97 >>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 >>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97 >>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8 >>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000 >>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790 >>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260 >>> >> I also encountered the similar bug, seems there are some async objects which are not freed with >> BFQ_GROUP_IOSCHED build as "Y". >> >> If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c ("block, bfq: add full hierarchical >> scheduling and cgroups support")like below, then I can't see the bug again with simple test. >> >> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >> index 889a8549d97f..c640e64e042f 100644 >> --- a/block/bfq-iosched.c >> +++ b/block/bfq-iosched.c >> @@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue *e) >> spin_lock_irq(&bfqd->lock); >> list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list) >> bfq_deactivate_bfqq(bfqd, bfqq, false, false); >> + bfq_put_async_queues(bfqd, bfqd->root_group); >> spin_unlock_irq(&bfqd->lock); >> >> hrtimer_cancel(&bfqd->idle_slice_timer); >> @@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue *e) >> blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq); >> #else >> spin_lock_irq(&bfqd->lock); >> - bfq_put_async_queues(bfqd, bfqd->root_group); >> kfree(bfqd->root_group); >> spin_unlock_irq(&bfqd->lock); >> >> But perhaps we should do it inside blkcg_deactivate_policy, right? >> > Thank you very much for investigating this. And yes, I removed that bfq_put_async_queues invocation a long ago, because it had to be done in blkcg_deactivate_policy; and actually it was invoked as expected. We are trying to understand what is going wrong now. Thanks for the in formations. I find another way, does it make sense? Regards, Guoqing diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index da1525ec4c87..0a070daf96c7 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -474,7 +474,10 @@ static void bfq_pd_init(struct blkg_policy_data *pd)  static void bfq_pd_free(struct blkg_policy_data *pd)  {         struct bfq_group *bfqg = pd_to_bfqg(pd); +       struct bfq_data *bfqd = bfqg->bfqd; +       if (bfqd) +               bfq_put_async_queues(bfqd, bfqg);         bfqg_stats_exit(&bfqg->stats);         bfqg_put(bfqg);  }