From patchwork Thu Jun 28 19:57:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10495005 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3CABF60230 for ; Thu, 28 Jun 2018 19:58:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A9ED2A3B8 for ; Thu, 28 Jun 2018 19:58:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2EC462A3BD; Thu, 28 Jun 2018 19:58:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0A5B2A3B8 for ; Thu, 28 Jun 2018 19:58:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965533AbeF1T6C (ORCPT ); Thu, 28 Jun 2018 15:58:02 -0400 Received: from mail-it0-f68.google.com ([209.85.214.68]:51538 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965531AbeF1T57 (ORCPT ); Thu, 28 Jun 2018 15:57:59 -0400 Received: by mail-it0-f68.google.com with SMTP id 128-v6so14520556itf.1 for ; Thu, 28 Jun 2018 12:57:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=4rKmfEUuk51mfhwjtRbJDcxJv9lntII7cLxnKXAD0S8=; b=Hsc0MzsTpZdi/5r8tkb+GPbYECoIwI6MW2CidskOOLkPzbFa2BkTyn4hhy3wnZRLtR MQSMBYfs4IKudcSIOwnDouDdrPXUZaN/56N2hFgdyWV58nddO1CAasFlrFIw0TwsX1h/ wnIY/1eNMJl8u+zrOLimuK/7ahEOaUSfZ93w/udvrRNv/4ZKcgf3lQdDsk+fb5E32K7m ZmRHYFnU59NV4VeERntpCQNZebIaW6otW7MLVzYxsIxpA0pY1FlGvtoaIH4rFkZ4Zs4j dGh6WW6fGJjSiXMF7tEjkXOtX7KuaMWFpZPF2B9WsW3E/O0Y39Mj2Aox57ixzpy2+gjW XcUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4rKmfEUuk51mfhwjtRbJDcxJv9lntII7cLxnKXAD0S8=; b=HNiBSvyB6eZ3wyA6vyn+KHD96uj+jJmYSuiMQ0+9AYwctw9psrr2qx6WGwwzn+bOlG CuGsMYTS7+KiPShT3eXed+23XHHRFFc8YcqzNzmSZjVYHuTNJTdb71YQgfV54qbmQude 04xcA5JugnEJzNf7TG/jMu6TW8z1y+LwInVb9Qz6KJO5manpX1ZdAWEAaj4PRmGYs6kJ WjjlOiHrAVWBulW+egXOd2RhXGtlHkqJVka1JW+7tCLCZ+N+eGRAFqW1dOmT6a70PZ5D 5Z6YVo2N5zWFv0IC7sxwYCW3avCX/674oSFdoGzkjU4b65jH0boNEWu+yQvkn+kphLaq sZow== X-Gm-Message-State: APt69E1GAH18VFUcwGdDb643CTlK10f/NthNGV1RVjZFQ6lWiua0wsz7 6dWZT1kYCrn9k7ot4iCLxzfHAA== X-Google-Smtp-Source: AAOMgpdFbdBG/bBlrP7aLVWdSOSgMpA8lU+h3bujH+fH/7h7PUb5EYOz21SagLWHL/uEMBpqOe/+Fg== X-Received: by 2002:a24:308b:: with SMTP id q133-v6mr8878041itq.55.1530215878515; Thu, 28 Jun 2018 12:57:58 -0700 (PDT) Received: from [192.168.1.212] (107.191.0.158.static.utbb.net. [107.191.0.158]) by smtp.gmail.com with ESMTPSA id 135-v6sm773950ity.19.2018.06.28.12.57.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jun 2018 12:57:57 -0700 (PDT) Subject: Re: Oops in kmem_cache_free() via bioset_exit() (was Re: [next-20180601][nvme][ppc] Kernel Oops is triggered when creating lvm snapshots on nvme disks) To: Michael Ellerman , Abdul Haleem Cc: linuxppc-dev , linux-fsdevel , linux-next , linux-kernel , linux-scsi , Stephen Rothwell , sachinp , sim , manvanth , Brian King , linux-block@vger.kernel.org, Kent Overstreet References: <1530003645.24245.7.camel@abdul.in.ibm.com> <87d0wd7jph.fsf@concordia.ellerman.id.au> <1530176707.24245.12.camel@abdul.in.ibm.com> <87a7rf55vy.fsf@concordia.ellerman.id.au> From: Jens Axboe Message-ID: <32e06233-3e58-10af-40d9-a22d9e5c4e96@kernel.dk> Date: Thu, 28 Jun 2018 13:57:55 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <87a7rf55vy.fsf@concordia.ellerman.id.au> Content-Language: en-US Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 6/28/18 8:42 AM, Michael Ellerman wrote: > Kent, Jens, > > This looks like it might be related to the recent bioset changes? > > cheers > > Abdul Haleem writes: >> On Tue, 2018-06-26 at 23:36 +1000, Michael Ellerman wrote: >>> Abdul Haleem writes: > ... >> I was able to reproduce again with slub_debug=FZP and DEBUG_INFO enabled >> on 4.17.0-rc7-next-20180601, but not much traces other than the Oops stack trace > > Are you still testing on that revision? It's nearly a month old. > > Please try to reproduce on mainline or today's linux-next. > > >> the faulty instruction points to below code path : >> >> gdb -batch vmlinux -ex 'list *(0xc000000000304fe0)' >> 0xc000000000304fe0 is in kmem_cache_free (mm/slab.h:231). >> 226 } >> 227 >> 228 static inline bool slab_equal_or_root(struct kmem_cache *s, >> 229 struct kmem_cache *p) >> 230 { >> 231 return p == s || p == s->memcg_params.root_cache; >> 232 } > > And s is NULL. > > Called via: > kmem_cache_free+0x210/0x2a0 > mempool_free_slab+0x24/0x40 > mempool_exit+0x50/0x90 > bioset_exit+0x40/0x1d0 > dm_io_client_destroy+0x2c/0x50 > dm_bufio_client_destroy+0x1fc/0x2d0 [dm_bufio] > persistent_read_metadata+0x430/0x660 [dm_snapshot] > snapshot_ctr+0x5c8/0x7a0 [dm_snapshot] > dm_table_add_target+0x19c/0x3c0 > table_load+0x104/0x450 > ctl_ioctl+0x1f8/0x570 > dm_ctl_ioctl+0x18/0x30 > do_vfs_ioctl+0xcc/0x9e0 > ksys_ioctl+0x5c/0xe0 > sys_ioctl+0x20/0x80 > system_call+0x58/0x6c > > So looks like we did: > > kmem_cache_free(NULL > > Probably a bad error path that frees before the cache has been allocated. > > mempool_init_node() calls mempool_exit() on a partially initialised > mempool, which looks fishy, though you're not hitting that patch AFAICS. The slab cache is setup elsewhere, it's pending_cache. So if pending_cache is NULL, then yeah and exit there will barf. I'd try something like the below, but from the trace, we already basically see the path. diff --git a/include/linux/mempool.h b/include/linux/mempool.h index 0c964ac107c2..ebfa2f89ffdd 100644 --- a/include/linux/mempool.h +++ b/include/linux/mempool.h @@ -59,6 +59,7 @@ void mempool_free_slab(void *element, void *pool_data); static inline int mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc) { + BUG_ON(!kc); return mempool_init(pool, min_nr, mempool_alloc_slab, mempool_free_slab, (void *) kc); } diff --git a/mm/mempool.c b/mm/mempool.c index b54f2c20e5e0..060f44acd0df 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -508,7 +508,9 @@ EXPORT_SYMBOL(mempool_alloc_slab); void mempool_free_slab(void *element, void *pool_data) { struct kmem_cache *mem = pool_data; - kmem_cache_free(mem, element); + + if (!WARN_ON(!mem)) + kmem_cache_free(mem, element); } EXPORT_SYMBOL(mempool_free_slab);