From patchwork Thu May 31 08:13:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 10440515 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4C6FD6035E for ; Thu, 31 May 2018 08:13:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 400DE26256 for ; Thu, 31 May 2018 08:13:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 346E326E3A; Thu, 31 May 2018 08:13:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0CB6026256 for ; Thu, 31 May 2018 08:13:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754126AbeEaINW (ORCPT ); Thu, 31 May 2018 04:13:22 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:34535 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754039AbeEaINN (ORCPT ); Thu, 31 May 2018 04:13:13 -0400 Received: by mail-io0-f193.google.com with SMTP id e15-v6so16433080iog.1 for ; Thu, 31 May 2018 01:13:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=L/0ohZ/LbIZR669GkuEvTJKPPjM2HUj9/vcxc4w6pjA=; b=iYt9e3F5IGoQtXw7r/ektV2NpUFOQ13c3vS6fan/7mFAnsmdsp+G4EQ0G/+HOCaUxB 4km2MXlLFCNkUsZqMt92VfOeg2WN/yQG6s+D0YXnLrcICPsIRVaiqPJhBT+BzSg/2gn8 IVkePfGNSpJ4iP46uGaEaEqZiwyg2qLz3XspgjsEmPdfwzCZXc9W5vNKuaZnnfW8x3Pi Rwr600EIFsF7TfwflnEKUHJ5BATe2UQsQylDhWfWjQZN4XgswtHfePEW8OXNoTv/nHUF N9sqJz3hUJF1MF8UK8FQnjCG4NQEvUcN7G0X7evAr2yczsqDF8aI54iZFa6MqPeD4yqm tFnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=L/0ohZ/LbIZR669GkuEvTJKPPjM2HUj9/vcxc4w6pjA=; b=RTsDRCSeY7KObeOwMvhmgWSvfsPxOJP1ozsK98yyatuxyDENPyHuBeSb/oNQDlkiQY 9hjP0n5EBzX40HdlELGe51VBq01hijnhMvs5DEYsRWZVF5wuSPZX1EM8iqhHSzymnIZ3 F04L55eOgLQxla8BLZY9uHExjsVEOK7JfS1o/CTDHHVMJ2mcACEOx2rsnFGJK0JjMCX7 6EiZYZadj8pXi8PwICHC0KjoV/ZUScSESEvb/NpSWW2k2xxVWLUkg74Amj8hBXxTuyWm 9OxAUDuCoMLS/1sUNQUYTPEas05Qd8VvysqFtbVB19r+FAI/zG9DkXdj6Xx5P6I9g9Tb eJ7A== X-Gm-Message-State: ALKqPwdqRIQBGN21ITTVA22n+NZu0SC7Bsx3LfJuxHpIfSSQSvnshyNE X8VxqaZm8wqhXnTC5RjVVfwA3I+GzDYzlDoLdPo= X-Google-Smtp-Source: ADUXVKKlOxvxODuovrXSycUjWWIvjqan4RBD+lRIdtC/xNL3K0G8zR+7k2OLWEWIMUaYVydrVdk839iPDsK1TpZWiO8= X-Received: by 2002:a6b:b288:: with SMTP id b130-v6mr5703337iof.153.1527754392845; Thu, 31 May 2018 01:13:12 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:b4ac:0:0:0:0:0 with HTTP; Thu, 31 May 2018 01:13:12 -0700 (PDT) In-Reply-To: <5B0FA1CF.5070208@easystack.cn> References: <1527564161-17328-1-git-send-email-dongsheng.yang@easystack.cn> <5B0FA1CF.5070208@easystack.cn> From: Ilya Dryomov Date: Thu, 31 May 2018 10:13:12 +0200 Message-ID: Subject: Re: [PATCH 1/2] rbd: don't queue watch delayed work when we are removing device To: Dongsheng Yang Cc: Jason Dillaman , Ceph Development Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, May 31, 2018 at 9:18 AM, Dongsheng Yang wrote: > > On 05/31/2018 03:11 AM, Ilya Dryomov wrote: >> >> On Tue, May 29, 2018 at 5:22 AM, Dongsheng Yang >> wrote: >>> >>> We will cancel all watch delayed work in >>> cancel_delayed_work_sync(&rbd_dev->watch_dwork); >>> If we queue delayed work after this, there will be a use-after-free >>> problem: >>> >>> [ 549.932085] BUG: unable to handle kernel NULL pointer dereference at >>> 0000000000000000 >>> [ 549.934134] PGD 0 P4D 0 >>> [ 549.935145] Oops: 0000 [#1] SMP PTI >>> [ 549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag udp_diag >>> inet_diag unix_diag af_packet_diag netlink_diag dns_resolver ebtable_filter >>> ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill >>> snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul >>> crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq mbcache >>> aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper snd_pcm >>> snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl soundcore i2c_piix4 >>> lockd grace sunrpc ip_tables xfs libcrc32c virtio_console virtio_blk >>> ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea sysfillrect >>> sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci 8139cp >>> virtio_ring i2c_core mii virtio floppy serio_raw dm_mirror dm_region_hash >>> [ 549.951835] dm_log dm_mod dax [last unloaded: libceph] >>> [ 549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE >>> 4.17.0-rc6+ #13 >>> [ 549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >>> [ 549.957246] RIP: 0010:__queue_work+0x6a/0x3b0 >>> [ 549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 >>> [ 549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: >>> 0000000000000000 >>> [ 549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: >>> 0000000000000000 >>> [ 549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: >>> ffff9427df1c3f00 >>> [ 549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: >>> ffff9427cfb85970 >>> [ 549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: >>> 0000000000000007 >>> [ 549.969999] FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) >>> knlGS:0000000000000000 >>> [ 549.972069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: >>> 00000000000206e0 >>> [ 549.975695] Call Trace: >>> [ 549.976900] >>> [ 549.978033] ? __queue_work+0x3b0/0x3b0 >>> [ 549.979442] call_timer_fn+0x2d/0x130 >>> [ 549.980824] run_timer_softirq+0x16e/0x430 >>> [ 549.982263] ? tick_sched_timer+0x37/0x70 >>> [ 549.983691] __do_softirq+0xd2/0x280 >>> [ 549.985035] irq_exit+0xd5/0xe0 >>> [ 549.986316] smp_apic_timer_interrupt+0x6c/0x130 >>> [ 549.987835] apic_timer_interrupt+0xf/0x20 >>> >>> This patch forbid to queue watch_dwork when we are removing device. >>> >>> Signed-off-by: Dongsheng Yang >>> --- >>> drivers/block/rbd.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >>> index 2b4e90d..d1d8f46 100644 >>> --- a/drivers/block/rbd.c >>> +++ b/drivers/block/rbd.c >>> @@ -3475,9 +3475,13 @@ static void rbd_reregister_watch(struct >>> work_struct *work) >>> set_bit(RBD_DEV_FLAG_BLACKLISTED, >>> &rbd_dev->flags); >>> wake_requests(rbd_dev, true); >>> } else { >>> - queue_delayed_work(rbd_dev->task_wq, >>> - &rbd_dev->watch_dwork, >>> - RBD_RETRY_DELAY); >>> + spin_lock_irq(&rbd_dev->lock); >>> + if (!test_bit(RBD_DEV_FLAG_REMOVING, >>> &rbd_dev->flags)) { >>> + queue_delayed_work(rbd_dev->task_wq, >>> + &rbd_dev->watch_dwork, >>> + RBD_RETRY_DELAY); >>> + } >>> + spin_unlock_irq(&rbd_dev->lock); >>> } >>> mutex_unlock(&rbd_dev->watch_mutex); >>> return; >> >> Hi Dongsheng, >> >> What made you think it is rbd (or ceph) related? Do you know what gets >> dereferenced? Is it reproducible? > > > Hi Ilya, > There is a simple reproduce script: > > ./vstart.sh -k -l --bluestore > rbd map -o osd_request_timeout=10 test1 > time dd if=/dev/zero of=/dev/rbd0 bs=64K count=1000 oflag=direct & > sleep 1 > ps -ef|grep -E "ceph-mon|ceph-osd"|gawk '{print "kill -9 "$2}'|bash > rbd unmap -o force /dev/rbd0 > > But maybe you need to run this script in a loop, because that's not 100% > happen. Ah, I see. I think a more correct fix would be to move ->watch_dwork cancellation: This way rbd_reregister_watch() should either bail out early because the watch is UNREGISTERED at that point or just get cancelled. Thanks, Ilya --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 2b4e90d06822..23ae0df7a978 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3400,7 +3400,6 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev) { dout("%s rbd_dev %p\n", __func__, rbd_dev); - cancel_delayed_work_sync(&rbd_dev->watch_dwork); cancel_work_sync(&rbd_dev->acquired_lock_work); cancel_work_sync(&rbd_dev->released_lock_work); cancel_delayed_work_sync(&rbd_dev->lock_dwork); @@ -3418,6 +3417,7 @@ static void rbd_unregister_watch(struct rbd_device *rbd_dev) rbd_dev->watch_state = RBD_WATCH_STATE_UNREGISTERED; mutex_unlock(&rbd_dev->watch_mutex); + cancel_delayed_work_sync(&rbd_dev->watch_dwork); ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc); }