From patchwork Fri Jun 1 09:27:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 10442833 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3B3C5602BC for ; Fri, 1 Jun 2018 09:27:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3378128EC3 for ; Fri, 1 Jun 2018 09:27:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 27B7B28ED6; Fri, 1 Jun 2018 09:27:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C013C28EC3 for ; Fri, 1 Jun 2018 09:27:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750794AbeFAJ1c (ORCPT ); Fri, 1 Jun 2018 05:27:32 -0400 Received: from m6562.mail.qiye.163.com ([123.126.65.62]:25926 "EHLO m6562.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750760AbeFAJ1a (ORCPT ); Fri, 1 Jun 2018 05:27:30 -0400 Received: from yds-pc.domain (unknown [218.94.118.90]) by smtp12 (Coremail) with SMTP id WtOowABXPPhsERFbiapFAA--.235S2; Fri, 01 Jun 2018 17:27:08 +0800 (CST) Subject: Re: [PATCH 1/2] rbd: don't queue watch delayed work when we are removing device To: Ilya Dryomov References: <1527564161-17328-1-git-send-email-dongsheng.yang@easystack.cn> <5B0FA1CF.5070208@easystack.cn> <5B0FB37B.9040608@easystack.cn> Cc: Jason Dillaman , Ceph Development From: Dongsheng Yang Message-ID: <5B11116C.3080004@easystack.cn> Date: Fri, 1 Jun 2018 17:27:08 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: X-CM-TRANSID: WtOowABXPPhsERFbiapFAA--.235S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3Jw43KF4rZw1UKw15Cw4kZwb_yoW3tF1kpr 47JFWUKFW8Jr1jqw1jvwn8XF15t3WqkFyDWry0yw17Cr9Ygrn7Ar1IkFyUCF1UGry5Ar47 Jr45X3yava4UG3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0pRsmRUUUUUU= X-Originating-IP: [218.94.118.90] X-CM-SenderInfo: 5grqw2pkhqwhp1dqwq5hdv52pwdfyhdfq/1tbieRFYelqrgIynzgAAse Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 05/31/2018 10:42 PM, Ilya Dryomov wrote: > On Thu, May 31, 2018 at 10:34 AM, Dongsheng Yang > wrote: >> >> On 05/31/2018 04:13 PM, Ilya Dryomov wrote: >>> On Thu, May 31, 2018 at 9:18 AM, Dongsheng Yang >>> wrote: >>>> On 05/31/2018 03:11 AM, Ilya Dryomov wrote: >>>>> On Tue, May 29, 2018 at 5:22 AM, Dongsheng Yang >>>>> wrote: >>>>>> We will cancel all watch delayed work in >>>>>> cancel_delayed_work_sync(&rbd_dev->watch_dwork); >>>>>> If we queue delayed work after this, there will be a use-after-free >>>>>> problem: >>>>>> >>>>>> [ 549.932085] BUG: unable to handle kernel NULL pointer dereference at >>>>>> 0000000000000000 >>>>>> [ 549.934134] PGD 0 P4D 0 >>>>>> [ 549.935145] Oops: 0000 [#1] SMP PTI >>>>>> [ 549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag udp_diag >>>>>> inet_diag unix_diag af_packet_diag netlink_diag dns_resolver >>>>>> ebtable_filter >>>>>> ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill >>>>>> snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul >>>>>> crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq >>>>>> mbcache >>>>>> aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper >>>>>> snd_pcm >>>>>> snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl soundcore >>>>>> i2c_piix4 >>>>>> lockd grace sunrpc ip_tables xfs libcrc32c virtio_console virtio_blk >>>>>> ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea >>>>>> sysfillrect >>>>>> sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci >>>>>> 8139cp >>>>>> virtio_ring i2c_core mii virtio floppy serio_raw dm_mirror >>>>>> dm_region_hash >>>>>> [ 549.951835] dm_log dm_mod dax [last unloaded: libceph] >>>>>> [ 549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE >>>>>> 4.17.0-rc6+ #13 >>>>>> [ 549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >>>>>> [ 549.957246] RIP: 0010:__queue_work+0x6a/0x3b0 >>>>>> [ 549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 >>>>>> [ 549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: >>>>>> 0000000000000000 >>>>>> [ 549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: >>>>>> 0000000000000000 >>>>>> [ 549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: >>>>>> ffff9427df1c3f00 >>>>>> [ 549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: >>>>>> ffff9427cfb85970 >>>>>> [ 549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: >>>>>> 0000000000000007 >>>>>> [ 549.969999] FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 549.972069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: >>>>>> 00000000000206e0 >>>>>> [ 549.975695] Call Trace: >>>>>> [ 549.976900] >>>>>> [ 549.978033] ? __queue_work+0x3b0/0x3b0 >>>>>> [ 549.979442] call_timer_fn+0x2d/0x130 >>>>>> [ 549.980824] run_timer_softirq+0x16e/0x430 >>>>>> [ 549.982263] ? tick_sched_timer+0x37/0x70 >>>>>> [ 549.983691] __do_softirq+0xd2/0x280 >>>>>> [ 549.985035] irq_exit+0xd5/0xe0 >>>>>> [ 549.986316] smp_apic_timer_interrupt+0x6c/0x130 >>>>>> [ 549.987835] apic_timer_interrupt+0xf/0x20 >>>>>> >>>>>> This patch forbid to queue watch_dwork when we are removing device. >>>>>> >>>>>> Signed-off-by: Dongsheng Yang >>>>>> --- >>>>>> drivers/block/rbd.c | 10 +++++++--- >>>>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >>>>>> index 2b4e90d..d1d8f46 100644 >>>>>> --- a/drivers/block/rbd.c >>>>>> +++ b/drivers/block/rbd.c >>>>>> @@ -3475,9 +3475,13 @@ static void rbd_reregister_watch(struct >>>>>> work_struct *work) >>>>>> set_bit(RBD_DEV_FLAG_BLACKLISTED, >>>>>> &rbd_dev->flags); >>>>>> wake_requests(rbd_dev, true); >>>>>> } else { >>>>>> - queue_delayed_work(rbd_dev->task_wq, >>>>>> - &rbd_dev->watch_dwork, >>>>>> - RBD_RETRY_DELAY); >>>>>> + spin_lock_irq(&rbd_dev->lock); >>>>>> + if (!test_bit(RBD_DEV_FLAG_REMOVING, >>>>>> &rbd_dev->flags)) { >>>>>> + queue_delayed_work(rbd_dev->task_wq, >>>>>> + >>>>>> &rbd_dev->watch_dwork, >>>>>> + RBD_RETRY_DELAY); >>>>>> + } >>>>>> + spin_unlock_irq(&rbd_dev->lock); >>>>>> } >>>>>> mutex_unlock(&rbd_dev->watch_mutex); >>>>>> return; >>>>> Hi Dongsheng, >>>>> >>>>> What made you think it is rbd (or ceph) related? Do you know what gets >>>>> dereferenced? Is it reproducible? >>>> >>>> Hi Ilya, >>>> There is a simple reproduce script: >>>> >>>> ./vstart.sh -k -l --bluestore >>>> rbd map -o osd_request_timeout=10 test1 >>>> time dd if=/dev/zero of=/dev/rbd0 bs=64K count=1000 oflag=direct & >>>> sleep 1 >>>> ps -ef|grep -E "ceph-mon|ceph-osd"|gawk '{print "kill -9 "$2}'|bash >>>> rbd unmap -o force /dev/rbd0 >>>> >>>> But maybe you need to run this script in a loop, because that's not 100% >>>> happen. >>> Ah, I see. I think a more correct fix would be to move ->watch_dwork >>> cancellation: >>> >>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >>> index 2b4e90d06822..23ae0df7a978 100644 >>> --- a/drivers/block/rbd.c >>> +++ b/drivers/block/rbd.c >>> @@ -3400,7 +3400,6 @@ static void cancel_tasks_sync(struct rbd_device >>> *rbd_dev) >>> { >>> dout("%s rbd_dev %p\n", __func__, rbd_dev); >>> >>> - cancel_delayed_work_sync(&rbd_dev->watch_dwork); >>> cancel_work_sync(&rbd_dev->acquired_lock_work); >>> cancel_work_sync(&rbd_dev->released_lock_work); >>> cancel_delayed_work_sync(&rbd_dev->lock_dwork); >>> @@ -3418,6 +3417,7 @@ static void rbd_unregister_watch(struct >>> rbd_device *rbd_dev) >>> rbd_dev->watch_state = RBD_WATCH_STATE_UNREGISTERED; >>> mutex_unlock(&rbd_dev->watch_mutex); >>> >>> + cancel_delayed_work_sync(&rbd_dev->watch_dwork); >>> ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc); >>> } >>> >>> This way rbd_reregister_watch() should either bail out early because >>> the watch is UNREGISTERED at that point or just get cancelled. >> >> Sounds good. what's your suggestion about the [2/2] about lock_dwork? > I don't think it's needed. In order for rbd_acquire_lock() to possibly > requeue itself, it must be queued after it is cancelled. > > Looking at the places where ->lock_dwork is queued: > > - rbd_wait_state_locked(), i.e. new I/O request -- shoudn't happen at > that point > > - rbd_reacquire_lock(), from rbd_reregister_watch() if LOCKED -- again > shouldn't happen at that point because it is preceded by rbd_unlock() > in rbd_dev_image_unlock() Hi Ilya, That's true. So get back to the fix for watch_dwork, maybe we can move the all cancel_tasks_sync() after unregister, to put the all taks canceling togather. Something like that: if (rbd_dev->watch_state == RBD_WATCH_STATE_REGISTERED) @@ -3427,6 +3426,7 @@ static void rbd_unregister_watch(struct rbd_device *rbd_dev) rbd_dev->watch_state = RBD_WATCH_STATE_UNREGISTERED; mutex_unlock(&rbd_dev->watch_mutex); + cancel_tasks_sync(rbd_dev); ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc); } Thanx Yang > > If you are seeing a similar crash on ->lock_dwork, the bug is somewhere > else. > > Thanks, > > Ilya > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index c450016..d1a1e78 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3419,7 +3419,6 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev) static void rbd_unregister_watch(struct rbd_device *rbd_dev) { WARN_ON(waitqueue_active(&rbd_dev->lock_waitq)); - cancel_tasks_sync(rbd_dev); mutex_lock(&rbd_dev->watch_mutex);