From patchwork Wed Dec 12 13:46:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10726401 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A786614E2 for ; Wed, 12 Dec 2018 13:47:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 921D22B3C2 for ; Wed, 12 Dec 2018 13:47:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83ECC2B3F0; Wed, 12 Dec 2018 13:47:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B258A2B3C2 for ; Wed, 12 Dec 2018 13:47:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727471AbeLLNq7 (ORCPT ); Wed, 12 Dec 2018 08:46:59 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:33982 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbeLLNq7 (ORCPT ); Wed, 12 Dec 2018 08:46:59 -0500 Received: by mail-pf1-f196.google.com with SMTP id h3so8916746pfg.1 for ; Wed, 12 Dec 2018 05:46:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=to:cc:from:subject:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=7GdkN0btlB0ZYVKkt9IYCc8wwUK4i6uZ/IVQvogO8D0=; b=jU9XJjWQyzzyKinQbVpathZQ6sg6gyr3QlTOuYQ2a2qXOsiZK32DIZBKHdjMNVQfcV 7b2OFHCJRJVik6NTuqAsHwBeAg3lphbN7g1q5/c5zkErbi5vCdmX6vvxk4ow+s9vdLHH 8WNnU0cn1eP8snCqdOl2bBLNFDUd7SWp5dfx48e2VqEve8e8m4wVFCFQ/91+M//51Jxa 4PwlydUmqYlzPixlJ1A/qbNqcmchGzZ0YGRwh7czbQjLB6xdr7wss2kaLUD2PPOiyqFW oftIqWIC+Lj58aDFkfNl0wOvPElUzgRzqulxnD07dlhZ2eR7NLFKYHdDu2KWMcphlEX6 1SYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=7GdkN0btlB0ZYVKkt9IYCc8wwUK4i6uZ/IVQvogO8D0=; b=JA0iu6z3VI0sSHGegSzWfKmB9jLzYvKXZxwrEfjqht222Ngwdv2DmVT+xr/9GpAUEl ytzD9nQWel63ucF/xf05bCgjE2jc3TsdeopTnyYAZE/VDO+lN964ZyBDDiSPTYasGF+K Nlx9enhGOn6pBmspasUhdsQ9sL6F+zGl7jU5mB+7eeOWd4/yLtn0dDbCppDmleTb+cOj 0Ptl5cRGm2EodEZXmprRkJsvpWv1KqXJiw0jerK0d7pO8b1PqUXhjnDLGjJzjcS5uEXu p+CnpTCFdTzvEHgIWGy7M1AdT3mVd9ZmIqZarlAwssvaxw0/7k7EFCkCjTPGk4OJXt4X 7pfw== X-Gm-Message-State: AA+aEWYjG6be+KYqOcsJwhhlqdUXqw8aj11z16aXgswBEuOA9rdRqh5q zgm08zPT7NMf2uB632x1+shcKFsUDNTkWA== X-Google-Smtp-Source: AFSGD/XPesMX3cFZxU/zIpk0J6gjxTgYglHpygRP/cRhKzsNHnuHIlfo+jWj7OkPJZJSA67BEb2PoA== X-Received: by 2002:a62:42d4:: with SMTP id h81mr20231012pfd.259.1544622418455; Wed, 12 Dec 2018 05:46:58 -0800 (PST) Received: from [192.168.1.121] (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id i1sm28164060pgb.46.2018.12.12.05.46.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Dec 2018 05:46:57 -0800 (PST) To: linux-scsi Cc: "Martin K. Petersen" From: Jens Axboe Subject: [PATCH] sd: use mempool for discard special page Message-ID: <6da7da45-ad80-65df-f1f0-81a2d488459c@kernel.dk> Date: Wed, 12 Dec 2018 06:46:55 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Content-Language: en-US Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When boxes are run near (or to) OOM, we have a problem with the discard page allocation in sd. If we fail allocating the special page, we return busy, and it'll get retried. But since ordering is honored for dispatch requests, we can keep retrying this same IO and failing. Behind that IO could be requests that want to free memory, but they never get the chance. This means you get repeated spews of traces like this: [1201401.625972] Call Trace: [1201401.631748] dump_stack+0x4d/0x65 [1201401.639445] warn_alloc+0xec/0x190 [1201401.647335] __alloc_pages_slowpath+0xe84/0xf30 [1201401.657722] ? get_page_from_freelist+0x11b/0xb10 [1201401.668475] ? __alloc_pages_slowpath+0x2e/0xf30 [1201401.679054] __alloc_pages_nodemask+0x1f9/0x210 [1201401.689424] alloc_pages_current+0x8c/0x110 [1201401.699025] sd_setup_write_same16_cmnd+0x51/0x150 [1201401.709987] sd_init_command+0x49c/0xb70 [1201401.719029] scsi_setup_cmnd+0x9c/0x160 [1201401.727877] scsi_queue_rq+0x4d9/0x610 [1201401.736535] blk_mq_dispatch_rq_list+0x19a/0x360 [1201401.747113] blk_mq_sched_dispatch_requests+0xff/0x190 [1201401.758844] __blk_mq_run_hw_queue+0x95/0xa0 [1201401.768653] blk_mq_run_work_fn+0x2c/0x30 [1201401.777886] process_one_work+0x14b/0x400 [1201401.787119] worker_thread+0x4b/0x470 [1201401.795586] kthread+0x110/0x150 [1201401.803089] ? rescuer_thread+0x320/0x320 [1201401.812322] ? kthread_park+0x90/0x90 [1201401.820787] ? do_syscall_64+0x53/0x150 [1201401.829635] ret_from_fork+0x29/0x40 Ensure that the discard page allocation has a mempool backing, so we know we can make progress. Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig --- We actually hit this in production, it's not a theoretical issue. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 4a6ed2fc8c71..a1a44f52e0e8 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -133,6 +133,7 @@ static DEFINE_MUTEX(sd_ref_mutex); static struct kmem_cache *sd_cdb_cache; static mempool_t *sd_cdb_pool; +static mempool_t *sd_page_pool; static const char *sd_cache_types[] = { "write through", "none", "write back", @@ -759,9 +760,10 @@ static blk_status_t sd_setup_unmap_cmnd(struct scsi_cmnd *cmd) unsigned int data_len = 24; char *buf; - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC); if (!rq->special_vec.bv_page) return BLK_STS_RESOURCE; + clear_highpage(rq->special_vec.bv_page); rq->special_vec.bv_offset = 0; rq->special_vec.bv_len = data_len; rq->rq_flags |= RQF_SPECIAL_PAYLOAD; @@ -793,9 +795,10 @@ static blk_status_t sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); u32 data_len = sdp->sector_size; - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC); if (!rq->special_vec.bv_page) return BLK_STS_RESOURCE; + clear_highpage(rq->special_vec.bv_page); rq->special_vec.bv_offset = 0; rq->special_vec.bv_len = data_len; rq->rq_flags |= RQF_SPECIAL_PAYLOAD; @@ -824,9 +827,10 @@ static blk_status_t sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); u32 data_len = sdp->sector_size; - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC); if (!rq->special_vec.bv_page) return BLK_STS_RESOURCE; + clear_highpage(rq->special_vec.bv_page); rq->special_vec.bv_offset = 0; rq->special_vec.bv_len = data_len; rq->rq_flags |= RQF_SPECIAL_PAYLOAD; @@ -1277,7 +1281,7 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt) u8 *cmnd; if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) - __free_page(rq->special_vec.bv_page); + mempool_free(rq->special_vec.bv_page, sd_page_pool); if (SCpnt->cmnd != scsi_req(rq)->cmd) { cmnd = SCpnt->cmnd; @@ -3614,6 +3618,13 @@ static int __init init_sd(void) goto err_out_cache; } + sd_page_pool = mempool_create_page_pool(SD_MEMPOOL_SIZE, 0); + if (!sd_page_pool) { + printk(KERN_ERR "sd: can't init discard page pool\n"); + err = -ENOMEM; + goto err_out_ppool; + } + err = scsi_register_driver(&sd_template.gendrv); if (err) goto err_out_driver; @@ -3621,6 +3632,9 @@ static int __init init_sd(void) return 0; err_out_driver: + mempool_destroy(sd_page_pool); + +err_out_ppool: mempool_destroy(sd_cdb_pool); err_out_cache: @@ -3647,6 +3661,7 @@ static void __exit exit_sd(void) scsi_unregister_driver(&sd_template.gendrv); mempool_destroy(sd_cdb_pool); + mempool_destroy(sd_page_pool); kmem_cache_destroy(sd_cdb_cache); class_unregister(&sd_disk_class);